Hi,I've looked at this patch again and did some testing. I don't have any comments to the code (I see there are two comments from Mark after the last version, though).
For the testing, I did a fairly simple benchmark loading either random or compressible data into a bytea column. The tables are defined as unlogged, the values are 1kB, 4kB and 1MB, and the total amount of data is always 1GB. The timings are
test master patched delta ------------------------------------------ random_1k 12295 12312 100% random_1m 12999 12984 100% random_4k 16881 15959 95% redundant_1k 12308 12348 100% redundant_1m 16632 14072 85% redundant_4k 16798 13828 82%I ran the test on multiple x86_64 machines, but the behavior is almost exactly the same.
This shows there's no difference for 1kB (expected, because this does not exceed the ~2kB TOAST threshold). For random data in general the difference is pretty negligible, although it's a bit strange it takes longer for 4kB than 1MB ones.
For redundant (highly compressible) values, there's quite significant speedup between 15-18%. Real-world data are likely somewhere between, but the speedup is still pretty nice.
Andrey, can you update the patch per Mark's review? I'll do my best to get it committed sometime in this CF.
Attached are the two scripts used for generating / testing (you'll have to fix some hardcoded paths, but simple otherwise).
regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
generator.sql
Description: application/sql
test.sh
Description: application/shellscript