Re: pglz compression performance, take two

Tomas Vondra Thu, 04 Nov 2021 13:47:39 -0700

Hi,

I've looked at this patch again and did some testing. I don't have any comments to the code (I see there are two comments from Mark after the last version, though).

For the testing, I did a fairly simple benchmark loading either random or compressible data into a bytea column. The tables are defined as unlogged, the values are 1kB, 4kB and 1MB, and the total amount of data is always 1GB. The timings are


    test           master     patched    delta
    ------------------------------------------
    random_1k      12295        12312     100%
    random_1m      12999        12984     100%
    random_4k      16881        15959      95%
    redundant_1k   12308        12348     100%
    redundant_1m   16632        14072      85%
    redundant_4k   16798        13828      82%

I ran the test on multiple x86_64 machines, but the behavior is almost exactly the same.

This shows there's no difference for 1kB (expected, because this does not exceed the ~2kB TOAST threshold). For random data in general the difference is pretty negligible, although it's a bit strange it takes longer for 4kB than 1MB ones.

For redundant (highly compressible) values, there's quite significant speedup between 15-18%. Real-world data are likely somewhere between, but the speedup is still pretty nice.

Andrey, can you update the patch per Mark's review? I'll do my best to get it committed sometime in this CF.

Attached are the two scripts used for generating / testing (you'll have to fix some hardcoded paths, but simple otherwise).


regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

generator.sql
Description: application/sql

test.sh
Description: application/shellscript

Re: pglz compression performance, take two

Reply via email to