On Sat, Feb 25, 2023 at 5:22 PM Justin Pryzby <pry...@telsasoft.com> wrote: > This resolves cfbot warnings: windows and cppcheck. > And refactors zstd routines. > And updates docs. > And includes some fixes for earlier patches that these patches conflicts > with/depends on.
This'll need a rebase (cfbot took a while to catch up). The patchset includes basebackup modifications, which are part of a different CF entry; was that intended? I tried this on a local, 3.5GB, mostly-text table (from the UK Price Paid dataset [1]) and the comparison against the other methods was impressive. (I'm no good at constructing compression benchmarks, so this is a super naive setup. Client's on the same laptop as the server.) $ time ./src/bin/pg_dump/pg_dump -d postgres -t pp_complete -Z zstd > /tmp/zstd.dump real 1m17.632s user 0m35.521s sys 0m2.683s $ time ./\src/bin/pg_dump/pg_dump -d postgres -t pp_complete -Z lz4 > /tmp/lz4.dump real 1m13.125s user 0m19.795s sys 0m3.370s $ time ./\src/bin/pg_dump/pg_dump -d postgres -t pp_complete -Z gzip > /tmp/gzip.dump real 2m24.523s user 2m22.114s sys 0m1.848s $ ls -l /tmp/*.dump -rw-rw-r-- 1 jacob jacob 1331493925 Mar 3 09:45 /tmp/gzip.dump -rw-rw-r-- 1 jacob jacob 2125998939 Mar 3 09:42 /tmp/lz4.dump -rw-rw-r-- 1 jacob jacob 1215834718 Mar 3 09:40 /tmp/zstd.dump Default gzip was the only method that bottlenecked on pg_dump rather than the server, and default zstd outcompressed it at a fraction of the CPU time. So, naively, this looks really good. With this particular dataset, I don't see much improvement with zstd:long. (At nearly double the CPU time, I get a <1% improvement in compression size.) I assume it's heavily data dependent, but from the notes on --long [2] it seems like they expect you to play around with the window size to further tailor it to your data. Does it make sense to provide the long option without the windowLog parameter? Thanks, --Jacob [1] https://landregistry.data.gov.uk/ [2] https://github.com/facebook/zstd/releases/tag/v1.3.2