Hi Maxim, Out of interest, will a zstd dictionary be (eventually) utilised as a strategy for further reducing compression and decompression speeds?
``` The compression library Zstandard (also known as "Zstd") has the ability to create an external "dictionary" from a set of training files which can be used to more efficiently (in terms of compression and decompression speed and also in terms of compression ratio) compress files of the same type as the training files. For example, if a dictionary is "trained" on an example set of email messages, anyone with access to the dictionary will be able to more efficiently compress another email file. The trick is that the commonalities are kept in the dictionary file, and, therefore, anyone wishing to decompress the email must have already had that same dictionary sent to them.[2] ``` http://fileformats.archiveteam.org/wiki/Zstandard_dictionary I appreciate it may confuse your piecemeal benchmarking (certainly at this stage) but I would assume that creating a dictionary (or dictionaries, say covering each Guix package category for linguistic overlaps) for manpages would further improve zstd speeds. HTH, ==================== Jonathan McHugh indieterminacy@libre.brussels March 30, 2022 4:49 PM, "Maxim Cournoyer" <maxim.courno...@gmail.com> wrote: > Hi Ludovic, > > Ludovic Courtès <l...@gnu.org> writes: > > [...] > >> To isolate the problem, you could allocate the 4 MiB buffer outside of >> the loop and use ‘get-bytevector-n!’, and also remove code that writes >> to ‘output’. > > I've adjusted the benchmark like so: > > --8<---------------cut here---------------start------------->8--- > (use-modules (ice-9 binary-ports) > (ice-9 match) > (rnrs bytevectors) > (zstd)) > > (define MiB (expt 2 20)) > (define block-size (* 4 MiB)) > (define bv (make-bytevector block-size)) > (define input-file "/tmp/chromium-98.0.4758.102.tar.zst") > > (define (run) > (call-with-input-file input-file > (lambda (port) > (call-with-zstd-input-port port > (lambda (input) > (while (not (eof-object? > (get-bytevector-n! input bv 0 block-size))))))))) > > (run) > --8<---------------cut here---------------end--------------->8--- > > It now runs much faster: > > --8<---------------cut here---------------start------------->8--- > $ time+ zstd -cdk /tmp/chromium-98.0.4758.102.tar.zst > /dev/null > cpu: 98%, mem: 10560 KiB, wall: 0:09.56, sys: 0.37, usr: 9.06 > --8<---------------cut here---------------end--------------->8--- > > --8<---------------cut here---------------start------------->8--- > $ time+ guile ~/src/guile-zstd/benchmark.scm > cpu: 100%, mem: 25152 KiB, wall: 0:11.69, sys: 0.38, usr: 11.30 > --8<---------------cut here---------------end--------------->8--- > > So guile-zstd was about 20% slower, not too far. > > For completeness, here's the same benchmark adjusted for guile-zlib: > > --8<---------------cut here---------------start------------->8--- > (use-modules (ice-9 binary-ports) > (ice-9 match) > (rnrs bytevectors) > (zlib)) > > (define MiB (expt 2 20)) > (define block-size (* 4 MiB)) > (define bv (make-bytevector block-size)) > (define input-file "/tmp/chromium-98.0.4758.102.tar.gz") > > (define (run) > (call-with-input-file input-file > (lambda (port) > (call-with-gzip-input-port port > (lambda (input) > (while (not (eof-object? > (get-bytevector-n! input bv 0 block-size))))))))) > > (run) > --8<---------------cut here---------------end--------------->8--- > > --8<---------------cut here---------------start------------->8--- > $ time+ guile ~/src/guile-zstd/benchmark-zlib.scm > cpu: 86%, mem: 14552 KiB, wall: 0:23.50, sys: 1.09, usr: 19.15 > --8<---------------cut here---------------end--------------->8--- > > --8<---------------cut here---------------start------------->8--- > $ time+ gunzip -ck /tmp/chromium-98.0.4758.102.tar.gz > /dev/null > cpu: 98%, mem: 2304 KiB, wall: 0:35.99, sys: 0.60, usr: 34.99 > --8<---------------cut here---------------end--------------->8--- > > Surprisingly, here guile-zlib appears to be faster than the 'gunzip' > command; guile-zstd is about twice as fast to decompress this 4 GiB > something archive (compressed with zstd at level 19). > > So, it seems the foundation we're building on is sane after all. This > suggests that compression is not the bottleneck when generating the man > pages database, probably because it only needs to read the first few > bytes of each compressed manpage to gather the information it needs, and > that the rest is more expensive compared to that (such as > string-tokenize'ing the lines read to parse the data). > > To be continued... > > Thanks! > > Maxim