Hi Ludovic, Ludovic Courtès <l...@gnu.org> writes:
[...] > To isolate the problem, you could allocate the 4 MiB buffer outside of > the loop and use ‘get-bytevector-n!’, and also remove code that writes > to ‘output’. I've adjusted the benchmark like so: --8<---------------cut here---------------start------------->8--- (use-modules (ice-9 binary-ports) (ice-9 match) (rnrs bytevectors) (zstd)) (define MiB (expt 2 20)) (define block-size (* 4 MiB)) (define bv (make-bytevector block-size)) (define input-file "/tmp/chromium-98.0.4758.102.tar.zst") (define (run) (call-with-input-file input-file (lambda (port) (call-with-zstd-input-port port (lambda (input) (while (not (eof-object? (get-bytevector-n! input bv 0 block-size))))))))) (run) --8<---------------cut here---------------end--------------->8--- It now runs much faster: --8<---------------cut here---------------start------------->8--- $ time+ zstd -cdk /tmp/chromium-98.0.4758.102.tar.zst > /dev/null cpu: 98%, mem: 10560 KiB, wall: 0:09.56, sys: 0.37, usr: 9.06 --8<---------------cut here---------------end--------------->8--- --8<---------------cut here---------------start------------->8--- $ time+ guile ~/src/guile-zstd/benchmark.scm cpu: 100%, mem: 25152 KiB, wall: 0:11.69, sys: 0.38, usr: 11.30 --8<---------------cut here---------------end--------------->8--- So guile-zstd was about 20% slower, not too far. For completeness, here's the same benchmark adjusted for guile-zlib: --8<---------------cut here---------------start------------->8--- (use-modules (ice-9 binary-ports) (ice-9 match) (rnrs bytevectors) (zlib)) (define MiB (expt 2 20)) (define block-size (* 4 MiB)) (define bv (make-bytevector block-size)) (define input-file "/tmp/chromium-98.0.4758.102.tar.gz") (define (run) (call-with-input-file input-file (lambda (port) (call-with-gzip-input-port port (lambda (input) (while (not (eof-object? (get-bytevector-n! input bv 0 block-size))))))))) (run) --8<---------------cut here---------------end--------------->8--- --8<---------------cut here---------------start------------->8--- $ time+ guile ~/src/guile-zstd/benchmark-zlib.scm cpu: 86%, mem: 14552 KiB, wall: 0:23.50, sys: 1.09, usr: 19.15 --8<---------------cut here---------------end--------------->8--- --8<---------------cut here---------------start------------->8--- $ time+ gunzip -ck /tmp/chromium-98.0.4758.102.tar.gz > /dev/null cpu: 98%, mem: 2304 KiB, wall: 0:35.99, sys: 0.60, usr: 34.99 --8<---------------cut here---------------end--------------->8--- Surprisingly, here guile-zlib appears to be faster than the 'gunzip' command; guile-zstd is about twice as fast to decompress this 4 GiB something archive (compressed with zstd at level 19). So, it seems the foundation we're building on is sane after all. This suggests that compression is not the bottleneck when generating the man pages database, probably because it only needs to read the first few bytes of each compressed manpage to gather the information it needs, and that the rest is more expensive compared to that (such as string-tokenize'ing the lines read to parse the data). To be continued... Thanks! Maxim