On 2017-02-25, Stefan Bodewig wrote: > While I was at it I thought it would be nice to compare the performance > of the compression formats we support:
> > https://github.com/bodewig/commons-compress-benchmarks/wiki/Comparison-of-Compression-Formats > As expected the performance of the zlib based native gzip/deflate > compressors/decompressors is superior to all others. The performance of > the pure Java LZ77 I've implemented for snappy and lz4 doesn't look too > bad, but I'm surprised to see Snappy is so much faster than LZ4 (they > share most of the code). But not of the configuration space. I've started to run the benchmark for a bigger file (Compress 1.13's uncompressed tar archive) and Snappy degrades to the performance of bzip2 while lz4 gets even slower than lzma. What happens here is that the current code is more or less equivalent to the "maximum compression" case of zlib (without lazy matches[1] which would slow down things even more). It searches backwards for matches until it finds one that is as long as the maximum match (or data is exceeded). For Snappy maximum match is 64 bytes, for LZ4 it is 2^16-1 (for comparison, for DEFLATE it is 258). In addition it stops searching backwards at a maximum distance from the current position wich is 2^15 for Snappy (by default) and 2^16-1 for lz4 (2^15-3 for DEFLATE). So the current lz4 configuration tries a harder to find a lot longer matches than Snappy does (or DEFLATE would do). zlib has two parameters that tune the performance for the no-lazy match case we've currently implemented. * A "nice match length" that says stop searching once you've found a match of at least this length. * a "maximum chain length" that puts a cap on the number of potential matches that are tried. I plan to implement both of these but am a little sure about the API. They'll go into lz77support.Parameter as optional properties. Of course we will want people who know what they do to provide arbitrary values but it would be nices to have something like new Parameters(WINDOW_SIZE).tunedForBestCompression() or new Parameters(WINDOW_SIZE).tunedForBestSpeed() but I'm unsure how many level we need in between (zlib goes from 0 - no compression - to 9 - maximum compression). And I'm also not sure there is a simple way of picking the best possible parameters for arbitrary LZ77 variants. Also, I'm not sure what the default should be for people using CompressorStreamFactory.createCompressorOutputStream. Ideas are more than welcome. Stefan [1] using lazy matches you find the best back-reference for the data starting at the current byte and calculate the best back-reference if you started with the next byte rather than the current byte and pick the longer one. This is something I also want to add before the 1.14 release. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org For additional commands, e-mail: dev-h...@commons.apache.org