Why can't LZ4 compress text files? Petr
On 2/09/2013 2:51 PM, Daniel Flores wrote: > Hello, > here is my report for week 11. > > As we are approaching the end of GSOC, the code of added feature seems > to become more stable and less likely to change. So, in this report, as > I said in the previous report, I present you the performance comparison > between different compression modes. It is unlikely that there will be > any new algorithm added in the course of the following weeks – most > likely, I'll focus on polishing the current code and bug-hunting, and > also performing some additional tests. > > Some changes were made in the code: the most import of those is that we > have one compression function for both compression algorithms now, which > makes the code more compact. Also the code related to ZLIB is much more > compact as well and basically only contains what we need, even though > some additional cleaning is still needed. Also in the current code the > compression is never turned off completely when an incompressible file > is detected – now it retries the compression every 512KB of a file. > > Now let's move on to performance results. It should be mentioned that > the strength of ZLIB compression can be adjusted and I tested > specifically levels 6 (default) and 9 (best compression ratio). > Currently the user can't tune this compression level value. > > First of all, I decided to determine what is the performance gain from > using ZLIB instead of LZ4, so I looked into how many blocks ended as > compressed and what was the actual size of those blocks in different > kind of files. Those tests were performed in the actual file system, not > in a prototype application, to ensure that the results would be like in > a real-life use. > > I tested a .jpg image, two .wav files, a couple of text files, a > perfectly compressible .tif image and a couple of log files. > > For the JPEG image and .wav files ZLIB compression turned unsuccessful, > which is not surprising. However, ZLIB compression turned to be very > effective with text files that LZ4 can't compress at all – it > effectively managed to reduce all blocks of those files from 64KB to > 32KB. Very nice result here. No difference between ZLIB level 6 and ZLIB > level 9 though. > > When it comes to .tif file both LZ4 and ZLIB managed to compress all the > blocks, however many blocks compressed by ZLIB ended up smaller than > blocks compressed by LZ4. > In case of this particular file the result was the following: > > Total number of blocks – 57 > > LZ4: > > 1KB – 1 > 2KB – 1 > 4KB – 1 > 8KB – 7 > 16KB – 43 > 32KB – 4 > > ZLIB level 6: > > 1KB – 2 > 2KB – 1 > 4KB – 13 > 8KB – 36 > 16KB – 5 > > ZLIB level 9: > > 1KB – 2 > 2KB – 5 > 4KB – 10 > 8KB – 38 > 16KB – 2 > > As you can see, not only ZLIB compression gave a better result, but > there is also a significant difference between ZLIB level 6 and ZLIB > level 9. Something similar seems to happen with log files where ZLIB > presents better results than LZ4 and, apparently, there is also a > difference between ZLIB level 6 and ZLIB level 9, even though for now I > can't present the exact numbers. > > Now, let's move to another side of performance, which is time of > execution. I measured the total elapsed time for cp (write performance) > and diff (read performance) commands using a time utility and scripts > that copied (from HAMMER to HAMMER2) or diff'ed the specified files > (diff'ed originals on HAMMER with copies on HAMMER2). Each script was > executed 10 times and the HAMMER2 partition was remounted between each > execution. > > You can see the results for write performance here [1] and for read > performance here [2]. > Both LZ4 and ZLIB are usable, however the usage of ZLIB may have a > significant impact on performance and the user must be aware of it. The > difference between LZ4 and ZLIB isn't that big in case of incompressible > files (.jpg and .tar.gz) and small compressible files (.tif and .txt), > but it's huge in case of big compressible files (logs). Very > interestingly, it does look like the strongest compression level for > ZLIB (level 9) is actually either no different or even slightly faster > overall than the default compression level. Also reading of files > compressed with ZLIB seems to be slightly faster than reading of files > compressed with LZ4. > > There is no difference between writing/reading small files without > compression and with compression of any type, but in case of bigger > files the writing without compression is slightly faster than writing > with LZ4, while in case of reading the compressed files almost always > offer advantage in speed, the ZLIB level 9 being a winner. > > It should be noted that all those tests were performed on a virtual > machine, which means that on a real hardware the performance would be > better. I also must note that this is not the comparison between > algorithms themselves, but only between their behavior in very specific > circumstances, that is, HAMMER2 file system, where they can't perform at > their full speed or compression ratio. > > Next week I hope to present some tests of files that can't fit directly > into compressible/incompressible category and also the performance test > of zero-checking feature. I also hope to improve the stability of code > and present the results of some stress-test. > > You can check out the files used for time performance tests (except the > log file for security reasons, please tell me if you need to take a look > at it): 1.jpg[3], every.wav[4], mike.wav[5], book1[6], frymire.tif[7]. > You can check out all the current code in my repository [8], branch > “hammer2_compression”. > > I'll appreciate all comments, suggestions and criticism. > > > Daniel > > [1] http://leaf.dragonflybsd.org/~iostream/write_performance.html > [2] http://leaf.dragonflybsd.org/~iostream/read_performance.html > [3] http://leaf.dragonflybsd.org/~iostream/1.jpg > [4] http://leaf.dragonflybsd.org/~iostream/every.wav > [5] http://leaf.dragonflybsd.org/~iostream/mike.wav > [6] http://leaf.dragonflybsd.org/~iostream/book1 > [7] http://leaf.dragonflybsd.org/~iostream/frymire.tif > [8] git://leaf.dragonflybsd.org/~iostream/dragonfly.git > <http://leaf.dragonflybsd.org/~iostream/dragonfly.git> -- Please use PGP to encrypt your email to ensure our privacy is respected.
signature.asc
Description: OpenPGP digital signature
