2017-06-20 1:09 GMT+03:00 Timofey Titovets <nefelim...@gmail.com>: > Hi, for last several days i try work on entropy calculation that can > be usable in btrfs compression code (for detect bad compressible > data), > > I've implemented: > - avg meaning (Problems with accuracy) > - shannon entropy > - shannon entropy with only integer logic (Accuracy compared to float > shannon +-0.5%) > > All writen on C with C++ inserts and can be easy ported to kernel code > if needed. > Repo there: > https://github.com/Nefelim4ag/Entropy_Calculation > > It will be great if someone has an interest in profiling and > performance tests of that > > Because my stupid tests with ~$ time <binary> and 8MB of test data > Shows that lzo with level 1-6 are fastest way to detect if data are > compressible > And that integer shannon entropy are much faster (in 5 times) way in > compare to any gzip level. > > Thanks! > > P.S. > I get this idea from: > https://btrfs.wiki.kernel.org/index.php/Project_ideas > - Compression enhancements > - heuristics -- try to learn in a simple way how well the file > data compress, or not > > -- > Have a nice day, > Timofey.
Update: Rewrite that on pure C, now performance for 10000 times of run on 128KB data (64kb random, 64kb zeroes) --- avg_meaning_entropy 10s --- shannon_entropy 23s --- shannon_int_entropy 14s --- gzip -f -k -3 ./indata.bin 43s --- lzop -f -k -3 ./indata.bin 18s Thanks. -- Have a nice day, Timofey. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html