2017-06-20 1:09 GMT+03:00 Timofey Titovets <nefelim...@gmail.com>:
> Hi, for last several days i try work on entropy calculation that can
> be usable in btrfs compression code (for detect bad compressible
> data),
>
> I've implemented:
> - avg meaning (Problems with accuracy)
> - shannon entropy
> - shannon entropy with only integer logic (Accuracy compared to float
> shannon +-0.5%)
>
> All writen on C with C++ inserts and can be easy ported to kernel code
> if needed.
> Repo there:
> https://github.com/Nefelim4ag/Entropy_Calculation
>
> It will be great if someone has an interest in profiling and
> performance tests of that
>
> Because my stupid tests with ~$ time <binary> and 8MB of test data
> Shows that lzo with level 1-6 are fastest way to detect if data are 
> compressible
> And that integer shannon entropy are much faster (in 5 times) way in
> compare to any gzip level.
>
> Thanks!
>
> P.S.
> I get this idea from:
> https://btrfs.wiki.kernel.org/index.php/Project_ideas
>  - Compression enhancements
>     - heuristics -- try to learn in a simple way how well the file
> data compress, or not
>
> --
> Have a nice day,
> Timofey.

Update:
Rewrite that on pure C, now performance for 10000 times of run on
128KB data (64kb random, 64kb zeroes)
---
avg_meaning_entropy
10s
---
shannon_entropy
23s
---
shannon_int_entropy
14s
---
gzip -f -k -3 ./indata.bin
43s
---
lzop -f -k -3 ./indata.bin
18s

Thanks.
-- 
Have a nice day,
Timofey.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to