I've done with heuristic method,
so i post some performance test output:
(I store test data in /run/user/$UID/, and script just ran programm 20000 times)
###
# Performance test will measure initialization time
# And remove it from run time of tests
# This may be inaccurate in some cases
# But this allow see speed difference
./performance_test.sh

Test good compressible data: 128k
AVG Initialization time: 17018ms
avg mean:       3240ms
shannon float:  11537ms
shannon integ:  9933ms
heuristic:      2060ms
gzip -ck -3 /run/user/1000/test_data.bin:       19767ms
lzop -ck -3 /run/user/1000/test_data.bin:       12393ms
lzop -ck -1 /run/user/1000/test_data.bin:       13660ms
- - - - -
Test half compressible data: 128k
AVG Initialization time: 15820ms
avg mean:       3302ms
shannon float:  8680ms
shannon integ:  9659ms
heuristic:      3207ms // 128*20000/1024/3 ~ 800 Mb/s
gzip -ck -3 /run/user/1000/test_data.bin:       57245ms
lzop -ck -3 /run/user/1000/test_data.bin:       14175ms
lzop -ck -1 /run/user/1000/test_data.bin:       15392ms
- - - - -
Test bad compressible data: 128k
AVG Initialization time: 16878ms
avg mean:       4410ms
shannon float:  7113ms
shannon integ:  6777ms
heuristic:      2162ms  // 128*20000/1024/2 ~ 1250 Mb/s
gzip -ck -3 /run/user/1000/test_data.bin:       110578ms
lzop -ck -3 /run/user/1000/test_data.bin:       14238ms
lzop -ck -1 /run/user/1000/test_data.bin:       15332ms
- - - - -
Test good compressible data: 8k
AVG Initialization time: 17526ms
avg mean:       1683ms
shannon float:  5762ms
shannon integ:  2427ms
heuristic:      1858ms
gzip -ck -3 /run/user/1000/test_data.bin:       2745ms
lzop -ck -3 /run/user/1000/test_data.bin:       10221ms
lzop -ck -1 /run/user/1000/test_data.bin:       11960ms
- - - - -
Test half compressible data: 8k
AVG Initialization time: 17513ms
avg mean:       1853ms
shannon float:  1933ms
shannon integ:  2572ms
heuristic:      2571ms
gzip -ck -3 /run/user/1000/test_data.bin:       4167ms
lzop -ck -3 /run/user/1000/test_data.bin:       9502ms
lzop -ck -1 /run/user/1000/test_data.bin:       10923ms
- - - - -
Test bad compressible data: 8k
AVG Initialization time: 18164ms
avg mean:       -28ms
shannon float:  312ms
shannon integ:  1891ms
heuristic:      542ms
gzip -ck -3 /run/user/1000/test_data.bin:       6407ms
lzop -ck -3 /run/user/1000/test_data.bin:       9088ms
lzop -ck -1 /run/user/1000/test_data.bin:       10741ms

So as you can see most of time heuristic test are 5 times faster then
direct compression.
I did some base testing on that (trying  random pieces of data: blocks
from VM images, Photos, Texts, /dev/zero, /dev/urandom)
And heuristic method are pretty accurate, most of magic are happens in [2].


I think that in near time i will try make some kernel patch set for that.
If someone interest in that, you can look code and do some tests on your data
I appreciate any feedback!

Thanks!

P.S.
[1] https://github.com/Nefelim4ag/Entropy_Calculation
[2] https://github.com/Nefelim4ag/Entropy_Calculation/blob/master/heuristic.c

-- 
Have a nice day,
Timofey.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to