On 2015-12-02 08:53, Tomasz Chmielewski wrote:
On 2015-12-02 22:03, Austin S Hemmelgarn wrote:

 From these numbers (124 GB used where data size is 153 GB), it appears
that we save around 20% with zlib compression enabled.
Is 20% reasonable saving for zlib? Typically text compresses much better
with that algorithm, although I understand that we have several
limitations when applying that on a filesystem level.

This is actually an excellent question.  A couple of things to note
before I share what I've seen:
1. Text compresses better with any compression algorithm.  It is by
nature highly patterned and moderately redundant data, which is what
benefits the most from compression.

It looks that compress=zlib does not compress very well. Following
Duncan's suggestion, I've changed it to compress-force=zlib, and
re-copied the data to make sure the file are compressed.
For future reference, if you run 'btrfs filesystem defrag -r -czlib' on the top level directory, you can achieve the same effect without having to deal with the copy overhead. This has a side effect of breaking reflinks, but copying the files off and back onto the filesystem does so also, and even then, I doubt that you're using reflinks. There probably wouldn't be much difference in the time it takes, but at least you wouldn't be hitting another disk in the process.

Compression ratio is much much better now (on a slightly changed data set):

# df -h
/dev/xvdb       200G   24G  176G  12% /var/log/remote


# du -sh /var/log/remote/
138G    /var/log/remote/


So, 138 GB files use just 24 GB on disk - nice!

However, I would still expect that compress=zlib has almost the same
effect as compress-force=zlib, for 100% text files/logs.

That's better than 80% space savings (it works out to about 83.6%), so I doubt that you'd manage to get anything better than that even with only plain text files. It's interesting that there's such a big discrepancy though, that indicates that BTRFS really needs some work WRT deciding what to compress.


Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

Reply via email to