On 2015-12-03 01:29, Duncan wrote:
Austin S Hemmelgarn posted on Wed, 02 Dec 2015 09:39:08 -0500 as
excerpted:

On 2015-12-02 09:03, Imran Geriskovan wrote:
What are your disk space savings when using btrfs with compression?

[Some] posters have reported that for mostly text, compress didn't
give them expected compression results and they needed to use
compress-force.

"compress-force" option compresses regardless of the "compressibility"
of the file.

"compress" option makes some inference about the "compressibility" and
decides to compress or not.

I wonder how that inference is done?
Can anyone provide some pseudo code for it?

I'm not certain how BTRFS does it, but my guess would be trying to
compress the block, then storing the uncompressed version if the
compressed one is bigger.

No pseudocode as I'm not a dev and wouldn't want to give the wrong
impression, but as I believe I replied recently in another thread, based
on comments the devs have made...

With compress, btrfs does a(n intended to be fast) trial compression of
the first 128 KiB block or two and uses the result of that to decide
whether to compress the entire file.

Compress-force simply bypasses that first decision point, processing the
file as if the test always succeeded and compression was chosen.

If the decision to compress is made, the file is (evidently, again, not a
dev, but filefrag results support) compressed a 128 KiB block at a time
with the resulting size compared against the uncompressed version, with
the smaller version stored.

(Filefrag doesn't understand btrfs compression and reports individual
extents for each 128 KiB compression block, if compressed.  However, for
many files processed with compress-force, filefrag doesn't report the
expected size/128-KiB extents, but rather something lower.  If
filefrag -v is used, details of each "extent" are listed, and some show
up as multiples of 128 KiB, indicating runs of uncompressable blocks that
unlike actually compressed blocks, filefrag can and does report correctly
as single extents.  The conclusion is thus as above, that btrfs is
testing the compression result of each block, and not compressing if the
"compression" ends up being negative, that is, if the "compressed" size
is larger than the uncompressed size.)

On a side note, I really wish BTRFS would just add LZ4 support.  It's a
lot more deterministic WRT decompression time than LZO, gets a similar
compression ratio, and runs faster on most processors for both
compression and decompression.

There were patches (at least RFC level, IIRC) floating around years ago
to add lz4... I wonder what happened to them?  My impression was that a
large deployment somewhere may actually be running them as well, making
them well tested (and obviously well beyond preliminary RFC level) by
now, altho that impression could well be wrong.

Hmm, I'll have to see if I can find those and rebase them. IIRC, the argument against adding it was 'but we already have a fast compression algorithm!', which in turn says to me they didn't try to sell it on the most significant parts, namely that it's faster at decompression than LZO (even when you use the lz4hc variant, which takes longer to compress to give a (usually) better compression ratio, but decompresses just as fast as regular lz4), and the timings are a lot more deterministic (which is really important if you're doing real-time stuff).

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

Reply via email to