Re: compression disk space saving - what are your results?
if you have been waiting for a particular compressor to reach linux, chances are it already has. if you are slacking with btrfs and assuming someone will port your favorite compression profile to a btrfs mount option someday, someone has thought of that too, and that's already happened as well. Add support for LZ4-compressed kernel [LWN.net] https://lwn.net/Articles/541425/ bzip2/lzma kernel compression [LWN.net] https://lwn.net/Articles/314295/ Btrfs Picks Up Snappy Compression Support - Phoronix http://www.phoronix.com/scan.php?page=news_item&px=MTA0MjQ fusecompress - Transparent compression FUSE filesystem (0.9.x tree) - Google Project Hosting https://code.google.com/p/fusecompress/ On Mon, Dec 21, 2015 at 7:55 PM, Kai Krakow wrote: > Am Wed, 2 Dec 2015 09:49:05 -0500 > schrieb Austin S Hemmelgarn : > >> > So, 138 GB files use just 24 GB on disk - nice! >> > >> > However, I would still expect that compress=zlib has almost the same >> > effect as compress-force=zlib, for 100% text files/logs. >> > >> That's better than 80% space savings (it works out to about 83.6%), >> so I doubt that you'd manage to get anything better than that even >> with only plain text files. It's interesting that there's such a big >> discrepancy though, that indicates that BTRFS really needs some work >> WRT deciding what to compress. > > As far as I understood from reading here, btrfs fairly quickly opts out > of compressing further extents if it stumbles across the first block > with a bad compression ratio for file. > > So, what I do is compress-force=zlib for my backup drive which holds > several months of snapshots, new backups go to a scratch area which is > snapshotted after rsync finishes (important: use --no-whole-file and > --inplace). > > On my system drive I use compress=lzo and hope the heuristics work. > From time to time I use find and btrfs-defrag to selectively recompress > files (using mtime and name filters) and defrag directory nodes (which > according to docs should defrag metadata). > > A 3x TB btrfs mraid1 draid0 (1.6TB used) fits onto a 2TB backup drive > with backlog worth around 4 months (daily backups). It looks pretty > effective. Forcing zlib manages to compress file additions quite well > although I didn't measure it lately. It was far from 80% but it was not > far below 40-50%. > > I wish one could use per-subvolume compression option already. > > -- > Regards, > Kai > > Replies to list-only preferred. > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: compression disk space saving - what are your results?
Am Wed, 2 Dec 2015 09:49:05 -0500 schrieb Austin S Hemmelgarn : > > So, 138 GB files use just 24 GB on disk - nice! > > > > However, I would still expect that compress=zlib has almost the same > > effect as compress-force=zlib, for 100% text files/logs. > > > That's better than 80% space savings (it works out to about 83.6%), > so I doubt that you'd manage to get anything better than that even > with only plain text files. It's interesting that there's such a big > discrepancy though, that indicates that BTRFS really needs some work > WRT deciding what to compress. As far as I understood from reading here, btrfs fairly quickly opts out of compressing further extents if it stumbles across the first block with a bad compression ratio for file. So, what I do is compress-force=zlib for my backup drive which holds several months of snapshots, new backups go to a scratch area which is snapshotted after rsync finishes (important: use --no-whole-file and --inplace). On my system drive I use compress=lzo and hope the heuristics work. >From time to time I use find and btrfs-defrag to selectively recompress files (using mtime and name filters) and defrag directory nodes (which according to docs should defrag metadata). A 3x TB btrfs mraid1 draid0 (1.6TB used) fits onto a 2TB backup drive with backlog worth around 4 months (daily backups). It looks pretty effective. Forcing zlib manages to compress file additions quite well although I didn't measure it lately. It was far from 80% but it was not far below 40-50%. I wish one could use per-subvolume compression option already. -- Regards, Kai Replies to list-only preferred. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: compression disk space saving - what are your results?
On Sunday 06 December 2015 04:21:30 Duncan wrote: >Marc Joliet posted on Sat, 05 Dec 2015 15:11:51 +0100 as excerpted: >> I do think it's interesting that compression (even with LZO) seems to >> have offset the extra space wastage caused by autodefrag. > >I've seen (I think) you mention that twice now. Perhaps I'm missing >something... How does autodefrag trigger space wastage? > >What autodefrag does is watch for seriously fragmented files and queue >them up for later defrag by a worker thread. How would that waste space? > >Unless of course you're talking about breaking reflinks to existing >snapshots or other (possibly partial) copies of the file. That is in fact what I was referring to. >But I'd call >that wasting space due to the snapshots storing old copies, not due to >autodefrag keeping the current copy defragmented. And reflinks are >saving space by effectively storing parts of two files in the same >extent, not autodefrag wasting it, as the default on a normal filesystem >would be separate copies, so that's the zero-point base, Of course, the default on a normal file system is to not have any snapshots between which to reflink ;-) . Also, autodefrag is not a default mount option, so the default on BTRFS is to save space via reflinks, which is undone by defragmenting, hence why I see it as autodefrag triggering the waste of space. >and reflinks >save from it, with autodefrag therefore not changing things from the zero- >point base. No snapshots, no reflinks, autodefrag no longer "wastes" >space, so it's not autodefrag's wastage in the first place, it's the >other mechanisms' saving space. To my mind it is the keeping of snapshots and the breaking of reflinks via autodefrag that together cause space wastage. This is coming from the perspective that snapshots are *useful* and hence by themselves do not constitute wasted space. >From my viewpoint, anyway. I'd not ordinarily quibble over it one way or >the other if that's what you're referring to. But just in case you had >something else in mind that I'm not aware of, I'm posting the question. And the above is my viewpoint :-) . -- Marc Joliet -- "People who think they know everything really annoy those of us who know we don't" - Bjarne Stroustrup signature.asc Description: This is a digitally signed message part.
Re: compression disk space saving - what are your results?
Marc Joliet posted on Sat, 05 Dec 2015 15:11:51 +0100 as excerpted: > I do think it's interesting that compression (even with LZO) seems to > have offset the extra space wastage caused by autodefrag. I've seen (I think) you mention that twice now. Perhaps I'm missing something... How does autodefrag trigger space wastage? What autodefrag does is watch for seriously fragmented files and queue them up for later defrag by a worker thread. How would that waste space? Unless of course you're talking about breaking reflinks to existing snapshots or other (possibly partial) copies of the file. But I'd call that wasting space due to the snapshots storing old copies, not due to autodefrag keeping the current copy defragmented. And reflinks are saving space by effectively storing parts of two files in the same extent, not autodefrag wasting it, as the default on a normal filesystem would be separate copies, so that's the zero-point base, and reflinks save from it, with autodefrag therefore not changing things from the zero- point base. No snapshots, no reflinks, autodefrag no longer "wastes" space, so it's not autodefrag's wastage in the first place, it's the other mechanisms' saving space. >From my viewpoint, anyway. I'd not ordinarily quibble over it one way or the other if that's what you're referring to. But just in case you had something else in mind that I'm not aware of, I'm posting the question. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: compression disk space saving - what are your results?
> Subject: compression disk space saving - what are your results? > > What are your disk space savings when using btrfs with compression? I checked that for some folders when I moved from ext4 to btrfs. I compared du with df** just to get some numbers. I use lzo since btrfs-wiki said its better for speed. Percent_saving=(1-df/du)*100: 47% (mostly endless text files, source code etc., total amount of data is about 1TB) 2%-10% (for data which is mostly in the form of large (several hundred MB up to fewGB) binary files, total amount is about 4TB) 23% (for something in between, total amount is 0.4TB) Result indicate pretty clearly: large binary files are almost not compressed - without understanding much of it that's what I would intuitively expect (afaik lzo is dictionary based and those binary files have little for that). ** du -s on the folder I copied to the btrfs drive. df is the difference in between a df before and after the copy. Based on casual checking results were consistent with the space needed on the old ext4 drive. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: compression disk space saving - what are your results?
On Saturday 05 December 2015 14:37:05 Marc Joliet wrote: >My desktop looks like this: > >% df -h >DateisystemGröße Benutzt Verf. Verw% Eingehängt auf >/dev/sda1 108G 79G 26G 76% / >[...] > >For / I get a total of about 8G or at least 9% space saving: > ># du -hsc /mnt/rootfs/* >71G /mnt/rootfs/home >14G /mnt/rootfs/rootfs >2,3G/mnt/rootfs/var >87G insgesamt > >I write "at least" because this does not include snapshots. Just to be explicit, in case it was not clear, but I of course meant that the *du output* does not account for extra space used by snapshots. >On my laptop >the difference is merely 1 GB (83 vs. 84 GB), And here I also want to clarify that the df output was 84 GB, and the du output was 83 GB. Again, the du output does not account for snapshots, which go back farther on the laptop: 2 weeks of daily snapshots (with autodefrag!) instead of up to up to 2 days of bi-hourly snapshots. I do think it's interesting that compression (even with LZO) seems to have offset the extra space wastage caused by autodefrag. Greetings -- Marc Joliet -- "People who think they know everything really annoy those of us who know we don't" - Bjarne Stroustrup signature.asc Description: This is a digitally signed message part.
Re: compression disk space saving - what are your results?
On Wednesday 02 December 2015 18:46:30 Tomasz Chmielewski wrote: >What are your disk space savings when using btrfs with compression? > >I have a 200 GB btrfs filesystem which uses compress=zlib, only stores >text files (logs), mostly multi-gigabyte files. > > >It's a "single" filesystem, so "df" output matches "btrfs fi df": > ># df -h >Filesystem Size Used Avail Use% Mounted on >(...) >/dev/xvdb 200G 124G 76G 62% /var/log/remote > > ># du -sh /var/log/remote/ >153G/var/log/remote/ > > > From these numbers (124 GB used where data size is 153 GB), it appears >that we save around 20% with zlib compression enabled. >Is 20% reasonable saving for zlib? Typically text compresses much better >with that algorithm, although I understand that we have several >limitations when applying that on a filesystem level. > > >Tomasz Chmielewski >http://wpkg.org I have a total of three file systems that use compression, on a desktop and a laptop. / on both uses compress=lzo, and my backup drive uses compress=zlib (my RAID1 FS does not use compression). My desktop looks like this: % df -h DateisystemGröße Benutzt Verf. Verw% Eingehängt auf /dev/sda1 108G 79G 26G 76% / [...] For / I get a total of about 8G or at least 9% space saving: # du -hsc /mnt/rootfs/* 71G /mnt/rootfs/home 14G /mnt/rootfs/rootfs 2,3G/mnt/rootfs/var 87G insgesamt I write "at least" because this does not include snapshots. On my laptop the difference is merely 1 GB (83 vs. 84 GB), but it was using the autodefrag mount option until yesterday (when I migrated it to an SSD using dd), which probably accounts for a significant amount of wasted space. I'll see how it develops over the next two weeks, but I expect the ratio to become similar to my desktop (probably less, since there is also a lot of music on there). I would love to answer the question for my backup drive, but du took too long (> 1 h) so I stopped it :-( . I might try it again later, but no promises! Greetings -- Marc Joliet -- "People who think they know everything really annoy those of us who know we don't" - Bjarne Stroustrup signature.asc Description: This is a digitally signed message part.
Re: compression disk space saving - what are your results?
On 2015-12-03 01:29, Duncan wrote: Austin S Hemmelgarn posted on Wed, 02 Dec 2015 09:39:08 -0500 as excerpted: On 2015-12-02 09:03, Imran Geriskovan wrote: What are your disk space savings when using btrfs with compression? [Some] posters have reported that for mostly text, compress didn't give them expected compression results and they needed to use compress-force. "compress-force" option compresses regardless of the "compressibility" of the file. "compress" option makes some inference about the "compressibility" and decides to compress or not. I wonder how that inference is done? Can anyone provide some pseudo code for it? I'm not certain how BTRFS does it, but my guess would be trying to compress the block, then storing the uncompressed version if the compressed one is bigger. No pseudocode as I'm not a dev and wouldn't want to give the wrong impression, but as I believe I replied recently in another thread, based on comments the devs have made... With compress, btrfs does a(n intended to be fast) trial compression of the first 128 KiB block or two and uses the result of that to decide whether to compress the entire file. Compress-force simply bypasses that first decision point, processing the file as if the test always succeeded and compression was chosen. If the decision to compress is made, the file is (evidently, again, not a dev, but filefrag results support) compressed a 128 KiB block at a time with the resulting size compared against the uncompressed version, with the smaller version stored. (Filefrag doesn't understand btrfs compression and reports individual extents for each 128 KiB compression block, if compressed. However, for many files processed with compress-force, filefrag doesn't report the expected size/128-KiB extents, but rather something lower. If filefrag -v is used, details of each "extent" are listed, and some show up as multiples of 128 KiB, indicating runs of uncompressable blocks that unlike actually compressed blocks, filefrag can and does report correctly as single extents. The conclusion is thus as above, that btrfs is testing the compression result of each block, and not compressing if the "compression" ends up being negative, that is, if the "compressed" size is larger than the uncompressed size.) On a side note, I really wish BTRFS would just add LZ4 support. It's a lot more deterministic WRT decompression time than LZO, gets a similar compression ratio, and runs faster on most processors for both compression and decompression. There were patches (at least RFC level, IIRC) floating around years ago to add lz4... I wonder what happened to them? My impression was that a large deployment somewhere may actually be running them as well, making them well tested (and obviously well beyond preliminary RFC level) by now, altho that impression could well be wrong. Hmm, I'll have to see if I can find those and rebase them. IIRC, the argument against adding it was 'but we already have a fast compression algorithm!', which in turn says to me they didn't try to sell it on the most significant parts, namely that it's faster at decompression than LZO (even when you use the lz4hc variant, which takes longer to compress to give a (usually) better compression ratio, but decompresses just as fast as regular lz4), and the timings are a lot more deterministic (which is really important if you're doing real-time stuff). smime.p7s Description: S/MIME Cryptographic Signature
Re: compression disk space saving - what are your results?
On 2015-12-03 07:09, Imran Geriskovan wrote: On a side note, I really wish BTRFS would just add LZ4 support. It's a lot more deterministic WRT decompression time than LZO, gets a similar compression ratio, and runs faster on most processors for both compression and decompression. Relative ratios according to http://catchchallenger.first-world.info//wiki/Quick_Benchmark:_Gzip_vs_Bzip2_vs_LZMA_vs_XZ_vs_LZ4_vs_LZO Compressed size gzip (1) - lzo (1.4) - lz4 (1.4) Compression Time gzip (5) - lzo (1) - lz4 (0.8) Decompression Time gzip (9) - lzo (4) - lz4 (1) Compression Memory gzip (1) - lzo (2) - lz4 (20) Decompression Memory gzip (1) - lzo (2) - lz4 (130). Yes 130! not a typo. But there is a note: Note: lz4 it's the program using this size, the code for internal lz4 use very less memory. However, I could not find any better apples to apples comparison. If lz4's real memory consumption is in orders of lzo, than it looks good. AFAICT, it's similar memory consumption. I did some tests a while back comparing the options for kernel image compression using a VM, and one of the things I tested (although I can't for the life of me remember how exactly except that it involved using QEMU hooked up to GDB) was run-time decompressor footprint. LZO really should have a smaller memory footprint too, it's just that lzop needs to handle almost a dozen different LZO compression formats. smime.p7s Description: S/MIME Cryptographic Signature
Re: compression disk space saving - what are your results?
>> On a side note, I really wish BTRFS would just add LZ4 support. It's a >> lot more deterministic WRT decompression time than LZO, gets a similar >> compression ratio, and runs faster on most processors for both >> compression and decompression. Relative ratios according to http://catchchallenger.first-world.info//wiki/Quick_Benchmark:_Gzip_vs_Bzip2_vs_LZMA_vs_XZ_vs_LZ4_vs_LZO Compressed size gzip (1) - lzo (1.4) - lz4 (1.4) Compression Time gzip (5) - lzo (1) - lz4 (0.8) Decompression Time gzip (9) - lzo (4) - lz4 (1) Compression Memory gzip (1) - lzo (2) - lz4 (20) Decompression Memory gzip (1) - lzo (2) - lz4 (130). Yes 130! not a typo. But there is a note: Note: lz4 it's the program using this size, the code for internal lz4 use very less memory. However, I could not find any better apples to apples comparison. If lz4's real memory consumption is in orders of lzo, than it looks good. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: compression disk space saving - what are your results?
Austin S Hemmelgarn posted on Wed, 02 Dec 2015 09:39:08 -0500 as excerpted: > On 2015-12-02 09:03, Imran Geriskovan wrote: What are your disk space savings when using btrfs with compression? >> >>> [Some] posters have reported that for mostly text, compress didn't >>> give them expected compression results and they needed to use >>> compress-force. >> >> "compress-force" option compresses regardless of the "compressibility" >> of the file. >> >> "compress" option makes some inference about the "compressibility" and >> decides to compress or not. >> >> I wonder how that inference is done? >> Can anyone provide some pseudo code for it? > I'm not certain how BTRFS does it, but my guess would be trying to > compress the block, then storing the uncompressed version if the > compressed one is bigger. No pseudocode as I'm not a dev and wouldn't want to give the wrong impression, but as I believe I replied recently in another thread, based on comments the devs have made... With compress, btrfs does a(n intended to be fast) trial compression of the first 128 KiB block or two and uses the result of that to decide whether to compress the entire file. Compress-force simply bypasses that first decision point, processing the file as if the test always succeeded and compression was chosen. If the decision to compress is made, the file is (evidently, again, not a dev, but filefrag results support) compressed a 128 KiB block at a time with the resulting size compared against the uncompressed version, with the smaller version stored. (Filefrag doesn't understand btrfs compression and reports individual extents for each 128 KiB compression block, if compressed. However, for many files processed with compress-force, filefrag doesn't report the expected size/128-KiB extents, but rather something lower. If filefrag -v is used, details of each "extent" are listed, and some show up as multiples of 128 KiB, indicating runs of uncompressable blocks that unlike actually compressed blocks, filefrag can and does report correctly as single extents. The conclusion is thus as above, that btrfs is testing the compression result of each block, and not compressing if the "compression" ends up being negative, that is, if the "compressed" size is larger than the uncompressed size.) > On a side note, I really wish BTRFS would just add LZ4 support. It's a > lot more deterministic WRT decompression time than LZO, gets a similar > compression ratio, and runs faster on most processors for both > compression and decompression. There were patches (at least RFC level, IIRC) floating around years ago to add lz4... I wonder what happened to them? My impression was that a large deployment somewhere may actually be running them as well, making them well tested (and obviously well beyond preliminary RFC level) by now, altho that impression could well be wrong. -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: compression disk space saving - what are your results?
On 2015-12-02 08:53, Tomasz Chmielewski wrote: On 2015-12-02 22:03, Austin S Hemmelgarn wrote: From these numbers (124 GB used where data size is 153 GB), it appears that we save around 20% with zlib compression enabled. Is 20% reasonable saving for zlib? Typically text compresses much better with that algorithm, although I understand that we have several limitations when applying that on a filesystem level. This is actually an excellent question. A couple of things to note before I share what I've seen: 1. Text compresses better with any compression algorithm. It is by nature highly patterned and moderately redundant data, which is what benefits the most from compression. It looks that compress=zlib does not compress very well. Following Duncan's suggestion, I've changed it to compress-force=zlib, and re-copied the data to make sure the file are compressed. For future reference, if you run 'btrfs filesystem defrag -r -czlib' on the top level directory, you can achieve the same effect without having to deal with the copy overhead. This has a side effect of breaking reflinks, but copying the files off and back onto the filesystem does so also, and even then, I doubt that you're using reflinks. There probably wouldn't be much difference in the time it takes, but at least you wouldn't be hitting another disk in the process. Compression ratio is much much better now (on a slightly changed data set): # df -h /dev/xvdb 200G 24G 176G 12% /var/log/remote # du -sh /var/log/remote/ 138G/var/log/remote/ So, 138 GB files use just 24 GB on disk - nice! However, I would still expect that compress=zlib has almost the same effect as compress-force=zlib, for 100% text files/logs. That's better than 80% space savings (it works out to about 83.6%), so I doubt that you'd manage to get anything better than that even with only plain text files. It's interesting that there's such a big discrepancy though, that indicates that BTRFS really needs some work WRT deciding what to compress. smime.p7s Description: S/MIME Cryptographic Signature
Re: compression disk space saving - what are your results?
On 2015-12-02 09:03, Imran Geriskovan wrote: What are your disk space savings when using btrfs with compression? * There's the compress vs. compress-force option and discussion. A number of posters have reported that for mostly text, compress didn't give them expected compression results and they needed to use compress- force. "compress-force" option compresses regardless of the "compressibility" of the file. "compress" option makes some inference about the "compressibility" and decides to compress or not. I wonder how that inference is done? Can anyone provide some pseudo code for it? I'm not certain how BTRFS does it, but my guess would be trying to compress the block, then storing the uncompressed version if the compressed one is bigger. The program lrzip has an option to do per-block compression checks kind of like this, but it's method is to try LZO compression on the block (which is fast), and only use the selected compression method (bzip2 by default I think, but it can also do zlib and xz) if the LZO compression ratio is is good enough. If we went with a similar method, I'd say we should integrate LZ4 support first, and use that for the test. I think NTFS compression on Windows might do something similar, but they use an old LZ77 derivative for their compression (I think it's referred to as LZNT1, and it's designed for speed, and usually doesn't get much better than a 30% compression ratio). On a side note, I really wish BTRFS would just add LZ4 support. It's a lot more deterministic WRT decompression time than LZO, gets a similar compression ratio, and runs faster on most processors for both compression and decompression. smime.p7s Description: S/MIME Cryptographic Signature
Re: compression disk space saving - what are your results?
On Wed, Dec 2, 2015 at 9:53 PM, Tomasz Chmielewski wrote: > On 2015-12-02 22:03, Austin S Hemmelgarn wrote: > >>> From these numbers (124 GB used where data size is 153 GB), it appears >>> that we save around 20% with zlib compression enabled. >>> Is 20% reasonable saving for zlib? Typically text compresses much better >>> with that algorithm, although I understand that we have several >>> limitations when applying that on a filesystem level. >> >> >> This is actually an excellent question. A couple of things to note >> before I share what I've seen: >> 1. Text compresses better with any compression algorithm. It is by >> nature highly patterned and moderately redundant data, which is what >> benefits the most from compression. > > > It looks that compress=zlib does not compress very well. Following Duncan's > suggestion, I've changed it to compress-force=zlib, and re-copied the data > to make sure the file are compressed. > > Compression ratio is much much better now (on a slightly changed data set): > > # df -h > /dev/xvdb 200G 24G 176G 12% /var/log/remote > > > # du -sh /var/log/remote/ > 138G/var/log/remote/ > > > So, 138 GB files use just 24 GB on disk - nice! > > However, I would still expect that compress=zlib has almost the same effect > as compress-force=zlib, for 100% text files/logs. btw, what is your kernel version? there was a bug that detected inode compression ration wrong. http://git.kernel.org/cgit/linux/kernel/git/mason/linux-btrfs.git/commit/?id=68bb462d42a963169bf7acbe106aae08c17129a5 http://git.kernel.org/cgit/linux/kernel/git/mason/linux-btrfs.git/commit/?id=4bcbb33255131adbe481c0467df26d654ce3bc78 Regards, Shilong > > > Tomasz Chmielewski > http://wpkg.org > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: compression disk space saving - what are your results?
On 2015-12-02 23:03, Wang Shilong wrote: Compression ratio is much much better now (on a slightly changed data set): # df -h /dev/xvdb 200G 24G 176G 12% /var/log/remote # du -sh /var/log/remote/ 138G/var/log/remote/ So, 138 GB files use just 24 GB on disk - nice! However, I would still expect that compress=zlib has almost the same effect as compress-force=zlib, for 100% text files/logs. btw, what is your kernel version? there was a bug that detected inode compression ration wrong. http://git.kernel.org/cgit/linux/kernel/git/mason/linux-btrfs.git/commit/?id=68bb462d42a963169bf7acbe106aae08c17129a5 http://git.kernel.org/cgit/linux/kernel/git/mason/linux-btrfs.git/commit/?id=4bcbb33255131adbe481c0467df26d654ce3bc78 Linux 4.3.0. Tomasz Chmielewski http://wpkg.org/ -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: compression disk space saving - what are your results?
>> What are your disk space savings when using btrfs with compression? > * There's the compress vs. compress-force option and discussion. A > number of posters have reported that for mostly text, compress didn't > give them expected compression results and they needed to use compress- > force. "compress-force" option compresses regardless of the "compressibility" of the file. "compress" option makes some inference about the "compressibility" and decides to compress or not. I wonder how that inference is done? Can anyone provide some pseudo code for it? Regards, Imran -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: compression disk space saving - what are your results?
On 2015-12-02 22:03, Austin S Hemmelgarn wrote: From these numbers (124 GB used where data size is 153 GB), it appears that we save around 20% with zlib compression enabled. Is 20% reasonable saving for zlib? Typically text compresses much better with that algorithm, although I understand that we have several limitations when applying that on a filesystem level. This is actually an excellent question. A couple of things to note before I share what I've seen: 1. Text compresses better with any compression algorithm. It is by nature highly patterned and moderately redundant data, which is what benefits the most from compression. It looks that compress=zlib does not compress very well. Following Duncan's suggestion, I've changed it to compress-force=zlib, and re-copied the data to make sure the file are compressed. Compression ratio is much much better now (on a slightly changed data set): # df -h /dev/xvdb 200G 24G 176G 12% /var/log/remote # du -sh /var/log/remote/ 138G/var/log/remote/ So, 138 GB files use just 24 GB on disk - nice! However, I would still expect that compress=zlib has almost the same effect as compress-force=zlib, for 100% text files/logs. Tomasz Chmielewski http://wpkg.org -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: compression disk space saving - what are your results?
On 2015-12-02 04:46, Tomasz Chmielewski wrote: What are your disk space savings when using btrfs with compression? I have a 200 GB btrfs filesystem which uses compress=zlib, only stores text files (logs), mostly multi-gigabyte files. It's a "single" filesystem, so "df" output matches "btrfs fi df": # df -h Filesystem Size Used Avail Use% Mounted on (...) /dev/xvdb 200G 124G 76G 62% /var/log/remote # du -sh /var/log/remote/ 153G/var/log/remote/ From these numbers (124 GB used where data size is 153 GB), it appears that we save around 20% with zlib compression enabled. Is 20% reasonable saving for zlib? Typically text compresses much better with that algorithm, although I understand that we have several limitations when applying that on a filesystem level. This is actually an excellent question. A couple of things to note before I share what I've seen: 1. Text compresses better with any compression algorithm. It is by nature highly patterned and moderately redundant data, which is what benefits the most from compression. 2. When BTRFS does in-line compression, it uses 128k blocks. Because of this, there are diminishing returns for smaller files when using compression. 3. The best compression ratio I've ever seen from zlib on real data is about 65-70%, and that was using SquashFS, which is designed to take up as little room as possible. 4. LZO gets a worse compression ratio than zlib (around 40-50% if you're lucky), but is a _lot_ faster. 5. By playing around with the -c option for defrag, you can compress or uncompress different parts of the filesystem, and get a rough idea of what compresses best. Now, to my results. These are all from my desktop system, with no deduplication, and the data for zlib is somewhat outdated (I've not used it since LZO support stabilized). For the filesystems I have on traditional hard disks: 1. For /home (mostly text files, some SQLite databases, and a couple of git repositories), I get about 15-20% space savings with zlib, and about a 2-4$ performance hit. I get about 5-10% space savings with lzo, but performance is about 5-8% better than uncompressed. 2. For /usr/src (50/50 mix of text and executable code), I get about 25% space savings with zlib with a 5-7% hit to performance, and about 10% with lzo with a 7% boost in performance relative to uncompressed. 3. For /usr/portage and /var/lib/layman (lots of small text files, a number of VCS repos, and about 2000 compressed source archives), I get about 25% space savings with zlib, with a 15% performance hit (yes, seriously 15%), and with lzo I get about 25% space savings with no measurable performance difference relative to uncompressed. For the filesystems I have on SSD's: 1. For /var/tmp (huge assortment of different things, but usually similar to /usr/src because this is where packages get built), I get almost no space savings with either type of compression, and see a performance reduction of about 5% for both. 2. For /var/log (Lots of text files (notably, I don't compress rotated logs, and I don't have systemd's insane binary log files), I get about 30% space savings with zlib, but it makes the _whole_ system run about 5% slower, and I get about 20% space savings with lzo, with no measurable performance difference relative to uncompressed. 3. For /var/spool (Lots of really short text files, mostly stuff from postfix and CUPS), I actually see higher disk usage with both types of compression, but almost zero performance impact from either of them. 4. For /boot (a couple of big binary files that already have built-in compression), I see no net space savings, and don't have any numbers regarding performance impact. 5. For / (everything that isn't on one of the other filesystems I listed above), I see about 10-20% space savings from zlib, with a roughly 5% performance hit, and about 5-15% space savings with lzo, with no measurable performance difference. smime.p7s Description: S/MIME Cryptographic Signature
Re: compression disk space saving - what are your results?
Tomasz Chmielewski posted on Wed, 02 Dec 2015 18:46:30 +0900 as excerpted: > What are your disk space savings when using btrfs with compression? > > I have a 200 GB btrfs filesystem which uses compress=zlib, only stores > text files (logs), mostly multi-gigabyte files. > > > It's a "single" filesystem, so "df" output matches "btrfs fi df": > > # df -h Filesystem Size Used Avail Use% Mounted on (...) > /dev/xvdb 200G 124G 76G 62% /var/log/remote > > > # du -sh /var/log/remote/ > 153G/var/log/remote/ > > > From these numbers (124 GB used where data size is 153 GB), it appears > that we save around 20% with zlib compression enabled. > Is 20% reasonable saving for zlib? Typically text compresses much better > with that algorithm, although I understand that we have several > limitations when applying that on a filesystem level. Here, just using compress=lzo, no compress-force and lzo not zlib, I'm mostly just happy to see lower usage than I was getting on reiserfs. Between that and no longer needing to worry whether copying a sparse file is going to end up sparse or not, because even if not the compression should effectively collapse the sparse areas, I've been happy /enough/ with it. There's at least three additional factors to consider, for your case. * There is of course metadata to consider as well as data, and on single-device btrfs, metadata normally defaults to dup, 2X the space. You did say single, but didn't specify whether that was for metadata also (and for that matter, didn't specify whether it was a single-device filesystem or not, tho I assume it is). And of course btrfs does checksumming that other filesystems don't do, and even puts small files in metadata too, all of which will be dup by default, taking even more space. A btrfs fi df will of course give you separate data/metadata/system values, and you can take the data used value and compare that against the du -sh value to get a more accurate read on how well your compression really is working. (Tho as noted, small files, a few KiB max, are often stored in the metadata, so if you have lots of those, you'd probably need to adjust for that, but you mentioned mostly GiB-scale files, so...) * There's the compress vs. compress-force option and discussion. A number of posters have reported that for mostly text, compress didn't give them expected compression results and they needed to use compress- force. Of course, changing the option now won't change how existing files are stored. You'd have to either rewrite them, or wait for log rotation to rotate out the old files, to see the full effect. Also see the btrfs fi defrag -c option. * Talking about defrag, it's not snapshot aware, which brings up the question of whether you're using btrfs snapshots on this filesystem and the effect that would have if you do. I'll presume not, as that would seem to be important enough to mention in a discussion of this sort, if you were, and also because that allows me to simply handwave further discussion of this point away. =:^) -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
compression disk space saving - what are your results?
What are your disk space savings when using btrfs with compression? I have a 200 GB btrfs filesystem which uses compress=zlib, only stores text files (logs), mostly multi-gigabyte files. It's a "single" filesystem, so "df" output matches "btrfs fi df": # df -h Filesystem Size Used Avail Use% Mounted on (...) /dev/xvdb 200G 124G 76G 62% /var/log/remote # du -sh /var/log/remote/ 153G/var/log/remote/ From these numbers (124 GB used where data size is 153 GB), it appears that we save around 20% with zlib compression enabled. Is 20% reasonable saving for zlib? Typically text compresses much better with that algorithm, although I understand that we have several limitations when applying that on a filesystem level. Tomasz Chmielewski http://wpkg.org -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html