Fedora 34 Change: Enable btrfs transparent zstd compression by default (System-Wide Change proposal)
https://fedoraproject.org/wiki/Changes/BtrfsTransparentCompression == Summary == On variants using btrfs as the default filesystem, enable transparent compression using zstd. Compression saves space and can significantly increase the lifespan of flash-based media by reducing write amplification. It can also increase read and write performance. == Owners == * Name: [[User:salimma|Michel Salim]], [[User:dcavalca|Davide Cavalca]], [[User:josef|Josef Bacik]] * Email: mic...@michel-slm.name, dcava...@fb.com, jo...@toxicpanda.com == Detailed description == Transparent compression is a btrfs feature that allows a btrfs filesystem to apply compression on a per-file basis. Of the three supported algorithms, zstd is the one with the best compression speed and ratio. Enabling compression saves space, but it also reduces write amplification, which is important for SSDs. Depending on the workload and the hardware, compression can also result in an increase in read and write performance. See https://pagure.io/fedora-btrfs/project/issue/5 for details. This was originally scoped as an optimization for https://fedoraproject.org/wiki/Changes/BtrfsByDefault during Fedora 33. == Benefit to Fedora == Better disk space usage, reduction of write amplification, which in turn helps increase lifespan and performance on SSDs and other flash-based media. It can also increase read and write performance. == Scope == * Proposal owners: ** Update anaconda to perform the installation using mount -o compress=zstd:1 ** Set the proper option in fstab (alternatively: set the XATTR) ** Update disk image build tools to enable compression: *** lorax *** appliance-tools *** osbuild *** imagefactory ** [optional] Add support for [https://github.com/kdave/btrfs-progs/issues/328 setting compression level when defragmenting] ** [optional] Add support for [https://github.com/kdave/btrfs-progs/issues/329 setting compression level using `btrfs property`] * Other developers: ** anaconda: review PRs as needed * Release engineering: https://pagure.io/releng/issue/9920 * Policies and guidelines: N/A * Trademark approval: N/A == Upgrade/compatibility impact == This Change only applies to newly installed systems. Existing systems on upgrade will be unaffected, but can be converted manually with btrfs filesystem defrag -czstd -r, updating `/etc/fstab` and remounting. == How to test == Existing systems can be converted to use compression manually with btrfs filesystem defrag -czstd -r, updating `/etc/fstab` and remounting. == User experience == Compression will result in file sizes (e.g. as reported by du) not matching the actual space occupied on disk. The [https://src.fedoraproject.org/rpms/compsize compsize] utility can be used to examine the compression type, effective compression ration and actual size. == Dependencies == Anaconda will need to be updated to perform the installation using mount -o compress=zstd:1 == Contingency plan == * Contingency mechanism: will not include PR patches if not merged upstream and will not enable * Contingency deadline: Final freeze * Blocks release? No * Blocks product? No == Documentation == https://btrfs.wiki.kernel.org/index.php/Compression == Release Notes == Transparent compression of the filesystem using zstd is now enabled by default. Use the compsize utility to find out the actual size on disk of a given file. -- Ben Cotton He / Him / His Senior Program Manager, Fedora & CentOS Stream Red Hat TZ=America/Indiana/Indianapolis ___ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-le...@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Re: Fedora 34 Change: Enable btrfs transparent zstd compression by default (System-Wide Change proposal)
On Wed, 30 Dec 2020 at 14:53, Ben Cotton wrote: > > https://fedoraproject.org/wiki/Changes/BtrfsTransparentCompression > > == Summary == > > On variants using btrfs as the default filesystem, enable transparent > compression using zstd. Compression saves space and can significantly > increase the lifespan of flash-based media by reducing write > amplification. It can also increase read and write performance. > > == Owners == > > * Name: [[User:salimma|Michel Salim]], [[User:dcavalca|Davide > Cavalca]], [[User:josef|Josef Bacik]] > * Email: mic...@michel-slm.name, dcava...@fb.com, jo...@toxicpanda.com > > > == Detailed description == > > Transparent compression is a btrfs feature that allows a btrfs > filesystem to apply compression on a per-file basis. Of the three > supported algorithms, zstd is the one with the best compression speed > and ratio. Enabling compression saves space, but it also reduces write > amplification, which is important for SSDs. Depending on the workload > and the hardware, compression can also result in an increase in read > and write performance. > > See https://pagure.io/fedora-btrfs/project/issue/5 for details. This > was originally scoped as an optimization for > https://fedoraproject.org/wiki/Changes/BtrfsByDefault during Fedora > 33. > > > == Benefit to Fedora == > > Better disk space usage, reduction of write amplification, which in > turn helps increase lifespan and performance on SSDs and other > flash-based media. It can also increase read and write performance. > > == Scope == > > * Proposal owners: > ** Update anaconda to perform the installation using mount -o > compress=zstd:1 > ** Set the proper option in fstab (alternatively: set the XATTR) > ** Update disk image build tools to enable compression: > *** lorax > *** appliance-tools > *** osbuild > *** imagefactory > ** [optional] Add support for > [https://github.com/kdave/btrfs-progs/issues/328 setting compression > level when defragmenting] > ** [optional] Add support for > [https://github.com/kdave/btrfs-progs/issues/329 setting compression > level using `btrfs property`] > * Other developers: > ** anaconda: review PRs as needed > * Release engineering: https://pagure.io/releng/issue/9920 > * Policies and guidelines: N/A > * Trademark approval: N/A > > == Upgrade/compatibility impact == > > This Change only applies to newly installed systems. Existing systems > on upgrade will be unaffected, but can be converted manually with > btrfs filesystem defrag -czstd -r, updating `/etc/fstab` > and remounting. > > == How to test == > > Existing systems can be converted to use compression manually with > btrfs filesystem defrag -czstd -r, updating `/etc/fstab` Update `/etc/fstab` how? Please be more explicit. > and remounting. > > == User experience == > > Compression will result in file sizes (e.g. as reported by du) not > matching the actual space occupied on disk. The > [https://src.fedoraproject.org/rpms/compsize compsize] utility can be > used to examine the compression type, effective compression ration and > actual size. > > == Dependencies == > > Anaconda will need to be updated to perform the installation using > mount -o compress=zstd:1 > > == Contingency plan == > > * Contingency mechanism: will not include PR patches if not merged > upstream and will not enable > * Contingency deadline: Final freeze > * Blocks release? No > * Blocks product? No > > == Documentation == > > https://btrfs.wiki.kernel.org/index.php/Compression > > == Release Notes == > > Transparent compression of the filesystem using zstd is now enabled by > default. Use the compsize utility to find out the actual size on disk > of a given file. > > > -- > Ben Cotton > He / Him / His > Senior Program Manager, Fedora & CentOS Stream > Red Hat > TZ=America/Indiana/Indianapolis > ___ > devel mailing list -- devel@lists.fedoraproject.org > To unsubscribe send an email to devel-le...@lists.fedoraproject.org > Fedora Code of Conduct: > https://docs.fedoraproject.org/en-US/project/code-of-conduct/ > List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines > List Archives: > https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org -- Elliott ___ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-le...@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Re: Fedora 34 Change: Enable btrfs transparent zstd compression by default (System-Wide Change proposal)
On Wed, 2020-12-30 at 16:28 -0500, Elliott Sales de Andrade wrote: > On Wed, 30 Dec 2020 at 14:53, Ben Cotton wrote: > > > > https://fedoraproject.org/wiki/Changes/BtrfsTransparentCompression > > > > == How to test == > > > > Existing systems can be converted to use compression manually with > > btrfs filesystem defrag -czstd -r, updating > > `/etc/fstab` > > Update `/etc/fstab` how? Please be more explicit. > Good point, thanks. Adding it now. > -- Michel Alexandre Salim profile: https://keyoxide.org/mic...@michel-slm.name chat via email: https://delta.chat/ GPG key: 5DCE 2E7E 9C3B 1CFF D335 C1D7 8B22 9D2F 7CCC 04F2 signature.asc Description: This is a digitally signed message part ___ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-le...@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Re: Fedora 34 Change: Enable btrfs transparent zstd compression by default (System-Wide Change proposal)
On 2020-12-30 1:48 p.m., Michel Alexandre Salim wrote: On Wed, 2020-12-30 at 16:28 -0500, Elliott Sales de Andrade wrote: On Wed, 30 Dec 2020 at 14:53, Ben Cotton wrote: https://fedoraproject.org/wiki/Changes/BtrfsTransparentCompression == How to test == Existing systems can be converted to use compression manually with btrfs filesystem defrag -czstd -r, updating `/etc/fstab` Update `/etc/fstab` how? Please be more explicit. Good point, thanks. Adding it now. Additionally, make sure to apply "systemctl daemon-reload" after editing /etc/fstab otherwise some services will fail to boot on existing installed system. -- Luya Tshimbalanga Fedora Design Team Fedora Design Suite maintainer ___ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-le...@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Re: Fedora 34 Change: Enable btrfs transparent zstd compression by default (System-Wide Change proposal)
On Wed, Dec 30, 2020 at 12:53 PM Ben Cotton wrote: > > ** Update anaconda to perform the installation using mount -o > compress=zstd:1 > ** Set the proper option in fstab (alternatively: set the XATTR) I think the most discoverable is using 'compress=zstd:1" as the mount option, and any one who wants to opt out would remove this. Upon removal, the system will become not compressed basically by attrition, as files are replaced. The mount option method is per file system. Since we have 'subvol' mount options to mount '/' and '/home' it seems plausible that compression is a per subvolume option, but it's not (see below). It's file system wide. The per subvolume, per directory, per file method of compression has some pretty esoteric nuances: - chattr +c method uses the default compression, currently this is zlib - btrfs property method can't be unset https://github.com/kdave/btrfs-progs/issues/308 - btrfs property compression 'none' is not the same as unsetting it, and it inherits just like any other xattr; none means mount option "compress" does not apply; mount option "compress-force" will compress files set with compression 'none'. - btrfs property method isn't recursive https://github.com/kdave/btrfs-progs/issues/278 - both methods stop at subvolume boundaries; i.e. if you set compression on a subvolume or directory, it inherits as you add new directories or files, but stops at a subvolume - compression flags survive through btrfs send/receive - this is particularly confusing because it can make it a bit difficult to have a copy without compression, and not immediately obvious that it's continuing to tag along This might best be turned into a flowchart :P > This Change only applies to newly installed systems. Existing systems > on upgrade will be unaffected, but can be converted manually with > btrfs filesystem defrag -czstd -r, updating `/etc/fstab` > and remounting. Note that defragmenting to compress is an option. You can just add the mount option to fstab and remount, but only new files will be compressed, but again by attrition, eventually most of the file system will end up compressed. -- Chris Murphy ___ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-le...@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Re: Fedora 34 Change: Enable btrfs transparent zstd compression by default (System-Wide Change proposal)
On Wed, 30 Dec 2020 at 19:53, Ben Cotton wrote: > ** Update anaconda to perform the installation using mount -o > compress=zstd:1 > Any reason behind compression level of 1 rather than the default of 3? ___ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-le...@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Re: Fedora 34 Change: Enable btrfs transparent zstd compression by default (System-Wide Change proposal)
It's faster. Here is some benchmark with different zstd compression ratios https://lkml.org/lkml/2019/1/28/1930. Could be outdated a little bit though. But for HDD it makes sense to increase it probably. And IIRC Chris wrote about such plans. ___ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-le...@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Re: Fedora 34 Change: Enable btrfs transparent zstd compression by default (System-Wide Change proposal)
A few more things: * btrfs-progs tools don't yet have a way to report compression information. While 'df' continues to report correctly about actual blocks used and free, both regular 'du' (coreutils) and 'btrfs filesystem du' will report uncompressed values. * 'compsize' will report compression information and is in Fedora repo for a while. But it requires privilege. * 'filefrag' misreports fragmentation, it always over reports fragments. -- Chris Murphy ___ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-le...@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Re: Fedora 34 Change: Enable btrfs transparent zstd compression by default (System-Wide Change proposal)
On Fri, Jan 1, 2021 at 11:31 AM Artem Tim wrote: > > It's faster. Here is some benchmark with different zstd compression ratios > https://lkml.org/lkml/2019/1/28/1930. Could be outdated a little bit though. > > But for HDD it makes sense to increase it probably. And IIRC Chris wrote > about such plans. There are ideas but it's difficult because the kernel doesn't expose the information we really need to make an automatic determination. sysfs commonly misreports rotational devices as being non-rotational and vice versa. Since this is based on the device self-reporting, it's not great. I use zstd:1 for SSD/NVMe. And zstd:3 (which is the same as not specifying a level) for HDD/USB sticks/eMMC/SD Card. For the more archive style of backup, I use zstd:7. But these can all be mixed and matched, Btrfs doesn't care. You can even mix and match algorithms. Anyway, compress=zstd:1 is a good default. Everyone benefits, and I'm not even sure someone with a very fast NVMe drive will notice a slow down because the compression/decompression is threaded. I expect if we get the "fast" levels (the negative value levels) new to zstd in the kernel, that Btrfs will likely remap its level 1 to one of the negative levels, and keep level 3 set to zstd 3 (the default). So we might actually see it get even faster at the cost of some compression ratio. Given this possibility, I think level 1 is the best choice as a default for Fedora. -- Chris Murphy ___ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-le...@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Re: Fedora 34 Change: Enable btrfs transparent zstd compression by default (System-Wide Change proposal)
On Fri, Jan 1, 2021 at 7:59 PM Chris Murphy wrote: >Given this possibility, I think level 1 is the best > choice as a default for Fedora. ^ for the fstab mount option way of doing this for the entire file system. If one day there's 'btrfs property' support for levels, it's easy to imagine doing something like zstd:5 for /usr and /var/lib/flatpak because the limiting factor is not write performance but download bandwidth. Since there's effectively a wait for the download (slow no matter what from the cpu perspective) why not compress more? -- Chris Murphy ___ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-le...@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
Re: Fedora 34 Change: Enable btrfs transparent zstd compression by default (System-Wide Change proposal)
Hi, On 1/1/21 8:59 PM, Chris Murphy wrote: On Fri, Jan 1, 2021 at 11:31 AM Artem Tim wrote: It's faster. Here is some benchmark with different zstd compression ratios https://lkml.org/lkml/2019/1/28/1930. Could be outdated a little bit though. But for HDD it makes sense to increase it probably. And IIRC Chris wrote about such plans. There are ideas but it's difficult because the kernel doesn't expose the information we really need to make an automatic determination. sysfs commonly misreports rotational devices as being non-rotational and vice versa. Since this is based on the device self-reporting, it's not great. I use zstd:1 for SSD/NVMe. And zstd:3 (which is the same as not specifying a level) for HDD/USB sticks/eMMC/SD Card. For the more archive style of backup, I use zstd:7. But these can all be mixed and matched, Btrfs doesn't care. You can even mix and match algorithms. Anyway, compress=zstd:1 is a good default. Everyone benefits, and I'm not even sure someone with a very fast NVMe drive will notice a slow down because the compression/decompression is threaded. I disagree that everyone benefits. Any read latency sensitive workload will be slower due to the application latency being both the drive latency plus the decompression latency. And as the kernel benchmarks indicate very few systems are going to get anywhere near the performance of even baseline NVMe drives when its comes to throughput. With PCIe Gen4 controllers the burst speeds are even higher (>3GB/sec read & write). Worse, if the workload is very parallel, and at max CPU already the compression overhead will only make that situation worse as well. (I suspect you could test this just by building some packages that have good parallelism during the build). So, your penalizing a large majority of machines built in the past couple years. Plus, the write amplification comment isn't even universal as there continue to be controllers where the flash translation layer is compressing the data. OTOH, it makes a lot more sense on a lot of these arm/sbc boards utilizing MMC because the disks are so slow. Maybe if something like this were made the default the machine should run a quick CPU compress/decompress vs IO speed test and only enable compression if the compress/decompress speed is at least the IO rate. I expect if we get the "fast" levels (the negative value levels) new to zstd in the kernel, that Btrfs will likely remap its level 1 to one of the negative levels, and keep level 3 set to zstd 3 (the default). So we might actually see it get even faster at the cost of some compression ratio. Given this possibility, I think level 1 is the best choice as a default for Fedora. ___ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-le...@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
Re: Fedora 34 Change: Enable btrfs transparent zstd compression by default (System-Wide Change proposal)
On Thu, Feb 11, 2021 at 9:58 AM Jeremy Linton wrote: > > Hi, > > On 1/1/21 8:59 PM, Chris Murphy wrote: > > Anyway, compress=zstd:1 is a good default. Everyone benefits, and I'm > > not even sure someone with a very fast NVMe drive will notice a slow > > down because the compression/decompression is threaded. > > I disagree that everyone benefits. Any read latency sensitive workload > will be slower due to the application latency being both the drive > latency plus the decompression latency. And as the kernel benchmarks > indicate very few systems are going to get anywhere near the performance > of even baseline NVMe drives when its comes to throughput. It's possible some workloads on NVMe might have faster reads or writes without compression. https://github.com/facebook/zstd btrfs compress=zstd:1 translates into zstd -1 right now; there are some ideas to remap btrfs zstd:1 to one of the newer zstd --fast options, but it's just an idea. And in any case the default for btrfs and zstd will remain as 3 and -3 respectively, which is what 'compress=zstd' maps to, making it identical to 'compress=zstd:3'. I have a laptop with NVMe and haven't come across such a workload so far, but this is obviously not a scientific sample. I think you'd need a process that's producing read/write rates that the storage can meet, but that the compression algorithm limits. Btrfs is threaded, as is the compression. What's typical, is no change in performance and sometimes a small small increase in performance. It essentially trades some CPU cycles in exchange for less IO. That includes less time reading and writing, but also less latency, meaning the gain on rotational media is greater. >Worse, if the workload is very parallel, and at max CPU already > the compression overhead will only make that situation worse as well. (I > suspect you could test this just by building some packages that have > good parallelism during the build). This is compiling the kernel on a 4/8-core CPU (circa 2011) using make -j8, the kernel running is 5.11-rc7. no compression real55m32.769s user369m32.823s sys 35m59.948s -- compress=zstd:1 real53m44.543s user368m17.614s sys 36m2.505s That's a one time test, and it's a ~3% improvement. *shrug* We don't really care too much these days about 1-3% differences when doing encryption, so I think this is probably in that ballpark, even if it turns out another compile is 3% slower. This is not a significantly read or write centric workload, it's mostly CPU. So this 3% difference may not even be related to the compression. > Plus, the write amplification comment isn't even universal as there > continue to be controllers where the flash translation layer is > compressing the data. At least consumer SSDs tend to just do concurrent write dedup. File system compression isn't limited to Btrfs, there's also F2FS contributed by Samsung which implements compression these days as well, although they commit to it at mkfs time, where as on Btrfs it's a mount option. Mix and match compressed extents is routine on Btrfs anyway, so there's no concern with users mixing things up. They can change the compression level and even the algorithm with impunity, just tacking it onto a remount command. It's not even necessary to reboot. > OTOH, it makes a lot more sense on a lot of these arm/sbc boards > utilizing MMC because the disks are so slow. Maybe if something like > this were made the default the machine should run a quick CPU > compress/decompress vs IO speed test and only enable compression if the > compress/decompress speed is at least the IO rate. It's not that simple because neither the user space writers nor kworkers are single threaded. You'd need a particularly fast NVMe matched with a not so fast CPU with a workload that somehow dumps a lot of data in a way that the compression acts as a bottle neck. It could exist. But it's not a per se problem that I've seen. But if you propose a test, I can do A/B testing. -- Chris Murphy ___ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-le...@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
Re: Fedora 34 Change: Enable btrfs transparent zstd compression by default (System-Wide Change proposal)
> A few more things: > > * btrfs-progs tools don't yet have a way to report compression > information. While 'df' continues to report correctly about actual > blocks used and free, both regular 'du' (coreutils) and 'btrfs > filesystem du' will report uncompressed values. Are there plans for upstream to address this pretty major shortcoming in the next release of btrfs-progs? From what I can see on the btrfs wiki the user space support for compression is very rudimentary and with no real indication that it is being worked on or seen as a priority. ___ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-le...@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
Re: Fedora 34 Change: Enable btrfs transparent zstd compression by default (System-Wide Change proposal)
Hi, On 2/11/21 11:05 PM, Chris Murphy wrote: On Thu, Feb 11, 2021 at 9:58 AM Jeremy Linton wrote: Hi, On 1/1/21 8:59 PM, Chris Murphy wrote: Anyway, compress=zstd:1 is a good default. Everyone benefits, and I'm not even sure someone with a very fast NVMe drive will notice a slow down because the compression/decompression is threaded. I disagree that everyone benefits. Any read latency sensitive workload will be slower due to the application latency being both the drive latency plus the decompression latency. And as the kernel benchmarks indicate very few systems are going to get anywhere near the performance of even baseline NVMe drives when its comes to throughput. It's possible some workloads on NVMe might have faster reads or writes without compression. https://github.com/facebook/zstd btrfs compress=zstd:1 translates into zstd -1 right now; there are some ideas to remap btrfs zstd:1 to one of the newer zstd --fast options, but it's just an idea. And in any case the default for btrfs and zstd will remain as 3 and -3 respectively, which is what 'compress=zstd' maps to, making it identical to 'compress=zstd:3'. I have a laptop with NVMe and haven't come across such a workload so far, but this is obviously not a scientific sample. I think you'd need a process that's producing read/write rates that the storage can meet, but that the compression algorithm limits. Btrfs is threaded, as is the compression. What's typical, is no change in performance and sometimes a small small increase in performance. It essentially trades some CPU cycles in exchange for less IO. That includes less time reading and writing, but also less latency, meaning the gain on rotational media is greater. Worse, if the workload is very parallel, and at max CPU already the compression overhead will only make that situation worse as well. (I suspect you could test this just by building some packages that have good parallelism during the build). This is compiling the kernel on a 4/8-core CPU (circa 2011) using make -j8, the kernel running is 5.11-rc7. no compression real55m32.769s user369m32.823s sys 35m59.948s -- compress=zstd:1 real53m44.543s user368m17.614s sys 36m2.505s That's a one time test, and it's a ~3% improvement. *shrug* We don't really care too much these days about 1-3% differences when doing encryption, so I think this is probably in that ballpark, even if it turns out another compile is 3% slower. This is not a significantly read or write centric workload, it's mostly CPU. So this 3% difference may not even be related to the compression. Did you drop caches/etc between runs? Because I git cloned mainline, copied the fedora kernel config from /boot and on a fairly recent laptop (12 threads) with a software encrypted NVMe. Dropped caches and did a time make against a compressed directory and an uncompressed one with both a semi constrained (4G) setup and 32G ram setup (compressed swapping disabled, because the machine has an encrypted swap for hibernation and crashdumps). compressed: real22m40.129s user221m9.816s sys 23m37.038s uncompressed: real21m53.366s user221m56.714s sys 23m39.988s uncompressed 4G ram: real28m48.964s user288m47.569s sys 30m43.957s compressed 4G real29m54.061s user281m7.120s sys 29m50.613s and that is not an IO constrained workload its generally cpu constrained, and since the caches are warm due to the software encryption the decompress times should be much faster than machines that aren't cache stashing. The machine above, can actually peg all 6 cores until it hits thermal limits simply doing cp's with btrfs/zstd compression, all the while losing about 800MB/sec of IO bandwidth over the raw disk. Turning an IO bound problem into a CPU bound one isn't ideal. Compressed disks only work in the situation where the CPUs can compress/decompress faster than the disk, or the workload is managing to significantly reduce IO because the working set is streaming rather than random. Any workload which has a random read component to it and is tending closer to page sized read/writes is going to get hurt, and god help if its a RMW cycle. Similarly for parallelized compression, which is only scalable if the IO sizes are large enough that its worth the IPI overhead of bringing additional cores online and the resulting chunks are still large enough to be dealt with individually. Plus, the write amplification comment isn't even universal as there continue to be controllers where the flash translation layer is compressing the data. At least consumer SSDs tend to just do concurrent write dedup. File system compression isn't limited to Btrfs, there's also F2FS contributed by Samsung which implements compression these days as well, although they commit to it at mkfs time, where as on Btrfs it's a mount option. Mix and match compressed extents is routine on Btrfs anyway, so there's no
Re: Fedora 34 Change: Enable btrfs transparent zstd compression by default (System-Wide Change proposal)
On Thu, Feb 11, 2021 at 11:48 PM Tom Seewald wrote: > > > A few more things: > > > > * btrfs-progs tools don't yet have a way to report compression > > information. While 'df' continues to report correctly about actual > > blocks used and free, both regular 'du' (coreutils) and 'btrfs > > filesystem du' will report uncompressed values. > > Are there plans for upstream to address this pretty major shortcoming in the > next release of btrfs-progs? From what I can see on the btrfs wiki the user > space support for compression is very rudimentary and with no real indication > that it is being worked on or seen as a priority. I know there is an intent to incorporate it, I don't know the time frame. It probably belongs in 'btrfs filesystem du' but I'm not certain. Since F2FS is also doing compression, it might make sense to enhance df or du (both are coreutils). There is a tool called compsize that will be included in default installations, that will provide statistical information. Speaking for myself, it's been something of a short term novelty usage that tapers off over time. The statistics satisfy curiosity, but I've found the curiosity wanes because it doesn't affect any decision making, contrary to df, du, and ls. The behavior of du and ls is unchanged, whereas the behavior of df is that the rate of free space consumption is always the same or less compared to uncompressed. There isn't a mechanism for a program to consider 100G of free space as 200G accounting for compression. It's still just 100G, and compression maybe gets you a 50-70G file for things like binaries. And for text files it can be quite a lot. And for multimedia files you're not going to see any compression. -- Chris Murphy ___ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-le...@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archives: https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org Do not reply to spam on the list, report it: https://pagure.io/fedora-infrastructure
Re: Fedora 34 Change: Enable btrfs transparent zstd compression by default (System-Wide Change proposal)
On Sat, Feb 13, 2021 at 9:45 PM Jeremy Linton wrote: > > Hi, > > On 2/11/21 11:05 PM, Chris Murphy wrote: > > On Thu, Feb 11, 2021 at 9:58 AM Jeremy Linton wrote: > >> > >> Hi, > >> > >> On 1/1/21 8:59 PM, Chris Murphy wrote: > > > >>> Anyway, compress=zstd:1 is a good default. Everyone benefits, and I'm > >>> not even sure someone with a very fast NVMe drive will notice a slow > >>> down because the compression/decompression is threaded. > >> > >> I disagree that everyone benefits. Any read latency sensitive workload > >> will be slower due to the application latency being both the drive > >> latency plus the decompression latency. And as the kernel benchmarks > >> indicate very few systems are going to get anywhere near the performance > >> of even baseline NVMe drives when its comes to throughput. > > > > It's possible some workloads on NVMe might have faster reads or writes > > without compression. > > > > https://github.com/facebook/zstd > > > > btrfs compress=zstd:1 translates into zstd -1 right now; there are > > some ideas to remap btrfs zstd:1 to one of the newer zstd --fast > > options, but it's just an idea. And in any case the default for btrfs > > and zstd will remain as 3 and -3 respectively, which is what > > 'compress=zstd' maps to, making it identical to 'compress=zstd:3'. > > > > I have a laptop with NVMe and haven't come across such a workload so > > far, but this is obviously not a scientific sample. I think you'd need > > a process that's producing read/write rates that the storage can meet, > > but that the compression algorithm limits. Btrfs is threaded, as is > > the compression. > > > > What's typical, is no change in performance and sometimes a small > > small increase in performance. It essentially trades some CPU cycles > > in exchange for less IO. That includes less time reading and writing, > > but also less latency, meaning the gain on rotational media is > > greater. > > > >> Worse, if the workload is very parallel, and at max CPU already > >> the compression overhead will only make that situation worse as well. (I > >> suspect you could test this just by building some packages that have > >> good parallelism during the build). > > > > This is compiling the kernel on a 4/8-core CPU (circa 2011) using make > > -j8, the kernel running is 5.11-rc7. > > > > no compression > > > > real55m32.769s > > user369m32.823s > > sys 35m59.948s > > > > -- > > > > compress=zstd:1 > > > > real53m44.543s > > user368m17.614s > > sys 36m2.505s > > > > That's a one time test, and it's a ~3% improvement. *shrug* We don't > > really care too much these days about 1-3% differences when doing > > encryption, so I think this is probably in that ballpark, even if it > > turns out another compile is 3% slower. This is not a significantly > > read or write centric workload, it's mostly CPU. So this 3% difference > > may not even be related to the compression. > > Did you drop caches/etc between runs? Yes. And also did the test with uncompressed source files when compiling without the compress mount option. And compressed source files when compiling with the compress mount option. While it's possible to mix those around (there's four combinations), I kept them the same since those are the most common. >Because I git cloned mainline, > copied the fedora kernel config from /boot and on a fairly recent laptop > (12 threads) with a software encrypted NVMe. Dropped caches and did a > time make against a compressed directory and an uncompressed one with > both a semi constrained (4G) setup and 32G ram setup (compressed > swapping disabled, because the machine has an encrypted swap for > hibernation and crashdumps). > > compressed: > real22m40.129s > user221m9.816s > sys 23m37.038s > > uncompressed: > real21m53.366s > user221m56.714s > sys 23m39.988s > > uncompressed 4G ram: > real28m48.964s > user288m47.569s > sys 30m43.957s > > compressed 4G > real29m54.061s > user281m7.120s > sys 29m50.613s > While the feature page doesn't claim it always increases performance, it also doesn't say it can reduce performance. In the CPU intensive workloads, it stands to reason there's going to be some competition. The above results strongly suggest that's what's going on, even if I couldn't reproduce it. But performance gain/loss isn't the only factor for consideration. > and that is not an IO constrained workload its generally cpu > constrained, and since the caches are warm due to the software > encryption the decompress times should be much faster than machines that > aren't cache stashing. I don't know, so I can't confirm or deny any of that. > The machine above, can actually peg all 6 cores until it hits thermal > limits simply doing cp's with btrfs/zstd compression, all the while > losing about 800MB/sec of IO bandwidth over the raw disk. Turning an IO > bound problem into a CPU bound one isn't ideal. It's a set of tradeoffs. And ther
Re: Fedora 34 Change: Enable btrfs transparent zstd compression by default (System-Wide Change proposal)
Hi, On 2/14/21 2:20 PM, Chris Murphy wrote: On Sat, Feb 13, 2021 at 9:45 PM Jeremy Linton wrote: Hi, On 2/11/21 11:05 PM, Chris Murphy wrote: On Thu, Feb 11, 2021 at 9:58 AM Jeremy Linton wrote: Hi, On 1/1/21 8:59 PM, Chris Murphy wrote: Anyway, compress=zstd:1 is a good default. Everyone benefits, and I'm not even sure someone with a very fast NVMe drive will notice a slow down because the compression/decompression is threaded. I disagree that everyone benefits. Any read latency sensitive workload will be slower due to the application latency being both the drive latency plus the decompression latency. And as the kernel benchmarks indicate very few systems are going to get anywhere near the performance of even baseline NVMe drives when its comes to throughput. It's possible some workloads on NVMe might have faster reads or writes without compression. https://github.com/facebook/zstd btrfs compress=zstd:1 translates into zstd -1 right now; there are some ideas to remap btrfs zstd:1 to one of the newer zstd --fast options, but it's just an idea. And in any case the default for btrfs and zstd will remain as 3 and -3 respectively, which is what 'compress=zstd' maps to, making it identical to 'compress=zstd:3'. I have a laptop with NVMe and haven't come across such a workload so far, but this is obviously not a scientific sample. I think you'd need a process that's producing read/write rates that the storage can meet, but that the compression algorithm limits. Btrfs is threaded, as is the compression. What's typical, is no change in performance and sometimes a small small increase in performance. It essentially trades some CPU cycles in exchange for less IO. That includes less time reading and writing, but also less latency, meaning the gain on rotational media is greater. Worse, if the workload is very parallel, and at max CPU already the compression overhead will only make that situation worse as well. (I suspect you could test this just by building some packages that have good parallelism during the build). This is compiling the kernel on a 4/8-core CPU (circa 2011) using make -j8, the kernel running is 5.11-rc7. no compression real55m32.769s user369m32.823s sys 35m59.948s -- compress=zstd:1 real53m44.543s user368m17.614s sys 36m2.505s That's a one time test, and it's a ~3% improvement. *shrug* We don't really care too much these days about 1-3% differences when doing encryption, so I think this is probably in that ballpark, even if it turns out another compile is 3% slower. This is not a significantly read or write centric workload, it's mostly CPU. So this 3% difference may not even be related to the compression. Did you drop caches/etc between runs? Yes. And also did the test with uncompressed source files when compiling without the compress mount option. And compressed source files when compiling with the compress mount option. While it's possible to mix those around (there's four combinations), I kept them the same since those are the most common. Because I git cloned mainline, copied the fedora kernel config from /boot and on a fairly recent laptop (12 threads) with a software encrypted NVMe. Dropped caches and did a time make against a compressed directory and an uncompressed one with both a semi constrained (4G) setup and 32G ram setup (compressed swapping disabled, because the machine has an encrypted swap for hibernation and crashdumps). compressed: real22m40.129s user221m9.816s sys 23m37.038s uncompressed: real21m53.366s user221m56.714s sys 23m39.988s uncompressed 4G ram: real28m48.964s user288m47.569s sys 30m43.957s compressed 4G real29m54.061s user281m7.120s sys 29m50.613s While the feature page doesn't claim it always increases performance, it also doesn't say it can reduce performance. In the CPU intensive workloads, it stands to reason there's going to be some competition. The above results strongly suggest that's what's going on, even if I couldn't reproduce it. But performance gain/loss isn't the only factor for consideration. and that is not an IO constrained workload its generally cpu constrained, and since the caches are warm due to the software encryption the decompress times should be much faster than machines that aren't cache stashing. I don't know, so I can't confirm or deny any of that. The machine above, can actually peg all 6 cores until it hits thermal limits simply doing cp's with btrfs/zstd compression, all the while losing about 800MB/sec of IO bandwidth over the raw disk. Turning an IO bound problem into a CPU bound one isn't ideal. It's a set of tradeoffs. And there isn't a governor that can assess when an IO bound bottleneck becomes a CPU bound one. Compressed disks only work in the situation where the CPUs can compress/decompress faster than the disk, or the workload is managing to significantly reduce IO because the working set is strea
Re: Fedora 34 Change: Enable btrfs transparent zstd compression by default (System-Wide Change proposal)
On Tue, Feb 16, 2021 at 4:10 PM Jeremy Linton wrote: > On 2/14/21 2:20 PM, Chris Murphy wrote: > > This isn't sufficiently qualified. It does work to reduce space > > consumption and write amplification. It's just that there's a tradeoff > > that you dislike, which is IO reduction. And it's completely > > reasonable to have a subjective position on this tradeoff. But no > > matter what there is a consequence to the choice. > > IO reduction in some cases (see below), for additional read latency, and > and increase in CPU utilization. > > For a desktop workload the former is likely a larger problem. But as we > all know sluggishness is a hard thing to measure on a desktop. QD1 > pointer chasing on disk though is a good approximation, sometimes boot > times are too. What is your counter proposal? > > A larger file might have a mix of compressed and non-compressed > > extents, based on this "is it worth it" estimate. This is the > > difference between the compress and compress-force options, where > > force drops this estimator and depends on the compression algorithm to > > do that work. I sometimes call that estimator the "early bailout" > > check. > > Compression estimation is its own ugly ball of wax. But ignoring that > for the moment, consider what happens if you have a bunch of 2G database > files with a reasonable compression ratio. Lets assume for a moment the > database attempts to update records in the middle of the files. What > happens when the compression ratio gets slightly worse? (its likely you > already have nodatacow). What percentage of Fedora desktop users do you estimate have a bunch of 2G database files? I don't assume datacow or nodatacow for databases, because some databases and their workloads do OK on COW filesystems and others don't. Also, nodatacow disables compression. i.e. files having file attribute 'C' (nodatacow) with mount option compress(-force) remain uncompressed. > Although this becomes a case of > seeing if the "compression estimation" logic is smart enough to detect > its causing poor IO patterns (while still having a reasonably good > compression ratio). The "early bail" heuristic just tries to estimate if the effort of compression is worth it. If it is, the data extent is submitted for compression and if it's not worth it, it isn't. The max extent size for this is 128KiB. There's no IO pattern detection. Once the compression has happened, the write allocator works the same as without compression. https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/btrfs/compression.c?h=v5.11#n1314 https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/btrfs/compression.c?h=v5.11#n1609 > In a past life, I spent a non inconsequential part of a decade > engineering compressed ram+storage systems (similar to what has been > getting merged to mainline over the past few years). Its really hard to > make one that is performant across a wide range of workloads. What you > get are areas where it can help, but if you average those case with the > ones where it hurts the overwhelming analysis is you shouldn't be > compressing unless you want the capacity. The worse part is that most > synthetic file IO benchmarks tend to be on the "it helps" side of the > equation and the real applications on the other. This is why I tend to poo poo on benchmarks. They're useful for the narrow purpose they're intended to measure. Synthetic benchmarks are good at exposing problems, but won't tell you their significance, so what they expose is the need for better testing. A databased benchmark will do a good job showing performance issues with workloads that act like the database that the benchmark is mimicking. Not all databases have the same behavior. > IMHO if fedora wanted to take a hit on the IO perf side, a much better > place to focus would be flipping encryption on. The perf profile is > flatter (aes-ni & the arm crypto extensions are common) with fewer evil > edge cases. Or a more controlled method might to be picking a couple > fairly atomic directories and enabling compression there (say /usr). Workstation WG has been tracking these: https://pagure.io/fedora-workstation/issue/136 https://pagure.io/fedora-workstation/issue/82 A significant impediment to ticket the "Encrypt my data" checkbox by default in Automatic partitioning is the UI/UX. The current evaluation centers on using systemd-homed to encrypt user data by default; and optionally enabling system encryption with the key sealed in the TPM, or protected on something like a yubikey. There's still some work to do to get this integrated. -- Chris Murphy ___ devel mailing list -- devel@lists.fedoraproject.org To unsubscribe send an email to devel-le...@lists.fedoraproject.org Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/ List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines List Archiv