Re: FYIO: A rant about btrfs
On 18 September 2015 at 14:10, Austin S Hemmelgarn wrote: > On 2015-09-17 10:52, Aneurin Price wrote: >> >> On 16 September 2015 at 20:21, Austin S Hemmelgarn >> wrote: >>> >>> ZFS has been around for much longer, it's been mature and feature >>> complete for more than a decade, and has had a long time to improve >>> performance wise. It is important to note though, that on low-end hardware, >>> BTRFS can (and often does in my experience) perform better than ZFS, because >>> ZFS is a serious resource hog (I have yet to see a stable ZFS deployment >>> with less than 16G of RAM, even with all the fancy features turned off). >> >> >> If you have a real example of ZFS becoming unstable with, say, 4 or >> 8GB of memory, that doesn't involve attempting deduplication (which I >> guess is what you mean by 'all the fancy features') on a many-TB pool, >> I'd be interested to hear about it. (Psychic debugger says 'possibly >> somebody trying to use a large L2ARC on a pool with many/large zvols') >> >> My home fileserver has been running zfs for about 5 years, on a system >> maxed out at 4GB RAM. Currently up to ~9TB of data. The only stability >> problems I ever had were towards the beginning when I was using >> zfs-fuse because zfsonlinux wasn't ready then, *and* I was trying out >> deduplication. >> >> I have a couple of work machines with 2GB RAM and pools currently >> around 2.5TB full; no problems with these either in the couple of >> years they've been in use, though granted these are lightly loaded >> machines since what they mostly do is receive backup streams. >> >> Bear in mind that these are Linux machines, and zfsonlinux's memory >> management is known to be inferior to ZFS on Solaris and FreeBSD >> (because it does not integrate with the page cache and instead grabs a >> [configurable] chunk of memory, and doesn't always do a great job of >> dropping it in response to memory pressure). >> > I should qualify this further, in particular I meant using ZFS on Linux (not > *BSD, they did an amazing job WRT stability), and actually taking advantage > of the volume-management (ie, not just storing files on it, but also using > zvols). In essence, A better way to put it is that I've never seen a truly > stable system using zfsonlinux with less than 16G or RAM which is using it > for volume-management as well as file storage. > That's interesting, and I think it supports the 'large L2ARC on a pool with many/large zvols' theory. Apologies that this is now so off-topic, but I really wish this were more widely known: I've seen major problems using an L2ARC with zvols, and from reading various reports I get the impression that a lot of other people have too, though they may not always have diagnosed it. It's easy to get the general impression that an L2ARC is a great thing to have, that has the potential to improve your read throughput with minimal downside, but in fact the downside can be pretty large - and it is much larger by default with volumes than filesystems. Whenever a chunk of data is stored in the L2ARC, a header for that data is stored in the ARC, where it must remain in memory (not sure if it's pinned to stop it being swapped out; I think it is, but if you're at the point where that makes a difference you're probably already having a bad day). For a filesystem, the header corresponds to a record which is, by default, a variable size up to 128k; for a volume, the header corresponds to a block which is, by default, 8k. Assuming that your records are mostly close to their maximum size, this means that the RAM overhead of having a given amount of data in L2ARC is *16x* higher for a volume than a filesystem. Unfortunately, if you're using volumes it's probably because you're running VMs or a dedicated storage server, and in those circumstances we naturally think "yes, some caching sounds like a great idea", and then proceed to wildly underestimate the amount of memory required. Consequently, as the L2ARC fills, ZFS fills its memory allowance with data that is basically pure overhead - often blowing past its configured limit without getting much mileage out of it, and becoming barely responsive. The effect is that the system seems fine when it first starts, and becomes slower and slower over time until it's unusable. Practically speaking, there's not too much difference between a storage server that spends three minutes thrashing in response to each read and one that's locked up hard. Basically having an L2ARC of non-trivial size without a lot of RAM to back it up is a bad idea, so I think there's a good chance that the problems you've observed/heard about may well be due to inappropriate L2ARC usage without fully appreciating the memory requirements. Nye -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: FYIO: A rant about btrfs
On 09/16/2015 10:43 AM, M G Berberich wrote: Hello, just for information. I stumbled about a rant about btrfs-performance: Found this through reddit, I'm reproducing some of his issues artificially, he's definitely run into some real bugs that aren't related to "databases suck on btrfs." Thanks, Josef -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: FYIO: A rant about btrfs
On 2015-09-17 20:34, Duncan wrote: Zygo Blaxell posted on Wed, 16 Sep 2015 18:08:56 -0400 as excerpted: On Wed, Sep 16, 2015 at 03:04:38PM -0400, Vincent Olivier wrote: OK fine. Let it be clearer then (on the Btrfs wiki): nobarrier is an absolute no go. Case closed. Sometimes it is useful to make an ephemeral filesystem, i.e. a btrfs on a dm-crypt device with a random key that is not stored. This configuration intentionally and completely destroys the entire filesystem, and all data on it, in the event of a power failure. It's useful for things like temporary table storage, where ramfs is too small, swap-backed tmpfs is too slow, and/or there is a requirement that the data not be persisted across reboots. In other words, nobarrier is for a little better performance when you already want to _intentionally_ destroy your filesystem on power failure. Very good explanation of why it's useful to have such an otherwise destructive mount option even available in the first place. Thanks! =:^) The other reason, as has been pointed out in a different sub-thread, is that if you have a guaranteed good hardware RAID controller, which has a known good built in non-volatile write cache, and you turn off write-reordering, and you turn off the write-caches on all the connected hard drives, then it is relatively safe. Of course, the chances of most people actually meeting all those conditions is pretty slim. smime.p7s Description: S/MIME Cryptographic Signature
Re: FYIO: A rant about btrfs
On 2015-09-17 10:52, Aneurin Price wrote: On 16 September 2015 at 20:21, Austin S Hemmelgarn wrote: ZFS has been around for much longer, it's been mature and feature complete for more than a decade, and has had a long time to improve performance wise. It is important to note though, that on low-end hardware, BTRFS can (and often does in my experience) perform better than ZFS, because ZFS is a serious resource hog (I have yet to see a stable ZFS deployment with less than 16G of RAM, even with all the fancy features turned off). If you have a real example of ZFS becoming unstable with, say, 4 or 8GB of memory, that doesn't involve attempting deduplication (which I guess is what you mean by 'all the fancy features') on a many-TB pool, I'd be interested to hear about it. (Psychic debugger says 'possibly somebody trying to use a large L2ARC on a pool with many/large zvols') My home fileserver has been running zfs for about 5 years, on a system maxed out at 4GB RAM. Currently up to ~9TB of data. The only stability problems I ever had were towards the beginning when I was using zfs-fuse because zfsonlinux wasn't ready then, *and* I was trying out deduplication. I have a couple of work machines with 2GB RAM and pools currently around 2.5TB full; no problems with these either in the couple of years they've been in use, though granted these are lightly loaded machines since what they mostly do is receive backup streams. Bear in mind that these are Linux machines, and zfsonlinux's memory management is known to be inferior to ZFS on Solaris and FreeBSD (because it does not integrate with the page cache and instead grabs a [configurable] chunk of memory, and doesn't always do a great job of dropping it in response to memory pressure). I should qualify this further, in particular I meant using ZFS on Linux (not *BSD, they did an amazing job WRT stability), and actually taking advantage of the volume-management (ie, not just storing files on it, but also using zvols). In essence, A better way to put it is that I've never seen a truly stable system using zfsonlinux with less than 16G or RAM which is using it for volume-management as well as file storage. smime.p7s Description: S/MIME Cryptographic Signature
Re: FYIO: A rant about btrfs
On 2015-09-17 11:57, Martin Steigerwald wrote: Am Mittwoch, 16. September 2015, 23:29:30 CEST schrieb Hugo Mills: but even then having write-barriers turned off is still not as safe as having them turned on. Most of the time when I've tried testing with 'nobarrier' (not just on BTRFS but on ext* as well), I had just as many issues with data loss when the system crashed as when it (simlated via killing the virtual machine) lost power. Both journaling and COW filesystems need to ensure ordering of certain write operations to be able to maintain consistency. For example, the new/updated data blocks need to be on disk before the metadata is updated to point to them, otherwise you database can end up corrupted. Indeed. The barriers are an ordering condition. The FS relies on (i.e. *requires*) that ordering condition, in order to be truly consistent. Running with "nobarrier" is a very strong signal that you really don't care about the data on the FS. This is not a case of me simply believing that because I've been using btrfs for so long that I've got used to the peculiarities. The first time I heard about the nobarrier option, something like 6 years ago when I was first using btrfs, I thought "that's got to be a really silly idea". Any complex data structure, like a filesystem, is going to rely on some kind of ordering guarantees, somewhere in its structure. (The ordering might be strict, with a global clock, or barrier-based, or lattice-like, as for example a vector clock, but there's going to be _some_ concept of order). nobarrier allows the FS to ignore those guarantees, and even without knowing anything about the FS at all, doing so is a big red DANGER flag. Official recommendation for XFS differs from that: Q. Should barriers be enabled with storage which has a persistent write cache? Many hardware RAID have a persistent write cache which preserves it across power failure, interface resets, system crashes, etc. Using write barriers in this instance is not recommended and will in fact lower performance. Therefore, it is recommended to turn off the barrier support and mount the filesystem with "nobarrier", assuming your RAID controller is infallible and not resetting randomly like some common ones do. But take care about the hard disk write cache, which should be off. http://xfs.org/index.php/ XFS_FAQ#Q._Should_barriers_be_enabled_with_storage_which_has_a_persistent_write_cache. 3F There's a difference there still, XFS isn't quite as dependent on ordering as BTRFS is, and they're also giving some other strict rquirements (hard-disk write-cache being off (which I see a rather large number of people ignore), and a high-end RAID controller). smime.p7s Description: S/MIME Cryptographic Signature
Re: FYIO: A rant about btrfs
Zygo Blaxell posted on Wed, 16 Sep 2015 18:08:56 -0400 as excerpted: > On Wed, Sep 16, 2015 at 03:04:38PM -0400, Vincent Olivier wrote: >> >> OK fine. Let it be clearer then (on the Btrfs wiki): nobarrier is an >> absolute no go. Case closed. > > Sometimes it is useful to make an ephemeral filesystem, i.e. a btrfs on > a dm-crypt device with a random key that is not stored. This > configuration intentionally and completely destroys the entire > filesystem, and all data on it, in the event of a power failure. It's > useful for things like temporary table storage, where ramfs is too > small, swap-backed tmpfs is too slow, and/or there is a requirement that > the data not be persisted across reboots. > > In other words, nobarrier is for a little better performance when you > already want to _intentionally_ destroy your filesystem on power > failure. Very good explanation of why it's useful to have such an otherwise destructive mount option even available in the first place. Thanks! =:^) -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: FYIO: A rant about btrfs
Am Mittwoch, 16. September 2015, 23:29:30 CEST schrieb Hugo Mills: > > but even then having write-barriers > > turned off is still not as safe as having them turned on. Most of > > the time when I've tried testing with 'nobarrier' (not just on BTRFS > > but on ext* as well), I had just as many issues with data loss when > > the system crashed as when it (simlated via killing the virtual > > machine) lost power. Both journaling and COW filesystems need to > > ensure ordering of certain write operations to be able to maintain > > consistency. For example, the new/updated data blocks need to be on > > disk before the metadata is updated to point to them, otherwise you > > database can end up corrupted. > >Indeed. The barriers are an ordering condition. The FS relies on > (i.e. *requires*) that ordering condition, in order to be truly > consistent. Running with "nobarrier" is a very strong signal that you > really don't care about the data on the FS. > >This is not a case of me simply believing that because I've been > using btrfs for so long that I've got used to the peculiarities. The > first time I heard about the nobarrier option, something like 6 years > ago when I was first using btrfs, I thought "that's got to be a really > silly idea". Any complex data structure, like a filesystem, is going > to rely on some kind of ordering guarantees, somewhere in its > structure. (The ordering might be strict, with a global clock, or > barrier-based, or lattice-like, as for example a vector clock, but > there's going to be _some_ concept of order). nobarrier allows the FS > to ignore those guarantees, and even without knowing anything about > the FS at all, doing so is a big red DANGER flag. Official recommendation for XFS differs from that: Q. Should barriers be enabled with storage which has a persistent write cache? Many hardware RAID have a persistent write cache which preserves it across power failure, interface resets, system crashes, etc. Using write barriers in this instance is not recommended and will in fact lower performance. Therefore, it is recommended to turn off the barrier support and mount the filesystem with "nobarrier", assuming your RAID controller is infallible and not resetting randomly like some common ones do. But take care about the hard disk write cache, which should be off. http://xfs.org/index.php/ XFS_FAQ#Q._Should_barriers_be_enabled_with_storage_which_has_a_persistent_write_cache. 3F Thanks, -- Martin -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: FYIO: A rant about btrfs
On 16 September 2015 at 20:21, Austin S Hemmelgarn wrote: > ZFS has been around for much longer, it's been mature and feature complete > for more than a decade, and has had a long time to improve performance wise. > It is important to note though, that on low-end hardware, BTRFS can (and > often does in my experience) perform better than ZFS, because ZFS is a > serious resource hog (I have yet to see a stable ZFS deployment with less > than 16G of RAM, even with all the fancy features turned off). If you have a real example of ZFS becoming unstable with, say, 4 or 8GB of memory, that doesn't involve attempting deduplication (which I guess is what you mean by 'all the fancy features') on a many-TB pool, I'd be interested to hear about it. (Psychic debugger says 'possibly somebody trying to use a large L2ARC on a pool with many/large zvols') My home fileserver has been running zfs for about 5 years, on a system maxed out at 4GB RAM. Currently up to ~9TB of data. The only stability problems I ever had were towards the beginning when I was using zfs-fuse because zfsonlinux wasn't ready then, *and* I was trying out deduplication. I have a couple of work machines with 2GB RAM and pools currently around 2.5TB full; no problems with these either in the couple of years they've been in use, though granted these are lightly loaded machines since what they mostly do is receive backup streams. Bear in mind that these are Linux machines, and zfsonlinux's memory management is known to be inferior to ZFS on Solaris and FreeBSD (because it does not integrate with the page cache and instead grabs a [configurable] chunk of memory, and doesn't always do a great job of dropping it in response to memory pressure). -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: FYIO: A rant about btrfs
On 2015-09-16 19:31, Hugo Mills wrote: On Wed, Sep 16, 2015 at 03:21:26PM -0400, Austin S Hemmelgarn wrote: On 2015-09-16 12:45, Martin Tippmann wrote: 2015-09-16 17:20 GMT+02:00 Austin S Hemmelgarn : [...] [...] From reading the list I understand that btrfs is still very much work in progress and performance is not a top priority at this stage but I don't see why it shouldn't perform at least equally good as ZFS/F2FS on the same workloads. Is looking at performance problems on the development roadmap? Performance is on the roadmap, but the roadmap is notoriously short-sighted when it comes to time-frame for completion of something. You have to understand also that the focus in BTRFS has also been more on data safety than performance, because that's the intended niche, and the area most people look to ZFS for. Wait... there's a roadmap? ;) Yeah, maybe it's better to say that there's a directed graph of feature interdependence. I was just basing my statement on the presence of a list of project ideas on the wiki. :) smime.p7s Description: S/MIME Cryptographic Signature
Re: FYIO: A rant about btrfs
On Wed, Sep 16, 2015 at 12:45 PM, Martin Tippmann wrote: > From reading the list I understand that btrfs is still very much work > in progress and performance is not a top priority at this stage but I > don't see why it shouldn't perform at least equally good as ZFS/F2FS > on the same workloads. Is looking at performance problems on the > development roadmap? My sense is that sufferings in comparison to ZFS just represent a lack of maturity - there just hasn't been as much focus on performance. I'm not aware of any fundamental design issues which are likely to make btrfs perform worse than ZFS in the long-term. F2FS is a fundamentally different beast. It is a log-based filesystem as far as I'm aware, and on flash that gives it some substantial advantages, but it doesn't support snapshotting/etc as far as I'm aware. I'm sure that in the long term some operations are just going to be faster on F2FS no matter what just due to its design, and other operations will always be slower on F2FS. To draw an analogy, imagine you have a 1TB ext4 filesystem and a 1TB btrfs filesystem. On each you create a 900GB file, and then proceed to make millions of internal writes all over it. The ext4 filesystem is just going to completely outperform btrfs at this job, and I suspect it would outperform zfs as well. For such a use case you don't really even need a filesystem - you might as well just be reading/writing random blocks right off the disk, and ext4 is pretty close to that in behavior when it comes to internal file modifications. The COW filesystems are going to be fragmenting the living daylights out of the file and its metadata. Of course, if you pulled the plug in the middle of one of those operations the COW filesystems are more likely to end up in a sane state if you care about the order of file modifications, and if you're doing this on RAID both zfs and btrfs will be immune to any write hole issues. Also, if you go making reflink copies of large files on a btrfs filesystem it will perform MUCH better than doing the equivalent on ext4 (which requires copying all the data, at a cost of both time and space). In the end you have to look at your application, and not just performance stats. There are tradeoffs. Personally, I've had enough hard drive failures that btrfs is worth it to me just for the assurance that when something goes wrong the filesystem knows what is good and what isn't. As drives get bigger this becomes more and more important. -- Rich -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: FYIO: A rant about btrfs
On Wed, Sep 16, 2015 at 03:21:26PM -0400, Austin S Hemmelgarn wrote: > On 2015-09-16 12:45, Martin Tippmann wrote: > >2015-09-16 17:20 GMT+02:00 Austin S Hemmelgarn : > >[...] [...] > > From reading the list I understand that btrfs is still very much work > >in progress and performance is not a top priority at this stage but I > >don't see why it shouldn't perform at least equally good as ZFS/F2FS > >on the same workloads. Is looking at performance problems on the > >development roadmap? > Performance is on the roadmap, but the roadmap is notoriously > short-sighted when it comes to time-frame for completion of > something. You have to understand also that the focus in BTRFS has > also been more on data safety than performance, because that's the > intended niche, and the area most people look to ZFS for. Wait... there's a roadmap? ;) Hugo. -- Hugo Mills | Our so-called leaders speak hugo@... carfax.org.uk | with words they try to jail ya http://carfax.org.uk/ | They subjugate the meek PGP: E2AB1DE4 | but it's the rhetoric of failure. The Police signature.asc Description: Digital signature
Re: FYIO: A rant about btrfs
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Wed, Sep 16, 2015 at 03:08:43PM -0400, Austin S Hemmelgarn wrote: > On 2015-09-16 12:25, Zia Nayamuth wrote: > >Some response to your criticism: > > > >1. How would that hole fare with a fully battery-backed/flash-backed > >path (battery-backed or flash-backed HBA with disks with full power-loss > >protection, like the Intel S3500)? In such a situation (quite > >commonplace in server-land), power-loss should not cause any data loss > >since all data in the cache is guaranteed to be committed to > >non-volatile memory at some point (whether such assurances may be > >trusted is another matter entirely though, and well outside the scope of > >this discussion). > It's not as much of an issue if you have full power loss protection > (assuming of course it works), "Power loss" is a shorthand for a number of issues, only one of which is related to electrons ceasing to vibrate through the wires. You can have a power-loss-like event when your kernel crashes hard. No UPS will help you there. > but even then having write-barriers > turned off is still not as safe as having them turned on. Most of > the time when I've tried testing with 'nobarrier' (not just on BTRFS > but on ext* as well), I had just as many issues with data loss when > the system crashed as when it (simlated via killing the virtual > machine) lost power. Both journaling and COW filesystems need to > ensure ordering of certain write operations to be able to maintain > consistency. For example, the new/updated data blocks need to be on > disk before the metadata is updated to point to them, otherwise you > database can end up corrupted. Indeed. The barriers are an ordering condition. The FS relies on (i.e. *requires*) that ordering condition, in order to be truly consistent. Running with "nobarrier" is a very strong signal that you really don't care about the data on the FS. This is not a case of me simply believing that because I've been using btrfs for so long that I've got used to the peculiarities. The first time I heard about the nobarrier option, something like 6 years ago when I was first using btrfs, I thought "that's got to be a really silly idea". Any complex data structure, like a filesystem, is going to rely on some kind of ordering guarantees, somewhere in its structure. (The ordering might be strict, with a global clock, or barrier-based, or lattice-like, as for example a vector clock, but there's going to be _some_ concept of order). nobarrier allows the FS to ignore those guarantees, and even without knowing anything about the FS at all, doing so is a big red DANGER flag. Hugo. > >2. Fair point. I'd like to know his hardware, given how strongly > >hardware can influence things. > > > >3. It's pretty obvious that the author of that blog is specifically > >targeting OLTP performance (explicit statement in intro, choice of > >benchmark, name and focus of blog), not common-case, and even states > >that in the first two paragraphs of his conclusion. The focus is > >somewhat less clear in said conclusion, namely, is he truly talking > >about general purpose use or is he talking about general purpose OLTP use? > > > My takeaway was that he intended 'general purpose use' to mean > generic every day usage across a wide variety of systems, he was not > particularly specific about it however. - -- Hugo Mills | Our so-called leaders speak hugo@... carfax.org.uk | with words they try to jail ya http://carfax.org.uk/ | They subjugate the meek PGP: E2AB1DE4 | but it's the rhetoric of failure. The Police -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.12 (GNU/Linux) iQIcBAEBAgAGBQJV+fswAAoJEFheFHXiqx3knD0QAJ0aBV6nE8npPF+daG0Nwdwi U8ksTAxNYjJQPbwLVcoIDhBzuP/DD8yMBER/FaGuXDvOIOdaX05mffQblYLGc+bT bze8nY9F0+5/5m2dGXiNyoouifHIZBtW+vaIipTqMwFU6I6IeOJIABIYeCxNmcm+ INcODG6rogCyRamJZGuSkZxnUe4GIu5yzHzPqnlXQ9PHVRa1IjYzAO17EMoBfHHo bASoTQ7h1u1Kx3KqTnctgwKxBD1LdRNlZ/SeEmnAGrj6s8RPIYj2/nEtgjTLleU+ vGt2keIApk1mFPwayvxZdNBZhyuy4UrqGdXYVkmXYIJnoGshOulWr2Ntp83ZSw/f pGHXubU7oHU+Lha7pKbDunTp8ktKlcyCKIU+3eyE2fhMozy0HChqR591uoGUtp3n ylxchdl9GiUEsIpeZ7MLVEvfxHHob6Na18NuU3zLWtvMuch0+riWWWG8Wf73ql1b JovCfdAAroowjWfThxApx4QT3Lz5c9K9JZsHeSNbvmPZz2kEsyXIBg7zVNMbKdcm uqv8aMQxYMP6O8rrHXC6td8tizlLOKQVf8GNmAZS5BvfVSaPx5bbpDKYz6g87dRj hbsAWwhIvWOE4S5/dhwKTinRXAJ+qNbeeASyGGXNgpa6afNzfAVErpFtte10SFuE M3w+DYcbj1JEORvxvC3U =LXO/ -END PGP SIGNATURE- signature.asc Description: Digital signature
Re: FYIO: A rant about btrfs
Vincent Olivier posted on Wed, 16 Sep 2015 15:04:38 -0400 as excerpted: 3. He's testing it for a workload is a known and documented problem for BTRFS, and claiming that that means that it isn't worth considering as a general usage filesystem. Most people don't run RDBMS servers on their systems, and as such, such a workload is not worth considering for most people. >>> Apparently RDBMS being a problem on Btrfs is neither known nor >>> documented enough (he’s right about the contrast with claiming >>> publicly that Btrfs is indeed production ready). >> OK, maybe not documented, but RDBMS falls under 'Large files with >> highly random access patterns and heavy RMW usage', which is a known >> issue for BTRFS, and also applies to VM images. > > > This guy is no idiot. If it wasn’t clear enough for him. It’s not clear > enough period. I'd argue that while he's "clearly no idiot", he's equally clearly an "idiot savant" (relatively speaking). He has an extremely high level of knowledge in a very few specific areas (rdbms being the primary one in discussion here), and much higher than average in a number of others, but unfortunately tends to lack what many would call "common sense" in some or many others, to the point that he does what people in those areas would consider "idiotic things" because he simply doesn't have a reasonable level of experience with them, and what's more, is demonstrably uninterested in getting that level of "functional experience", as he has higher priorities ranking where he's going to be spending his time -- basically, staying at the top of his field in the areas where he's already extremely highly knowledgeable. Note that I'm not mocking him. I'm much the same way, as are many geeks/ nerds, thus the term "nerdview". (Personal example. Back in college I once _stapled_ a note to a football. It simply never occurred to me that a football is inflated, and stapling something to it was a _very_ bad idea (!!), because I simply didn't have even a minimal level of functional experience in that area, such that I lacked "common sense" in regard to it, and further, wasn't then and am not now, particularly interested in spending my time getting that minimal functional experience level, as my priorities simply lie elsewhere.) See also the relatively high level of Asperger syndrome among the geek crowd. As for his conclusions, I find myself "in violent agreement", as I've seen it said, with most of them, the exception being that I don't agree that btrfs isn't appropriate as a general purpose filesystem -- I'd say a more accurate statement is that, taking into account btrfs' maturity level, btrfs is acceptable as a general purpose filesystem, with the caveat that as a COW based filesystem where optimization has not yet been a priority and may well not be for a few more years, btrfs is definitely COUNTER-indicated for GiB+ sized file VM-image and database use. However, accepting that RDBMS is in fact one of his primary focus areas, I can see how his definition of "general purpose" would tend to include that, where a more "general purpose" definition of "general purpose filesystem" (ha, recursive definitions!) has room for that particular case being an exception. But I _definitely_ agree with him that btrfs is unfortunately being billed as "mature" and "production ready", where, for the general use case, I'd... let's just say I'd not choose that characterization, preferring instead to characterize btrfs as "not yet entirely stable and mature", and definitely not yet optimized. It's certainly ready for "cautious" use after doing a bit of research on your use-case, and with backups at the ready if you care about the data, but as any good sysadmin will tell you, by definition, if you care about the data, you have it backed up, and if you don't have it backed up, by equal definition, you do _NOT_ care about losing the data, so that's a _given_. But "cautious use after researching your use-case" isn't what he's interested in doing, or rather... He's not particularly interested in doing that level of pre-deployment research, and to his credit, for a mature filesystem, he really shouldn't have to... tho people that care enough about their use-case, as he really should for RDBMS given that as a major focus point for him, will be doing that research anyway, _because_ they care. Which to his credit, he's doing the research, in a way. It's not the way _I_ would go about it, certainly, but by running those benchmarks, etc, and comparing against other filesystems, he's doing research in his own way, and under the parameters he uses, I certainly don't disagree with his conclusions, that btrfs is simply a rather poor choice for his use- case of interest, particularly given the level of additional research and tuning he's demonstrably willing to put into btrfs specifically, that being (very close to) zero. As for the btrfs s
Re: FYIO: A rant about btrfs
On Wed, Sep 16, 2015 at 03:04:38PM -0400, Vincent Olivier wrote: > > On Sep 16, 2015, at 2:22 PM, Austin S Hemmelgarn > > wrote: > > On 2015-09-16 12:51, Vincent Olivier wrote: > >>> On Sep 16, 2015, at 11:20 AM, Austin S Hemmelgarn > >>> wrote: > >>> On 2015-09-16 10:43, M G Berberich wrote: > >>> It is worth noting a few things that were done incorrectly in this > >>> testing: > >>> 1. _NEVER_ turn off write barriers (nobarrier mount option), doing so > >>> subtly breaks the data integrity guarantees of _ALL_ filesystems, but > >>> especially so on COW filesystems like BTRFS. With this off, you will > >>> have a much higher chance that a power loss will cause data loss. It > >>> shouldn't be turned off unless you are also turning off write-caching in > >>> the hardware or know for certain that no write-reordering is done by the > >>> hardware (and almost all modern hardware does write-reordering for > >>> performance reasons). > >> But can the “nobarrier” mount option affect performances negatively for > >> Btrfs (and not only data integrity)? > > Using it improves performance for every filesystem on Linux that supports > > it. This does not mean that it is _EVER_ a good idea to do so. This mount > > option is one of the few things on my list of things that I will _NEVER_ > > personally provide support to people for, because it almost guarantees that > > you will lose data if the system dies unexpectedly (even if it's for a > > reason other than power loss). > > OK fine. Let it be clearer then (on the Btrfs wiki): nobarrier is an absolute > no go. Case closed. Sometimes it is useful to make an ephemeral filesystem, i.e. a btrfs on a dm-crypt device with a random key that is not stored. This configuration intentionally and completely destroys the entire filesystem, and all data on it, in the event of a power failure. It's useful for things like temporary table storage, where ramfs is too small, swap-backed tmpfs is too slow, and/or there is a requirement that the data not be persisted across reboots. In other words, nobarrier is for a little better performance when you already want to _intentionally_ destroy your filesystem on power failure. signature.asc Description: Digital signature
Re: FYIO: A rant about btrfs
On 2015-09-16 15:04, Vincent Olivier wrote: On Sep 16, 2015, at 2:22 PM, Austin S Hemmelgarn wrote: On 2015-09-16 12:51, Vincent Olivier wrote: Hi, On Sep 16, 2015, at 11:20 AM, Austin S Hemmelgarn wrote: On 2015-09-16 10:43, M G Berberich wrote: Hello, just for information. I stumbled about a rant about btrfs-performance: http://blog.pgaddict.com/posts/friends-dont-let-friends-use-btrfs-for-oltp I read it too. It is worth noting a few things that were done incorrectly in this testing: 1. _NEVER_ turn off write barriers (nobarrier mount option), doing so subtly breaks the data integrity guarantees of _ALL_ filesystems, but especially so on COW filesystems like BTRFS. With this off, you will have a much higher chance that a power loss will cause data loss. It shouldn't be turned off unless you are also turning off write-caching in the hardware or know for certain that no write-reordering is done by the hardware (and almost all modern hardware does write-reordering for performance reasons). But can the “nobarrier” mount option affect performances negatively for Btrfs (and not only data integrity)? Using it improves performance for every filesystem on Linux that supports it. This does not mean that it is _EVER_ a good idea to do so. This mount option is one of the few things on my list of things that I will _NEVER_ personally provide support to people for, because it almost guarantees that you will lose data if the system dies unexpectedly (even if it's for a reason other than power loss). OK fine. Let it be clearer then (on the Btrfs wiki): nobarrier is an absolute no go. Case closed. From the https://btrfs.wiki.kernel.org/index.php/Mount_options NOTE: Using this option greatly increases the chances of you experiencing data corruption during a power failure situation. This means full file-system corruption, and not just losing or corrupting data that was being written during a power cut or kernel panic. It could be a bit clearer, but it's pretty well spelled out. 2. He provides no comparison of any other filesystem with TRIM support turned on (it is very likely that all filesystems will demonstrate such performance drops. Based on that graph, it looks like the device doesn't support asynchronous trim commands). I think he means by the text surrounding the only graph that mentions TRIM that this exact same test on the other filesystems he benchmarked yield much better results. Possibly, but there are also known issues with TRIM/DISCARD on BTRFS in 4.0. And his claim is still baseless unless he actually provides reference for it. Same as above: TRIM/DISCARD officially not recommended in production until further notice? TRIM/DISCARD do work, it's just that they don't work to the degree they are expected to, there's some cases where BTRFS doesn't issue a discard when it should, and fstrim doesn't properly trim everything. 3. He's testing it for a workload is a known and documented problem for BTRFS, and claiming that that means that it isn't worth considering as a general usage filesystem. Most people don't run RDBMS servers on their systems, and as such, such a workload is not worth considering for most people. Apparently RDBMS being a problem on Btrfs is neither known nor documented enough (he’s right about the contrast with claiming publicly that Btrfs is indeed production ready). OK, maybe not documented, but RDBMS falls under 'Large files with highly random access patterns and heavy RMW usage', which is a known issue for BTRFS, and also applies to VM images. This guy is no idiot. If it wasn’t clear enough for him. It’s not clear enough period. From https://btrfs.wiki.kernel.org/index.php/Gotchas Fragmentation Files with a lot of random writes can become heavily fragmented (1+ extents) causing trashing on HDDs and excessive multi-second spikes of CPU load on systems with an SSD or large amount a RAM. On servers and workstations this affects databases and virtual machine images. The nodatacow mount option may be of use here, with associated gotchas. On desktops this primarily affects application databases (including Firefox and Chromium profiles, GNOME Zeitgeist, Ubuntu Desktop Couch, Banshee, and Evolution's datastore.) Workarounds include manually defragmenting your home directory using btrfs fi defragment. Auto-defragment (mount option autodefrag) should solve this problem in 3.0. Symptoms include btrfs-transacti and btrfs-endio-wri taking up a lot of CPU time (in spikes, possibly triggered by syncs). You can use filefrag to locate heavily fragmented files (may not work correctly with compression). His points about the degree of performance jitter are valid however, as are the complaints of apparent CPU intensive stalls in the BTRFS code, and I occasionally see both on my own systems. Me too. My two cents is that focusing on improving performances for Btrfs-optimal use cases is much more interesting than bringing
Re: FYIO: A rant about btrfs
On 2015-09-16 12:45, Martin Tippmann wrote: Hi, 2015-09-16 17:20 GMT+02:00 Austin S Hemmelgarn : [...] 3. He's testing it for a workload is a known and documented problem for BTRFS, and claiming that that means that it isn't worth considering as a general usage filesystem. Most people don't run RDBMS servers on their systems, and as such, such a workload is not worth considering for most people. His points about the degree of performance jitter are valid however, as are the complaints of apparent CPU intensive stalls in the BTRFS code, and I occasionally see both on my own systems. Are there any conceptual or design properties that make btrfs perform worse in this workload compared to ZFS or F2FS that both do CoW and share some similarity? ZFS has been around for much longer, it's been mature and feature complete for more than a decade, and has had a long time to improve performance wise. It is important to note though, that on low-end hardware, BTRFS can (and often does in my experience) perform better than ZFS, because ZFS is a serious resource hog (I have yet to see a stable ZFS deployment with less than 16G of RAM, even with all the fancy features turned off). As for F2FS, it was designed from the ground up for high performance and efficiency. BTRFS started (from what I understand) as more of an experiment, and as such the original code was not particularly optimized. If not then I still think (from a user perspective) it it still interesting too look where the difference is coming from? There is also a quite old talk with some unfortunate numbers for btrfs from the XFS developer Dave Chinner: https://www.youtube.com/watch?v=FegjLbCnoBw - is this already resolved? 2012 is 3 years ago. That depends on what you mean by resolved. In every test I've done that wasn't a specialized benchmark designed to simulate a particular workload, I've gotten roughly equal performance between XFS and BTRFS (YMMV of course, especially because my usecase is not very typical of most people (primarily building software and running BOINC projects)). I use BTRFS mostly because it makes online reprovisioning much easier (you can't migrate an XFS filesystem to a new block device online, and you also can't shrink one). It's also worth noting that Dave Chinner usually comes forth with numbers extolling the value of XFS when a new filesystem gets proposed. In my experience, for small scale usage, ext* gets better performance than almost anything else, including XFS. From reading the list I understand that btrfs is still very much work in progress and performance is not a top priority at this stage but I don't see why it shouldn't perform at least equally good as ZFS/F2FS on the same workloads. Is looking at performance problems on the development roadmap? Performance is on the roadmap, but the roadmap is notoriously short-sighted when it comes to time-frame for completion of something. You have to understand also that the focus in BTRFS has also been more on data safety than performance, because that's the intended niche, and the area most people look to ZFS for. smime.p7s Description: S/MIME Cryptographic Signature
Re: FYIO: A rant about btrfs
On 2015-09-16 12:25, Zia Nayamuth wrote: Some response to your criticism: 1. How would that hole fare with a fully battery-backed/flash-backed path (battery-backed or flash-backed HBA with disks with full power-loss protection, like the Intel S3500)? In such a situation (quite commonplace in server-land), power-loss should not cause any data loss since all data in the cache is guaranteed to be committed to non-volatile memory at some point (whether such assurances may be trusted is another matter entirely though, and well outside the scope of this discussion). It's not as much of an issue if you have full power loss protection (assuming of course it works), but even then having write-barriers turned off is still not as safe as having them turned on. Most of the time when I've tried testing with 'nobarrier' (not just on BTRFS but on ext* as well), I had just as many issues with data loss when the system crashed as when it (simlated via killing the virtual machine) lost power. Both journaling and COW filesystems need to ensure ordering of certain write operations to be able to maintain consistency. For example, the new/updated data blocks need to be on disk before the metadata is updated to point to them, otherwise you database can end up corrupted. 2. Fair point. I'd like to know his hardware, given how strongly hardware can influence things. 3. It's pretty obvious that the author of that blog is specifically targeting OLTP performance (explicit statement in intro, choice of benchmark, name and focus of blog), not common-case, and even states that in the first two paragraphs of his conclusion. The focus is somewhat less clear in said conclusion, namely, is he truly talking about general purpose use or is he talking about general purpose OLTP use? My takeaway was that he intended 'general purpose use' to mean generic every day usage across a wide variety of systems, he was not particularly specific about it however. smime.p7s Description: S/MIME Cryptographic Signature
Re: FYIO: A rant about btrfs
> On Sep 16, 2015, at 2:22 PM, Austin S Hemmelgarn wrote: > > On 2015-09-16 12:51, Vincent Olivier wrote: >> Hi, >> >> >>> On Sep 16, 2015, at 11:20 AM, Austin S Hemmelgarn >>> wrote: >>> >>> On 2015-09-16 10:43, M G Berberich wrote: Hello, just for information. I stumbled about a rant about btrfs-performance: http://blog.pgaddict.com/posts/friends-dont-let-friends-use-btrfs-for-oltp >> I read it too. >>> It is worth noting a few things that were done incorrectly in this testing: >>> 1. _NEVER_ turn off write barriers (nobarrier mount option), doing so >>> subtly breaks the data integrity guarantees of _ALL_ filesystems, but >>> especially so on COW filesystems like BTRFS. With this off, you will have >>> a much higher chance that a power loss will cause data loss. It shouldn't >>> be turned off unless you are also turning off write-caching in the hardware >>> or know for certain that no write-reordering is done by the hardware (and >>> almost all modern hardware does write-reordering for performance reasons). >> But can the “nobarrier” mount option affect performances negatively for >> Btrfs (and not only data integrity)? > Using it improves performance for every filesystem on Linux that supports it. > This does not mean that it is _EVER_ a good idea to do so. This mount > option is one of the few things on my list of things that I will _NEVER_ > personally provide support to people for, because it almost guarantees that > you will lose data if the system dies unexpectedly (even if it's for a reason > other than power loss). OK fine. Let it be clearer then (on the Btrfs wiki): nobarrier is an absolute no go. Case closed. >>> 2. He provides no comparison of any other filesystem with TRIM support >>> turned on (it is very likely that all filesystems will demonstrate such >>> performance drops. Based on that graph, it looks like the device doesn't >>> support asynchronous trim commands). >> I think he means by the text surrounding the only graph that mentions TRIM >> that this exact same test on the other filesystems he benchmarked yield much >> better results. > Possibly, but there are also known issues with TRIM/DISCARD on BTRFS in 4.0. > And his claim is still baseless unless he actually provides reference for it. Same as above: TRIM/DISCARD officially not recommended in production until further notice? >>> 3. He's testing it for a workload is a known and documented problem for >>> BTRFS, and claiming that that means that it isn't worth considering as a >>> general usage filesystem. Most people don't run RDBMS servers on their >>> systems, and as such, such a workload is not worth considering for most >>> people. >> Apparently RDBMS being a problem on Btrfs is neither known nor documented >> enough (he’s right about the contrast with claiming publicly that Btrfs is >> indeed production ready). > OK, maybe not documented, but RDBMS falls under 'Large files with highly > random access patterns and heavy RMW usage', which is a known issue for > BTRFS, and also applies to VM images. This guy is no idiot. If it wasn’t clear enough for him. It’s not clear enough period. >>> His points about the degree of performance jitter are valid however, as are >>> the complaints of apparent CPU intensive stalls in the BTRFS code, and I >>> occasionally see both on my own systems. >> Me too. My two cents is that focusing on improving performances for >> Btrfs-optimal use cases is much more interesting than bringing new features >> like automatically turning COW off for RDBMS usage or debugging TRIM support. > It depends, BTRFS is still not feature complete with the overall intent when > it was started (raid56 and qgroups being the two big issues at the moment), > and attempting to optimize things tends to introduce bugs, which we have > quite enough of already without people adding more (and they still seem to be > breeding like rabbits). I would just like a clear statement from a dev-lead saying : until we are feature-complete (with a finite list of features to complete) the focus will be on feature-completion and not optimizing already-implemented features. Ideally with an ETA on when optimization will be more of a priority than it is today. > That said, my systems (which are usually doing mostly CPU or memory bound > tasks, and not I/O bound like the aforementioned benchmarks were testing) run > no slower than they did with ext4 as the main filesystem, and in some cases > work much faster (even after averaging out the jitter in performance). Based > on this, I wouldn't advocate it for most server usage (except possibly as the > root filesystem), but it does work very well for most desktop usage patterns > and a number of HPC usage patterns as well. See, this is interesting: I’d rather have a super fast and discardable SSD F2FS/ext4 root with a large Btrfs RAID for (NAS) server usage. Does your non-advocacy
Re: FYIO: A rant about btrfs
On 2015-09-16 12:51, Vincent Olivier wrote: Hi, On Sep 16, 2015, at 11:20 AM, Austin S Hemmelgarn wrote: On 2015-09-16 10:43, M G Berberich wrote: Hello, just for information. I stumbled about a rant about btrfs-performance: http://blog.pgaddict.com/posts/friends-dont-let-friends-use-btrfs-for-oltp I read it too. It is worth noting a few things that were done incorrectly in this testing: 1. _NEVER_ turn off write barriers (nobarrier mount option), doing so subtly breaks the data integrity guarantees of _ALL_ filesystems, but especially so on COW filesystems like BTRFS. With this off, you will have a much higher chance that a power loss will cause data loss. It shouldn't be turned off unless you are also turning off write-caching in the hardware or know for certain that no write-reordering is done by the hardware (and almost all modern hardware does write-reordering for performance reasons). But can the “nobarrier” mount option affect performances negatively for Btrfs (and not only data integrity)? Using it improves performance for every filesystem on Linux that supports it. This does not mean that it is _EVER_ a good idea to do so. This mount option is one of the few things on my list of things that I will _NEVER_ personally provide support to people for, because it almost guarantees that you will lose data if the system dies unexpectedly (even if it's for a reason other than power loss). 2. He provides no comparison of any other filesystem with TRIM support turned on (it is very likely that all filesystems will demonstrate such performance drops. Based on that graph, it looks like the device doesn't support asynchronous trim commands). I think he means by the text surrounding the only graph that mentions TRIM that this exact same test on the other filesystems he benchmarked yield much better results. Possibly, but there are also known issues with TRIM/DISCARD on BTRFS in 4.0. And his claim is still baseless unless he actually provides reference for it. 3. He's testing it for a workload is a known and documented problem for BTRFS, and claiming that that means that it isn't worth considering as a general usage filesystem. Most people don't run RDBMS servers on their systems, and as such, such a workload is not worth considering for most people. Apparently RDBMS being a problem on Btrfs is neither known nor documented enough (he’s right about the contrast with claiming publicly that Btrfs is indeed production ready). OK, maybe not documented, but RDBMS falls under 'Large files with highly random access patterns and heavy RMW usage', which is a known issue for BTRFS, and also applies to VM images. My view on this is that having one filesystem to rule them all (all storage technologies, all use cases) is unrealistic. Also the time when you could put your NAS on an old i386 with 3MB of RAM is over. Compression, checksumming, COW, snapshotting, quotas, etc. are all computationally intensive features. In 2015 block storage has become computationally intensive. How about saying non-root Btrfs RAID10 is the best choice for a Samba NAS on rotational-HDDs with no SMR (my use case)? For root and RDBMS, I use ext4 on a M.2 SSD and with a sane initramfs and the most recent stable kernel. I am happy with the performances and delighted with the features Btrfs provides. I think it is much more productive to document and compare the most successful Btrfs deployments rather than trying to find bugs and bottlenecks for use cases that are the development focus of other filesystems. I don’t know, I might not make a lot of sense here but on top of refactoring the Gotchas, I would be happy to start a successful deployment story section on the wiki and maybe improve my usage of Btrfs along the way (who else here is using Btrfs in a similar fashion?). Agreed, there's a reason that XFS was never the default in most Linux distributions, and similarly why there are so many filesystem drivers available. Any given filesystem can have a number of arguments made against it, for example: * ZFS: Ridiculously resource hungry, and doesn't use the normal page-cache * XFS: filesystems can't be shrunk, and tends to perform slow under light load compared to most other filesystems. * NTFS: Poor file layout for many use cases, and clusters all the metadata together in one place. * ext*: Lacks some useful functionality (reflinks for example), and the file layout and aggressive journaling are usually bad for flash. * reiserfs: numerous gotchas in usage, and fsck loses it's mind when dealing with filestems that have reiserfs images stored in them as regular files His points about the degree of performance jitter are valid however, as are the complaints of apparent CPU intensive stalls in the BTRFS code, and I occasionally see both on my own systems. Me too. My two cents is that focusing on improving performances for Btrfs-optimal use cases is much more interesting
Re: FYIO: A rant about btrfs
Hi, > On Sep 16, 2015, at 11:20 AM, Austin S Hemmelgarn > wrote: > > On 2015-09-16 10:43, M G Berberich wrote: >> Hello, >> >> just for information. I stumbled about a rant about btrfs-performance: >> >> http://blog.pgaddict.com/posts/friends-dont-let-friends-use-btrfs-for-oltp I read it too. > It is worth noting a few things that were done incorrectly in this testing: > 1. _NEVER_ turn off write barriers (nobarrier mount option), doing so subtly > breaks the data integrity guarantees of _ALL_ filesystems, but especially so > on COW filesystems like BTRFS. With this off, you will have a much higher > chance that a power loss will cause data loss. It shouldn't be turned off > unless you are also turning off write-caching in the hardware or know for > certain that no write-reordering is done by the hardware (and almost all > modern hardware does write-reordering for performance reasons). But can the “nobarrier” mount option affect performances negatively for Btrfs (and not only data integrity)? > 2. He provides no comparison of any other filesystem with TRIM support turned > on (it is very likely that all filesystems will demonstrate such performance > drops. Based on that graph, it looks like the device doesn't support > asynchronous trim commands). I think he means by the text surrounding the only graph that mentions TRIM that this exact same test on the other filesystems he benchmarked yield much better results. > 3. He's testing it for a workload is a known and documented problem for > BTRFS, and claiming that that means that it isn't worth considering as a > general usage filesystem. Most people don't run RDBMS servers on their > systems, and as such, such a workload is not worth considering for most > people. Apparently RDBMS being a problem on Btrfs is neither known nor documented enough (he’s right about the contrast with claiming publicly that Btrfs is indeed production ready). My view on this is that having one filesystem to rule them all (all storage technologies, all use cases) is unrealistic. Also the time when you could put your NAS on an old i386 with 3MB of RAM is over. Compression, checksumming, COW, snapshotting, quotas, etc. are all computationally intensive features. In 2015 block storage has become computationally intensive. How about saying non-root Btrfs RAID10 is the best choice for a Samba NAS on rotational-HDDs with no SMR (my use case)? For root and RDBMS, I use ext4 on a M.2 SSD and with a sane initramfs and the most recent stable kernel. I am happy with the performances and delighted with the features Btrfs provides. I think it is much more productive to document and compare the most successful Btrfs deployments rather than trying to find bugs and bottlenecks for use cases that are the development focus of other filesystems. I don’t know, I might not make a lot of sense here but on top of refactoring the Gotchas, I would be happy to start a successful deployment story section on the wiki and maybe improve my usage of Btrfs along the way (who else here is using Btrfs in a similar fashion?). > His points about the degree of performance jitter are valid however, as are > the complaints of apparent CPU intensive stalls in the BTRFS code, and I > occasionally see both on my own systems. Me too. My two cents is that focusing on improving performances for Btrfs-optimal use cases is much more interesting than bringing new features like automatically turning COW off for RDBMS usage or debugging TRIM support. Vincent signature.asc Description: Message signed with OpenPGP using GPGMail
Re: FYIO: A rant about btrfs
Hi, 2015-09-16 17:20 GMT+02:00 Austin S Hemmelgarn : [...] > 3. He's testing it for a workload is a known and documented problem for > BTRFS, and claiming that that means that it isn't worth considering as a > general usage filesystem. Most people don't run RDBMS servers on their > systems, and as such, such a workload is not worth considering for most > people. > > His points about the degree of performance jitter are valid however, as are > the complaints of apparent CPU intensive stalls in the BTRFS code, and I > occasionally see both on my own systems. Are there any conceptual or design properties that make btrfs perform worse in this workload compared to ZFS or F2FS that both do CoW and share some similarity? If not then I still think (from a user perspective) it it still interesting too look where the difference is coming from? There is also a quite old talk with some unfortunate numbers for btrfs from the XFS developer Dave Chinner: https://www.youtube.com/watch?v=FegjLbCnoBw - is this already resolved? 2012 is 3 years ago. >From reading the list I understand that btrfs is still very much work in progress and performance is not a top priority at this stage but I don't see why it shouldn't perform at least equally good as ZFS/F2FS on the same workloads. Is looking at performance problems on the development roadmap? regards Martin -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: FYIO: A rant about btrfs
Some response to your criticism: 1. How would that hole fare with a fully battery-backed/flash-backed path (battery-backed or flash-backed HBA with disks with full power-loss protection, like the Intel S3500)? In such a situation (quite commonplace in server-land), power-loss should not cause any data loss since all data in the cache is guaranteed to be committed to non-volatile memory at some point (whether such assurances may be trusted is another matter entirely though, and well outside the scope of this discussion). 2. Fair point. I'd like to know his hardware, given how strongly hardware can influence things. 3. It's pretty obvious that the author of that blog is specifically targeting OLTP performance (explicit statement in intro, choice of benchmark, name and focus of blog), not common-case, and even states that in the first two paragraphs of his conclusion. The focus is somewhat less clear in said conclusion, namely, is he truly talking about general purpose use or is he talking about general purpose OLTP use? -- Zia Nayamuth On 17/09/2015 01:20, Austin S Hemmelgarn wrote: On 2015-09-16 10:43, M G Berberich wrote: Hello, just for information. I stumbled about a rant about btrfs-performance: http://blog.pgaddict.com/posts/friends-dont-let-friends-use-btrfs-for-oltp MfG bmg It is worth noting a few things that were done incorrectly in this testing: 1. _NEVER_ turn off write barriers (nobarrier mount option), doing so subtly breaks the data integrity guarantees of _ALL_ filesystems, but especially so on COW filesystems like BTRFS. With this off, you will have a much higher chance that a power loss will cause data loss. It shouldn't be turned off unless you are also turning off write-caching in the hardware or know for certain that no write-reordering is done by the hardware (and almost all modern hardware does write-reordering for performance reasons). 2. He provides no comparison of any other filesystem with TRIM support turned on (it is very likely that all filesystems will demonstrate such performance drops. Based on that graph, it looks like the device doesn't support asynchronous trim commands). 3. He's testing it for a workload is a known and documented problem for BTRFS, and claiming that that means that it isn't worth considering as a general usage filesystem. Most people don't run RDBMS servers on their systems, and as such, such a workload is not worth considering for most people. His points about the degree of performance jitter are valid however, as are the complaints of apparent CPU intensive stalls in the BTRFS code, and I occasionally see both on my own systems. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: FYIO: A rant about btrfs
On 2015-09-16 10:43, M G Berberich wrote: Hello, just for information. I stumbled about a rant about btrfs-performance: http://blog.pgaddict.com/posts/friends-dont-let-friends-use-btrfs-for-oltp MfG bmg It is worth noting a few things that were done incorrectly in this testing: 1. _NEVER_ turn off write barriers (nobarrier mount option), doing so subtly breaks the data integrity guarantees of _ALL_ filesystems, but especially so on COW filesystems like BTRFS. With this off, you will have a much higher chance that a power loss will cause data loss. It shouldn't be turned off unless you are also turning off write-caching in the hardware or know for certain that no write-reordering is done by the hardware (and almost all modern hardware does write-reordering for performance reasons). 2. He provides no comparison of any other filesystem with TRIM support turned on (it is very likely that all filesystems will demonstrate such performance drops. Based on that graph, it looks like the device doesn't support asynchronous trim commands). 3. He's testing it for a workload is a known and documented problem for BTRFS, and claiming that that means that it isn't worth considering as a general usage filesystem. Most people don't run RDBMS servers on their systems, and as such, such a workload is not worth considering for most people. His points about the degree of performance jitter are valid however, as are the complaints of apparent CPU intensive stalls in the BTRFS code, and I occasionally see both on my own systems. smime.p7s Description: S/MIME Cryptographic Signature