Qu Wenruo posted on Mon, 14 Dec 2015 10:08:16 +0800 as excerpted: > Martin Steigerwald wrote on 2015/12/13 23:35 +0100: >> Hi! >> >> For me it is still not production ready. > > Yes, this is the *FACT* and not everyone has a good reason to deny it.
In the above sentence, I /think/ you (Qu) agree with Martin (and I) that btrfs shouldn't be considered production ready... yet, and the first part of the sentence makes it very clear that you feel strongly about the *FACT*, but the second half of the sentence (after *FACT*) doesn't parse well in English, thus leaving the entire sentence open to interpretation, tho it's obvious either way that you feel strongly about it. =:^\ At the risk of getting it completely wrong, what I /think/ you meant to say is (as expanded in typically Duncan fashion =:^)... Yes, this is the *FACT*, though some people have reasons to deny it. Presumably, said reasons would include the fact that various distros are trying to sell enterprise support contracts to customers very eager to have the features that btrfs provides, and said customers are willing to pay for assurances that the solutions they're buying are "production ready", whether that's actually the case or not, presumably because said payment is (in practice) simply ensuring there's someone else to pin the blame on if things go bad. And the demonstration of that would be the continued fact that people otherwise unnecessarily continue to pay rather large sums of money for that very assurance, when in practice, they'd get equal or better support not worrying about that payment, but instead actually making use of free- of-cost resources such as this list. [Linguistic analysis, see frequent discussion of this topic at Language Log, which I happen to subscribe to as I find this sort of thing interesting, for more commentary and examples of the same general issue: http://languagelog.net ] The problem with the sentence as originally written, is that English doesn't deal well with multi-negation, sometimes considering each negation an inversion of the previous (as do most programming languages and thus programmers), while other times or as read/heard/interpreted by others repeated negation may be considered a strengthening of the original negation. Regardless, mis-negation due to speaker/writer confusion is quite common even among native English speakers/writers. The negating words in question here are "not" and "deny". If you will note, my rewrite kept "deny", but rewrote the "not" out of the sentence, so there's only one negative to worry about, making the meaning much clearer as the reader's mind isn't left trying to figure out what the speaker meant with the double-negative (mistake? deliberate canceling out of the first negative with the second? deliberate intensifier?) and thus unable to be sure one way or the other what was meant. And just in case there would have been doubt, the explanation then makes doubly obvious what I think your intent was by expanding on it. Of course that's easy to do as I entirely agree. OTOH if I'm mistaken as to your intent and you meant it the other way... well then you'll need to do the explaining as then the implication is that some people have good reasons to deny it and you agree with them, but without further expansion, I wouldn't know where you're trying to go with that claim. Just in case there's any doubt left of my own opinion on the original claim of not production ready in the above discussion, let me be explicit: I (too) agree with Martin (and I think with Qu) that btrfs isn't yet production ready. But I don't believe you'll find many on the list taking issue with that, as I think everybody on-list agrees, btrfs /isn't/ production ready. Certainly pretty much just that has been repeatedly stated in individualized style by many posters including myself, and I've yet to see anyone take serious issue with it. >> No matter whether SLES 12 uses it as default for root, no matter >> whether Fujitsu and Facebook use it: I will not let this onto any >> customer machine without lots and lots of underprovisioning and >> rigorous free space monitoring. >> Actually I will renew my recommendations in my trainings to be careful >> with BTRFS. ... And were I to put money on it, my money would be on every regular on- list poster 100% agreeing with that. =:^) >> >> From my experience the monitoring would check for: >> >> merkaba:~> btrfs fi show /home >> Label: 'home' uuid: […] >> Total devices 2 FS bytes used 156.31GiB >> devid 1 size 170.00GiB used 164.13GiB path /dev/[path1] >> devid 2 size 170.00GiB used 164.13GiB path /dev/[path2] >> >> If "used" is same as "size" then make big fat alarm. It is not >> sufficient for it to happen. It can run for quite some time just fine >> without any issues, but I never have seen a kworker thread using 100% >> of one core for extended period of time blocking everything else on the >> fs without this condition being met. Astutely observed. =:^) > And specially advice on the device size from myself: > Don't use devices over 100G but less than 500G. > Over 100G will leads btrfs to use big chunks, where data chunks can be > at most 10G and metadata to be 1G. Thanks, Qu. This is the first time I've seen such specifics both in terms of the big-chunks trigger (minimum 100 GiB effective usable filesystem size) and in terms of how big those big chunks are (10 GiB data, 1 GiB metadata). Filed away for further reference. =:^) > I have seen a lot of users with about 100~200G device, and hit > unbalanced chunk allocation (10G data chunk easily takes the last > available space and makes later metadata no where to store) That does indeed seem to be a reoccurring theme. Now I know why, and where the big-chunks trigger is. =:^) And to add, while the kernel now does empty-chunk reaping, returning them to the unallocated pool, the chances of a 10 GiB chunk being mostly empty but still having at least one small extent still locking it in place as not entirely empty, and thus not reapable, are obviously going to be at least an order of magnitude higher (and in practice likely more, due to a likely unlinearly greater share of files being under 10 GiB size than under 1 GiB size) than the chances at the 1 GiB chunk size. > And unfortunately, your fs is already in the dangerous zone. > (And you are using RAID1, which means it's the same as one 170G btrfs > with SINGLE data/meta) That raid1 parenthetical is why I chose the "effective usable filesystem size" wording above, to try to word it broadly enough to include all the different replication/parity variants. >> Reported in another thread here that got completely ignored >> so far. I think I could go back to 4.2 kernel to make this work. > > Unfortunately, this happens a lot of times, even you posted it to mail > list. > Devs here are always busy locating bugs or adding new features or > enhancing current behavior. > > So *PLEASE* be patient about such slow response. Yes indeed. Generally speaking, one post/thread alone isn't likely to get the eye of a dev unless they happen to be between bug-hunting projects at that moment. But several posts/threads, particularly over a couple kernel cycles or from multiple posters, a trend makes, and then it's much more likely to catch attention. > BTW, you may not want to revert to 4.2 until some bug fix is backported > to 4.2. > As qgroup rework in 4.2 has broken delayed ref and caused some scrub > bugs. (My fault) Good point. (Tho I never happened to trigger those scrub bugs here, but I strongly suspect that's because I both use quite small filesystems, well under that 100 GiB effective size barrier mentioned above, and relatively fast ssds, so my scrubs are done in under a minute and don't tend to be subject to the same sort of IO bottlenecking and races that scrubs on spinning rust at 100 GiB plus filesystem sizes tend to be.) >> I think it got somewhat better. It took much longer to come into that >> state again than last time, but still, blocking like this is *no* >> option for a *production ready* filesystem. Agreed on both counts. The problem should be markedly better since the empty-chunk-reaping went into (IIRC) 3.17, to the point that we're only now beginning to see reports of it being triggered again, while previously people were seeing it repeatedly, often monthly or more frequently. But it's still not hitting the expectations for a production-ready filesystem, but then again, I've yet to see a list regular actually make anything like a claim that btrfs is in fact production ready; rather the opposite, in fact, and repeatedly. What distros might be claiming is another matter, but arguably, people relying on their claims should be following up by demanding support from the distros making them, based on the claims they made. Meanwhile, on this list we're /not/ making those claims and thus cannot reasonably be held to them as if we were. >> I am seriously consider to switch to XFS for my production laptop >> again. Cause I never saw any of these free space issues with any of the >> XFS or Ext4 filesystems I used in the last 10 years. > > Yes, xfs and ext4 is very stable for normal use case. > > But at least, I won't recommend xfs yet, and considering the nature or > journal based fs, I'll recommend backup power supply in crash recovery > for both of them. > > Xfs already messed up several test environment of mine, and an > unfortunate double power loss has destroyed my whole /home ext4 > partition years ago. > > [xfs story] > After several crash, xfs makes several corrupted file just to 0 size. > Including my kernel .git directory. Then I won't trust it any longer. > No to mention that grub2 support for xfs v5 is not here yet. > > [ext4 story] > For ext4, when recovering my /home partition after a power loss, a new > power loss happened, and my home partition is doomed. > Only several non-sense files are savaged. As they say YMMV, but FWIW, despite the stories from the pre-data=ordered- by-default era, and with the acknowledgment that a single anecdote or even a small but unrandomized sampling of anecdotes doesn't a scientific study make, I've actually had surprisingly good luck with reiserfs here, even on hardware that I had little reason to expect a filesystem to actually work reliably on (bad memory incidents, overheated and head- crashed drive incident where after cooldown I took the mounted at the time partitions out of use and successfully and reliably continued to use other partitions on the drive, old and burst capacitor and thus power- unstable mobo incident,... etc, tho not all at once, fortunately!). ATM I use btrfs on my SSDs but continue to use reiserfs on my spinning rust, and FWIW, reiserfs has continued to be as reliable as I'd expect a deeply mature and stable filesystem to be, while btrfs... has been as occasionally but arguably dependably buggy as I'd expect a still under heavy development tho past "experimental", still stabilizing and not yet mature filesystem to be. Tho pre-ordered-by-default era, I remember a few of those 0-size- truncated files on reiserfs, too. But the ordered-by-default introduction was long in the past even when the 3.0 kernel was new, so is pretty well pre-history, by now (which I guess qualifies me as a Linux old fogey by now, even if I didn't really get into it to speak of until the turn of the century or so, after MS gave me the push by very specifically and deliberately shipping malware in eXPrivacy, thus crossing a line I was never to cross with them). -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html