Thanks, Duncan. Your points are very reasonable and I will be doing a bisect, as you suggest (though I don't know that the problem is fixed in later releases, just that we couldn't reproduce it readily). Just to be clear, I am not looking for support. I just hoped that perhaps this had been a conspicuous enough problem that someone would be able to give me a little history.
On Fri, Dec 2, 2016 at 9:58 PM, Duncan <1i5t5.dun...@cox.net> wrote: > Blake Lewis posted on Fri, 02 Dec 2016 12:36:29 -0800 as excerpted: > >> Well, 3.10 is what you get with the RHEL7.x distributions, so that's why >> people are running it. >> Apparently, it is "good enough" for many purposes. >> >> My real goal here is to understand the scope of the bug and whether any >> mitigation is possible. Of course, I don't expect anyone else to make a >> patch for me (or even to accept my patch), but if I knew what the bug is >> and what was done to fix it, >> I'd be in a much better position to decide what to do. If anyone can >> shed any light on this, >> I'd be very grateful. > > I'm going to try to make several points with this post, including a > (perhaps too simplistic, but it works for me and apparently enough others > to have a dedicated tool) suggestion for finding the problem, along with > others basically agreeing with LB's reply but with a bit more > background. YMMV but it's posted in the hope that it is of help. > > 1) I am a list regular and btrfs using admin myself, but not a dev, so > don't look here for the code-level stuff. > > 2) This is the btrfs development list, kernel level, not a distro list, > and the viewpoints generally held here may or may not correspond to that > of various distros and their support teams. > > 3) In particular, while btrfs has been officially out of experimental for > some time now, on this list btrfs is held (by both btrfs devs and list- > regular users) to be still stabilizing, not fully stable and mature, as I > normally describe it. Both on this list and on the btrfs wiki (at > https://btrfs.wiki.kernel.org ), the strong recommendation is to keep > current on the kernel in particular, because bugs are still being found > and fixed, and naturally, this list being development-focused, the view > tends to be rather more forward leaning than particularly "enterprise- > stale^H^Hble" distros. > > Also strongly recommended is keeping backups, tested and ready-to-use, > because there /are/ still serious bugs being fixed, and sometimes the > problems they trigger are, for non-devs at least, simply easiest to fix > by blowing away the existing filesystem and starting over with a clean > one and backups to recover from. > > 4) OTOH, some distros and other product vendors (including your company) > obviously consider btrfs if not yet /entirely/ stable, stable /enough/ to > build and ship product on. While this is accepted as a questionable but > already-in-the-wild-so-get-used-to-it position on the list, for kernels > outside our normal support range (next point), the standard answer to > users asking here for support is to say that while we recognize some > distros support it on older kernels and with older btrfs userspace, we on > this list tend to be forward looking and don't track what patches distros > and vendors may have backported and which ones they haven't. Thus, > there's a 3-way choice they need to make, either (a) upgrade to something > withing our recommended support range so we can best help, or (b) take > the distro/vendor up on the support they offer and that the user may in > fact be paying for, since they're best positioned to provide that support > for older kernels they've done their own patching to, or (c) stay where > they are and muddle thru the best they can with the limited support we > /can/ offer -- we'll still do what we can, but honestly, the "impedance > mismatch" with code that old is going to make it difficult to apply what > limited support we can offer. > > 5) The versions we support best on the list, keeping in mind the above > "current kernel" recommendation, are the two latest kernel series in > either the current or LTS series. On the current side, kernel 4.9 is > very close to out now, so we like to see 4.8, tho 4.7 is still current > enough that people on-list are likely to be able to reply sanely in terms > of whether we recognize a bug and whether it's still current or has > already been fixed. > > On the LTS side, contradicting LB slightly, btrfs does try to backport > bugfixes when we know they're needed in LTS series -- no effort is made > to backport to non-LTS beyond the relatively short mainline non-LTS > current kernel series support period, and as I said, they basically go > out of support two kernels back. > > The two most recent LTS kernel series are 4.4 and 4.1, with 3.18 before > that. Based on the two-most-recent policy, 4.4 is definitely still in > focus, and we do still try for 4.1 as well, tho it's getting long enough > in the tooth now that if the bug isn't recognized, an early question/ > recommendation is going to be to try with a newer kernel. > > The 3.18 LTS series actually ended up reasonably stable for btrfs, while > 4.4 may have taken a bit longer than normal to mature (in non-btrfs areas > as well), so while it's back too far for much active support, it was > working and is likely still working quite well for many. > > 3.16 was the LTS series before that, but that was a rough period for > btrfs, and honestly, btrfs LTS support was new enough back then that we > were only really trying to support the latest one, so 3.18 is really the > practical horizon in terms of list support. Before that there were some > pretty bad bugs that nobody wants to even think about any more. > > 6) Meanwhile, what a lot of people don't realize is that until 3.12 (IIRC) > stripped off the experimental label, btrfs remained officially > experimental, with a pretty strong warning on both the kernel option and > mkfs.btrfs on the user side. In list support terms that's seriously > ancient history now (think electricity in the 18th century maturity, it > was a toy people could see was going to do great things one day and > people were doing stuff with it, but it really /was/ experimental in > anything even close to modern-day terms) and you're pretty much on your > own. > > Given that you're asking about 3.10, well... like I said, that's still > btrfs experimental era, and I don't think I'm alone when I seriously > question the sanity of anyone still attempting to support a product > running btrfs on /that/! If you're doing it, and people are willing to > pay for it, well, I can't argue with that, but in all honesty, it's your > customer's data you're putting at risk, and were I to see a product still > running btrfs on a 3.10 kernel (or really, earlier than 3.18, for the > reasons explained above, but 3.10 really /was/ still labeled experimental! > ), I'd immediately mark everything that company sold as highly > experimental and thus questionable, as well. <shrug> Just being honest. > > > So you can see why you're getting told to upgrade to /something/ semi- > recent, 3.18 LTS, at absolute minimum. Some of those bugs in the early > 3.15-3.16 era tie my stomach in knots thinking about people still running > those versions, they were that bad, even if in theory those bugs are long > patched, by now. Like I said, we were glad to see 3.18 LTS and it > really /did/ end up surprisingly stable, up thru early 4.4 at least, when > we stopped tracking it. > > And FWIW, I /could/ be wrong here, but I /believe/ I saw that even Red > Hat now actively recommends that people move off of btrfs on 3.10 era > RHEL-7.x. (At least there was a post on this list that I interpreted to > that effect, tho if it actually affected me I'd be double-checking.) I > don't know what their actual support status is for those that don't. You > might want to look that up, because if it's true, it /would/ give your > company a bit of an out in terms of supporting then-experimental btrfs, > as well. > > > All that said, if you insist, again keeping in mind that I'm not a coder, > so it's unsurprising that I don't have much in the way of specific > commits to point you at, but at least here's a way to help you find them, > yourself. =:^) > > 7) Incremental problem/fix bisect. Recursively break the problem space > roughly in half and test to see which half the problem/fix is in, then > recurse by breaking that half in half and testing again. You can use > either git bisect, which has been popularized as a way for even non- > coders like me to nail down bugs (or in your case fixes) to specific > commits, or perhaps first, to narrow down the range you need to git > bisect, simply by doing a manual bisect of the release series between > 3.10 and 4.8, to find where the problem goes away, and then looking at > that commit or the commits around that area to see what might have > changed that broke, or in your case, fixed, the problem. > > Bisect may be dumb and brute force, but it works surprisingly well, > especially since git bisect automated most of the process, and as I said, > it has allowed many non-coders to help in tracing and ultimately fixing > their bugs. I know it has worked that way for me. And a modern git > bisect better stays with release and then rc tags, then big merge points, > as long as possible before diving down into individual commits, as well, > making the incremental-bisect process a bit nicer and less "black-box" > than it used to be, too. > > Given what I said about 3.18 being a really good LTS in btrfs terms, I > might suggest you start by testing it. If it fixes the problem for you, > then you can decide whether to try to push it as an upgrade, or try to > bisect the problem further in ordered to properly backport the fix. If > it doesn't, of course the 4.1 and 4.4 LTS kernels are other major test > points you can try. > > > So indeed, some of that was rehash, but with hopefully helpful additional > detail now, and the bisect suggestion may be too simplistic. But hope > /some/ of it was helpful, anyway. =:^) > > -- > Duncan - List replies preferred. No HTML msgs. > "Every nonfree program has a lord, a master -- > and if you use the program, he is your master." Richard Stallman > > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html