Thanks, Duncan.  Your points are very reasonable and I will be doing
a bisect, as you suggest (though I don't know that the problem is fixed
in later releases, just that we couldn't reproduce it readily).  Just to be
clear, I am not looking for support.  I just hoped that perhaps this had
been a conspicuous enough problem that someone would be able to
give me a little history.


On Fri, Dec 2, 2016 at 9:58 PM, Duncan <1i5t5.dun...@cox.net> wrote:
> Blake Lewis posted on Fri, 02 Dec 2016 12:36:29 -0800 as excerpted:
>
>> Well, 3.10 is what you get with the RHEL7.x distributions, so that's why
>> people are running it.
>> Apparently, it is "good enough" for many purposes.
>>
>> My real goal here is to understand the scope of the bug and whether any
>> mitigation is possible.  Of course, I don't expect anyone else to make a
>> patch for me (or even to accept my patch), but if I knew what the bug is
>> and what was done to fix it,
>> I'd be in a much better position to decide what to do.  If anyone can
>> shed any light on this,
>> I'd be very grateful.
>
> I'm going to try to make several points with this post, including a
> (perhaps too simplistic, but it works for me and apparently enough others
> to have a dedicated tool) suggestion for finding the problem, along with
> others basically agreeing with LB's reply but with a bit more
> background.  YMMV but it's posted in the hope that it is of help.
>
> 1) I am a list regular and btrfs using admin myself, but not a dev, so
> don't look here for the code-level stuff.
>
> 2) This is the btrfs development list, kernel level, not a distro list,
> and the viewpoints generally held here may or may not correspond to that
> of various distros and their support teams.
>
> 3) In particular, while btrfs has been officially out of experimental for
> some time now, on this list btrfs is held (by both btrfs devs and list-
> regular users) to be still stabilizing, not fully stable and mature, as I
> normally describe it.  Both on this list and on the btrfs wiki (at
> https://btrfs.wiki.kernel.org ), the strong recommendation is to keep
> current on the kernel in particular, because bugs are still being found
> and fixed, and naturally, this list being development-focused, the view
> tends to be rather more forward leaning than particularly "enterprise-
> stale^H^Hble" distros.
>
> Also strongly recommended is keeping backups, tested and ready-to-use,
> because there /are/ still serious bugs being fixed, and sometimes the
> problems they trigger are, for non-devs at least, simply easiest to fix
> by blowing away the existing filesystem and starting over with a clean
> one and backups to recover from.
>
> 4) OTOH, some distros and other product vendors (including your company)
> obviously consider btrfs if not yet /entirely/ stable, stable /enough/ to
> build and ship product on.  While this is accepted as a questionable but
> already-in-the-wild-so-get-used-to-it position on the list, for kernels
> outside our normal support range (next point), the standard answer to
> users asking here for support is to say that while we recognize some
> distros support it on older kernels and with older btrfs userspace, we on
> this list tend to be forward looking and don't track what patches distros
> and vendors may have backported and which ones they haven't.  Thus,
> there's a 3-way choice they need to make, either (a) upgrade to something
> withing our recommended support range so we can best help, or (b) take
> the distro/vendor up on the support they offer and that the user may in
> fact be paying for, since they're best positioned to provide that support
> for older kernels they've done their own patching to, or (c) stay where
> they are and muddle thru the best they can with the limited support we
> /can/ offer -- we'll still do what we can, but honestly, the "impedance
> mismatch" with code that old is going to make it difficult to apply what
> limited support we can offer.
>
> 5) The versions we support best on the list, keeping in mind the above
> "current kernel" recommendation, are the two latest kernel series in
> either the current or LTS series.  On the current side, kernel 4.9 is
> very close to out now, so we like to see 4.8, tho 4.7 is still current
> enough that people on-list are likely to be able to reply sanely in terms
> of whether we recognize a bug and whether it's still current or has
> already been fixed.
>
> On the LTS side, contradicting LB slightly, btrfs does try to backport
> bugfixes when we know they're needed in LTS series -- no effort is made
> to backport to non-LTS beyond the relatively short mainline non-LTS
> current kernel series support period, and as I said, they basically go
> out of support two kernels back.
>
> The two most recent LTS kernel series are 4.4 and 4.1, with 3.18 before
> that.  Based on the two-most-recent policy, 4.4 is definitely still in
> focus, and we do still try for 4.1 as well, tho it's getting long enough
> in the tooth now that if the bug isn't recognized, an early question/
> recommendation is going to be to try with a newer kernel.
>
> The 3.18 LTS series actually ended up reasonably stable for btrfs, while
> 4.4 may have taken a bit longer than normal to mature (in non-btrfs areas
> as well), so while it's back too far for much active support, it was
> working and is likely still working quite well for many.
>
> 3.16 was the LTS series before that, but that was a rough period for
> btrfs, and honestly, btrfs LTS support was new enough back then that we
> were only really trying to support the latest one, so 3.18 is really the
> practical horizon in terms of list support.  Before that there were some
> pretty bad bugs that nobody wants to even think about any more.
>
> 6) Meanwhile, what a lot of people don't realize is that until 3.12 (IIRC)
> stripped off the experimental label, btrfs remained officially
> experimental, with a pretty strong warning on both the kernel option and
> mkfs.btrfs on the user side.  In list support terms that's seriously
> ancient history now (think electricity in the 18th century maturity, it
> was a toy people could see was going to do great things one day and
> people were doing stuff with it, but it really /was/ experimental in
> anything even close to modern-day terms) and you're pretty much on your
> own.
>
> Given that you're asking about 3.10, well... like I said, that's still
> btrfs experimental era, and I don't think I'm alone when I seriously
> question the sanity of anyone still attempting to support a product
> running btrfs on /that/!  If you're doing it, and people are willing to
> pay for it, well, I can't argue with that, but in all honesty, it's your
> customer's data you're putting at risk, and were I to see a product still
> running btrfs on a 3.10 kernel (or really, earlier than 3.18, for the
> reasons explained above, but 3.10 really /was/ still labeled experimental!
> ), I'd immediately mark everything that company sold as highly
> experimental and thus questionable, as well.  <shrug>  Just being honest.
>
>
> So you can see why you're getting told to upgrade to /something/ semi-
> recent, 3.18 LTS, at absolute minimum.  Some of those bugs in the early
> 3.15-3.16 era tie my stomach in knots thinking about people still running
> those versions, they were that bad, even if in theory those bugs are long
> patched, by now.  Like I said, we were glad to see 3.18 LTS and it
> really /did/ end up surprisingly stable, up thru early 4.4 at least, when
> we stopped tracking it.
>
> And FWIW, I /could/ be wrong here, but I /believe/ I saw that even Red
> Hat now actively recommends that people move off of btrfs on 3.10 era
> RHEL-7.x.  (At least there was a post on this list that I interpreted to
> that effect, tho if it actually affected me I'd be double-checking.)  I
> don't know what their actual support status is for those that don't.  You
> might want to look that up, because if it's true, it /would/ give your
> company a bit of an out in terms of supporting then-experimental btrfs,
> as well.
>
>
> All that said, if you insist, again keeping in mind that I'm not a coder,
> so it's unsurprising that I don't have much in the way of specific
> commits to point you at, but at least here's a way to help you find them,
> yourself. =:^)
>
> 7) Incremental problem/fix bisect.  Recursively break the problem space
> roughly in half and test to see which half the problem/fix is in, then
> recurse by breaking that half in half and testing again.  You can use
> either git bisect, which has been popularized as a way for even non-
> coders like me to nail down bugs (or in your case fixes) to specific
> commits, or perhaps first, to narrow down the range you need to git
> bisect, simply by doing a manual bisect of the release series between
> 3.10 and 4.8, to find where the problem goes away, and then looking at
> that commit or the commits around that area to see what might have
> changed that broke, or in your case, fixed, the problem.
>
> Bisect may be dumb and brute force, but it works surprisingly well,
> especially since git bisect automated most of the process, and as I said,
> it has allowed many non-coders to help in tracing and ultimately fixing
> their bugs.  I know it has worked that way for me.  And a modern git
> bisect better stays with release and then rc tags, then big merge points,
> as long as possible before diving down into individual commits, as well,
> making the incremental-bisect process a bit nicer and less "black-box"
> than it used to be, too.
>
> Given what I said about 3.18 being a really good LTS in btrfs terms, I
> might suggest you start by testing it.  If it fixes the problem for you,
> then you can decide whether to try to push it as an upgrade, or try to
> bisect the problem further in ordered to properly backport the fix.  If
> it doesn't, of course the 4.1 and 4.4 LTS kernels are other major test
> points you can try.
>
>
> So indeed, some of that was rehash, but with hopefully helpful additional
> detail now, and the bisect suggestion may be too simplistic.  But hope
> /some/ of it was helpful, anyway. =:^)
>
> --
> Duncan - List replies preferred.   No HTML msgs.
> "Every nonfree program has a lord, a master --
> and if you use the program, he is your master."  Richard Stallman
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to