On 11/13/18 4:03 PM, David Sterba wrote: > On Thu, Oct 11, 2018 at 07:40:22PM +0000, Hans van Kranenburg wrote: >> On 10/11/2018 05:13 PM, David Sterba wrote: >>> On Thu, Oct 04, 2018 at 11:24:37PM +0200, Hans van Kranenburg wrote: >>>> This patch set contains an additional fix for a newly exposed bug after >>>> the previous attempt to fix a chunk allocator bug for new DUP chunks: >>>> >>>> https://lore.kernel.org/linux-btrfs/782f6000-30c0-0085-abd2-74ec5827c...@mendix.com/T/#m609ccb5d32998e8ba5cfa9901c1ab56a38a6f374 >>>> >>>> The DUP fix is "fix more DUP stripe size handling". I did that one >>>> before starting to change more things so it can be applied to earlier >>>> LTS kernels. >>>> >>>> Besides that patch, which is fixing the bug in a way that is least >>>> intrusive, I added a bunch of other patches to help getting the chunk >>>> allocator code in a state that is a bit less error-prone and >>>> bug-attracting. >>>> >>>> When running this and trying the reproduction scenario, I can now see >>>> that the created DUP device extent is 827326464 bytes long, which is >>>> good. >>>> >>>> I wrote and tested this on top of linus 4.19-rc5. I still need to create >>>> a list of related use cases and test more things to at least walk >>>> through a bunch of obvious use cases to see if there's nothing exploding >>>> too quickly with these changes. However, I'm quite confident about it, >>>> so I'm sharing all of it already. >>>> >>>> Any feedback and review is appreciated. Be gentle and keep in mind that >>>> I'm still very much in a learning stage regarding kernel development. >>> >>> The patches look good, thanks. Problem is explained, preparatory work is >>> separated from the fix itself. >> >> \o/ >> >>>> The stable patches handling workflow is not 100% clear to me yet. I >>>> guess I have to add a Fixes: in the DUP patch which points to the >>>> previous commit 92e222df7b. >>> >>> Almost nobody does it right, no worries. If you can identify a single >>> patch that introduces a bug then it's for Fixes:, otherwise a CC: stable >>> with version where it makes sense & applies is enough. I do that check >>> myself regardless of what's in the patch. >> >> It's 92e222df7b and the thing I'm not sure about is if it also will >> catch the same patch which was already backported to LTS kernels since >> 92e222df7b also has Fixes in it... So by now the new bug is in 4.19, >> 4.14, 4.9, 4.4, 3.16... >> >>> I ran the patches in a VM and hit a division-by-zero in test >>> fstests/btrfs/011, stacktrace below. First guess is that it's caused by >>> patch 3/6. >> >> Ah interesting, dev replace. >> >> I'll play around with replace and find out how to run the tests properly >> and then reproduce this. >> >> The code introduced in patch 3 is removed again in patch 6, so I don't >> suspect that one. :) >> >> But, I'll find out. >> >> Thanks for testing. > > I've merged patches 1-5 to misc-next as they seem to be safe and pass > fstests, please let me know when you have updates to the last one. > Thanks.
I'll definitely follow up on that. It has not happened because something about todo lists and ordering work and not doing too many things at the same time. Thanks for already confirming it was not patch 3 but 6 that has the problem, I already suspected that. For patch 3, the actual fix, how do we get that on top of old stable kernels where the previous fix (92e222df7b) is in? Because of the Fixes: line that one was picked again in 4.14, 4.9, 4.4 and even 3.16. How does this work? Does it need all the other commit ids from those branches in Fixes lines? Or is there magic somewhere that does this? From your "development cycle open" mails, I understand that for testing myself, I would test based on misc-next, for-next or for-x.y in that order depending on where the first 5 are yet at that moment? Hans