On 10/11/2018 09:40 PM, Hans van Kranenburg wrote:
> On 10/11/2018 05:13 PM, David Sterba wrote:
>> On Thu, Oct 04, 2018 at 11:24:37PM +0200, Hans van Kranenburg wrote:
>>> This patch set contains an additional fix for a newly exposed bug after
>>> the previous attempt to fix a chunk allocator bug for new DUP chunks:
>>>
>>> https://lore.kernel.org/linux-btrfs/782f6000-30c0-0085-abd2-74ec5827c...@mendix.com/T/#m609ccb5d32998e8ba5cfa9901c1ab56a38a6f374
>>>
>>> The DUP fix is "fix more DUP stripe size handling". I did that one
>>> before starting to change more things so it can be applied to earlier
>>> LTS kernels.
>>>
>>> Besides that patch, which is fixing the bug in a way that is least
>>> intrusive, I added a bunch of other patches to help getting the chunk
>>> allocator code in a state that is a bit less error-prone and
>>> bug-attracting.
>>>
>>> When running this and trying the reproduction scenario, I can now see
>>> that the created DUP device extent is 827326464 bytes long, which is
>>> good.
>>>
>>> I wrote and tested this on top of linus 4.19-rc5. I still need to create
>>> a list of related use cases and test more things to at least walk
>>> through a bunch of obvious use cases to see if there's nothing exploding
>>> too quickly with these changes. However, I'm quite confident about it,
>>> so I'm sharing all of it already.
>>>
>>> Any feedback and review is appreciated. Be gentle and keep in mind that
>>> I'm still very much in a learning stage regarding kernel development.
>>
>> The patches look good, thanks. Problem is explained, preparatory work is
>> separated from the fix itself.
> 
> \o/
> 
>>> The stable patches handling workflow is not 100% clear to me yet. I
>>> guess I have to add a Fixes: in the DUP patch which points to the
>>> previous commit 92e222df7b.
>>
>> Almost nobody does it right, no worries. If you can identify a single
>> patch that introduces a bug then it's for Fixes:, otherwise a CC: stable
>> with version where it makes sense & applies is enough. I do that check
>> myself regardless of what's in the patch.
> 
> It's 92e222df7b and the thing I'm not sure about is if it also will
> catch the same patch which was already backported to LTS kernels since
> 92e222df7b also has Fixes in it... So by now the new bug is in 4.19,
> 4.14, 4.9, 4.4, 3.16...
> 
>> I ran the patches in a VM and hit a division-by-zero in test
>> fstests/btrfs/011, stacktrace below. First guess is that it's caused by
>> patch 3/6.
> 
> Ah interesting, dev replace.
> 
> I'll play around with replace and find out how to run the tests properly
> and then reproduce this.
> 
> The code introduced in patch 3 is removed again in patch 6, so I don't
> suspect that one. :)

Actually, while writing this I realize that this means it should be
tested separately (like, older kernel with only 3), because, well,
obvious I guess.

> But, I'll find out.
> 
> Thanks for testing.
> 
> Hans
> 
>> [ 3116.065595] BTRFS: device fsid e3bd8db5-304f-4b1a-8488-7791ea94088f devid 
>> 1 transid 5 /dev/vdb
>> [ 3116.071274] BTRFS: device fsid e3bd8db5-304f-4b1a-8488-7791ea94088f devid 
>> 2 transid 5 /dev/vdc
>> [ 3116.087086] BTRFS info (device vdb): disk space caching is enabled
>> [ 3116.088644] BTRFS info (device vdb): has skinny extents
>> [ 3116.089796] BTRFS info (device vdb): flagging fs with big metadata feature
>> [ 3116.093971] BTRFS info (device vdb): checking UUID tree
>> [ 3125.853755] BTRFS info (device vdb): dev_replace from /dev/vdb (devid 1) 
>> to /dev/vdd started
>> [ 3125.860269] divide error: 0000 [#1] PREEMPT SMP
>> [ 3125.861264] CPU: 1 PID: 6477 Comm: btrfs Not tainted 4.19.0-rc7-default+ 
>> #288
>> [ 3125.862841] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
>> 1.0.0-prebuilt.qemu-project.org 04/01/2014
>> [ 3125.865385] RIP: 0010:__btrfs_alloc_chunk+0x368/0xa70 [btrfs]
>> [ 3125.870541] RSP: 0018:ffffa4ea0409fa48 EFLAGS: 00010206
>> [ 3125.871862] RAX: 0000000004000000 RBX: ffff94e074374508 RCX: 
>> 0000000000000002
>> [ 3125.873587] RDX: 0000000000000000 RSI: ffff94e017818c80 RDI: 
>> 0000000002000000
>> [ 3125.874677] RBP: 0000000080800000 R08: 0000000000000000 R09: 
>> 0000000000000002
>> [ 3125.875816] R10: 0000000300000000 R11: 0000000080900000 R12: 
>> 0000000000000000
>> [ 3125.876742] R13: 0000000000000001 R14: 0000000000000001 R15: 
>> 0000000000000002
>> [ 3125.877657] FS:  00007f6de34208c0(0000) GS:ffff94e07d600000(0000) 
>> knlGS:0000000000000000
>> [ 3125.878862] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [ 3125.880080] CR2: 00007ffe963d5ce8 CR3: 000000007659b000 CR4: 
>> 00000000000006e0
>> [ 3125.881485] Call Trace:
>> [ 3125.882105]  do_chunk_alloc+0x266/0x3e0 [btrfs]
>> [ 3125.882841]  btrfs_inc_block_group_ro+0x10e/0x160 [btrfs]
>> [ 3125.883875]  scrub_enumerate_chunks+0x18b/0x5d0 [btrfs]
>> [ 3125.884658]  ? is_module_address+0x11/0x30
>> [ 3125.885271]  ? wait_for_completion+0x160/0x190
>> [ 3125.885928]  btrfs_scrub_dev+0x1b8/0x5a0 [btrfs]
>> [ 3125.887767]  ? start_transaction+0xa1/0x470 [btrfs]
>> [ 3125.888648]  btrfs_dev_replace_start.cold.19+0x155/0x17e [btrfs]
>> [ 3125.889459]  btrfs_dev_replace_by_ioctl+0x35/0x60 [btrfs]
>> [ 3125.890251]  btrfs_ioctl+0x2a94/0x31d0 [btrfs]
>> [ 3125.890885]  ? do_sigaction+0x7c/0x210
>> [ 3125.891731]  ? do_vfs_ioctl+0xa2/0x6b0
>> [ 3125.892652]  do_vfs_ioctl+0xa2/0x6b0
>> [ 3125.893642]  ? do_sigaction+0x1a7/0x210
>> [ 3125.894665]  ksys_ioctl+0x3a/0x70
>> [ 3125.895523]  __x64_sys_ioctl+0x16/0x20
>> [ 3125.896339]  do_syscall_64+0x5a/0x1a0
>> [ 3125.896949]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
>> [ 3125.897638] RIP: 0033:0x7f6de28ecaa7
>> [ 3125.901313] RSP: 002b:00007ffe963da9c8 EFLAGS: 00000246 ORIG_RAX: 
>> 0000000000000010
>> [ 3125.902486] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 
>> 00007f6de28ecaa7
>> [ 3125.903538] RDX: 00007ffe963dae00 RSI: 00000000ca289435 RDI: 
>> 0000000000000003
>> [ 3125.904878] RBP: 0000000000000000 R08: 0000000000000000 R09: 
>> 0000000000000000
>> [ 3125.905788] R10: 0000000000000008 R11: 0000000000000246 R12: 
>> 00007ffe963de26f
>> [ 3125.906700] R13: 0000000000000001 R14: 0000000000000004 R15: 
>> 000055fceeceb2a0
>> [ 3125.907954] Modules linked in: btrfs libcrc32c xor zstd_decompress 
>> zstd_compress xxhash raid6_pq loop
>> [ 3125.909342] ---[ end trace 5492bb467d3be2da ]---
>> [ 3125.910031] RIP: 0010:__btrfs_alloc_chunk+0x368/0xa70 [btrfs]
>> [ 3125.913600] RSP: 0018:ffffa4ea0409fa48 EFLAGS: 00010206
>> [ 3125.914595] RAX: 0000000004000000 RBX: ffff94e074374508 RCX: 
>> 0000000000000002
>> [ 3125.916209] RDX: 0000000000000000 RSI: ffff94e017818c80 RDI: 
>> 0000000002000000
>> [ 3125.917701] RBP: 0000000080800000 R08: 0000000000000000 R09: 
>> 0000000000000002
>> [ 3125.919209] R10: 0000000300000000 R11: 0000000080900000 R12: 
>> 0000000000000000
>> [ 3125.920782] R13: 0000000000000001 R14: 0000000000000001 R15: 
>> 0000000000000002
>> [ 3125.922413] FS:  00007f6de34208c0(0000) GS:ffff94e07d600000(0000) 
>> knlGS:0000000000000000
>> [ 3125.924264] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [ 3125.925627] CR2: 00007ffe963d5ce8 CR3: 000000007659b000 CR4: 
>> 00000000000006e0
>>
> 
> 


-- 
Hans van Kranenburg

Reply via email to