extent-tree.c:1833 on rebalance

Stéphane Lesimple Wed, 16 Sep 2015 03:28:46 -0700

Le 2015-09-16 07:02, Duncan a écrit :

Stéphane Lesimple posted on Tue, 15 Sep 2015 23:47:01 +0200 asexcerpted:
Le 2015-09-15 16:56, Josef Bacik a écrit :
On 09/15/2015 10:47 AM, Stéphane Lesimple wrote:
I've been experiencing repetitive "kernel BUG" occurences in thepast
few days trying to balance a raid5 filesystem after adding a new
drive.
It occurs on both 4.2.0 and 4.1.7, using 4.2 userspace tools.
I've ran a scrub on this filesystem after the crash happened twice,
and if found no errors.
The BUG_ON() condition that my filesystem triggers is the following:
BUG_ON(owner < BTRFS_FIRST_FREE_OBJECTID);
// in insert_inline_extent_backref() of extent-tree.c.
Does btrfsck complain at all?
Just to elucidate a bit...
[...]
Which is where btrfs check comes in and why JB asked you to run it,since
unlike scrub, check is designed to catch filesystem logic errors.


Thanks for your clarification Duncan, that perfectly makes sense.

You're right, even if btrfs scrub didn't complain, btrfsck does :

checking extents
bad metadata [4179166806016, 4179166822400) crossing stripe boundary
bad metadata [4179166871552, 4179166887936) crossing stripe boundary
bad metadata [4179166937088, 4179166953472) crossing stripe boundary


This is an actively in-focus bug ATM, and while I'm not a dev and can't

tell you for sure that it's behind the specific balance-related crashand

traces you posted (tho I believe it so), it certainly has the potential
to be that serious, yes.

The most common cause is a buggy btrfs-convert that was creatinginvalidbtrfs when converting from ext* at one point. AFAIK they've hotfixedtheimmediate convert issue, but are still actively working on a longertermproper fix. Meanwhile, while btrfs check does now detect the issue(and

even that is quite new code, added in 4.2 I believe), there's still no
real fix for what was after all a defective btrfs from the moment the
convert was done.
[...]
If, however, you created the filesystem using mkfs.btrfs, then the
problem must have occurred some other way.  Whether there's some other

cause beyond the known cause, a buggy btrfs-convert, has in fact beenin

question, so in this case the devs are likely to be quite interested
indeed in your case and perhaps the filesystem history that brought you
to this point.  The ultimate fix is likely to be the same (unless the
devs have you test new fix code for btrfs check --repair), but I'd
strongly urge you to delay blowing away the filesystem, if possible,
until the devs have a chance to ask you to run other diagnostics and
perhaps even get a btrfs-image for them, since you may well have
accidentally found a corner-case they'll have trouble reproducing,
without your information.

Nice to know that this bug was already somewhat known, but I can confirmthat it actually doesn't come from an ext4 conversion on my case.


Here is the filesystem history, which is actually quite short :

- FS created from scratch, no convert, on 2x4T devices using mkfs.btrfswith raid1 metadata, raid5 data. This is using the 4.2 tools and kernel3.19, so a couple incompat features were turned on by default (such asskinny metadata).- Approx. 4T worth of files copied to it, a bit less, I had around 30Gfree after the copy.

- Upgraded to kernel 4.2.0
- Added a third 4T device to the filesystem

- Ran a balance to get an even repartition of data/metadata among the 3drives- Kernel BUG after a couple hours. The btrfs balance userspace toolsegfaulted at the same time. Due to apport default configuration (damnyou, Ubuntu !), core file was discarded, but I don't think the segfaultis really interesting. The kernel trace is.


This was all done within ~1 week.

I've just created an image of the metadata, using btrfs-image -s. Theimage is 2.9G large, I can drop it somewhere in case a dev would like tohave a look at it.

For what it's worth, I've been hitting another kernel BUG, almostcertainly related, while trying to dev del the 3rd device, after 8 hoursof work (kernel 4.1.7) :


kernel BUG at /home/kernel/COD/linux/fs/btrfs/extent-tree.c:2248!
in __btrfs_run_delayed_refs+0x11a1/0x1230 [btrfs]

Trace:
[<ffffffff813d9a65>] ? __percpu_counter_add+0x55/0x70
[<ffffffffc02ea483>] btrfs_run_delayed_refs.part.66+0x73/0x270 [btrfs]
[<ffffffffc02ea697>] btrfs_run_delayed_refs+0x17/0x20 [btrfs]
[<ffffffffc02fb169>] btrfs_should_end_transaction+0x49/0x60 [btrfs]
[<ffffffffc02e8aa2>] btrfs_drop_snapshot+0x472/0x880 [btrfs]
[<ffffffffc034ab00>] ? should_ignore_root.part.15+0x50/0x50 [btrfs]
[<ffffffffc034fd49>] merge_reloc_roots+0xd9/0x240 [btrfs]
[<ffffffffc0350119>] relocate_block_group+0x269/0x670 [btrfs]
[<ffffffffc03506f6>] btrfs_relocate_block_group+0x1d6/0x2e0 [btrfs]
[<ffffffffc0323cbe>] btrfs_relocate_chunk.isra.38+0x3e/0xc0 [btrfs]
[<ffffffffc0324944>] btrfs_shrink_device+0x1d4/0x450 [btrfs]
[<ffffffffc0328d43>] btrfs_rm_device+0x323/0x810 [btrfs]
[<ffffffffc0334ee6>] btrfs_ioctl+0x1e86/0x2b30 [btrfs]
[<ffffffff81183544>] ? filemap_map_pages+0x1d4/0x230
[<ffffffff811b29f5>] ? handle_mm_fault+0xd95/0x17e0
[<ffffffff81115112>] ? from_kgid_munged+0x12/0x20
[<ffffffff811fe710>] ? cp_new_stat+0x140/0x160
[<ffffffff8120ce68>] do_vfs_ioctl+0x2f8/0x510
[<ffffffff81066f76>] ? __do_page_fault+0x1b6/0x450
[<ffffffff811fe75f>] ? SYSC_newstat+0x2f/0x40
[<ffffffff8120d101>] SyS_ioctl+0x81/0xa0
[<ffffffff81067240>] ? do_page_fault+0x30/0x80
[<ffffffff817d8ab2>] system_call_fastpath+0x16/0x75

If JB or any other btrfs dev wants me to try anything at this filesystembefore I recreate it from scratch, such as a kernel patch or userlandtool patch, or run a more verbose debug balance, I would be happy to doso.If this is the case, please tell me, so I can keep the filesystem as itis. On the other hand if you're sure the btrfs-image is enough, pleasetell me too, so I can go forward and fix my system :)


Thanks,

--
Stéphane.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: kernel BUG at linux-4.2.0/fs/btrfs/extent-tree.c:1833 on rebalance

Reply via email to