Le 2015-09-16 07:02, Duncan a écrit :
Stéphane Lesimple posted on Tue, 15 Sep 2015 23:47:01 +0200 as
excerpted:
Le 2015-09-15 16:56, Josef Bacik a écrit :
On 09/15/2015 10:47 AM, Stéphane Lesimple wrote:
I've been experiencing repetitive "kernel BUG" occurences in the
past
few days trying to balance a raid5 filesystem after adding a new
drive.
It occurs on both 4.2.0 and 4.1.7, using 4.2 userspace tools.
I've ran a scrub on this filesystem after the crash happened twice,
and if found no errors.
The BUG_ON() condition that my filesystem triggers is the following
:
BUG_ON(owner < BTRFS_FIRST_FREE_OBJECTID);
// in insert_inline_extent_backref() of extent-tree.c.
Does btrfsck complain at all?
Just to elucidate a bit...
[...]
Which is where btrfs check comes in and why JB asked you to run it,
since
unlike scrub, check is designed to catch filesystem logic errors.
Thanks for your clarification Duncan, that perfectly makes sense.
You're right, even if btrfs scrub didn't complain, btrfsck does :
checking extents
bad metadata [4179166806016, 4179166822400) crossing stripe boundary
bad metadata [4179166871552, 4179166887936) crossing stripe boundary
bad metadata [4179166937088, 4179166953472) crossing stripe boundary
This is an actively in-focus bug ATM, and while I'm not a dev and can't
tell you for sure that it's behind the specific balance-related crash
and
traces you posted (tho I believe it so), it certainly has the potential
to be that serious, yes.
The most common cause is a buggy btrfs-convert that was creating
invalid
btrfs when converting from ext* at one point. AFAIK they've hotfixed
the
immediate convert issue, but are still actively working on a longer
term
proper fix. Meanwhile, while btrfs check does now detect the issue
(and
even that is quite new code, added in 4.2 I believe), there's still no
real fix for what was after all a defective btrfs from the moment the
convert was done.
[...]
If, however, you created the filesystem using mkfs.btrfs, then the
problem must have occurred some other way. Whether there's some other
cause beyond the known cause, a buggy btrfs-convert, has in fact been
in
question, so in this case the devs are likely to be quite interested
indeed in your case and perhaps the filesystem history that brought you
to this point. The ultimate fix is likely to be the same (unless the
devs have you test new fix code for btrfs check --repair), but I'd
strongly urge you to delay blowing away the filesystem, if possible,
until the devs have a chance to ask you to run other diagnostics and
perhaps even get a btrfs-image for them, since you may well have
accidentally found a corner-case they'll have trouble reproducing,
without your information.
Nice to know that this bug was already somewhat known, but I can confirm
that it actually doesn't come from an ext4 conversion on my case.
Here is the filesystem history, which is actually quite short :
- FS created from scratch, no convert, on 2x4T devices using mkfs.btrfs
with raid1 metadata, raid5 data. This is using the 4.2 tools and kernel
3.19, so a couple incompat features were turned on by default (such as
skinny metadata).
- Approx. 4T worth of files copied to it, a bit less, I had around 30G
free after the copy.
- Upgraded to kernel 4.2.0
- Added a third 4T device to the filesystem
- Ran a balance to get an even repartition of data/metadata among the 3
drives
- Kernel BUG after a couple hours. The btrfs balance userspace tool
segfaulted at the same time. Due to apport default configuration (damn
you, Ubuntu !), core file was discarded, but I don't think the segfault
is really interesting. The kernel trace is.
This was all done within ~1 week.
I've just created an image of the metadata, using btrfs-image -s. The
image is 2.9G large, I can drop it somewhere in case a dev would like to
have a look at it.
For what it's worth, I've been hitting another kernel BUG, almost
certainly related, while trying to dev del the 3rd device, after 8 hours
of work (kernel 4.1.7) :
kernel BUG at /home/kernel/COD/linux/fs/btrfs/extent-tree.c:2248!
in __btrfs_run_delayed_refs+0x11a1/0x1230 [btrfs]
Trace:
[<ffffffff813d9a65>] ? __percpu_counter_add+0x55/0x70
[<ffffffffc02ea483>] btrfs_run_delayed_refs.part.66+0x73/0x270 [btrfs]
[<ffffffffc02ea697>] btrfs_run_delayed_refs+0x17/0x20 [btrfs]
[<ffffffffc02fb169>] btrfs_should_end_transaction+0x49/0x60 [btrfs]
[<ffffffffc02e8aa2>] btrfs_drop_snapshot+0x472/0x880 [btrfs]
[<ffffffffc034ab00>] ? should_ignore_root.part.15+0x50/0x50 [btrfs]
[<ffffffffc034fd49>] merge_reloc_roots+0xd9/0x240 [btrfs]
[<ffffffffc0350119>] relocate_block_group+0x269/0x670 [btrfs]
[<ffffffffc03506f6>] btrfs_relocate_block_group+0x1d6/0x2e0 [btrfs]
[<ffffffffc0323cbe>] btrfs_relocate_chunk.isra.38+0x3e/0xc0 [btrfs]
[<ffffffffc0324944>] btrfs_shrink_device+0x1d4/0x450 [btrfs]
[<ffffffffc0328d43>] btrfs_rm_device+0x323/0x810 [btrfs]
[<ffffffffc0334ee6>] btrfs_ioctl+0x1e86/0x2b30 [btrfs]
[<ffffffff81183544>] ? filemap_map_pages+0x1d4/0x230
[<ffffffff811b29f5>] ? handle_mm_fault+0xd95/0x17e0
[<ffffffff81115112>] ? from_kgid_munged+0x12/0x20
[<ffffffff811fe710>] ? cp_new_stat+0x140/0x160
[<ffffffff8120ce68>] do_vfs_ioctl+0x2f8/0x510
[<ffffffff81066f76>] ? __do_page_fault+0x1b6/0x450
[<ffffffff811fe75f>] ? SYSC_newstat+0x2f/0x40
[<ffffffff8120d101>] SyS_ioctl+0x81/0xa0
[<ffffffff81067240>] ? do_page_fault+0x30/0x80
[<ffffffff817d8ab2>] system_call_fastpath+0x16/0x75
If JB or any other btrfs dev wants me to try anything at this filesystem
before I recreate it from scratch, such as a kernel patch or userland
tool patch, or run a more verbose debug balance, I would be happy to do
so.
If this is the case, please tell me, so I can keep the filesystem as it
is. On the other hand if you're sure the btrfs-image is enough, please
tell me too, so I can go forward and fix my system :)
Thanks,
--
Stéphane.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html