Re: btrfs kernel oops on mount

Duncan Fri, 09 Sep 2016 15:10:48 -0700

moparisthebest posted on Fri, 09 Sep 2016 15:23:13 -0400 as excerpted:

> On 09/09/2016 02:47 PM, Austin S. Hemmelgarn wrote:
>> On 2016-09-09 12:12, moparisthebest wrote:
>>> Hi,
>>>
>>> I'm hoping to get some help with mounting my btrfs array which quit
>>> working yesterday.  My array was in the middle of a balance, about 50%
>>> remaining, when it hit an error and remounted itself read-only [1].
>>> btrfs fi show output [2], btrfs df output [3].
>>>
>>> I unmounted the array, and when I tried to mount it again, it locked
>>> up the whole system so even alt+sysrq would not work.  I rebooted,
>>> tried to mount again, same lockup.  This was all kernel 4.5.7.
>>>
>>> I rebooted to kernel 4.4.0, tried to mount, crashed again, this time a
>>> message appeared on the screen and I took a picture [4].
>>>
>>> I rebooted into an arch live system with kernel 4.7.2, tried to mount
>>> again, got some dmesg output before it crashed [5] and took a picture
>>> when it crashed [6], says in part 'BUG: unable to handle kernel NULL
>>> pointer dereference at 00000000000001f0'.
>>>
>>> Is there anything I can do to get this in a working state again or
>>> perhaps even recover some data?
>>>
>>> Thanks much for any help
>>>
>>> [1]: https://www.moparisthebest.com/btrfs/initial_crash.txt [2]:
>>> https://www.moparisthebest.com/btrfs/btrfsfishow.txt [3]:
>>> https://www.moparisthebest.com/btrfs/btrfsdf.txt [4]:
>>> https://www.moparisthebest.com/btrfsoops.jpg [5]:
>>> https://www.moparisthebest.com/btrfs/dmsgprecrash.txt [6]:
>>> https://www.moparisthebest.com/btrfsnulldereference.jpg
>> 
>> The output from btrfs fi show and fi df both indicate that the
>> filesystem is essentially completely full.  You've gotten to the point
>> where your using the global metadata reserve, and I think things are
>> getting stuck trying (and failing) to reclaim the space that's used
>> there.


>> Given that the FS is pretty much wedged, I think your best bet for
>> fixing this is probably going to be to use btrfs restore to get the
>> data onto a new (larger) set of disks.  If you do take this approach, a
>> metadata dump might be useful, if somebody could find enough room to
>> extract it.

> If I read btrfs fi show right, it's got minimum ~600gb free on each one
> of the 8 drives, shouldn't that be more than enough for most things?  (I
> guess unless I have single files over 600gb that need COW'd, I don't
> though)

Austin did pick up on something I (and apparently Chris) missed, the non-
zero used global reserve, but as best I can tell he's wrongly attributing 
it to fully used devices, when as you (and Chris) point out that's not 
the case.

What he picked up on is this.  Under normal conditions, global reserve 
"used" should always be zero, as sans bugs, btrfs has to be in pretty 
dire lack of space condition before it'll start using the reserve.  Under 
most conditions, btrfs will simply ENOSPC an operation before it starts 
using reserve, so the fact that it's used indicates that btrfs *BELIEVES* 
that it is in dire straits, space-wise, and has no place to go *but* 
reserves.

But as you point out, all eight devices seem to have a half-TiB plus 
available, unallocated and free to allocate as necessary.  Given that 
btrfs raid1 only does pair-mirroring, and that chunks should be at 
absolute largest, 10 GiB, there's *plenty* of space to allocate as needed.


Which can only mean that you've hit one of those elusive ENOSPC bugs 
where there's plenty of space left to allocate, but btrfs simply refuses 
to allocate it, instead triggering ENOSPC errors left and right, and of 
particular interest here, btrfs believes the ENOSPC problems to be severe 
enough that it has even run substantially into global reserves, *DESPITE* 
there *actually* being *plenty* of space!

Now I'm not a dev (just a btrfs user and list regular) and the traces, 
etc, don't tend to add much usable information for me, so I can't judge 
whether your particular case is affected by the following or not, but as 
it so happens, there's active patches going into 4.8 dealing with some of 
these previously unsolved ENOSPC when there's *plenty* of space bugs.

So there's a fair chance the patches in either current 4.8-git or still 
in-process at this very moment will fix at least the evident false ENOSPC 
despite loads of space actually being available, which based on the fact 
that used reserve is /not/ zero was very likely the original trigger for 
the auto-remount-ro.  However, it's also possible that there are other 
issues now as well, that the current patches may /not/ fix, even if they 
fix all the ENOSPC issues, which itself I can't guarantee.  But it's 
worth a shot.

The other known problem with a known (mount-option) fix that you're 
almost certainly running into ATM is the unfinished balance, since the 
balance will try to resume once you mount the btrfs writable, and at 
least without the ENSPC patchs mentioned above, that balance is 
immediately running into the same ENSPC problem that triggered the 
remount-ro in the first place.

So try adding skip_balance to your mount options, and see if that lets 
you mount without the crash.  If it does, you can then manually run btrfs 
balance cancel to cancel the ongoing balance, allowing you to mount 
normally (without skip_balance) again.  However, you might want to try 
the ENOSPC patches first, before canceling the balance, since the cancel 
by definition will lose your place in the balance, and presumably you 
were doing a balance for some reason and would thus have to restart it.


So what I'd try, in order:

0) Btrfs is still considered stabilizing, not yet fully stable and 
mature, so the usual sysadmin's rule of backups, that you either have 
them or by virtue of skipping them, you're defining your data as of less 
value to you than the hassle and resources a backup would otherwise 
require, regardless of any claims to the contrary, applies even more 
strongly than it does to a normally stable and mature filesystem.

So if you don't have backups (or they're outdated) and you are now 
reconsidering your definition of that data as not worth the hassle of 
backups, your first priority is getting those backups, even before repair 
of the filesystem.

If that is your case, I'd try mounting read-only and taking the backup 
from there if you can, or using btrfs restore if you can't mount read-
only.

Then of course be aware of what a failure to have backups actually means 
in terms of how you are defining the value of your data (or the value of 
the delta between the current data and the data at the time of the last 
backup, if you have them but they aren't absolutely current), and act 
accordingly.  If that means btrfs is no longer an appropriate choice for 
you due to the stronger backups rule application, that's what it means.

1) A quick mount with skip_balance using your existing kernel, just to 
see if it lets you mount without an immediate crash.  If it does, we know 
it was the resuming balance that was the problem.  But don't cancel the 
balance just yet so you don't lose your place in it.

Of course if the option works, this is a nice place to take/update 
backups too. =:^)

2) A mount with the very latest 4.8-rc or git kernel, possibly with 
further enospc patches applied (I'm not sure if they've all reached 
mainline yet).  If you're really lucky, these enospc patches will let you 
continue the existing in-process balance from where you left off, thus 
avoiding the cancel.

If you're less lucky but still in good shape, they'll fix the root 
problem but the balance already got the btrfs so wedged that you'll still 
have to mount with skip_balance, then cancel the balance, losing your 
place, and then presumably restart a new one.

3) Given the currently active enospc work, find the threads discussing 
those patches and either confirm that they fixed your enospc problem, or 
catch up on the status of the current patches and what sorts of debugging 
and testing the devs are having reporters do, and either confirm a 
remaining issue on those threads or get prepared to do a new bug, if the 
issue appears to yet another enospc bug, that isn't addressed by those 
patches.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: btrfs kernel oops on mount

Reply via email to