Patrik Lundquist posted on Fri, 25 Mar 2016 13:48:08 +0100 as excerpted:

> On 25 March 2016 at 12:49, Stephen Williams <steph...@veryfast.biz>
> wrote:
>>
>> So catch 22, you need all the drives otherwise it won't let you mount,
>> But what happens if a drive dies and the OS doesn't detect it? BTRFS
>> wont allow you to mount the raid volume to remove the bad disk!
> 
> Version of Linux and btrfs-progs?

Yes, please.  This can be very critical information as a lot of bugs will 
be fixed in new versions that are known to exist in older versions, and 
occasionally new ones are introduced as well, where older versions won't 
be affected.

> You can't have a raid10 with less than 4 devices so you need to add a
> new device before deleting the missing. That is of course still a
> problem with a read-only fs.
> 
> btrfs replace is also the recommended way to replace a failed device
> nowadays. The wiki is outdated.

In theory, what it's supposed to do in a missing device situation that 
takes it below the minimum (four devices for a raid10) for a given raid 
mode, is allow writable mounting, unless the number of missing devices is 
too high (more than one missing on raid10) to allow functional degraded 
operation.

What it will often end up doing in that case, since it can't write the 
full raid10, is once current raid10 chunks get filled up and it needs to 
create more, since it doesn't have enough devices to create them in 
raid10, it will degrade to creating them in raid1 mode.

The problem, however, is that on subsequent mounts, btrfs will see that 
single chunk in addition to the raid10 chunks, and will see the missing 
device, and knowing single mode is broken with /any/ missing devices, 
will at that point only mount read-only.

That's a currently known bug, which effectively means you may well get 
only one read-write mount to fix the problem, before btrfs will see that 
new single chunk created in the first degraded writable mount, and will 
refuse to mount writable again.

There are patches available that will fix this known bug by changing this 
detection to per-chunk, instead of per-filesystem.  The degraded-writable 
mount will still degrade to writing single chunks, but btrfs will see 
that all single chunks are accounted for, and all raid10 chunks only have 
one device missing and thus can still be used, and the filesystem will 
thus continue to be write mountable, unless of course another device 
fails.

But AFAIK, those patches were part of a patch set (the hot-spare patches) 
that as a whole wasn't picked for 4.5, tho by rights the per-chunk 
checking patches should have been cherry-picked as ready and fixing an 
existing bug, but weren't.  So as of 4.5, AFAIK, they still have to be 
applied separately before build.  Hopefully they'll be in 4.6.

However, while lack of the per-chunk checking patch would mean an 
expected situation of allowing only one degraded-writable mount before no 
more would be allowed, unless you got it to work once and didn't mention 
it, and unless that btrfs fi usage was from before that writable mount as 
it doesn't show the single-mode chunk that would then prevent further 
writable mounts, it looks like you may have a possibly related, but 
definitely more severe bug, as it appears you aren't even being allowed 
what would otherwise be expected to be that one-shot degraded-writable 
mount.

And without that, as mentioned, you have a problem, since you have to 
have a writable mount to repair the filesystem, and it's not allowing you 
even that one-shot writable mount that should be possible even with that 
known bug.

Assuming you're using a current kernel and post that information, it's 
quite likely the dev working on the other bug will be interested, and 
will have you build a kernel with those patches to see if that alone 
fixes it, before possibly having you try various debugging patches to 
hone in on the problem, if it doesn't, so he can hopefully duplicate the 
problem himself, and ultimately come up with a fix.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to