Re: raid1: cannot add disk to replace faulty because can only mount fs as read-only.

Hans Deragon Tue, 31 Jan 2017 19:07:08 -0800

On 2017-01-30 07:18, Austin S. Hemmelgarn wrote:
> On 2017-01-28 04:17, Andrei Borzenkov wrote:
>> 27.01.2017 23:03, Austin S. Hemmelgarn пишет:
>>> On 2017-01-27 11:47, Hans Deragon wrote:
>>>> On 2017-01-24 14:48, Adam Borowski wrote:
>>>>
>>>>> On Tue, Jan 24, 2017 at 01:57:24PM -0500, Hans Deragon wrote:
>>>>>
>>>>>> If I remove 'ro' from the option, I cannot get the filesystem mounted
>>>>>> because of the following error: BTRFS: missing devices(1) exceeds the
>>>>>> limit(0), writeable mount is not allowed So I am stuck. I can only
>>>>>> mount the filesystem as read-only, which prevents me to add a disk.
>>>>>
>>>>> A known problem: you get only one shot at fixing the filesystem, but
>>>>> that's
>>>>> not because of some damage but because the check whether the fs is
>>>>> in a
>>>>> shape is good enough to mount is oversimplistic.
>>>>>
>>>>> Here's a patch, if you apply it and recompile, you'll be able to mount
>>>>> degraded rw.
>>>>>
>>>>> Note that it removes a safety harness: here, the harness got tangled
>>>>> up and
>>>>> keeps you from recovering when it shouldn't, but it _has_ valid uses
>>>>> that.
>>>>>
>>>>> Meow!
>>>>
>>>> Greetings,
>>>>
>>>> Ok, that solution will solve my problem in the short run, i.e. getting
>>>> my raid1 up again.
>>>>
>>>> However, as a user, I am seeking for an easy, no maintenance raid
>>>> solution.  I wish that if a drive fails, the btrfs filesystem still
>>>> mounts rw and leaves the OS running, but warns the user of the failing
>>>> disk and easily allow the addition of a new drive to reintroduce
>>>> redundancy.  Are there any plans within the btrfs community to
>>>> implement
>>>> such a feature?  In a year from now, when the other drive will fail,
>>>> will I hit again this problem, i.e. my OS failing to start, booting
>>>> into
>>>> a terminal, and cannot reintroduce a new drive without recompiling the
>>>> kernel?
>>> Before I make any suggestions regarding this, I should point out that
>>> mounting read-write when a device is missing is what caused this issue
>>> in the first place.
>>
>>
>> How do you replace device when filesystem is mounted read-only?
>>
> I'm saying that the use case you're asking to have supported is the
> reason stuff like this happens.  If you're mounting read-write degraded
> and fixing the filesystem _immediately_ then it's not an issue, that's
> exactly what read-write degraded mounts are for.  If you're mounting
> read-write degraded and then having the system run as if nothing was
> wrong, then I have zero sympathy because that's _dangerous_, even with
> LVM, MD-RAID, or even hardware RAID (actually, especially with hardware
> RAID, LVM and MD are smart enough to automatically re-sync, most
> hardware RAID controllers aren't).
> 
> That said, as I mentioned further down in my initial reply, you
> absolutely should be monitoring the filesystem and not letting things
> get this bad if at all possible.  It's actually very rare that a storage
> device fails catastrophically with no warning (at least, on the scale
> that most end users are operating).  At a minimum, even if you're using
> ext4 on top of LVM, you should be monitoring SMART attributes on the
> storage devices (or whatever the SCSI equivalent is if you use
> SCSI/SAS/FC devices).  While not 100% reliable (they are getting better
> though), they're generally a pretty good way to tell if a disk is likely
> to fail in the near future.


Greetings,

I totally understand your concerns.  However, anybody using raid is a
grown up and though for them if they do not understand this.  But the
current scenario makes it difficult for me to put redundancy back into
service!  How much time did I waited until I find the mailing list,
subscribe to it, post my email and get an answer?  Wouldn't it be better
if the user could actually add the disk at anytime, mostly ASAP?

And to fix this, I have to learn how to patch and compile the kernel.  I
have not done this since the beginning of the century.  More delays,
more risk added to the system (what if I compile the kernel with the
wrong parameters?).  Fortunately, my raid1 system is for my home system
and I do not need that data available right now.  The data is safe, but
I have no time to fiddle with this issue and put the raid1 in service by
compiling a new kernel.  I do have the extra drive sitting on my desk,
useless for the moment.

Which market is btrfs raid targeted for?  In the enterprise world, all
the proprietary raid solutions I know of alerts admins when a disk is a
problem, but allows continuous, uninterrupted read-write service.  If
you have hundreds of employees depending upon a NAS, you do not want
them to turn their thumbs until the new drive is put in place.  In SOHO,
the admin is often outsourced, maybe attending someone's else problem
when the drive failure occurs.  What should then the business do?  Tell
everybody to go home?

Is this the same problem with raid6?  If one drive dies, system goes
down even if redundancy still remains?

Best regards,
Hans Deragon

signature.asc
Description: OpenPGP digital signature

Re: raid1: cannot add disk to replace faulty because can only mount fs as read-only.

Reply via email to