Re: [btrfs tools] ability to fail a device...

Ian Kumlien Wed, 09 Sep 2015 00:08:05 -0700

On 9 September 2015 at 03:35, Anand Jain <anand.j...@oracle.com> wrote:
> On 09/09/2015 03:34 AM, Hugo Mills wrote:
>>
>> On Tue, Sep 08, 2015 at 09:18:05PM +0200, Ian Kumlien wrote:
>>>
>>> Hi,
>>>
>>> Currently i have a raid1 configuration on two disks where one of them
>>> is failing.
>>>
>>> But since:
>>> btrfs fi df /mnt/disk/
>>> Data, RAID1: total=858.00GiB, used=638.16GiB
>>> Data, single: total=1.00GiB, used=256.00KiB
>>> System, RAID1: total=32.00MiB, used=132.00KiB
>>> Metadata, RAID1: total=4.00GiB, used=1.21GiB
>>> GlobalReserve, single: total=412.00MiB, used=0.00B
>>>
>>> There should be no problem in failing one disk... Or so i thought!
>>>
>>> btrfs dev delete /dev/sdb2 /mnt/disk/
>>> ERROR: error removing the device '/dev/sdb2' - unable to go below two
>>> devices on raid1
>>
>>
>>     dev delete is more like a reshaping operation in mdadm: it tries to
>> remove a device safely whilst retaining all of the redundancy
>> guarantees. You can't go down to one device with RAID-1 and still keep
>> the redundancy.
>>
>>     dev delete is really for managed device removal under non-failure
>> conditions, not for error recovery.
>>
>>> And i can't issue rebalance either since it will tell me about errors
>>> until the failing disk dies.
>>>
>>> Whats even more interesting is that i can't mount just the working
>>> disk - ie if the other disk
>>> *has* failed and is inaccessible... though, i haven't tried physically
>>> removing it...
>>
>>
>>     Physically removing it is the way to go (or disabling it using echo
>> offline >/sys/block/sda/device/state). Once you've done that, you can
>> mount the degraded FS with -odegraded, then either add a new device
>> and balance to restore the RAID-1, or balance with
>> -{d,m}convert=single to drop the redundancy to single.
>
>
>  its like you _must_ add a disk in this context otherwise the volume will
> render unmountable in the next mount cycle. the below mentioned patch has
> more details.


Which would mean that if the disk dies, you have a unusable disk.
(and in my case adding a disk might not help since it would try to
read from the broken
one until it completely fails again)

>>> mdam has fail and remove, I assume for this reason - perhaps it's
>>> something that should be added?
>>
>>
>>     I think there should be a btrfs dev drop, which is the fail-like
>> operation: tell the FS that a device is useless, and should be dropped
>> from the array, so the FS doesn't keep trying to write to it. That's
>> not implemented yet, though.
>
>
>
>  There is a patch set to handle this..
>     'Btrfs: introduce function to handle device offline'

I'll have a look

> Thanks, Anand
>
>>     Hugo.
>>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [btrfs tools] ability to fail a device...

Reply via email to