Re: [btrfs tools] ability to fail a device...

2015-09-09 Thread Ian Kumlien
On 9 September 2015 at 09:07, Ian Kumlien  wrote:
> On 9 September 2015 at 03:35, Anand Jain  wrote:
> >  There is a patch set to handle this..
> > 'Btrfs: introduce function to handle device offline'
>
> I'll have a look

So from my very quick look at the code that i could find (can only
find patch set 3 for some reason) this would not fix it properly ;)

(Completely lost all it's formatting but:)
+ if ((rw_devices > 1) &&
+ (degrade_option || tolerated_fail > missing)) {
+ btrfs_sysfs_rm_device_link(fs_devices, dev, 0);
+ __btrfs_put_dev_offline(dev);
+ return;
+ }

I think that this has to be a evaluation on if there is a "complete
copy" on the device(s) we have.
If there is, then we can populate any other device and the system
should still be viable
(this includes things like 'doing the math' to replace missing disks
in raid5 and 6 btw)

Do you have the patches somewhere? They don't seem to apply to 4.2
(been looking at line numbers)
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [btrfs tools] ability to fail a device...

2015-09-09 Thread Ian Kumlien
On 9 September 2015 at 03:35, Anand Jain  wrote:
> On 09/09/2015 03:34 AM, Hugo Mills wrote:
>>
>> On Tue, Sep 08, 2015 at 09:18:05PM +0200, Ian Kumlien wrote:
>>>
>>> Hi,
>>>
>>> Currently i have a raid1 configuration on two disks where one of them
>>> is failing.
>>>
>>> But since:
>>> btrfs fi df /mnt/disk/
>>> Data, RAID1: total=858.00GiB, used=638.16GiB
>>> Data, single: total=1.00GiB, used=256.00KiB
>>> System, RAID1: total=32.00MiB, used=132.00KiB
>>> Metadata, RAID1: total=4.00GiB, used=1.21GiB
>>> GlobalReserve, single: total=412.00MiB, used=0.00B
>>>
>>> There should be no problem in failing one disk... Or so i thought!
>>>
>>> btrfs dev delete /dev/sdb2 /mnt/disk/
>>> ERROR: error removing the device '/dev/sdb2' - unable to go below two
>>> devices on raid1
>>
>>
>> dev delete is more like a reshaping operation in mdadm: it tries to
>> remove a device safely whilst retaining all of the redundancy
>> guarantees. You can't go down to one device with RAID-1 and still keep
>> the redundancy.
>>
>> dev delete is really for managed device removal under non-failure
>> conditions, not for error recovery.
>>
>>> And i can't issue rebalance either since it will tell me about errors
>>> until the failing disk dies.
>>>
>>> Whats even more interesting is that i can't mount just the working
>>> disk - ie if the other disk
>>> *has* failed and is inaccessible... though, i haven't tried physically
>>> removing it...
>>
>>
>> Physically removing it is the way to go (or disabling it using echo
>> offline >/sys/block/sda/device/state). Once you've done that, you can
>> mount the degraded FS with -odegraded, then either add a new device
>> and balance to restore the RAID-1, or balance with
>> -{d,m}convert=single to drop the redundancy to single.
>
>
>  its like you _must_ add a disk in this context otherwise the volume will
> render unmountable in the next mount cycle. the below mentioned patch has
> more details.

Which would mean that if the disk dies, you have a unusable disk.
(and in my case adding a disk might not help since it would try to
read from the broken
one until it completely fails again)

>>> mdam has fail and remove, I assume for this reason - perhaps it's
>>> something that should be added?
>>
>>
>> I think there should be a btrfs dev drop, which is the fail-like
>> operation: tell the FS that a device is useless, and should be dropped
>> from the array, so the FS doesn't keep trying to write to it. That's
>> not implemented yet, though.
>
>
>
>  There is a patch set to handle this..
> 'Btrfs: introduce function to handle device offline'

I'll have a look

> Thanks, Anand
>
>> Hugo.
>>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [btrfs tools] ability to fail a device...

2015-09-08 Thread Ian Kumlien
On 8 September 2015 at 21:43, Ian Kumlien  wrote:
> On 8 September 2015 at 21:34, Hugo Mills  wrote:
>> On Tue, Sep 08, 2015 at 09:18:05PM +0200, Ian Kumlien wrote:
[--8<--]

>>Physically removing it is the way to go (or disabling it using echo
>> offline >/sys/block/sda/device/state). Once you've done that, you can
>> mount the degraded FS with -odegraded, then either add a new device
>> and balance to restore the RAID-1, or balance with
>> -{d,m}convert=single to drop the redundancy to single.
>
> This did not work...

And removing the pyscial device is not the answer either... until i
did a read only mount ;)

Didn't expect it to fail with unable to open ctree like that...

[--8<--]
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [btrfs tools] ability to fail a device...

2015-09-08 Thread Chris Murphy
On Tue, Sep 8, 2015 at 2:00 PM, Ian Kumlien  wrote:
> On 8 September 2015 at 21:55, Ian Kumlien  wrote:
>> On 8 September 2015 at 21:43, Ian Kumlien  wrote:
>>> On 8 September 2015 at 21:34, Hugo Mills  wrote:
 On Tue, Sep 08, 2015 at 09:18:05PM +0200, Ian Kumlien wrote:
>> [--8<--]
>>
Physically removing it is the way to go (or disabling it using echo
 offline >/sys/block/sda/device/state). Once you've done that, you can
 mount the degraded FS with -odegraded, then either add a new device
 and balance to restore the RAID-1, or balance with
 -{d,m}convert=single to drop the redundancy to single.
>>>
>>> This did not work...
>>
>> And removing the pyscial device is not the answer either... until i
>> did a read only mount ;)
>>
>> Didn't expect it to fail with unable to open ctree like that...
>
> Someone thought they were done too early, only one disk => read only
> mount. But, readonly mount => no balance.
>
> I think something is wrong
>
> btrfs balance start -dconvert=single -mconvert=single /mnt/disk/
> ERROR: error during balancing '/mnt/disk/' - Read-only file system
>
> btrfs dev delete missing /mnt/disk/
> ERROR: error removing the device 'missing' - Read-only file system
>
> Any mount without ro becomes:
> [  507.236652] BTRFS info (device sda2): allowing degraded mounts
> [  507.236655] BTRFS info (device sda2): disk space caching is enabled
> [  507.325365] BTRFS: bdev (null) errs: wr 2036894, rd 2031380, flush
> 705, corrupt 0, gen 0
> [  510.983321] BTRFS: too many missing devices, writeable mount is not allowed
> [  511.006241] BTRFS: open_ctree failed
>
> And one of them has to give! ;)


You've run into this:
https://bugzilla.kernel.org/show_bug.cgi?id=92641




-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [btrfs tools] ability to fail a device...

2015-09-08 Thread Ian Kumlien
On 8 September 2015 at 22:17, Chris Murphy  wrote:
> On Tue, Sep 8, 2015 at 2:13 PM, Ian Kumlien  wrote:
>> On 8 September 2015 at 22:08, Chris Murphy  wrote:
>>> On Tue, Sep 8, 2015 at 2:00 PM, Ian Kumlien  wrote:
>>
>> [--8<--]
>>
 Someone thought they were done too early, only one disk => read only
 mount. But, readonly mount => no balance.

 I think something is wrong

 btrfs balance start -dconvert=single -mconvert=single /mnt/disk/
 ERROR: error during balancing '/mnt/disk/' - Read-only file system

 btrfs dev delete missing /mnt/disk/
 ERROR: error removing the device 'missing' - Read-only file system

 Any mount without ro becomes:
 [  507.236652] BTRFS info (device sda2): allowing degraded mounts
 [  507.236655] BTRFS info (device sda2): disk space caching is enabled
 [  507.325365] BTRFS: bdev (null) errs: wr 2036894, rd 2031380, flush
 705, corrupt 0, gen 0
 [  510.983321] BTRFS: too many missing devices, writeable mount is not 
 allowed
 [  511.006241] BTRFS: open_ctree failed

 And one of them has to give! ;)
>>>
>>>
>>> You've run into this:
>>> https://bugzilla.kernel.org/show_bug.cgi?id=92641
>>
>> Ah, I thought it might not be known - I'm currently copying the files
>> since a read only mount is "good enough" for that
>>
>> -o degraded should allow readwrite *IF* the data can be written to
>> My question is also, would this keep me from "adding devices"?
>> I mean, it did seem like a catch 22 earlier, but that would really
>> make a mess of things...
>
> It is not possible to add a device to an ro filesystem, so effectively
> the fs read-writeability is broken in this case.

Wow, now that's quite a bug!

> --
> Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [btrfs tools] ability to fail a device...

2015-09-08 Thread Ian Kumlien
On 8 September 2015 at 21:34, Hugo Mills  wrote:
> On Tue, Sep 08, 2015 at 09:18:05PM +0200, Ian Kumlien wrote:
>> Hi,
>>
>> Currently i have a raid1 configuration on two disks where one of them
>> is failing.
>>
>> But since:
>> btrfs fi df /mnt/disk/
>> Data, RAID1: total=858.00GiB, used=638.16GiB
>> Data, single: total=1.00GiB, used=256.00KiB
>> System, RAID1: total=32.00MiB, used=132.00KiB
>> Metadata, RAID1: total=4.00GiB, used=1.21GiB
>> GlobalReserve, single: total=412.00MiB, used=0.00B
>>
>> There should be no problem in failing one disk... Or so i thought!
>>
>> btrfs dev delete /dev/sdb2 /mnt/disk/
>> ERROR: error removing the device '/dev/sdb2' - unable to go below two
>> devices on raid1
>
>dev delete is more like a reshaping operation in mdadm: it tries to
> remove a device safely whilst retaining all of the redundancy
> guarantees. You can't go down to one device with RAID-1 and still keep
> the redundancy.
>
>dev delete is really for managed device removal under non-failure
> conditions, not for error recovery.
>
>> And i can't issue rebalance either since it will tell me about errors
>> until the failing disk dies.
>>
>> Whats even more interesting is that i can't mount just the working
>> disk - ie if the other disk
>> *has* failed and is inaccessible... though, i haven't tried physically
>> removing it...
>
>Physically removing it is the way to go (or disabling it using echo
> offline >/sys/block/sda/device/state). Once you've done that, you can
> mount the degraded FS with -odegraded, then either add a new device
> and balance to restore the RAID-1, or balance with
> -{d,m}convert=single to drop the redundancy to single.

This did not work...

[ 1742.368079] BTRFS info (device sda2): The free space cache file
(280385028096) is invalid. skip it
[ 1789.052403] BTRFS: open /dev/sdb2 failed
[ 1789.064629] BTRFS info (device sda2): allowing degraded mounts
[ 1789.064632] BTRFS info (device sda2): disk space caching is enabled
[ 1789.092286] BTRFS: bdev /dev/sdb2 errs: wr 2036894, rd 2031380,
flush 705, corrupt 0, gen 0
[ 1792.625275] BTRFS: too many missing devices, writeable mount is not allowed
[ 1792.644407] BTRFS: open_ctree failed

>> mdam has fail and remove, I assume for this reason - perhaps it's
>> something that should be added?
>
>I think there should be a btrfs dev drop, which is the fail-like
> operation: tell the FS that a device is useless, and should be dropped
> from the array, so the FS doesn't keep trying to write to it. That's
> not implemented yet, though.

Damn it =)

>Hugo.
>
> --
> Hugo Mills | Alert status mauve ocelot: Slight chance of
> hugo@... carfax.org.uk | brimstone. Be prepared to make a nice cup of tea.
> http://carfax.org.uk/  |
> PGP: E2AB1DE4  |
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [btrfs tools] ability to fail a device...

2015-09-08 Thread Hugo Mills
On Tue, Sep 08, 2015 at 09:18:05PM +0200, Ian Kumlien wrote:
> Hi,
> 
> Currently i have a raid1 configuration on two disks where one of them
> is failing.
> 
> But since:
> btrfs fi df /mnt/disk/
> Data, RAID1: total=858.00GiB, used=638.16GiB
> Data, single: total=1.00GiB, used=256.00KiB
> System, RAID1: total=32.00MiB, used=132.00KiB
> Metadata, RAID1: total=4.00GiB, used=1.21GiB
> GlobalReserve, single: total=412.00MiB, used=0.00B
> 
> There should be no problem in failing one disk... Or so i thought!
> 
> btrfs dev delete /dev/sdb2 /mnt/disk/
> ERROR: error removing the device '/dev/sdb2' - unable to go below two
> devices on raid1

   dev delete is more like a reshaping operation in mdadm: it tries to
remove a device safely whilst retaining all of the redundancy
guarantees. You can't go down to one device with RAID-1 and still keep
the redundancy.

   dev delete is really for managed device removal under non-failure
conditions, not for error recovery.

> And i can't issue rebalance either since it will tell me about errors
> until the failing disk dies.
> 
> Whats even more interesting is that i can't mount just the working
> disk - ie if the other disk
> *has* failed and is inaccessible... though, i haven't tried physically
> removing it...

   Physically removing it is the way to go (or disabling it using echo
offline >/sys/block/sda/device/state). Once you've done that, you can
mount the degraded FS with -odegraded, then either add a new device
and balance to restore the RAID-1, or balance with
-{d,m}convert=single to drop the redundancy to single.

> mdam has fail and remove, I assume for this reason - perhaps it's
> something that should be added?

   I think there should be a btrfs dev drop, which is the fail-like
operation: tell the FS that a device is useless, and should be dropped
from the array, so the FS doesn't keep trying to write to it. That's
not implemented yet, though.

   Hugo.

-- 
Hugo Mills | Alert status mauve ocelot: Slight chance of
hugo@... carfax.org.uk | brimstone. Be prepared to make a nice cup of tea.
http://carfax.org.uk/  |
PGP: E2AB1DE4  |


signature.asc
Description: Digital signature


Re: [btrfs tools] ability to fail a device...

2015-09-08 Thread Ian Kumlien
On 8 September 2015 at 21:55, Ian Kumlien  wrote:
> On 8 September 2015 at 21:43, Ian Kumlien  wrote:
>> On 8 September 2015 at 21:34, Hugo Mills  wrote:
>>> On Tue, Sep 08, 2015 at 09:18:05PM +0200, Ian Kumlien wrote:
> [--8<--]
>
>>>Physically removing it is the way to go (or disabling it using echo
>>> offline >/sys/block/sda/device/state). Once you've done that, you can
>>> mount the degraded FS with -odegraded, then either add a new device
>>> and balance to restore the RAID-1, or balance with
>>> -{d,m}convert=single to drop the redundancy to single.
>>
>> This did not work...
>
> And removing the pyscial device is not the answer either... until i
> did a read only mount ;)
>
> Didn't expect it to fail with unable to open ctree like that...

Someone thought they were done too early, only one disk => read only
mount. But, readonly mount => no balance.

I think something is wrong

btrfs balance start -dconvert=single -mconvert=single /mnt/disk/
ERROR: error during balancing '/mnt/disk/' - Read-only file system

btrfs dev delete missing /mnt/disk/
ERROR: error removing the device 'missing' - Read-only file system

Any mount without ro becomes:
[  507.236652] BTRFS info (device sda2): allowing degraded mounts
[  507.236655] BTRFS info (device sda2): disk space caching is enabled
[  507.325365] BTRFS: bdev (null) errs: wr 2036894, rd 2031380, flush
705, corrupt 0, gen 0
[  510.983321] BTRFS: too many missing devices, writeable mount is not allowed
[  511.006241] BTRFS: open_ctree failed

And one of them has to give! ;)

> [--8<--]
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [btrfs tools] ability to fail a device...

2015-09-08 Thread Ian Kumlien
On 8 September 2015 at 22:08, Chris Murphy  wrote:
> On Tue, Sep 8, 2015 at 2:00 PM, Ian Kumlien  wrote:

[--8<--]

>> Someone thought they were done too early, only one disk => read only
>> mount. But, readonly mount => no balance.
>>
>> I think something is wrong
>>
>> btrfs balance start -dconvert=single -mconvert=single /mnt/disk/
>> ERROR: error during balancing '/mnt/disk/' - Read-only file system
>>
>> btrfs dev delete missing /mnt/disk/
>> ERROR: error removing the device 'missing' - Read-only file system
>>
>> Any mount without ro becomes:
>> [  507.236652] BTRFS info (device sda2): allowing degraded mounts
>> [  507.236655] BTRFS info (device sda2): disk space caching is enabled
>> [  507.325365] BTRFS: bdev (null) errs: wr 2036894, rd 2031380, flush
>> 705, corrupt 0, gen 0
>> [  510.983321] BTRFS: too many missing devices, writeable mount is not 
>> allowed
>> [  511.006241] BTRFS: open_ctree failed
>>
>> And one of them has to give! ;)
>
>
> You've run into this:
> https://bugzilla.kernel.org/show_bug.cgi?id=92641

Ah, I thought it might not be known - I'm currently copying the files
since a read only mount is "good enough" for that

-o degraded should allow readwrite *IF* the data can be written to
My question is also, would this keep me from "adding devices"?
I mean, it did seem like a catch 22 earlier, but that would really
make a mess of things...

> --
> Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [btrfs tools] ability to fail a device...

2015-09-08 Thread Chris Murphy
On Tue, Sep 8, 2015 at 2:13 PM, Ian Kumlien  wrote:
> On 8 September 2015 at 22:08, Chris Murphy  wrote:
>> On Tue, Sep 8, 2015 at 2:00 PM, Ian Kumlien  wrote:
>
> [--8<--]
>
>>> Someone thought they were done too early, only one disk => read only
>>> mount. But, readonly mount => no balance.
>>>
>>> I think something is wrong
>>>
>>> btrfs balance start -dconvert=single -mconvert=single /mnt/disk/
>>> ERROR: error during balancing '/mnt/disk/' - Read-only file system
>>>
>>> btrfs dev delete missing /mnt/disk/
>>> ERROR: error removing the device 'missing' - Read-only file system
>>>
>>> Any mount without ro becomes:
>>> [  507.236652] BTRFS info (device sda2): allowing degraded mounts
>>> [  507.236655] BTRFS info (device sda2): disk space caching is enabled
>>> [  507.325365] BTRFS: bdev (null) errs: wr 2036894, rd 2031380, flush
>>> 705, corrupt 0, gen 0
>>> [  510.983321] BTRFS: too many missing devices, writeable mount is not 
>>> allowed
>>> [  511.006241] BTRFS: open_ctree failed
>>>
>>> And one of them has to give! ;)
>>
>>
>> You've run into this:
>> https://bugzilla.kernel.org/show_bug.cgi?id=92641
>
> Ah, I thought it might not be known - I'm currently copying the files
> since a read only mount is "good enough" for that
>
> -o degraded should allow readwrite *IF* the data can be written to
> My question is also, would this keep me from "adding devices"?
> I mean, it did seem like a catch 22 earlier, but that would really
> make a mess of things...

It is not possible to add a device to an ro filesystem, so effectively
the fs read-writeability is broken in this case.

-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [btrfs tools] ability to fail a device...

2015-09-08 Thread Hugo Mills
On Tue, Sep 08, 2015 at 02:17:55PM -0600, Chris Murphy wrote:
> On Tue, Sep 8, 2015 at 2:13 PM, Ian Kumlien  wrote:
> > On 8 September 2015 at 22:08, Chris Murphy  wrote:
> >> On Tue, Sep 8, 2015 at 2:00 PM, Ian Kumlien  wrote:
> >
> > [--8<--]
> >
> >>> Someone thought they were done too early, only one disk => read only
> >>> mount. But, readonly mount => no balance.
> >>>
> >>> I think something is wrong
> >>>
> >>> btrfs balance start -dconvert=single -mconvert=single /mnt/disk/
> >>> ERROR: error during balancing '/mnt/disk/' - Read-only file system
> >>>
> >>> btrfs dev delete missing /mnt/disk/
> >>> ERROR: error removing the device 'missing' - Read-only file system
> >>>
> >>> Any mount without ro becomes:
> >>> [  507.236652] BTRFS info (device sda2): allowing degraded mounts
> >>> [  507.236655] BTRFS info (device sda2): disk space caching is enabled
> >>> [  507.325365] BTRFS: bdev (null) errs: wr 2036894, rd 2031380, flush
> >>> 705, corrupt 0, gen 0
> >>> [  510.983321] BTRFS: too many missing devices, writeable mount is not 
> >>> allowed
> >>> [  511.006241] BTRFS: open_ctree failed
> >>>
> >>> And one of them has to give! ;)
> >>
> >>
> >> You've run into this:
> >> https://bugzilla.kernel.org/show_bug.cgi?id=92641
> >
> > Ah, I thought it might not be known - I'm currently copying the files
> > since a read only mount is "good enough" for that
> >
> > -o degraded should allow readwrite *IF* the data can be written to
> > My question is also, would this keep me from "adding devices"?
> > I mean, it did seem like a catch 22 earlier, but that would really
> > make a mess of things...
> 
> It is not possible to add a device to an ro filesystem, so effectively
> the fs read-writeability is broken in this case.

   I thought this particular issue had already been dealt with in 4.2?
(i.e. you can still mount an FS RW if it's degraded, but there are
still some single chunks on it).

   Ian: If you can still mount the FS read/write with both devices in
it, then it might be worth trying to balance away the problematic
single chunks with:

btrfs bal start -dprofiles=single -mprofiles=single /mountpoint

   Then unmount, pull the dead drive, and remount -odegraded.

   Hugo.

-- 
Hugo Mills | The early bird gets the worm, but the second mouse
hugo@... carfax.org.uk | gets the cheese.
http://carfax.org.uk/  |
PGP: E2AB1DE4  |


signature.asc
Description: Digital signature


Re: [btrfs tools] ability to fail a device...

2015-09-08 Thread Ian Kumlien
On 8 September 2015 at 22:28, Hugo Mills  wrote:
> On Tue, Sep 08, 2015 at 02:17:55PM -0600, Chris Murphy wrote:
>> On Tue, Sep 8, 2015 at 2:13 PM, Ian Kumlien  wrote:
>> > On 8 September 2015 at 22:08, Chris Murphy  wrote:
>> >> On Tue, Sep 8, 2015 at 2:00 PM, Ian Kumlien  wrote:

[--8<--]

>> > -o degraded should allow readwrite *IF* the data can be written to
>> > My question is also, would this keep me from "adding devices"?
>> > I mean, it did seem like a catch 22 earlier, but that would really
>> > make a mess of things...
>>
>> It is not possible to add a device to an ro filesystem, so effectively
>> the fs read-writeability is broken in this case.
>
>I thought this particular issue had already been dealt with in 4.2?
> (i.e. you can still mount an FS RW if it's degraded, but there are
> still some single chunks on it).

Single chunks are only on sda - not on sdb...

There should be no problem...

>Ian: If you can still mount the FS read/write with both devices in
> it, then it might be worth trying to balance away the problematic
> single chunks with:
>
> btrfs bal start -dprofiles=single -mprofiles=single /mountpoint
>
>Then unmount, pull the dead drive, and remount -odegraded.

It never completes, too many errors and eventually the disk disappears
until the machine is turned off and on again... (normal disk reset
doesn't work)
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [btrfs tools] ability to fail a device...

2015-09-08 Thread Hugo Mills
On Tue, Sep 08, 2015 at 10:33:54PM +0200, Ian Kumlien wrote:
> On 8 September 2015 at 22:28, Hugo Mills  wrote:
> > On Tue, Sep 08, 2015 at 02:17:55PM -0600, Chris Murphy wrote:
> >> On Tue, Sep 8, 2015 at 2:13 PM, Ian Kumlien  wrote:
> >> > On 8 September 2015 at 22:08, Chris Murphy  
> >> > wrote:
> >> >> On Tue, Sep 8, 2015 at 2:00 PM, Ian Kumlien  
> >> >> wrote:
> 
> [--8<--]
> 
> >> > -o degraded should allow readwrite *IF* the data can be written to
> >> > My question is also, would this keep me from "adding devices"?
> >> > I mean, it did seem like a catch 22 earlier, but that would really
> >> > make a mess of things...
> >>
> >> It is not possible to add a device to an ro filesystem, so effectively
> >> the fs read-writeability is broken in this case.
> >
> >I thought this particular issue had already been dealt with in 4.2?
> > (i.e. you can still mount an FS RW if it's degraded, but there are
> > still some single chunks on it).
> 
> Single chunks are only on sda - not on sdb...
> 
> There should be no problem...

   The check is more primitive than that at the moment, sadly. It just
checks that the number of missing devices is smaller than or equal to
the acceptable device loss for each RAID profile present on the FS.

> >Ian: If you can still mount the FS read/write with both devices in
> > it, then it might be worth trying to balance away the problematic
> > single chunks with:
> >
> > btrfs bal start -dprofiles=single -mprofiles=single /mountpoint
> >
> >Then unmount, pull the dead drive, and remount -odegraded.
> 
> It never completes, too many errors and eventually the disk disappears
> until the machine is turned off and on again... (normal disk reset
> doesn't work)

   The profiles= parameters should limit the balance to just the three
single chunks, and will remove them (because they're empty). It
shouldn't hit the metadata too hard, even if it's raising lots of
errors.

   Hugo.

-- 
Hugo Mills | The early bird gets the worm, but the second mouse
hugo@... carfax.org.uk | gets the cheese.
http://carfax.org.uk/  |
PGP: E2AB1DE4  |


signature.asc
Description: Digital signature


Re: [btrfs tools] ability to fail a device...

2015-09-08 Thread Anand Jain



On 09/09/2015 03:34 AM, Hugo Mills wrote:

On Tue, Sep 08, 2015 at 09:18:05PM +0200, Ian Kumlien wrote:

Hi,

Currently i have a raid1 configuration on two disks where one of them
is failing.

But since:
btrfs fi df /mnt/disk/
Data, RAID1: total=858.00GiB, used=638.16GiB
Data, single: total=1.00GiB, used=256.00KiB
System, RAID1: total=32.00MiB, used=132.00KiB
Metadata, RAID1: total=4.00GiB, used=1.21GiB
GlobalReserve, single: total=412.00MiB, used=0.00B

There should be no problem in failing one disk... Or so i thought!

btrfs dev delete /dev/sdb2 /mnt/disk/
ERROR: error removing the device '/dev/sdb2' - unable to go below two
devices on raid1


dev delete is more like a reshaping operation in mdadm: it tries to
remove a device safely whilst retaining all of the redundancy
guarantees. You can't go down to one device with RAID-1 and still keep
the redundancy.

dev delete is really for managed device removal under non-failure
conditions, not for error recovery.


And i can't issue rebalance either since it will tell me about errors
until the failing disk dies.

Whats even more interesting is that i can't mount just the working
disk - ie if the other disk
*has* failed and is inaccessible... though, i haven't tried physically
removing it...


Physically removing it is the way to go (or disabling it using echo
offline >/sys/block/sda/device/state). Once you've done that, you can
mount the degraded FS with -odegraded, then either add a new device
and balance to restore the RAID-1, or balance with
-{d,m}convert=single to drop the redundancy to single.


 its like you _must_ add a disk in this context otherwise the volume 
will render unmountable in the next mount cycle. the below mentioned 
patch has more details.




mdam has fail and remove, I assume for this reason - perhaps it's
something that should be added?


I think there should be a btrfs dev drop, which is the fail-like
operation: tell the FS that a device is useless, and should be dropped
from the array, so the FS doesn't keep trying to write to it. That's
not implemented yet, though.



 There is a patch set to handle this..
'Btrfs: introduce function to handle device offline'

Thanks, Anand


Hugo.


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html