Re: Strange behavior when replacing device on BTRFS RAID 5 array.

Steven Haigh Mon, 27 Jun 2016 15:30:07 -0700

On 28/06/16 03:46, Chris Murphy wrote:
> On Mon, Jun 27, 2016 at 11:29 AM, Chris Murphy <li...@colorremedies.com> 
> wrote:
> 
>>
>> Next is to decide to what degree you want to salvage this volume and
>> keep using Btrfs raid56 despite the risks
> 
> Forgot to complete this thought. So if you get a backup, and decide
> you want to fix it, I would see if you can cancel the replace using
> "btrfs replace cancel <mp>" and confirm that it stops. And now is the
> risky part, which is whether to try "btrfs add" and then "btrfs
> remove" or remove the bad drive, reboot, and see if it'll mount with
> -o degraded, and then use add and remove (in which case you'll use
> 'remove missing').
> 
> The first you risk Btrfs still using the flaky bad drive.
> 
> The second you risk whether a degraded mount will work, and whether
> any other drive in the array has a problem while degraded (like an
> unrecovery read error from a single sector).


This is the exact set of circumstances that caused my corrupt array. I
was using RAID6 - yet it still corrupted large portions of things. In
theory, due to having double parity, it should still have survived even
if a disk did go bad - but there we are.

I first started a replace - noted how slow it was going - cancelled the
replace, then did an add / delete - the system crashed and it was all over.

Just as another data point, I've been flogging the guts out of the array
with mdadm RAID6 doing a reshape of that - and no read errors, system
crashes or other problems in over 48 hours.

-- 
Steven Haigh

Email: net...@crc.id.au
Web: https://www.crc.id.au
Phone: (03) 9001 6090 - 0412 935 897

signature.asc
Description: OpenPGP digital signature

Re: Strange behavior when replacing device on BTRFS RAID 5 array.

Reply via email to