Re: Failed Disk RAID10 Problems

Chris Murphy Wed, 28 May 2014 13:40:35 -0700

On May 28, 2014, at 12:39 PM, Justin Brown <justin.br...@fandingo.org> wrote:


> Chris,
> 
> Thanks for the tip. I was able to mount the drive as degraded and
> recovery. Then, I deleted the faulty drive, leaving me with the
> following array:
> 
> 
> Label: media  uuid: 7b7afc82-f77c-44c0-b315-669ebd82f0c5
> 
> Total devices 6 FS bytes used 2.40TiB
> 
> devid    1 size 931.51GiB used 919.88GiB path
> /dev/mapper/SAMSUNG_HD103SI_499431FS734755p1
> 
> devid    2 size 931.51GiB used 919.38GiB path /dev/dm-8
> 
> devid    3 size 1.82TiB used 1.19TiB path /dev/dm-6
> 
> devid    4 size 931.51GiB used 919.88GiB path /dev/dm-5
> 
> devid    5 size 0.00 used 918.38GiB path /dev/dm-11
> 
> devid    6 size 1.82TiB used 3.88GiB path /dev/dm-9
> 
> 
> /dev/dm-11 is the failed drive. I take it that size 0 is a good sign.
> I'm not really sure where to go from here. I tried rebooting the
> system with the failed drive attached, and Btrfs re-adds it to the
> array. Should I physically remove the drive now? Is a balance
> recommended?

I'm going to guess at what I think has happened. You had a 5 device raid10. 
devid 5 is the failed device, but at the time you added new device devid 6, it 
was not considered failed by btrfs. Your first btrfs fi show does not show size 
0 for devid 5. So I think btrfs made you a 6 device raid10 volume.

But now devid 5 has failed, shows up as size 0. The reason you have to mount 
degraded still is because you have a 6 device raid10 now, and 1 device has 
failed. And you can't remove the failed device because you've mounted degraded. 
So actually it was a mistake to add a new device first, but it's an easy 
mistake to make because right now btrfs really tolerates a lot of error 
conditions that it probably should give up on and outright fail the device.

So I think you might have to get a 7th device to fix this with btrfs replace 
start. You can later delete devices once you're not mounted degraded. Or you 
can just do a backup now while you can mount degraded, and then blow away the 
btrfs volume and start over.

If you have a current backups and are willing to lose data on this volume, you 
can try the following

1. Poweroff, remove the failed drive, boot, and do a normal mount. That 
probably won't work but it's worth a shot. If it doesn't work try mount -o 
degraded. [That might not work either, in which case stop here, I think you'll 
need to go with a 7th device and use 'btrfs replace start 5 /dev/newdevice7 
/mp' That will explicitly replace failed device 5 with new device.]

2. Assuming mount -o degraded works, take a btrfs fi show. There should be a 
missing device listed. Now try btrfs device delete missing /mp and see what 
happens. If it at least doesn't complain, it means it's working and might take 
hours to replicate data that was on the missing device onto the new one. So I'd 
leave it alone until iotop or something like that tells you it's not busy 
anymore.

3. Unmount the file system. Try to mount normally (not degraded).



Chris Murphy--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Failed Disk RAID10 Problems

Reply via email to