Re: Recommendation on raid5 drive error resolution

Chris Murphy Sun, 28 Aug 2016 10:15:51 -0700

On Thu, Aug 25, 2016 at 1:23 AM, Gareth Pye <gar...@cerberos.id.au> wrote:
> So I've been living on the reckless-side (meta RAID6, data RAID5) and
> I have a drive or two that isn't playing nicely any more.
>
> dmesg of the system running for a few minutes: http://pastebin.com/9pHBRQVe
>
> Everything of value is backed up, but I'd rather keep data than
> download it all again. When I only saw one disk having troubles I was
> concerned. Now I notice both sda and sdc having issues I'm thinking I
> might be about to have a bad time.
>
> What else should I provide?



[   72.555921] BTRFS info (device sda7): bdev /dev/sdc errs: wr 0, rd
9091, flush 0, corrupt 0, gen 0
[   72.555941] BTRFS info (device sda7): bdev /dev/sdh errs: wr 0, rd
74, flush 0, corrupt 0, gen 0

Two devices with read errors, bad. If they overlap, it's basically a
dead raid5. And it also means you *CANNOT* remove either drive.  So
now you have a problem, and I highly advise that you fresh your backup
because this is a really fragile state for any raid5.

What's the result from these two commands for every drive in this array?

smarctl -l scterc <dev>
cat /sys/block/sdX/device/timeout

The SCTERC value must be less than the timeout. This really must be
the first thing you do, even before starting your backup, because
otherwise a misconfiguration here has a very good chance of preventing
the success of getting a backup. Note these are not persistent
settings.


-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Recommendation on raid5 drive error resolution

Reply via email to