On 2017-03-05 at 16:02, Gregory Seidman wrote: > I have a disk that is reporting SMART errors. It is an active disk in > a (kernel, not hardware) RAID1 configuration. I also have a hot spare > in the RAID1, and md hasn't decided it should fail the disk and > switch to the hot spare. Should I proactively tell md to fail the > disk (and let the hot spare take over), or should I just wait until > md notices a problem?
So, you're saying you have a two-disk RAID-1 array with a third disk as hot spare? Under those circumstances, I would be inclined to leave it alone until either md fails the one disk out or I start noticing visible symptoms, but I'm not an expert and I'm not sure what the best practice is. Certainly the paranoid, better-be-safe-than-sorry approach would be to fail it out, let the hot spare take over, then swap in a cold spare as the new hot spare. For my own main array (RAID-5 with no hot spares), I don't necessarily replace the disk as soon as I start noticing SMART errors - but I do start monitoring the situation more closely, and as soon as I start to see other indications (most prominently read- or write-related notices from dmesg), I arrange for a replacement, fail out the disk, and swap in the new one. (The only reason I have no hot spare is that there are no unused SATA ports in the system.) I initially expected that I would not fail out the disk manually at all, but the last time I saw drive errors md was not automatically failing a disk out of the array even when that disk was exhibiting read issues so severe that the entire UI was hanging for 15-to-60 seconds on any read attempt against the failed portions of the disk. -- The Wanderer The reasonable man adapts himself to the world; the unreasonable one persists in trying to adapt the world to himself. Therefore all progress depends on the unreasonable man. -- George Bernard Shaw
signature.asc
Description: OpenPGP digital signature