On Sun, Jun 5, 2016 at 4:45 AM, Mladen Milinkovic <max...@smoothware.net> wrote:
> On 06/03/2016 04:05 PM, Chris Murphy wrote:
>> Make certain the kernel command timer value is greater than the driver
>> error recovery timeout. The former is found in sysfs, per block
>> device, the latter can be get and set with smartctl. Wrong
>> configuration is common (it's actually the default) when using
>> consumer drives, and inevitably leads to problems, even the loss of
>> the entire array. It really is a terrible default.
>
> Since it's first time i've heard of this I did some googling.
>
> Here's some nice article about these timeouts:
> http://strugglers.net/~andy/blog/2015/11/09/linux-software-raid-and-drive-timeouts/comment-page-1/
>
> And some udev rules that should apply this automatically:
> http://comments.gmane.org/gmane.linux.raid/48193

Yes it's a constant problem that pops up on the linux-raid list.
Sometimes the list is quiet on this issue but it really seems like
it's once a week. From last week...

http://www.spinics.net/lists/raid/msg52447.html

And you wouldn't know it because the subject is "raid 5 crashed" so
you wouldn't think, oh bad sectors are accumulating because they're
not getting fixed up and they're not getting fixed up because the
kernel command timer is resetting the link preventing the drive from
reporting a read error and the associated sector LBA. It starts with
that, and then you get a single disk failure, and now when doing a
rebuild, you hit the bad sector on an otherwise good drive and in
effect that's like a 2nd drive failure and now the raid5 implodes.
It's fixable, sometimes, but really tedious.



-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to