On Sun, Jun 5, 2016 at 4:45 AM, Mladen Milinkovic <max...@smoothware.net> wrote: > On 06/03/2016 04:05 PM, Chris Murphy wrote: >> Make certain the kernel command timer value is greater than the driver >> error recovery timeout. The former is found in sysfs, per block >> device, the latter can be get and set with smartctl. Wrong >> configuration is common (it's actually the default) when using >> consumer drives, and inevitably leads to problems, even the loss of >> the entire array. It really is a terrible default. > > Since it's first time i've heard of this I did some googling. > > Here's some nice article about these timeouts: > http://strugglers.net/~andy/blog/2015/11/09/linux-software-raid-and-drive-timeouts/comment-page-1/ > > And some udev rules that should apply this automatically: > http://comments.gmane.org/gmane.linux.raid/48193
Yes it's a constant problem that pops up on the linux-raid list. Sometimes the list is quiet on this issue but it really seems like it's once a week. From last week... http://www.spinics.net/lists/raid/msg52447.html And you wouldn't know it because the subject is "raid 5 crashed" so you wouldn't think, oh bad sectors are accumulating because they're not getting fixed up and they're not getting fixed up because the kernel command timer is resetting the link preventing the drive from reporting a read error and the associated sector LBA. It starts with that, and then you get a single disk failure, and now when doing a rebuild, you hit the bad sector on an otherwise good drive and in effect that's like a 2nd drive failure and now the raid5 implodes. It's fixable, sometimes, but really tedious. -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html