On 2016-06-05 22:40, James Johnston wrote:
On 06/06/2016 at 01:47, Chris Murphy wrote:
On Sun, Jun 5, 2016 at 4:45 AM, Mladen Milinkovic <max...@smoothware.net> wrote:
On 06/03/2016 04:05 PM, Chris Murphy wrote:
Make certain the kernel command timer value is greater than the driver
error recovery timeout. The former is found in sysfs, per block
device, the latter can be get and set with smartctl. Wrong
configuration is common (it's actually the default) when using
consumer drives, and inevitably leads to problems, even the loss of
the entire array. It really is a terrible default.
Since it's first time i've heard of this I did some googling.
Here's some nice article about these timeouts:
http://strugglers.net/~andy/blog/2015/11/09/linux-software-raid-and-drive-
timeouts/comment-page-1/
And some udev rules that should apply this automatically:
http://comments.gmane.org/gmane.linux.raid/48193
Yes it's a constant problem that pops up on the linux-raid list.
Sometimes the list is quiet on this issue but it really seems like
it's once a week. From last week...
http://www.spinics.net/lists/raid/msg52447.html
It seems like it would be useful if the distributions or the kernel could
automatically set the kernel timeout to an appropriate value. If the TLER can
be
indeed be queried via smartctl, then it would be easy to automatically read it,
and then calculate a suitable timeout. A RAID-oriented drive would end up
leaving
the current 30 seconds, while if it can't successfully query for TLER or the
drive
just doesn't support it, then assume a consumer drive and set timeout for 180
seconds.
That way, zero user configuration would be needed in the common case. Or is it
not that simple?
Strictly speaking, it's policy, and therefore shouldn't be in the
kernel. It's not hard to write a script to handle this though, both
hdparm and smartctl can set the SCT ERC value, and will report an error
if it fails, so you can try and set the value as you want (I personally
would go with 10 seconds instead of 7), and if that fails, bump the
kernel command timout.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html