On Fri, Jun 3, 2016 at 6:48 PM, Nicholas D Steeves <nstee...@gmail.com> wrote:
> On 3 June 2016 at 11:33, Austin S. Hemmelgarn <ahferro...@gmail.com> wrote:
>> On 2016-06-03 10:11, Martin wrote:
>>>> Make certain the kernel command timer value is greater than the driver
>>>> error recovery timeout. The former is found in sysfs, per block
>>>> device, the latter can be get and set with smartctl. Wrong
>>>> configuration is common (it's actually the default) when using
>>>> consumer drives, and inevitably leads to problems, even the loss of
>>>> the entire array. It really is a terrible default.
>>> Are nearline SAS drives considered consumer drives?
>> If it's a SAS drive, then no, especially when you start talking about things
>> marketed as 'nearline'.  Additionally, SCT ERC is entirely a SATA thing, I
>> forget what the equivalent in SCSI (and by extension SAS) terms is, but I'm
>> pretty sure that the kernel handles things differently there.
> For the purposes of BTRFS RAID1: For drives that ship with SCT ERC of
> 7sec, is the default kernel command timeout of 30sec appropriate, or
> should it be reduced?

It's fine. But it depends on your use case, if it can tolerate a rare
> 7 second < 30 second hang, and you're prepared to start
investigating the cause then I'd leave it alone. If the use case
prefers resetting the drive when it stops responding, then you'd go
with something shorter.

I'm fairly certain SAS's command queue doesn't get obliterated with
such a link reset, just the hung command; where SATA drives all
information in the queue is lost. So resets on SATA are a much bigger
penalty if I have the correct understanding.

>  For SATA drives that do not support SC TERC, is
> it true that 120sec is a sane value?  I forget where I got this value
> of 120sec;

It's a good question. It's not well documented, is not defined in the
SATA spec, so it's probably make/model specific. The linux-raid@ list
probably has the most information on this just because their users get
nailed by this problem often. And the recommendation does seem to vary
around 120 to 180. That is of course a maximum. The drive could give
up much sooner. But what you don't want is for the drive to be in
recovery for a bad sector, and the command timer does a link reset,
losing all of what the drive was doing: all of which is replaceable
except really one thing which is what sector was having the problem.
And right now there's no report of the drive for slow sectors. It only
reports failed reads, and it's that failed read error that includes
the sector, so that the raid mechanism can figure out what data is
missing, recongistruct from mirror or parity, and then fix the bad
sector by writing to it.

> it might have been this list, it might have been an mdadm
> bug report.  Also, in terms of tuning, I've been unable to find
> whether the ideal kernel timeout value changes depending on RAID
> type...is that a factor in selecting a sane kernel timeout value?

No. It's strictly a value to make certain you get read errors from the
drive rather than link resets.

And that's why I think it's a bad default, because it totally thwarts
attempts by manufacturers to recover marginal sectors, even in the
single disk case.

Chris Murphy
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to