On 2016-06-03 21:48, Chris Murphy wrote:
On Fri, Jun 3, 2016 at 6:48 PM, Nicholas D Steeves <nstee...@gmail.com> wrote:
On 3 June 2016 at 11:33, Austin S. Hemmelgarn <ahferro...@gmail.com> wrote:
On 2016-06-03 10:11, Martin wrote:

Make certain the kernel command timer value is greater than the driver
error recovery timeout. The former is found in sysfs, per block
device, the latter can be get and set with smartctl. Wrong
configuration is common (it's actually the default) when using
consumer drives, and inevitably leads to problems, even the loss of
the entire array. It really is a terrible default.


Are nearline SAS drives considered consumer drives?

If it's a SAS drive, then no, especially when you start talking about things
marketed as 'nearline'.  Additionally, SCT ERC is entirely a SATA thing, I
forget what the equivalent in SCSI (and by extension SAS) terms is, but I'm
pretty sure that the kernel handles things differently there.

For the purposes of BTRFS RAID1: For drives that ship with SCT ERC of
7sec, is the default kernel command timeout of 30sec appropriate, or
should it be reduced?

It's fine. But it depends on your use case, if it can tolerate a rare
7 second < 30 second hang, and you're prepared to start
investigating the cause then I'd leave it alone. If the use case
prefers resetting the drive when it stops responding, then you'd go
with something shorter.

I'm fairly certain SAS's command queue doesn't get obliterated with
such a link reset, just the hung command; where SATA drives all
information in the queue is lost. So resets on SATA are a much bigger
penalty if I have the correct understanding.
There's also more involved otherwise with a ATA link reset because AHCI controllers aren't MP safe, so there's a global lock that has to be held while talking to them. Because of this, a link reset on an ATA drive (be it SATA or PATA) will cause performance degradation for all other devices on that controller as well until the reset is complete.


 For SATA drives that do not support SC TERC, is
it true that 120sec is a sane value?  I forget where I got this value
of 120sec;

It's a good question. It's not well documented, is not defined in the
SATA spec, so it's probably make/model specific. The linux-raid@ list
probably has the most information on this just because their users get
nailed by this problem often. And the recommendation does seem to vary
around 120 to 180. That is of course a maximum. The drive could give
up much sooner. But what you don't want is for the drive to be in
recovery for a bad sector, and the command timer does a link reset,
losing all of what the drive was doing: all of which is replaceable
except really one thing which is what sector was having the problem.
And right now there's no report of the drive for slow sectors. It only
reports failed reads, and it's that failed read error that includes
the sector, so that the raid mechanism can figure out what data is
missing, recongistruct from mirror or parity, and then fix the bad
sector by writing to it.
FWIW, I usually go with 150 on the Seagate 'Desktop' drives I use. I've seen some cheap Hitachi and Toshiba disks that need it as high as 300 though to work right.

it might have been this list, it might have been an mdadm
bug report.  Also, in terms of tuning, I've been unable to find
whether the ideal kernel timeout value changes depending on RAID
type...is that a factor in selecting a sane kernel timeout value?

No. It's strictly a value to make certain you get read errors from the
drive rather than link resets.
You have to factor in how the controller handles things too. SOme of them will retry just like a desktop drive, and you need to account for that.

And that's why I think it's a bad default, because it totally thwarts
attempts by manufacturers to recover marginal sectors, even in the
single disk case.
That's debatable, by attempting to recover the bad sector, they're slowing down the whole system. The likelihood of recovering a bad sectors functionally falls off linearly the longer you try, and not having the ability to choose when to report an error is the bigger issue here.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to