Re: Western Digital hard disks and ATA timeouts
Søren Schmidt wrote: On 7Nov, 2008, at 20:12 , Peter Wemm wrote: On Thu, Nov 6, 2008 at 11:17 PM, Jeremy Chadwick [EMAIL PROTECTED] wrote: [..] As stated, FreeBSD's ATA command timeout is hard-set to 5 seconds, and is not adjustable without editing the ATA code yourself and increasing the value. The FreeNAS folks have made patches available to turn the timeout value into a sysctl. Soren and/or others, please increase this timeout value. Five seconds has now been deemed too aggressive a default. And please consider migrating the timeout value into a sysctl. The 5 second timeout has been a problem for quite a while actually. I've had a number of instances where I've had to increase it to 20 or 30 seconds when recovering from marginal drives. The longest successful recovery attempt I've seen was 26 seconds, I believe on a Maxtor drive a few years ago. (successful == the drive spent 26 seconds but eventually successfully read the sector). Even the IBM death star drives could take much longer than 5 seconds to do a recovery 5 years ago. 5 seconds has never been a good default. I think the timeout should be increased to at least 30 seconds. My windows box has a timeout that goes for several minutes. If there is concern about FreeBSD appearing to hang, I could imagine that a console warning message could be printed after 5 seconds. But just say drive has not yet responded. But give it more time. In this day and age we're generally not playing games with udma33 vs 66, notched cables, poor CRC support etc. SATA seems to have eliminated all that. Hmm, it might make sense to increase the timeout on SATA connections to 2 or 3 minutes by default. Actually I do have a patch around that logs the timeout on the console after the normal timeout (5secs), then just goes on to wait for double the timeout and log again etc etc, final timeout was IIRC 60 secs but could be anything. I have a disk which I am finally getting rid of that produces READ_DMA and WRITE_DMA errors at a pretty high rate. I did enable the extra ATA error reporting and it doesn't seem to indicate any sort of actual errors, just extra long itmeouts. At one time, I did change the system to extend the timeout, but I did not see any real improvement at 30 seconds. I suspect that an even more extended timeout would be necessary to solve the problem. I am removing the disk this week. Does anyone want a disk that produces DMA timeouts at a regular rate? Would it help actually solve this problem? Please let me know if you want such a beast and I will ship it to you. /Joe ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Western Digital hard disks and ATA timeouts
Note that Western Digital's RAID edition drives claim to take up to 7 seconds to reallocate sectors, using something they call TLER, which force-limits the amount of time the drive can spend reallocating. TLER cannot be disabled: TLER can be enabled/disabled on recent WD drives (SE16/RE2/GP). SE16/GP come with TLER off, RE2 with TLER on. Google WDTLER utility. It can apparently be obtained from WD by asking them nicely. Or, yet again, google is your friend. Here's one example - http://www.hardforum.com/archive/index.php/t-1191548.html --Artem ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Western Digital hard disks and ATA timeouts
On Fri, Nov 07, 2008 at 12:08:01AM -0800, Artem Belevich wrote: Note that Western Digital's RAID edition drives claim to take up to 7 seconds to reallocate sectors, using something they call TLER, which force-limits the amount of time the drive can spend reallocating. TLER cannot be disabled: TLER can be enabled/disabled on recent WD drives (SE16/RE2/GP). SE16/GP come with TLER off, RE2 with TLER on. Google WDTLER utility. It can apparently be obtained from WD by asking them nicely. Or, yet again, google is your friend. Here's one example - http://www.hardforum.com/archive/index.php/t-1191548.html Thanks for the information. Nice to know one of their FAQ entries is false. Also, note that SE16/RE2/GP is not specific enough; I have SE16 drives from 2005, and I highly doubt those have TLER capability due to their age. Also, there's a Wikipedia article on this whole fiasco. http://en.wikipedia.org/wiki/Time-Limited_Error_Recovery It also appears Samsung drives have a similar feature called CCTL, which uses a value of 7 or 8 seconds: http://www.samsung.com/global/business/hdd/learningresource/whitepapers/LearningResource_CCTL.html But regardless of TLER being toggleable, FreeBSD's ATA command timeout of 5 seconds is too aggressive, and should be increased. Likewise, the value should be a sysctl, so those who do want such aggressive values can use it at the community's -- or their own -- behest. -- | Jeremy Chadwickjdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Western Digital hard disks and ATA timeouts
But regardless of TLER being toggleable, FreeBSD's ATA command timeout of 5 seconds is too aggressive, and should be increased. Likewise, the value should be a sysctl, so those who do want such aggressive values Once it migrates from a constant to sysctl variable, could kernel maybe also sniff the drives, automatically set appropriate value ? (Just an idea ? :-) Cheers, Julian -- Julian Stacey: BSDUnixLinux C Prog Admin SysEng Consult Munich www.berklix.com Mail plain ASCII text. HTML Base64 text are spam. www.asciiribbon.org ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Western Digital hard disks and ATA timeouts
On Thu, Nov 6, 2008 at 11:17 PM, Jeremy Chadwick [EMAIL PROTECTED] wrote: [..] As stated, FreeBSD's ATA command timeout is hard-set to 5 seconds, and is not adjustable without editing the ATA code yourself and increasing the value. The FreeNAS folks have made patches available to turn the timeout value into a sysctl. Soren and/or others, please increase this timeout value. Five seconds has now been deemed too aggressive a default. And please consider migrating the timeout value into a sysctl. The 5 second timeout has been a problem for quite a while actually. I've had a number of instances where I've had to increase it to 20 or 30 seconds when recovering from marginal drives. The longest successful recovery attempt I've seen was 26 seconds, I believe on a Maxtor drive a few years ago. (successful == the drive spent 26 seconds but eventually successfully read the sector). Even the IBM death star drives could take much longer than 5 seconds to do a recovery 5 years ago. 5 seconds has never been a good default. I think the timeout should be increased to at least 30 seconds. My windows box has a timeout that goes for several minutes. If there is concern about FreeBSD appearing to hang, I could imagine that a console warning message could be printed after 5 seconds. But just say drive has not yet responded. But give it more time. In this day and age we're generally not playing games with udma33 vs 66, notched cables, poor CRC support etc. SATA seems to have eliminated all that. Hmm, it might make sense to increase the timeout on SATA connections to 2 or 3 minutes by default. -- Peter Wemm - [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]; KI6FJV All of this is for nothing if we don't go to the stars - JMS/B5 If Java had true garbage collection, most programs would delete themselves upon execution. -- Robert Sewell ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Western Digital hard disks and ATA timeouts
I can confirm that. Many FreeNAS users had problems with their HDDs (e.g. with APM, awake disks to access them after they felt to sleep). Increasing timeouts solves the problem in most cases. I think increasing the value BUT allowing the user to set it to a preferred value via sysctrl would be the best solution. I don't understand why adding such an sysctl interface is such an problem for some people. If someone wants to set any other value than the default one HE MUST KNOW what he do and live with the consequences. There are so many other kernel/system variables that can harm the system. Regards Volker http://dict.leo.org/ende?lp=endep=thMx..search=implications Peter Wemm wrote: On Thu, Nov 6, 2008 at 11:17 PM, Jeremy Chadwick [EMAIL PROTECTED] wrote: [..] As stated, FreeBSD's ATA command timeout is hard-set to 5 seconds, and is not adjustable without editing the ATA code yourself and increasing the value. The FreeNAS folks have made patches available to turn the timeout value into a sysctl. Soren and/or others, please increase this timeout value. Five seconds has now been deemed too aggressive a default. And please consider migrating the timeout value into a sysctl. The 5 second timeout has been a problem for quite a while actually. I've had a number of instances where I've had to increase it to 20 or 30 seconds when recovering from marginal drives. The longest successful recovery attempt I've seen was 26 seconds, I believe on a Maxtor drive a few years ago. (successful == the drive spent 26 seconds but eventually successfully read the sector). Even the IBM death star drives could take much longer than 5 seconds to do a recovery 5 years ago. 5 seconds has never been a good default. I think the timeout should be increased to at least 30 seconds. My windows box has a timeout that goes for several minutes. If there is concern about FreeBSD appearing to hang, I could imagine that a console warning message could be printed after 5 seconds. But just say drive has not yet responded. But give it more time. In this day and age we're generally not playing games with udma33 vs 66, notched cables, poor CRC support etc. SATA seems to have eliminated all that. Hmm, it might make sense to increase the timeout on SATA connections to 2 or 3 minutes by default. Internal Virus Database is out of date. Checked by AVG - http://www.avg.com Version: 8.0.175 / Virus Database: 270.8.5/1764 - Release Date: 03.11.2008 07:46 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Western Digital hard disks and ATA timeouts
On 7Nov, 2008, at 20:12 , Peter Wemm wrote: On Thu, Nov 6, 2008 at 11:17 PM, Jeremy Chadwick [EMAIL PROTECTED] wrote: [..] As stated, FreeBSD's ATA command timeout is hard-set to 5 seconds, and is not adjustable without editing the ATA code yourself and increasing the value. The FreeNAS folks have made patches available to turn the timeout value into a sysctl. Soren and/or others, please increase this timeout value. Five seconds has now been deemed too aggressive a default. And please consider migrating the timeout value into a sysctl. The 5 second timeout has been a problem for quite a while actually. I've had a number of instances where I've had to increase it to 20 or 30 seconds when recovering from marginal drives. The longest successful recovery attempt I've seen was 26 seconds, I believe on a Maxtor drive a few years ago. (successful == the drive spent 26 seconds but eventually successfully read the sector). Even the IBM death star drives could take much longer than 5 seconds to do a recovery 5 years ago. 5 seconds has never been a good default. I think the timeout should be increased to at least 30 seconds. My windows box has a timeout that goes for several minutes. If there is concern about FreeBSD appearing to hang, I could imagine that a console warning message could be printed after 5 seconds. But just say drive has not yet responded. But give it more time. In this day and age we're generally not playing games with udma33 vs 66, notched cables, poor CRC support etc. SATA seems to have eliminated all that. Hmm, it might make sense to increase the timeout on SATA connections to 2 or 3 minutes by default. Actually I do have a patch around that logs the timeout on the console after the normal timeout (5secs), then just goes on to wait for double the timeout and log again etc etc, final timeout was IIRC 60 secs but could be anything. -Søren___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Western Digital hard disks and ATA timeouts
As i'm writing this i'm trying to rescue the contents of another computers disk. Something about the seek heads or something related to that is physically half-broken so the disk might need up to 10 retries just to read a sector, once read however it's usually no problem. I'm using myrescue (running on 6.2 so i don't know if it's included in the current ports but if anyone wants to run it on freebsd i've done the gruntwork for porting) so it's not a really big issue with all the timeouts as it'll try to read that sector again later, but had i had the sysctl i would've been a tad happier right now. As for the defaults being a small value i personally think it's better to throw out some messages/errors early on before the disk reaches a catastrophic state (Atleast on 6.2 the kernel will put out a message for each retry without giving faults, maybe more retries before throwing an error maybe?). By catastrpohic state i'm refering to that oh-so-famous google paper that did say that once a disk has started showing errors it doesn't have long to live, but i do trust that conclusion as i've been warned by these messages 2 times but ignored them until the disk went really bad. The main thing i'm trying to get through is that early warning and small problems are helluva lot better than big disasters. Thing of it like the oil meter on your car, it's not like you're gonna go out and drive 100s of km's in the wilderness if you know that the car is in a bad state. (Now if only smart info was reliable!) / Jonas 2008/11/7 Peter Wemm [EMAIL PROTECTED]: On Thu, Nov 6, 2008 at 11:17 PM, Jeremy Chadwick [EMAIL PROTECTED] wrote: [..] As stated, FreeBSD's ATA command timeout is hard-set to 5 seconds, and is not adjustable without editing the ATA code yourself and increasing the value. The FreeNAS folks have made patches available to turn the timeout value into a sysctl. Soren and/or others, please increase this timeout value. Five seconds has now been deemed too aggressive a default. And please consider migrating the timeout value into a sysctl. The 5 second timeout has been a problem for quite a while actually. I've had a number of instances where I've had to increase it to 20 or 30 seconds when recovering from marginal drives. The longest successful recovery attempt I've seen was 26 seconds, I believe on a Maxtor drive a few years ago. (successful == the drive spent 26 seconds but eventually successfully read the sector). Even the IBM death star drives could take much longer than 5 seconds to do a recovery 5 years ago. 5 seconds has never been a good default. I think the timeout should be increased to at least 30 seconds. My windows box has a timeout that goes for several minutes. If there is concern about FreeBSD appearing to hang, I could imagine that a console warning message could be printed after 5 seconds. But just say drive has not yet responded. But give it more time. In this day and age we're generally not playing games with udma33 vs 66, notched cables, poor CRC support etc. SATA seems to have eliminated all that. Hmm, it might make sense to increase the timeout on SATA connections to 2 or 3 minutes by default. -- Peter Wemm - [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]; KI6FJV All of this is for nothing if we don't go to the stars - JMS/B5 If Java had true garbage collection, most programs would delete themselves upon execution. -- Robert Sewell ___ [EMAIL PROTECTED] mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hardware To unsubscribe, send any mail to [EMAIL PROTECTED] ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Western Digital hard disks and ATA timeouts
A user and myself on a broadband forum were discussing the possibility of diminishing quality of hard disks (particularly 1TB models) in recent days (specifically October). The user continually referenced something called deep recovery cycle, backed with claims from Newegg reviewers (who often know very little or nothing at all -- grain of salt concept applies), which make Western Digital's desktop hard disks unfit for RAID or server usage. I claimed shenanigans until the user pointed me to the following document on Western Digital's site: http://wdc.custhelp.com/cgi-bin/wdc.cfg/php/enduser/std_adp.php?p_faqid=1397 The feature described apparently causes the hard disk to enter some form of aggressive sector scan/sector remapping loop, which can take up to 2 minutes to complete, during which time, the hard disk is basically unusable. (I imagine ATA commands sent to the disk will simply time out or stall indefinitely, which would result in all sorts of timeout errors). Note that Western Digital's RAID edition drives claim to take up to 7 seconds to reallocate sectors, using something they call TLER, which force-limits the amount of time the drive can spend reallocating. TLER cannot be disabled: http://wdc.custhelp.com/cgi-bin/wdc.cfg/php/enduser/std_adp.php?p_faqid=1478 What baffles me is why Western Digital thinks that 2 minutes of the drive being unusable is acceptable but only for desktops. Any FreeBSD desktop will start reporting ATA timeouts if the drive wedges for more than 5 seconds -- two minutes would just spew errors and hard-lock the system. What also baffles me is why Western Digital thinks the term RAID always means a hardware RAID controller is involved as a buffer between the OS and the disks. Bzzzt, bad assumption on their part. So why do we care? As stated, FreeBSD's ATA command timeout is hard-set to 5 seconds, and is not adjustable without editing the ATA code yourself and increasing the value. The FreeNAS folks have made patches available to turn the timeout value into a sysctl. Soren and/or others, please increase this timeout value. Five seconds has now been deemed too aggressive a default. And please consider migrating the timeout value into a sysctl. P.S. -- I do not consider any of this reason to avoid Western Digital drives. But I would warn users to be a little more cautious before reporting ATA timeouts when newer (circia 2007 and later) WD drives are in use. -- | Jeremy Chadwickjdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]