Re: Western Digital hard disks and ATA timeouts

2008-11-09 Thread Joe Kelsey

Søren Schmidt wrote:

On 7Nov, 2008, at 20:12 , Peter Wemm wrote:

On Thu, Nov 6, 2008 at 11:17 PM, Jeremy Chadwick [EMAIL PROTECTED] 
wrote:

[..]

As stated, FreeBSD's ATA command timeout is hard-set to 5 seconds, and
is not adjustable without editing the ATA code yourself and increasing
the value.  The FreeNAS folks have made patches available to turn the
timeout value into a sysctl.

Soren and/or others, please increase this timeout value.  Five seconds
has now been deemed too aggressive a default.  And please consider
migrating the timeout value into a sysctl.


The 5 second timeout has been a problem for quite a while actually.
I've had a number of instances where I've had to increase it to 20 or
30 seconds when recovering from marginal drives.  The longest
successful recovery attempt I've seen was 26 seconds, I believe on a
Maxtor drive a few years ago.   (successful == the drive spent 26
seconds but eventually successfully read the sector).  Even the IBM
death star drives could take much longer than 5 seconds to do a
recovery 5 years ago.  5 seconds has never been a good default.

I think the timeout should be increased to at least 30 seconds.  My
windows box has a timeout that goes for several minutes.

If there is concern about FreeBSD appearing to hang, I could imagine
that a console warning message could be printed after 5 seconds.  But
just say drive has not yet responded.  But give it more time.

In this day and age we're generally not playing games with udma33 vs
66, notched cables, poor CRC support etc.  SATA seems to have
eliminated all that.  Hmm, it might make sense to increase the timeout
on SATA connections to 2 or 3 minutes by default.


Actually I do have a patch around that logs the timeout on the console 
after the normal timeout (5secs), then just goes on to wait for double 
the timeout and log again etc etc, final timeout was IIRC 60 secs but 
could be anything.
I have a disk which I am finally getting rid of that produces READ_DMA 
and WRITE_DMA errors at a pretty high rate.  I did enable the extra ATA 
error reporting and it doesn't seem to indicate any sort of actual 
errors, just extra long itmeouts.


At one time, I did change the system to extend the timeout, but I did 
not see any real improvement at 30 seconds.  I suspect that an even more 
extended timeout would be necessary to solve the problem.


I am removing the disk this week.  Does anyone want a disk that produces 
DMA timeouts at a regular rate?  Would it help actually solve this problem?


Please let me know if you want such a beast and I will ship it to you.

/Joe

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Western Digital hard disks and ATA timeouts

2008-11-07 Thread Artem Belevich
 Note that Western Digital's RAID edition drives claim to take up to 7
 seconds to reallocate sectors, using something they call TLER, which
 force-limits the amount of time the drive can spend reallocating.  TLER
 cannot be disabled:

TLER can be enabled/disabled on recent WD drives (SE16/RE2/GP). SE16/GP
come with TLER off, RE2 with TLER on. Google WDTLER utility.
It can apparently be obtained from WD by asking them nicely.
Or, yet again, google is your friend. Here's one example -
http://www.hardforum.com/archive/index.php/t-1191548.html

--Artem
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Western Digital hard disks and ATA timeouts

2008-11-07 Thread Jeremy Chadwick
On Fri, Nov 07, 2008 at 12:08:01AM -0800, Artem Belevich wrote:
  Note that Western Digital's RAID edition drives claim to take up to 7
  seconds to reallocate sectors, using something they call TLER, which
  force-limits the amount of time the drive can spend reallocating.  TLER
  cannot be disabled:
 
 TLER can be enabled/disabled on recent WD drives (SE16/RE2/GP). SE16/GP
 come with TLER off, RE2 with TLER on. Google WDTLER utility.
 It can apparently be obtained from WD by asking them nicely.
 Or, yet again, google is your friend. Here's one example -
 http://www.hardforum.com/archive/index.php/t-1191548.html

Thanks for the information.  Nice to know one of their FAQ entries is
false.  Also, note that SE16/RE2/GP is not specific enough; I have
SE16 drives from 2005, and I highly doubt those have TLER capability
due to their age.

Also, there's a Wikipedia article on this whole fiasco.

http://en.wikipedia.org/wiki/Time-Limited_Error_Recovery

It also appears Samsung drives have a similar feature called CCTL,
which uses a value of 7 or 8 seconds:

http://www.samsung.com/global/business/hdd/learningresource/whitepapers/LearningResource_CCTL.html

But regardless of TLER being toggleable, FreeBSD's ATA command timeout
of 5 seconds is too aggressive, and should be increased.  Likewise, the
value should be a sysctl, so those who do want such aggressive values
can use it at the community's -- or their own -- behest.

-- 
| Jeremy Chadwickjdc at parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977.  PGP: 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Western Digital hard disks and ATA timeouts

2008-11-07 Thread Julian Stacey
 But regardless of TLER being toggleable, FreeBSD's ATA command timeout
 of 5 seconds is too aggressive, and should be increased.  Likewise, the
 value should be a sysctl, so those who do want such aggressive values

Once it migrates from a constant to sysctl variable, could kernel maybe
also sniff the drives,  automatically set appropriate value ?
(Just an idea ? :-)

Cheers,
Julian
-- 
Julian Stacey: BSDUnixLinux C Prog Admin SysEng Consult Munich www.berklix.com
  Mail plain ASCII text.  HTML  Base64 text are spam. www.asciiribbon.org
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Western Digital hard disks and ATA timeouts

2008-11-07 Thread Peter Wemm
On Thu, Nov 6, 2008 at 11:17 PM, Jeremy Chadwick [EMAIL PROTECTED] wrote:
[..]
 As stated, FreeBSD's ATA command timeout is hard-set to 5 seconds, and
 is not adjustable without editing the ATA code yourself and increasing
 the value.  The FreeNAS folks have made patches available to turn the
 timeout value into a sysctl.

 Soren and/or others, please increase this timeout value.  Five seconds
 has now been deemed too aggressive a default.  And please consider
 migrating the timeout value into a sysctl.

The 5 second timeout has been a problem for quite a while actually.
I've had a number of instances where I've had to increase it to 20 or
30 seconds when recovering from marginal drives.  The longest
successful recovery attempt I've seen was 26 seconds, I believe on a
Maxtor drive a few years ago.   (successful == the drive spent 26
seconds but eventually successfully read the sector).  Even the IBM
death star drives could take much longer than 5 seconds to do a
recovery 5 years ago.  5 seconds has never been a good default.

I think the timeout should be increased to at least 30 seconds.  My
windows box has a timeout that goes for several minutes.

If there is concern about FreeBSD appearing to hang, I could imagine
that a console warning message could be printed after 5 seconds.  But
just say drive has not yet responded.  But give it more time.

In this day and age we're generally not playing games with udma33 vs
66, notched cables, poor CRC support etc.  SATA seems to have
eliminated all that.  Hmm, it might make sense to increase the timeout
on SATA connections to 2 or 3 minutes by default.
-- 
Peter Wemm - [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]; KI6FJV
All of this is for nothing if we don't go to the stars - JMS/B5
If Java had true garbage collection, most programs would delete
themselves upon execution. -- Robert Sewell
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Western Digital hard disks and ATA timeouts

2008-11-07 Thread Volker Theile
I can confirm that. Many FreeNAS users had problems with their HDDs  
(e.g. with APM, awake disks to access them after they felt to sleep). 
Increasing timeouts solves the problem in most cases. I think increasing 
the value BUT allowing the user to set it to a preferred value via 
sysctrl would be the best solution. I don't understand why adding such 
an sysctl interface is such an problem for some people. If someone wants 
to set any other value than the default one HE MUST KNOW what he do and 
live with the consequences. There are so many other kernel/system 
variables that can harm the system.


Regards
Volker
http://dict.leo.org/ende?lp=endep=thMx..search=implications
Peter Wemm wrote:

On Thu, Nov 6, 2008 at 11:17 PM, Jeremy Chadwick [EMAIL PROTECTED] wrote:
[..]
  

As stated, FreeBSD's ATA command timeout is hard-set to 5 seconds, and
is not adjustable without editing the ATA code yourself and increasing
the value.  The FreeNAS folks have made patches available to turn the
timeout value into a sysctl.

Soren and/or others, please increase this timeout value.  Five seconds
has now been deemed too aggressive a default.  And please consider
migrating the timeout value into a sysctl.



The 5 second timeout has been a problem for quite a while actually.
I've had a number of instances where I've had to increase it to 20 or
30 seconds when recovering from marginal drives.  The longest
successful recovery attempt I've seen was 26 seconds, I believe on a
Maxtor drive a few years ago.   (successful == the drive spent 26
seconds but eventually successfully read the sector).  Even the IBM
death star drives could take much longer than 5 seconds to do a
recovery 5 years ago.  5 seconds has never been a good default.

I think the timeout should be increased to at least 30 seconds.  My
windows box has a timeout that goes for several minutes.

If there is concern about FreeBSD appearing to hang, I could imagine
that a console warning message could be printed after 5 seconds.  But
just say drive has not yet responded.  But give it more time.

In this day and age we're generally not playing games with udma33 vs
66, notched cables, poor CRC support etc.  SATA seems to have
eliminated all that.  Hmm, it might make sense to increase the timeout
on SATA connections to 2 or 3 minutes by default.
  




Internal Virus Database is out of date.
Checked by AVG - http://www.avg.com 
Version: 8.0.175 / Virus Database: 270.8.5/1764 - Release Date: 03.11.2008 07:46


  

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Western Digital hard disks and ATA timeouts

2008-11-07 Thread Søren Schmidt

On 7Nov, 2008, at 20:12 , Peter Wemm wrote:

On Thu, Nov 6, 2008 at 11:17 PM, Jeremy Chadwick  
[EMAIL PROTECTED] wrote:

[..]
As stated, FreeBSD's ATA command timeout is hard-set to 5 seconds,  
and
is not adjustable without editing the ATA code yourself and  
increasing

the value.  The FreeNAS folks have made patches available to turn the
timeout value into a sysctl.

Soren and/or others, please increase this timeout value.  Five  
seconds

has now been deemed too aggressive a default.  And please consider
migrating the timeout value into a sysctl.


The 5 second timeout has been a problem for quite a while actually.
I've had a number of instances where I've had to increase it to 20 or
30 seconds when recovering from marginal drives.  The longest
successful recovery attempt I've seen was 26 seconds, I believe on a
Maxtor drive a few years ago.   (successful == the drive spent 26
seconds but eventually successfully read the sector).  Even the IBM
death star drives could take much longer than 5 seconds to do a
recovery 5 years ago.  5 seconds has never been a good default.

I think the timeout should be increased to at least 30 seconds.  My
windows box has a timeout that goes for several minutes.

If there is concern about FreeBSD appearing to hang, I could imagine
that a console warning message could be printed after 5 seconds.  But
just say drive has not yet responded.  But give it more time.

In this day and age we're generally not playing games with udma33 vs
66, notched cables, poor CRC support etc.  SATA seems to have
eliminated all that.  Hmm, it might make sense to increase the timeout
on SATA connections to 2 or 3 minutes by default.


Actually I do have a patch around that logs the timeout on the console  
after the normal timeout (5secs), then just goes on to wait for double  
the timeout and log again etc etc, final timeout was IIRC 60 secs but  
could be anything.


-Søren___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Western Digital hard disks and ATA timeouts

2008-11-07 Thread Jonas Lund
As i'm writing this i'm trying to rescue the contents of another computers disk.

Something about the seek heads or something related to that is
physically half-broken so the disk might need up to 10 retries just to
read a sector, once read however it's usually no problem. I'm using
myrescue (running on 6.2 so i don't know if it's included in the
current ports but if anyone wants to run it on freebsd i've done the
gruntwork for porting) so it's not a really big issue with all the
timeouts as it'll try to read that sector again later, but had i had
the sysctl i would've been a tad happier right now.

As for the defaults being a small value i personally think it's better
to throw out some messages/errors early on before the disk reaches a
catastrophic state (Atleast on 6.2 the kernel will put out a message
for each retry without giving faults, maybe more retries before
throwing an error maybe?).

By catastrpohic state i'm refering to that oh-so-famous google paper
that did say that once a disk has started showing errors it doesn't
have long to live, but i do trust that conclusion as i've been
warned by these messages 2 times but ignored them until the disk
went really bad.

The main thing i'm trying to get through is that early warning and
small problems are helluva lot better than big disasters. Thing of it
like the oil meter on your car, it's not like you're gonna go out and
drive 100s of km's in the wilderness if you know that the car is in a
bad state. (Now if only smart info was reliable!)

/ Jonas

2008/11/7 Peter Wemm [EMAIL PROTECTED]:
 On Thu, Nov 6, 2008 at 11:17 PM, Jeremy Chadwick [EMAIL PROTECTED] wrote:
 [..]
 As stated, FreeBSD's ATA command timeout is hard-set to 5 seconds, and
 is not adjustable without editing the ATA code yourself and increasing
 the value.  The FreeNAS folks have made patches available to turn the
 timeout value into a sysctl.

 Soren and/or others, please increase this timeout value.  Five seconds
 has now been deemed too aggressive a default.  And please consider
 migrating the timeout value into a sysctl.

 The 5 second timeout has been a problem for quite a while actually.
 I've had a number of instances where I've had to increase it to 20 or
 30 seconds when recovering from marginal drives.  The longest
 successful recovery attempt I've seen was 26 seconds, I believe on a
 Maxtor drive a few years ago.   (successful == the drive spent 26
 seconds but eventually successfully read the sector).  Even the IBM
 death star drives could take much longer than 5 seconds to do a
 recovery 5 years ago.  5 seconds has never been a good default.

 I think the timeout should be increased to at least 30 seconds.  My
 windows box has a timeout that goes for several minutes.

 If there is concern about FreeBSD appearing to hang, I could imagine
 that a console warning message could be printed after 5 seconds.  But
 just say drive has not yet responded.  But give it more time.

 In this day and age we're generally not playing games with udma33 vs
 66, notched cables, poor CRC support etc.  SATA seems to have
 eliminated all that.  Hmm, it might make sense to increase the timeout
 on SATA connections to 2 or 3 minutes by default.
 --
 Peter Wemm - [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]; KI6FJV
 All of this is for nothing if we don't go to the stars - JMS/B5
 If Java had true garbage collection, most programs would delete
 themselves upon execution. -- Robert Sewell
 ___
 [EMAIL PROTECTED] mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-hardware
 To unsubscribe, send any mail to [EMAIL PROTECTED]

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Western Digital hard disks and ATA timeouts

2008-11-06 Thread Jeremy Chadwick
A user and myself on a broadband forum were discussing the possibility
of diminishing quality of hard disks (particularly 1TB models) in recent
days (specifically October).

The user continually referenced something called deep recovery cycle,
backed with claims from Newegg reviewers (who often know very little or
nothing at all -- grain of salt concept applies), which make Western
Digital's desktop hard disks unfit for RAID or server usage.

I claimed shenanigans until the user pointed me to the following
document on Western Digital's site:

http://wdc.custhelp.com/cgi-bin/wdc.cfg/php/enduser/std_adp.php?p_faqid=1397

The feature described apparently causes the hard disk to enter some form
of aggressive sector scan/sector remapping loop, which can take up to 2
minutes to complete, during which time, the hard disk is basically
unusable.  (I imagine ATA commands sent to the disk will simply time out
or stall indefinitely, which would result in all sorts of timeout
errors).

Note that Western Digital's RAID edition drives claim to take up to 7
seconds to reallocate sectors, using something they call TLER, which
force-limits the amount of time the drive can spend reallocating.  TLER
cannot be disabled:

http://wdc.custhelp.com/cgi-bin/wdc.cfg/php/enduser/std_adp.php?p_faqid=1478

What baffles me is why Western Digital thinks that 2 minutes of the
drive being unusable is acceptable but only for desktops.  Any FreeBSD
desktop will start reporting ATA timeouts if the drive wedges for more
than 5 seconds -- two minutes would just spew errors and hard-lock the
system.

What also baffles me is why Western Digital thinks the term RAID
always means a hardware RAID controller is involved as a buffer between
the OS and the disks.  Bzzzt, bad assumption on their part.

So why do we care?

As stated, FreeBSD's ATA command timeout is hard-set to 5 seconds, and
is not adjustable without editing the ATA code yourself and increasing
the value.  The FreeNAS folks have made patches available to turn the
timeout value into a sysctl.

Soren and/or others, please increase this timeout value.  Five seconds
has now been deemed too aggressive a default.  And please consider
migrating the timeout value into a sysctl.

P.S. -- I do not consider any of this reason to avoid Western Digital
drives.  But I would warn users to be a little more cautious before
reporting ATA timeouts when newer (circia 2007 and later) WD drives
are in use.

-- 
| Jeremy Chadwickjdc at parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977.  PGP: 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]