Re: Bad hard driver [SOLVED]

2011-02-10 Thread Michael Powell
Daniel Zhelev wrote:

[snip]
> 
> The last worrying thing is the
> 
>  200 Multi_Zone_Error_Rate   0x0008   200   189   000Old_age   Offline
>-   3
> 
> Which according to the Internet is some mysterious value that none knows
> what it stands for, so is 3 of that mystery good?
> 

Each cylinder track has a width. A head seek is nominally supposed to 
exactly center the head over the central axis. There is some slight 
tolerance as to accidental offset, but the maximum concentration of gaussian 
magnetic domain orientation should be concentrated in the center of each 
cylinder track.

Temperature changes between cold and 'steady state' operation cause very 
small changes in the size of mechanical moving parts. A little slop factor 
will happen when read/write happens while a drive is warming up from cold. 
As long as the number remains small and doesn't change often it's probably 
nothing to worry over. If it does change a lot constantly it may be an 
indicator of worn mechanical parts. Such a thing should correlate with a 
large value of power on hours. A drive near the end of it's life may get 
wobbly head syndrome. :-)

The main consideration in both questions is a small number that maybe 
increments every once in a blue moon is nothing to become overly concerned 
with. Rather consider them a long term baseline and only become alarmed when 
they show a rather sudden and large deviation in rate of change from the 
baseline. Generally when this occurs the numbers will change in fairly 
dramatic fashion quickly and generally continue this change from that point 
on. It is this pattern you look for as a possible "pre failure" warning.

-Mike
[snip]


___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"


Re: Bad hard driver [SOLVED]

2011-02-10 Thread Bruce Cran
On Thu, 10 Feb 2011 11:59:37 +0200
Daniel Zhelev  wrote:

> The last worrying thing is the
> 
>  200 Multi_Zone_Error_Rate   0x0008   200   189   000Old_age
> Offline
>-   3
> 
> Which according to the Internet is some mysterious value that none
> knows what it stands for, so is 3 of that mystery good?

Look at the first 3 results from
http://www.google.co.uk/search?q=Multi+Zone+Error+Rate

-- 
Bruce Cran
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"


Re: Bad hard driver [SOLVED]

2011-02-10 Thread Daniel Zhelev
On Wed, Feb 9, 2011 at 10:44 PM, Michael Powell wrote:

> Chuck Swiger wrote:
>
> > On Feb 9, 2011, at 11:15 AM, Daniel Zhelev wrote:
> >> The following warning/error was logged by the smartd daemon:
> >>
> >> Device: /dev/ad7, 3 Offline uncorrectable sectors
> >
> > It means that the drive has detected errors in three sectors, and is
> > attempting to recover them without data loss to spare sectors, so far
> > without success.  It could also indicate that the drive has exhausted the
> > spare sectors, in which case all future errors will cause additional data
> > loss.
>
> As long as the remap region is not full the next write attempt to these
> sectors will clear this. It can be done by dd'ing zero to the entire drive,
> or formatting the entire drive as a shotgun approach. This entails a
> complete backup and restore cycle though. A little extreme, as this
> particular error is actually rather benign and eventually self-correcting
> as
> long as there is space in the remap area.
>
> Early in a drive's life this may be tolerable until the remap fills. Even
> if
> the remap area has space available, and these errors get cleared by the
> next
> write to the defective sectors I would still watch for more of these. If
> you
> get these errors cleared only to start to see more new ones it indicates
> media failure spreading across the platters. At such a point in a drive's
> life it only makes sense to replace it, as at some point the remap region
> fills and you will have lost data.
>
> >>From the "SMART Self-test log", it seems like you are running short
> >>self-tests every 24 hours, and periodically running extended tests on
> some
> >>interval as well.  The smartctl FAQ recommends doing so at weekly
> >>intervals; doing it daily is putting significant testing load onto the
> >>drive.
> >
> >> I know about the how to -
> >> http://smartmontools.sourceforge.net/badblockhowto.html
> >>
> >> But how can I get the LBA?
> >> And is there some diagnostic tool for WD in ports?
> >
> > Doing a "dd if=/dev/ad7 of=/dev/null bs=64k" will read-scan the entire
> > drive, and ought to produce a warning in the logs indicating the LBA of
> > the bad sectors.  As for diagnostic tools, WD makes utilities for DOS and
> > Windows, not FreeBSD.  See:
> >
> >   http://support.wdc.com/product/download.asp?groupid=613&lang=en
> >
> > ...for something which you can run off of a boot floppy, USB pendrive,
> > etc.
>
> The quick test will tell you about bad sectors and then direct you to run
> the full surface scan which destroys data. But it will "fix" the drive.
> Back
> to the dump/restore cycle. I run this on any used drive about to recycled
> back into use. It almost always finds something and "fixes" it.
>
> Excellent idea on how to find the LBA. If the exact sector addresses can be
> located a dd write to the specific sector(s) will clear the error
> condition,
> and should (in theory) be doable without losing data. I would still
> recommend an entire dump backup be done prior to trying anything.
>
> -Mike
>
>
>
> ___
> freebsd-questions@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-questions
> To unsubscribe, send any mail to "
> freebsd-questions-unsubscr...@freebsd.org"
>

Thanks all for the info,

I`ve fix it with the shotgun methood

dd if=/dev/zero of=/dev/ad7 bs=64k

Since the drive is new ( >6 moths ) I won`t go trough the warranty procedure
yet .
The strange this is that the

  5 Reallocated_Sector_Ct   0x0033   200   200   140Pre-fail  Always
  -   0

Has not changed, so the sectors were good?
Also I`ve noticed that these sectors were in the end of the disk - the disk
is 932GB usable space after fs installation usable space is around 900GB the
bad sectors were beyond the usable space.
Also fsck didn`t detect any fs corruption.

The last worrying thing is the

 200 Multi_Zone_Error_Rate   0x0008   200   189   000Old_age   Offline
   -   3

Which according to the Internet is some mysterious value that none knows
what it stands for, so is 3 of that mystery good?

//offtopic Spinrite  is very good tool hopefully I didn`t have to use it
this time, but it is really useful. In mine situation the drive failure is
not so critical(some temporary dumps on it), but if the hole machine is
down(this is the only not mirrored disk) it would be much worse.
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"