On Wed, May 19, 2010 at 11:58:14AM +0100, Vic wrote:
> 
> > Yes, if data is actually unrecoverable (as happened in my notebook's
> > hard disc: 3 bad sectors at the time of replacement, lost a video) the
> > drive will kick up a fuss-load of ATA errors, which will be reported all
> > over dmesg.
> 
> That's for broken sectors, not sector reallocation; if the data is
> actually gone from the drive, there's nothing you can do about it.

   The main route to discovering you've got a failed drive or part of
drive is when you can't read the data that was originally put on
it. This will come to light either when a checksum is computed and
fails comparison, or when part of the hardware is operating outside
the parameters that are expected of it. When that happens, you have
already lost data.

> The purpose of SMART is to notice impending failures before they get to
> such a critical level,

   It's not very good at it, though. The famous Google paper on disk
failures quotes a model (under "related work", page 11) with only a
30% success rate based on SMART information. They also state (section
3.5.6, page 10) that 56% of failed drives show no failure indicators
at all in the four main SMART fields, and 36% of the failed drives
show no failure indicators in SMART _at all_.

   Detecting failures and fixing them before they're going to occur is
a nice fairy-tale, but in the real world, it's just not going to
happen unless you're very lucky.

> and move data away from the failing areas into
> spare sectors. It's not perfect but, absent any significant external
> events (like dropped disks), it's pretty good.

   SMART is simply a reporting process (plus a self-test feature) --
the drive still does sector reallocations even if SMART itself is
turned off.

   Now, regarding the sector reallocation process: My understanding is
that the drive will reallocate a sector if it has trouble reading
it. So, if the drive electronics generates an internal error and
causes a re-read, it _may_ attempt to move the sector, writing the
data that it read from the troublesome sector to a spare.

   Now, I doubt that it will do this on the first problematic read,
but after enough sequential retries where it's had a problem, it will
trigger this behaviour. I don't know how many internal retries are
needed. I would guess at 3-4. If there's damage to the sector
(physical or checksum), then the data that's read and rewritten may
not be the data that was originally put on the disk. In this instance,
it may not be 512 bytes of zeroes, but it's not guaranteed to be
identical.

   Hugo.

-- 
=== Hugo Mills: h...@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
  --- You've read the project plan.  Forget that. We're going to Do ---  
                      Stuff and Have Fun doing it.                       

Attachment: signature.asc
Description: Digital signature

-- 
Please post to: Hampshire@mailman.lug.org.uk
Web Interface: https://mailman.lug.org.uk/mailman/listinfo/hampshire
LUG URL: http://www.hantslug.org.uk
--------------------------------------------------------------

Reply via email to