Re: mismatch_cnt != 0, member content mismatch, but md says the mirror is good

Benjamin Scott Tue, 23 Feb 2010 14:44:23 -0800

On Tue, Feb 23, 2010 at 2:01 PM, Michael Bilow
<mik...@colossus.bilow.com> wrote:
> During the md check operation, the array is "clean" (not degraded)
> and you can see that explicitly with the "[UU]" status report ...


  Of course, mdstat still calls the array "clean" even after
mismatches are detected, which isn't what I'd usually call "clean"...
:-)

> It is not a "scrub" because it does not attempt to repair anything.

  Comments in previously mentioned config file don't make it sound
like that.  "A check operation will scan the drives looking for bad
sectors and automatically repairing only bad sectors."  It doesn't
explain how it would repair bad sectors.  Perhaps it means the bad
sectors will be "repaired" by failing the entire member and having the
sysadmin insert a new disk.  Perhaps the comments are just wrong.

  Not arguing with you, just reporting what the file told me.  Would
the file lie?  ;-)

> Detecting and reporting "soft failure" incidents
> such as reallocations of spare sectors ...

  The relocation algorithm in modern disks generally works like this
(or so I'm told):

R1. OS requests read logical block from HDD.  HDD tries to read from
block on disk, and can't, even with retries and ECC.  HDD returns
failure to the OS, and marks that physical block as "bad" and as a
candidate for relocation.

R2. Repeated attempts by OS to read from the same block cause the HDD
to retry.  It won't throw away your data on its own.

R3. OS requests write to same logical block.  HDD relocate to
different physical block, and throws away the bad block.  It can do
that now, since you've told it you don't want the data that was there,
by writing new data over it.

  It would be nice if hard disks were smart enough to detect a block
that was getting marginal and preemptively relocate it.  Last I looked
into this (admittedly, several years ago), they didn't do that.  Maybe
they've gotten smarter about that.  If they haven't gotten smarter, if
the "check" operation reads all the blocks on the the disk but never
writes, that alone won't trigger relocation of a bad block.  The
"check" operation would have to read the good block from the other
disk, and attempt to rewrite it to the bad disk.  *That* might trigger
a useful relocation by the HDD with the bad block.

> smartmontools, which can and should be configured to look past the
> md device and monitor the physical drives that are its components.

  While I run smartd in monitor mode, I've never had it give me a
useful pre-failure alert.  Likewise, I've never had the SMART health
check in PC BIOSes give me a useful pre-failure alert.  More than once
I've seen SMART report the overall health check as "PASS" when the
whole damn disk is unreadable.  It make me wonder just what the
overall SMART health is supposed to indicate -- "Yes, the HDD is
physically present"?  :)

  I did once have the BIOS check start reporting a SMART health
warning, but all the OEM diagnostics, smartctl, "badblocks -w", etc.,
didn't actually report anything wrong.  The reseller replaced the
drive at my insistence.  Maybe the SMART health check knew something
that none of the other SMART parameters were reporting.

-- Ben
_______________________________________________
gnhlug-discuss mailing list
gnhlug-discuss@mail.gnhlug.org
http://mail.gnhlug.org/mailman/listinfo/gnhlug-discuss/

Re: mismatch_cnt != 0, member content mismatch, but md says the mirror is good

Reply via email to