Hi Bond,

On Mon, 2010-12-13 at 05:36 -0800, Bond Masuda wrote:
> I'm not 100% certain, but if your theory is correct that the controller
> is "remembering" the bad blocks, it might be storing that information in
> the same place where it stores the RAID information, which is replicated
> on every disk, and within the controller. If this happens to be the
> case, then maybe clearing the configuration data off all disks *AND* the
> controller might give you a fresh start?

yeah, that's what I meant by raid disk metadata - the whole raid
configuration appears to be written to each disk, so storing a bad block
list in there too seems possible.

> 
> in BIOS, hit "F2" when you highlight each physical disk and use the
> 'clear config' (or it might just be 'clear') option. then highlight the
> controller (top level) and do the same.

I'll give that a go, thanks.

John.

> 
> this is just a thought...
> -Bond
> 
> On Mon, 2010-12-13 at 13:29 +0000, John Leach wrote:
> > Hi,
> > 
> > Our PERC H700 is remembering bad blocks on disks, even though they don't
> > exist.
> > 
> > We had a mirror pair in a RAID10 get kicked out suddenly.  A disk scan
> > showed the exact same block on both disks was bad. A suspicious start :)
> > 
> > Dell support told us this was a "punctured raid stripe" and replacing
> > the disks, and recreating the raid container with a full initialisation
> > will fix it. They said the bad block had been "copied".
> > 
> > Not content with wiping the entire container for the sake of one 256k
> > stripe, I've been investigating this for a few days now (the data on it
> > is not important - I'm interested in working this out for next time).
> > 
> > I've tried a lot of arrangements, even copying one of the disks (with no
> > media errors encountered I might add) to a new disk, raid metadata and
> > all, and then trying to rebuild to a new disk from it - the rebuild
> > fails at the same point, offlining both disks. Note: *neither* of the
> > original disks are in the the array here.
> > 
> > It boots fine and the virtual disk is available normally until the
> > background initialisation (or a rebuild) hits the supposed bad block and
> > then the vdisk is offlined.
> > 
> > I've come to the conclusion that the RAID controller is emulating the
> > bad blocks. It's either remembering them in it's nvram, or they're
> > stored in the disk's raid metadata (though definitely not in the disks
> > GLIST, as I've checked that).
> > 
> > It seems to only "discover" them when a background init or rebuild runs
> > though. But it's definitely finding them on disks that definitely do not
> > have them.
> > 
> > Any thoughts? It is extremely frustrating that the only way to get the
> > controller to forget about these bad blocks seems to be a full wipe of
> > all the disks.
> > 
> > Some possible hints:
> > 
> > megacli tells me that "Disable Puncturing" is set to no. Maybe setting
> > this to yes would help (just for the rebuild to complete). Can't see how
> > to set this.
> > 
> > the omconfig tool has a "clearvdbadblocks" action, which sounds
> > promising. It unfortunately returns "operation disabled" when executed.
> > I can't find any documentation about what this action does.
> > 
> > Thanks,
> > 
> > John.
> > --
> > Brightbox
> > http://beta.brightbox.com/beta
> > 
> > _______________________________________________
> > Linux-PowerEdge mailing list
> > Linux-PowerEdge@dell.com
> > https://lists.us.dell.com/mailman/listinfo/linux-poweredge
> > Please read the FAQ at http://lists.us.dell.com/faq
> 



_______________________________________________
Linux-PowerEdge mailing list
Linux-PowerEdge@dell.com
https://lists.us.dell.com/mailman/listinfo/linux-poweredge
Please read the FAQ at http://lists.us.dell.com/faq

Reply via email to