Hi Bond, On Mon, 2010-12-13 at 05:36 -0800, Bond Masuda wrote: > I'm not 100% certain, but if your theory is correct that the controller > is "remembering" the bad blocks, it might be storing that information in > the same place where it stores the RAID information, which is replicated > on every disk, and within the controller. If this happens to be the > case, then maybe clearing the configuration data off all disks *AND* the > controller might give you a fresh start?
yeah, that's what I meant by raid disk metadata - the whole raid configuration appears to be written to each disk, so storing a bad block list in there too seems possible. > > in BIOS, hit "F2" when you highlight each physical disk and use the > 'clear config' (or it might just be 'clear') option. then highlight the > controller (top level) and do the same. I'll give that a go, thanks. John. > > this is just a thought... > -Bond > > On Mon, 2010-12-13 at 13:29 +0000, John Leach wrote: > > Hi, > > > > Our PERC H700 is remembering bad blocks on disks, even though they don't > > exist. > > > > We had a mirror pair in a RAID10 get kicked out suddenly. A disk scan > > showed the exact same block on both disks was bad. A suspicious start :) > > > > Dell support told us this was a "punctured raid stripe" and replacing > > the disks, and recreating the raid container with a full initialisation > > will fix it. They said the bad block had been "copied". > > > > Not content with wiping the entire container for the sake of one 256k > > stripe, I've been investigating this for a few days now (the data on it > > is not important - I'm interested in working this out for next time). > > > > I've tried a lot of arrangements, even copying one of the disks (with no > > media errors encountered I might add) to a new disk, raid metadata and > > all, and then trying to rebuild to a new disk from it - the rebuild > > fails at the same point, offlining both disks. Note: *neither* of the > > original disks are in the the array here. > > > > It boots fine and the virtual disk is available normally until the > > background initialisation (or a rebuild) hits the supposed bad block and > > then the vdisk is offlined. > > > > I've come to the conclusion that the RAID controller is emulating the > > bad blocks. It's either remembering them in it's nvram, or they're > > stored in the disk's raid metadata (though definitely not in the disks > > GLIST, as I've checked that). > > > > It seems to only "discover" them when a background init or rebuild runs > > though. But it's definitely finding them on disks that definitely do not > > have them. > > > > Any thoughts? It is extremely frustrating that the only way to get the > > controller to forget about these bad blocks seems to be a full wipe of > > all the disks. > > > > Some possible hints: > > > > megacli tells me that "Disable Puncturing" is set to no. Maybe setting > > this to yes would help (just for the rebuild to complete). Can't see how > > to set this. > > > > the omconfig tool has a "clearvdbadblocks" action, which sounds > > promising. It unfortunately returns "operation disabled" when executed. > > I can't find any documentation about what this action does. > > > > Thanks, > > > > John. > > -- > > Brightbox > > http://beta.brightbox.com/beta > > > > _______________________________________________ > > Linux-PowerEdge mailing list > > Linux-PowerEdge@dell.com > > https://lists.us.dell.com/mailman/listinfo/linux-poweredge > > Please read the FAQ at http://lists.us.dell.com/faq > _______________________________________________ Linux-PowerEdge mailing list Linux-PowerEdge@dell.com https://lists.us.dell.com/mailman/listinfo/linux-poweredge Please read the FAQ at http://lists.us.dell.com/faq