Problem with RAID 5 Array
I've got a Dell PowerVault RAID 5 enclosure that had a hard drive conk out over the weekend. No biggie, I figured-there are multiple hotspares available. The system grabbed one and rebuilt the array, but fussed that there was a consistency problem. I ran a second, manual consistency check on Monday, though, and it came up clean. Peachy. But Monday night, my backup of the PV failed; Symantec reported that four files were inaccessible. Today I tried to access those four files, and sure enough I can't do anything with them. Can't delete them. Can't copy them. Can't rename them. Nothing. I get Error 0x80070079: The semaphore timeout period has expired. I ran chkdsk in read-only mode, and got this: The type of the file system is NTFS. Volume label is PowerVault. WARNING! F parameter not specified. Running CHKDSK in read-only mode. CHKDSK is verifying files (stage 1 of 3)... File record segment 575200 is corrupt.0 file records processed) 2953600 file records processed. File verification completed. 832 large file records processed. Errors found. CHKDSK cannot continue in read-only mode. So, what gives? The array reports everything is fine. But obviously, something is funky. I can restore the four corrupt files from a backup-that's no problem. But not if I can't first delete the bad versions. John Hornbuckle MIS Department Taylor County School District www.taylor.k12.fl.us NOTICE: Florida has a broad public records law. Most written communications to or from this entity are public records that will be disclosed to the public and the media upon request. E-mail communications may be subject to public disclosure. ~ Finally, powerful endpoint security that ISN'T a resource hog! ~ ~ http://www.sunbeltsoftware.com/Business/VIPRE-Enterprise/ ~
Re: Problem with RAID 5 Array
On Tue, May 4, 2010 at 10:26 AM, John Hornbuckle john.hornbuc...@taylor.k12.fl.us wrote: So, what gives? The array reports everything is fine. But obviously, something is funky. I can restore the four corrupt files from a backup—that’s no problem. But not if I can’t first delete the bad versions. I'd call Dell tech support. It's free and sometimes even helpful. Not knowing more, my guess would be that one of the other disks has some bad blocks. Scenario: Most filesystems have a lot of files which are never or rarely read. Plus RAID 5 provides redundancy -- the controller may normally read the primary set of on-disk blocks and ignore the redundant blocks. End result, you've got blocks allocated on disk, but which are never read. Then a disk fails. Now the controller has to read *every* block of *all* the other disks, in order to rebuild the failed member. Boom. That's when you discoverer that one of the other disks has had bad blocks for years. Unfortunately, the only way to recovery from this scenario is to restore from good backups. For this reason, good controllers have a patrol read feature (or background scrub, etc.), where they regularly read all blocks from all disks, to discover bad blocks as soon as they happen. -- Ben ~ Finally, powerful endpoint security that ISN'T a resource hog! ~ ~ http://www.sunbeltsoftware.com/Business/VIPRE-Enterprise/ ~
RE: Problem with RAID 5 Array
Yeah, I opened a case with Dell support before sending this message. Haven't been blown away. The technician is Googling the error--not exactly the kind of expertise I was expecting. -Original Message- From: Ben Scott [mailto:mailvor...@gmail.com] Sent: Tuesday, May 04, 2010 11:16 AM To: NT System Admin Issues Subject: Re: Problem with RAID 5 Array On Tue, May 4, 2010 at 10:26 AM, John Hornbuckle john.hornbuc...@taylor.k12.fl.us wrote: So, what gives? The array reports everything is fine. But obviously, something is funky. I can restore the four corrupt files from a backup-that's no problem. But not if I can't first delete the bad versions. I'd call Dell tech support. It's free and sometimes even helpful. Not knowing more, my guess would be that one of the other disks has some bad blocks. Scenario: Most filesystems have a lot of files which are never or rarely read. Plus RAID 5 provides redundancy -- the controller may normally read the primary set of on-disk blocks and ignore the redundant blocks. End result, you've got blocks allocated on disk, but which are never read. Then a disk fails. Now the controller has to read *every* block of *all* the other disks, in order to rebuild the failed member. Boom. That's when you discoverer that one of the other disks has had bad blocks for years. Unfortunately, the only way to recovery from this scenario is to restore from good backups. For this reason, good controllers have a patrol read feature (or background scrub, etc.), where they regularly read all blocks from all disks, to discover bad blocks as soon as they happen. -- Ben ~ Finally, powerful endpoint security that ISN'T a resource hog! ~ ~ http://www.sunbeltsoftware.com/Business/VIPRE-Enterprise/ ~ NOTICE: Florida has a broad public records law. Most written communications to or from this entity are public records that will be disclosed to the public and the media upon request. E-mail communications may be subject to public disclosure. ~ Finally, powerful endpoint security that ISN'T a resource hog! ~ ~ http://www.sunbeltsoftware.com/Business/VIPRE-Enterprise/ ~