Problem with RAID 5 Array

2010-05-04 Thread John Hornbuckle
I've got a Dell PowerVault RAID 5 enclosure that had a hard drive conk out over 
the weekend.

No biggie, I figured-there are multiple hotspares available. The system grabbed 
one and rebuilt the array, but fussed that there was a consistency problem. I 
ran a second, manual consistency check on Monday, though, and it came up clean. 
Peachy.

But Monday night, my backup of the PV failed; Symantec reported that four files 
were inaccessible. Today I tried to access those four files, and sure enough I 
can't do anything with them. Can't delete them. Can't copy them. Can't rename 
them. Nothing. I get Error 0x80070079: The semaphore timeout period has 
expired.

I ran chkdsk in read-only mode, and got this:

The type of the file system is NTFS.
Volume label is PowerVault.

WARNING! F parameter not specified.
Running CHKDSK in read-only mode.

CHKDSK is verifying files (stage 1 of 3)...
File record segment 575200 is corrupt.0 file records processed)
2953600 file records processed.
File verification completed.
832 large file records processed.

Errors found. CHKDSK cannot continue in read-only mode.

So, what gives? The array reports everything is fine. But obviously, something 
is funky. I can restore the four corrupt files from a backup-that's no problem. 
But not if I can't first delete the bad versions.



John Hornbuckle
MIS Department
Taylor County School District
www.taylor.k12.fl.us





NOTICE: Florida has a broad public records law. Most written communications to 
or from this entity are public records that will be disclosed to the public and 
the media upon request. E-mail communications may be subject to public 
disclosure.

~ Finally, powerful endpoint security that ISN'T a resource hog! ~
~ http://www.sunbeltsoftware.com/Business/VIPRE-Enterprise/  ~

Re: Problem with RAID 5 Array

2010-05-04 Thread Ben Scott
On Tue, May 4, 2010 at 10:26 AM, John Hornbuckle
john.hornbuc...@taylor.k12.fl.us wrote:
 So, what gives? The array reports everything is fine. But obviously,
 something is funky. I can restore the four corrupt files from a
 backup—that’s no problem. But not if I can’t first delete the bad versions.

  I'd call Dell tech support.  It's free and sometimes even helpful.

  Not knowing more, my guess would be that one of the other disks has
some bad blocks.

  Scenario: Most filesystems have a lot of files which are never or
rarely read.  Plus RAID 5 provides redundancy -- the controller may
normally read the primary set of on-disk blocks and ignore the
redundant blocks.  End result, you've got blocks allocated on disk,
but which are never read.  Then a disk fails.  Now the controller has
to read *every* block of *all* the other disks, in order to rebuild
the failed member.  Boom.  That's when you discoverer that one of the
other disks has had bad blocks for years.

  Unfortunately, the only way to recovery from this scenario is to
restore from good backups.

  For this reason, good controllers have a patrol read feature (or
background scrub, etc.), where they regularly read all blocks from
all disks, to discover bad blocks as soon as they happen.

-- Ben

~ Finally, powerful endpoint security that ISN'T a resource hog! ~
~ http://www.sunbeltsoftware.com/Business/VIPRE-Enterprise/  ~



RE: Problem with RAID 5 Array

2010-05-04 Thread John Hornbuckle
Yeah, I opened a case with Dell support before sending this message. Haven't 
been blown away. The technician is Googling the error--not exactly the kind of 
expertise I was expecting.



-Original Message-
From: Ben Scott [mailto:mailvor...@gmail.com] 
Sent: Tuesday, May 04, 2010 11:16 AM
To: NT System Admin Issues
Subject: Re: Problem with RAID 5 Array

On Tue, May 4, 2010 at 10:26 AM, John Hornbuckle 
john.hornbuc...@taylor.k12.fl.us wrote:
 So, what gives? The array reports everything is fine. But obviously, 
 something is funky. I can restore the four corrupt files from a 
 backup-that's no problem. But not if I can't first delete the bad versions.

  I'd call Dell tech support.  It's free and sometimes even helpful.

  Not knowing more, my guess would be that one of the other disks has some bad 
blocks.

  Scenario: Most filesystems have a lot of files which are never or rarely 
read.  Plus RAID 5 provides redundancy -- the controller may normally read the 
primary set of on-disk blocks and ignore the redundant blocks.  End result, 
you've got blocks allocated on disk, but which are never read.  Then a disk 
fails.  Now the controller has to read *every* block of *all* the other disks, 
in order to rebuild the failed member.  Boom.  That's when you discoverer that 
one of the other disks has had bad blocks for years.

  Unfortunately, the only way to recovery from this scenario is to restore from 
good backups.

  For this reason, good controllers have a patrol read feature (or 
background scrub, etc.), where they regularly read all blocks from all disks, 
to discover bad blocks as soon as they happen.

-- Ben

~ Finally, powerful endpoint security that ISN'T a resource hog! ~ ~ 
http://www.sunbeltsoftware.com/Business/VIPRE-Enterprise/  ~



NOTICE: Florida has a broad public records law. Most written communications to 
or from this entity are public records that will be disclosed to the public and 
the media upon request. E-mail communications may be subject to public 
disclosure.


~ Finally, powerful endpoint security that ISN'T a resource hog! ~
~ http://www.sunbeltsoftware.com/Business/VIPRE-Enterprise/  ~