Jeremy wrote:
> Ok, so I know there's a bad drive in my array, but which one caused the
> errors so I can begin to replace it?  Also, how do I recover without
> loosing information?
>
> Here is the info:
>
> /proc/mdstat:
> Personalities : [linear] [raid0] [raid1] [raid5] [translucent]
> read_ahead 1024 sectors
> md2 : active raid1 md1[1](F) md0[0](F) 26876288 blocks [2/1] [_U]
> md0 : active raid0 sdc1[2] sdb1[1] sda1[0] 26876352 blocks 16k chunks
> md1 : active raid0 sdf1[2] sde1[1] sdd1[0] 26876352 blocks 16k chunks
> unused devices: <none>

Reading your messages.txt, it turns out you don't have a faulty disk, rather
that something was causing reads to be requested beyond the end of the disk.
Raid doesn't currently deal with this so well, as it will end up kicking out
the drive that "should" be responsible for this block.  So md2 ends up
kicking out both md0 and md1 for not being able to service these requests.

If this is happening while the system is running, you're likely not running
2.2.12.  There were some file system corruption problems that acted like
this in some recent kernels (2.2.9 - 2.2.11 ? I can't remember exactly).  If
you are running 2.2.12 and these happen after a good amount of uptime... you
want to look at other possible corruption sources in your system... disks or
memory corrupting, etc.

If this happens at e2fsck, likely your disk was scrambled to the point of
having seriously bad data in the superblock.  Calling e2fsck with an
alternate superblock farther into the disk set may cure this.

As it's already "down", you'll need to fix md2 before you can continue your
work. To fix this, stop md2 but leave md0 and md1 running, and run
'mkraid -force /dev/md2'

Looking at your setup, I'm confused as to why you aren't simply running one
raid5 set on all six disks.  It would certainly reduce complexity in
situations like this, and would leave you with more usable space.

Tom

Reply via email to