On Wed, 8 Sep 1999 [EMAIL PROTECTED] wrote:

> I have the exactly same problem.  My startup message was as follows:
> 
>   Sep  7 12:28:41 yfs kernel: md: kicking non-fresh sdc1 from array!
>   Sep  7 12:28:41 yfs kernel: unbind<sdc1,7>
>   Sep  7 12:28:41 yfs kernel: export_rdev(sdc1)
>   Sep  7 12:28:41 yfs kernel: md0: removing former faulty sdc1!
> 
> Note that /dev/sdh1 is the failed disk in the above system.

yes, but according to the kernel messages sdc1 failed as well. We cannot
recover from two-disk failures, obviously. But i think the RAID code
should not update the superblock with new failed disks if the array is in
degraded mode. Double-disk failures will thus be at least partially
survivable, if not a whole disk is lost.

> During the kernel hacking, I relaized that my /dev/sdc1 has good(?)
> superblock, which in my case, the oldest event counter.  The stock
> kernel uses freshest event counter.  I changed md.c to use event=7,
> and the array came back. :-)  Then I reverted to the standard
> 2-2.10-raid kernel.

you'd have had an equally good array if you created a raidtab similar to
this one:

raiddev /dev/md0
        raid-level              5
        nr-raid-disks           6
        persistent-superblock   0
        chunk-size              4
        device                  /dev/sda1
        raid-disk               0
        device                  /dev/sdb1
        raid-disk               1
        device                  /dev/sdc1
        raid-disk               2
        device                  /dev/sdd1
        raid-disk               3
        device                  /dev/sde1
        raid-disk               4
        device                  /dev/sdf1
        raid-disk               5
        device                  /dev/sdg1
        raid-disk               6
        device                  /dev/sdh1
        failed-disk             7

(note the 'failed-disk' entry) And forcedly recreate the array.

> I can recover about 99% of data at least. :-)

good. I suppose that 1% is due to the filesystem data getting corrupted
due to the double-disk failure.

> e2fsck 1.15, 18-Jul-1999 for EXT2 FS 0.5b, 95/08/09
> Pass 1: Checking inodes, blocks, and sizes
> Duplicate blocks found... invoking duplicate block passes.
> Pass 1B: Rescan for duplicate/bad blocks
> Duplicate/bad block(s) in inode 135841: 47 47 47
> Pass 1C: Scan directories for inodes with dup blocks.
> Pass 1D: Reconciling duplicate blocks

this indeed seems to be an (RAID-unrelated) e2fsck problem, Ted, Stephen,
any ideas?

-- mingo

Reply via email to