RAID5 array not coming up after "repaired" disk

Marc Haber Fri, 24 Mar 2000 12:11:28 -0800
Hi!

I am using kernel 2.2.14 with the RedHat Patch (raid-2.2.14-B1) on a Debian
System, raid tools 0.90.990824-5. I have six SCSI disks on two host
adapters, raidtab as follows:

|raiddev /dev/md0
|raid-level                5
|nr-raid-disks             6
|nr-spare-disks            0
|persistent-superblock     1
|chunk-size                32
|
|device                    /dev/sda7
|raid-disk                 0
|device                    /dev/sdb7
|raid-disk                 1
|device                    /dev/sdc7
|raid-disk                 2
|device                    /dev/sdd7
|raid-disk                 3
|device                    /dev/sde7
|raid-disk                 4
|device                    /dev/sdf7
|raid-disk                 5
                                                                    
Performance of that array is quite impressive.

However, I wanted to test RAID behavior in the case of a disk failure. So I
disconnected sde while the array was running. The system tried to access the
failed disk for a few seconds and then continued in degraded mode.

I then rebooted, to find the array (now running on sda7 thru sde7 because
sdf had moved up to sde) still in degraded mode. So far, so good.

Next step was reconnecting the "failed" disk and rebooting again. This time,
the array didn't come up, and raidstart /dev/md0 gave the following output:

|(read) sda7's sb offset: 8666944 [events: 00000029]
|(read) sdb7's sb offset: 803136 [events: 00000029]
|(read) sdc7's sb offset: 803136 [events: 00000029]
|(read) sdd7's sb offset: 803136 [events: 00000029]
|(read) sde7's sb offset: 803136 [events: 00000023]
|autorun ...
|considering sde7 ...
|adding sde7 ...
|adding sdd7 ...
|adding sdc7 ...
|adding sdb7 ...
|adding sda7 ...
|created md0
|bind<sda7,1>
|bind<sdb7,2>
|bind<sdc7,3>
|bind<sdd7,4>
|bind<sde7,5>
|running: <sde7><sdd7><sdc7><sdb7><sda7>
|now!
|sde7's event counter: 00000023
|sdd7's event counter: 00000029
|sdc7's event counter: 00000029
|sdb7's event counter: 00000029
|sda7's event counter: 00000029
|md: superblock update time inconsistency -- using the most recent one
|freshest: sdd7
|md: kicking non-fresh sde7 from array!
|unbind<sde7,4>
|export_rdev(sde7)
|md0: former device sde7 is unavailable, removing from array!
|md0: max total readahead window set to 640k
|md0: 5 data-disks, max readahead per data-disk: 128k
|raid5: device sdd7 operational as raid disk 3
|raid5: device sdc7 operational as raid disk 2
|raid5: device sdb7 operational as raid disk 1
|raid5: device sda7 operational as raid disk 0
|raid5: not enough operational devices for md0 (2/6 failed)
|RAID5 conf printout:
|--- rd:6 wd:4 fd:2
|disk 0, s:0, o:1, n:0 rd:0 us:1 dev:sda7
|disk 1, s:0, o:1, n:1 rd:1 us:1 dev:sdb7
|disk 2, s:0, o:1, n:2 rd:2 us:1 dev:sdc7
|disk 3, s:0, o:1, n:3 rd:3 us:1 dev:sdd7
|disk 4, s:0, o:0, n:4 rd:4 us:1 dev:[dev 00:00]
|disk 5, s:0, o:0, n:5 rd:5 us:1 dev:[dev 00:00]
|disk 6, s:0, o:0, n:0 rd:0 us:0 dev:[dev 00:00]
|disk 7, s:0, o:0, n:0 rd:0 us:0 dev:[dev 00:00]
|disk 8, s:0, o:0, n:0 rd:0 us:0 dev:[dev 00:00]
|disk 9, s:0, o:0, n:0 rd:0 us:0 dev:[dev 00:00]
|disk 10, s:0, o:0, n:0 rd:0 us:0 dev:[dev 00:00]
|disk 11, s:0, o:0, n:0 rd:0 us:0 dev:[dev 00:00]
|raid5: failed to run raid set md0
|pers->run() failed ...
|do_md_run() returned -22
|unbind<sdd7,3>
|export_rdev(sdd7)
|unbind<sdc7,2>
|export_rdev(sdc7)
|unbind<sdb7,1>
|export_rdev(sdb7)
|unbind<sda7,0>
|export_rdev(sda7)
|md0 stopped.
|... autorun DONE.

It looks like the RAID drivers didn't like the sixth disk moving up to sdf
again after the fifth disk was revived. This probably resulted in the driver
thinking that two of six disks were gone.

After turning off the fifth disk again and a reboot, the array was back
again with all data, but - of course - still in degraded mode:

|(read) sda7's sb offset: 8666944 [events: 0000002d]
|(read) sdb7's sb offset: 803136 [events: 0000002d]
|(read) sdc7's sb offset: 803136 [events: 0000002d]
|(read) sdd7's sb offset: 803136 [events: 0000002d]
|(read) sde7's sb offset: 803136 [events: 0000002d]
|autorun ...
|considering sde7 ...
|adding sde7 ...
|adding sdd7 ...
|adding sdc7 ...
|adding sdb7 ...
|adding sda7 ...
|created md0
|bind<sda7,1>
|bind<sdb7,2>
|bind<sdc7,3>
|bind<sdd7,4>
|bind<sde7,5>
|running: <sde7><sdd7><sdc7><sdb7><sda7>
|now!
|sde7's event counter: 0000002d
|sdd7's event counter: 0000002d
|sdc7's event counter: 0000002d
|sdb7's event counter: 0000002d
|sda7's event counter: 0000002d
|md0: max total readahead window set to 640k
|md0: 5 data-disks, max readahead per data-disk: 128k
|raid5: device sde7 operational as raid disk 5
|raid5: device sdd7 operational as raid disk 3
|raid5: device sdc7 operational as raid disk 2
|raid5: device sdb7 operational as raid disk 1
|raid5: device sda7 operational as raid disk 0
|raid5: md0, not all disks are operational -- trying to recover array
|raid5: allocated 6350kB for md0
|raid5: raid level 5 set md0 active with 5 out of 6 devices,
|algorithm 0
|RAID5 conf printout:
|--- rd:6 wd:5 fd:1
|disk 0, s:0, o:1, n:0 rd:0 us:1 dev:sda7
|disk 1, s:0, o:1, n:1 rd:1 us:1 dev:sdb7
|disk 2, s:0, o:1, n:2 rd:2 us:1 dev:sdc7
|disk 3, s:0, o:1, n:3 rd:3 us:1 dev:sdd7
|disk 4, s:0, o:0, n:4 rd:4 us:1 dev:[dev 00:00]
|disk 5, s:0, o:1, n:5 rd:5 us:1 dev:sde7
|disk 6, s:0, o:0, n:0 rd:0 us:0 dev:[dev 00:00]
|disk 7, s:0, o:0, n:0 rd:0 us:0 dev:[dev 00:00]
|disk 8, s:0, o:0, n:0 rd:0 us:0 dev:[dev 00:00]
|disk 9, s:0, o:0, n:0 rd:0 us:0 dev:[dev 00:00]
|disk 10, s:0, o:0, n:0 rd:0 us:0 dev:[dev 00:00]
|disk 11, s:0, o:0, n:0 rd:0 us:0 dev:[dev 00:00]
|RAID5 conf printout:
|--- rd:6 wd:5 fd:1
|disk 0, s:0, o:1, n:0 rd:0 us:1 dev:sda7
|disk 1, s:0, o:1, n:1 rd:1 us:1 dev:sdb7
|disk 2, s:0, o:1, n:2 rd:2 us:1 dev:sdc7
|disk 3, s:0, o:1, n:3 rd:3 us:1 dev:sdd7
|disk 4, s:0, o:0, n:4 rd:4 us:1 dev:[dev 00:00]
|disk 5, s:0, o:1, n:5 rd:5 us:1 dev:sde7
|disk 6, s:0, o:0, n:0 rd:0 us:0 dev:[dev 00:00]
|disk 7, s:0, o:0, n:0 rd:0 us:0 dev:[dev 00:00]
|disk 8, s:0, o:0, n:0 rd:0 us:0 dev:[dev 00:00]
|disk 9, s:0, o:0, n:0 rd:0 us:0 dev:[dev 00:00]
|disk 10, s:0, o:0, n:0 rd:0 us:0 dev:[dev 00:00]
|disk 11, s:0, o:0, n:0 rd:0 us:0 dev:[dev 00:00]
|md: updating md0 RAID superblock on device
|sde7 [events: 0000002e](write) sde7's sb offset: 803136
|md: recovery thread got woken up ...
|md0: no spare disk to reconstruct array! -- continuing in
|degraded mode
|md: recovery thread finished ...
|sdd7 [events: 0000002e](write) sdd7's sb offset: 803136
|sdc7 [events: 0000002e](write) sdc7's sb offset: 803136
|sdb7 [events: 0000002e](write) sdb7's sb offset: 803136
|sda7 [events: 0000002e](write) sda7's sb offset: 8666944
|.
|... autorun DONE.

After rebooting again with the fifth disk operating (and the RAID dead), I
decided to erase the fifth disk's RAID partition with dd if=/dev/zero
of=/dev/sde7. After that, the RAID still wouldn't come up, error messages as
given above.

Not even erasing the whole disk (dd if=/dev/zero of=/dev/sde) could get the
RAID up again. Now, where do I go from here? Any ideas?

Greetings
Marc
RAID5 array not coming up after "repaired" disk

Reply via email to