Re: RAID5 array not coming up after "repaired" disk

Douglas Egan Fri, 24 Mar 2000 13:28:50 -0800
When this happened to me I had to "raidhotadd" to get it back in the
list.  What does your /proc/mdstat indicate?

Try:
raidhotadd /dev/md0 /dev/sde7

Doug Egan

Marc Haber wrote:
> 
> Hi!
> 
> I am using kernel 2.2.14 with the RedHat Patch (raid-2.2.14-B1) on a Debian
> System, raid tools 0.90.990824-5. I have six SCSI disks on two host
> adapters, raidtab as follows:
> 
> |raiddev /dev/md0
> |raid-level                5
> |nr-raid-disks             6
> |nr-spare-disks            0
> |persistent-superblock     1
> |chunk-size                32
> |
> |device                    /dev/sda7
> |raid-disk                 0
> |device                    /dev/sdb7
> |raid-disk                 1
> |device                    /dev/sdc7
> |raid-disk                 2
> |device                    /dev/sdd7
> |raid-disk                 3
> |device                    /dev/sde7
> |raid-disk                 4
> |device                    /dev/sdf7
> |raid-disk                 5
> 
> Performance of that array is quite impressive.
> 
> However, I wanted to test RAID behavior in the case of a disk failure. So I
> disconnected sde while the array was running. The system tried to access the
> failed disk for a few seconds and then continued in degraded mode.
> 
> I then rebooted, to find the array (now running on sda7 thru sde7 because
> sdf had moved up to sde) still in degraded mode. So far, so good.
> 
> Next step was reconnecting the "failed" disk and rebooting again. This time,
> the array didn't come up, and raidstart /dev/md0 gave the following output:
> 
> |(read) sda7's sb offset: 8666944 [events: 00000029]
> |(read) sdb7's sb offset: 803136 [events: 00000029]
> |(read) sdc7's sb offset: 803136 [events: 00000029]
> |(read) sdd7's sb offset: 803136 [events: 00000029]
> |(read) sde7's sb offset: 803136 [events: 00000023]
> |autorun ...
> |considering sde7 ...
> |adding sde7 ...
> |adding sdd7 ...
> |adding sdc7 ...
> |adding sdb7 ...
> |adding sda7 ...
> |created md0
> |bind<sda7,1>
> |bind<sdb7,2>
> |bind<sdc7,3>
> |bind<sdd7,4>
> |bind<sde7,5>
> |running: <sde7><sdd7><sdc7><sdb7><sda7>
> |now!
> |sde7's event counter: 00000023
> |sdd7's event counter: 00000029
> |sdc7's event counter: 00000029
> |sdb7's event counter: 00000029
> |sda7's event counter: 00000029
> |md: superblock update time inconsistency -- using the most recent one
> |freshest: sdd7
> |md: kicking non-fresh sde7 from array!
> |unbind<sde7,4>
> |export_rdev(sde7)
> |md0: former device sde7 is unavailable, removing from array!
> |md0: max total readahead window set to 640k
> |md0: 5 data-disks, max readahead per data-disk: 128k
> |raid5: device sdd7 operational as raid disk 3
> |raid5: device sdc7 operational as raid disk 2
> |raid5: device sdb7 operational as raid disk 1
> |raid5: device sda7 operational as raid disk 0
> |raid5: not enough operational devices for md0 (2/6 failed)
> |RAID5 conf printout:
> |--- rd:6 wd:4 fd:2
> |disk 0, s:0, o:1, n:0 rd:0 us:1 dev:sda7
> |disk 1, s:0, o:1, n:1 rd:1 us:1 dev:sdb7
> |disk 2, s:0, o:1, n:2 rd:2 us:1 dev:sdc7
> |disk 3, s:0, o:1, n:3 rd:3 us:1 dev:sdd7
> |disk 4, s:0, o:0, n:4 rd:4 us:1 dev:[dev 00:00]
> |disk 5, s:0, o:0, n:5 rd:5 us:1 dev:[dev 00:00]
> |disk 6, s:0, o:0, n:0 rd:0 us:0 dev:[dev 00:00]
> |disk 7, s:0, o:0, n:0 rd:0 us:0 dev:[dev 00:00]
> |disk 8, s:0, o:0, n:0 rd:0 us:0 dev:[dev 00:00]
> |disk 9, s:0, o:0, n:0 rd:0 us:0 dev:[dev 00:00]
> |disk 10, s:0, o:0, n:0 rd:0 us:0 dev:[dev 00:00]
> |disk 11, s:0, o:0, n:0 rd:0 us:0 dev:[dev 00:00]
> |raid5: failed to run raid set md0
> |pers->run() failed ...
> |do_md_run() returned -22
> |unbind<sdd7,3>
> |export_rdev(sdd7)
> |unbind<sdc7,2>
> |export_rdev(sdc7)
> |unbind<sdb7,1>
> |export_rdev(sdb7)
> |unbind<sda7,0>
> |export_rdev(sda7)
> |md0 stopped.
> |... autorun DONE.
> 
> It looks like the RAID drivers didn't like the sixth disk moving up to sdf
> again after the fifth disk was revived. This probably resulted in the driver
> thinking that two of six disks were gone.
> 
> After turning off the fifth disk again and a reboot, the array was back
> again with all data, but - of course - still in degraded mode:
> 
> |(read) sda7's sb offset: 8666944 [events: 0000002d]
> |(read) sdb7's sb offset: 803136 [events: 0000002d]
> |(read) sdc7's sb offset: 803136 [events: 0000002d]
> |(read) sdd7's sb offset: 803136 [events: 0000002d]
> |(read) sde7's sb offset: 803136 [events: 0000002d]
> |autorun ...
> |considering sde7 ...
> |adding sde7 ...
> |adding sdd7 ...
> |adding sdc7 ...
> |adding sdb7 ...
> |adding sda7 ...
> |created md0
> |bind<sda7,1>
> |bind<sdb7,2>
> |bind<sdc7,3>
> |bind<sdd7,4>
> |bind<sde7,5>
> |running: <sde7><sdd7><sdc7><sdb7><sda7>
> |now!
> |sde7's event counter: 0000002d
> |sdd7's event counter: 0000002d
> |sdc7's event counter: 0000002d
> |sdb7's event counter: 0000002d
> |sda7's event counter: 0000002d
> |md0: max total readahead window set to 640k
> |md0: 5 data-disks, max readahead per data-disk: 128k
> |raid5: device sde7 operational as raid disk 5
> |raid5: device sdd7 operational as raid disk 3
> |raid5: device sdc7 operational as raid disk 2
> |raid5: device sdb7 operational as raid disk 1
> |raid5: device sda7 operational as raid disk 0
> |raid5: md0, not all disks are operational -- trying to recover array
> |raid5: allocated 6350kB for md0
> |raid5: raid level 5 set md0 active with 5 out of 6 devices,
> |algorithm 0
> |RAID5 conf printout:
> |--- rd:6 wd:5 fd:1
> |disk 0, s:0, o:1, n:0 rd:0 us:1 dev:sda7
> |disk 1, s:0, o:1, n:1 rd:1 us:1 dev:sdb7
> |disk 2, s:0, o:1, n:2 rd:2 us:1 dev:sdc7
> |disk 3, s:0, o:1, n:3 rd:3 us:1 dev:sdd7
> |disk 4, s:0, o:0, n:4 rd:4 us:1 dev:[dev 00:00]
> |disk 5, s:0, o:1, n:5 rd:5 us:1 dev:sde7
> |disk 6, s:0, o:0, n:0 rd:0 us:0 dev:[dev 00:00]
> |disk 7, s:0, o:0, n:0 rd:0 us:0 dev:[dev 00:00]
> |disk 8, s:0, o:0, n:0 rd:0 us:0 dev:[dev 00:00]
> |disk 9, s:0, o:0, n:0 rd:0 us:0 dev:[dev 00:00]
> |disk 10, s:0, o:0, n:0 rd:0 us:0 dev:[dev 00:00]
> |disk 11, s:0, o:0, n:0 rd:0 us:0 dev:[dev 00:00]
> |RAID5 conf printout:
> |--- rd:6 wd:5 fd:1
> |disk 0, s:0, o:1, n:0 rd:0 us:1 dev:sda7
> |disk 1, s:0, o:1, n:1 rd:1 us:1 dev:sdb7
> |disk 2, s:0, o:1, n:2 rd:2 us:1 dev:sdc7
> |disk 3, s:0, o:1, n:3 rd:3 us:1 dev:sdd7
> |disk 4, s:0, o:0, n:4 rd:4 us:1 dev:[dev 00:00]
> |disk 5, s:0, o:1, n:5 rd:5 us:1 dev:sde7
> |disk 6, s:0, o:0, n:0 rd:0 us:0 dev:[dev 00:00]
> |disk 7, s:0, o:0, n:0 rd:0 us:0 dev:[dev 00:00]
> |disk 8, s:0, o:0, n:0 rd:0 us:0 dev:[dev 00:00]
> |disk 9, s:0, o:0, n:0 rd:0 us:0 dev:[dev 00:00]
> |disk 10, s:0, o:0, n:0 rd:0 us:0 dev:[dev 00:00]
> |disk 11, s:0, o:0, n:0 rd:0 us:0 dev:[dev 00:00]
> |md: updating md0 RAID superblock on device
> |sde7 [events: 0000002e](write) sde7's sb offset: 803136
> |md: recovery thread got woken up ...
> |md0: no spare disk to reconstruct array! -- continuing in
> |degraded mode
> |md: recovery thread finished ...
> |sdd7 [events: 0000002e](write) sdd7's sb offset: 803136
> |sdc7 [events: 0000002e](write) sdc7's sb offset: 803136
> |sdb7 [events: 0000002e](write) sdb7's sb offset: 803136
> |sda7 [events: 0000002e](write) sda7's sb offset: 8666944
> |.
> |... autorun DONE.
> 
> After rebooting again with the fifth disk operating (and the RAID dead), I
> decided to erase the fifth disk's RAID partition with dd if=/dev/zero
> of=/dev/sde7. After that, the RAID still wouldn't come up, error messages as
> given above.
> 
> Not even erasing the whole disk (dd if=/dev/zero of=/dev/sde) could get the
> RAID up again. Now, where do I go from here? Any ideas?
> 
> Greetings
> Marc
Re: RAID5 array not coming up after "repaired" disk

Reply via email to