Software levels: Redhat 6.0, kernel 2.2.5-15, raidtools-0.90
Last Monday (while I was gone attending a conference of course) it seems that
one of my raid-1 disk pairs had some kind of software error causing the raid
to break:
Nov 8 14:03:27 pop-3 kernel: attempt to access beyond end of device
Nov 8 14:03:27 pop-3 kernel: 08:51: rw=0, want=892027448, limit=8956206
Nov 8 14:03:27 pop-3 kernel: raid1: Disk failure on sdf1, disabling device.
Nov 8 14:03:27 pop-3 kernel: Operation continuing on 1 devices
Nov 8 14:03:27 pop-3 kernel: raid1: md1: rescheduling block 892027447
Nov 8 14:03:27 pop-3 kernel: attempt to access beyond end of device
Nov 8 14:03:27 pop-3 kernel: 08:41: rw=0, want=1295592304, limit=8956206
Nov 8 14:03:27 pop-3 kernel: raid1: only one disk left and IO error.
Nov 8 14:03:27 pop-3 kernel: raid1: md1: rescheduling block 1295592303
Nov 8 14:03:27 pop-3 kernel: md: recovery thread got woken up ...
Nov 8 14:03:27 pop-3 kernel: md1: no spare disk to reconstruct array! -- continuing
in degraded mode
Nov 8 14:03:27 pop-3 kernel: md: recovery thread finished ...
Nov 8 14:03:27 pop-3 kernel: dirty sb detected, updating.
Nov 8 14:03:27 pop-3 kernel: md: updating md1 RAID superblock on device
Nov 8 14:03:27 pop-3 kernel: (skipping faulty sdf1 )
Nov 8 14:03:27 pop-3 kernel: (skipping faulty sde1 )
Nov 8 14:03:27 pop-3 kernel: .
Nov 8 14:03:27 pop-3 kernel: raid1: md1: unrecoverable I/O read error for block
1295592303
Nov 8 14:03:27 pop-3 kernel: raid1: md1: redirecting sector 892027447 to another
mirror
Nov 8 14:03:27 pop-3 kernel: attempt to access beyond end of device
Nov 8 14:03:27 pop-3 kernel: 08:41: rw=0, want=892027448, limit=8956206
Nov 8 14:03:27 pop-3 kernel: raid1: only one disk left and IO error.
Nov 8 14:03:27 pop-3 kernel: raid1: md1: rescheduling block 892027447
Nov 8 14:03:27 pop-3 kernel: raid1: md1: unrecoverable I/O read error for block
892027447
Nov 8 14:03:27 pop-3 kernel: md: recovery thread got woken up ...
Nov 8 14:03:27 pop-3 kernel: md1: no spare disk to reconstruct array! -- contin uing
in degraded mode
Nov 8 14:03:27 pop-3 kernel: md: recovery thread finished ...
Nov 8 14:03:27 pop-3 kernel: attempt to access beyond end of device
Nov 8 14:03:27 pop-3 kernel: 08:41: rw=0, want=892027448, limit=8956206
Nov 8 14:03:27 pop-3 kernel: raid1: only one disk left and IO error.
Nov 8 14:03:27 pop-3 kernel: raid1: md1: rescheduling block 892027447
Nov 8 14:03:27 pop-3 kernel: attempt to access beyond end of device
Nov 8 14:03:27 pop-3 kernel: 08:41: rw=0, want=1295592304, limit=8956206
Nov 8 14:03:27 pop-3 kernel: raid1: only one disk left and IO error.
Nov 8 14:03:27 pop-3 kernel: raid1: md1: rescheduling block 1295592303
Nov 8 14:03:27 pop-3 kernel: md: recovery thread got woken up ...
Nov 8 14:03:27 pop-3 kernel: md1: no spare disk to reconstruct array! -- contin
uing in degraded mode
Nov 8 14:03:27 pop-3 kernel: md: recovery thread finished ...
Nov 8 14:03:27 pop-3 kernel: raid1: md1: unrecoverable I/O read error for
block 1295592303
Nov 8 14:03:27 pop-3 kernel: raid1: md1: unrecoverable I/O read error for block
892027447
The disk drives involved are for our pop servers, so most likely the software
error was due to a corrupt map file.
My /proc/mdstat currently looks like:
Personalities : [raid1]
read_ahead 1024 sectors
md1 : active raid1 sdf1[1](F) sde1[0](F) 8956096 blocks [2/1] [U_]
unused devices: <none>
The first thing I tried was to do was:
raidhotremove /dev/md1 /dev/sde1
but it said the device was busy.
>From looking at /proc/scsi/aic7xxx/0, I have determined that reads and writes
to /dev/md1 are going to device 4, which is /dev/sde1 .
Now the question is, what do I do to recover from this situation? My
instinct is to tar /dev/sde1 up as a backup before I mess with the raid stuff
and then to reboot in single user mode after removing /dev/md1 from
/etc/fstab so that the raid does not attempt to start up. Then do an fsck on
/dev/sde1. Now if I restart the raid at this point, what ensures me that
this will resync in the right direction? That is, uses /dev/sde1 as the
master and not /dev/sdf1. Does it use the raid superblocks to determine
which way to sync up?
Comments???
My /etc/raidtab looks like:
# raid-1 configuration
raiddev /dev/md1
raid-level 1
nr-raid-disks 2
nr-spare-disks 0
chunk-size 4
persistent-superblock 1
device /dev/sde1
raid-disk 0
device /dev/sdf1
raid-disk 1
Thanks,
Kent Ziebell
__________________________________________________________________
Kent A. Ziebell [EMAIL PROTECTED]
249 Durham Center
Iowa State University Computation Center voice: (515) 294-9607
Ames, Iowa 50011 fax: (515) 294-1717