Software levels: Redhat 6.0, kernel 2.2.5-15, raidtools-0.90

Last Monday (while I was gone attending a conference of course) it seems that
one of my raid-1 disk pairs had some kind of software error causing the raid
to break:

Nov  8 14:03:27 pop-3 kernel: attempt to access beyond end of device 
Nov  8 14:03:27 pop-3 kernel: 08:51: rw=0, want=892027448, limit=8956206 
Nov  8 14:03:27 pop-3 kernel: raid1: Disk failure on sdf1, disabling device.
Nov  8 14:03:27 pop-3 kernel:        Operation continuing on 1 devices 
Nov  8 14:03:27 pop-3 kernel: raid1: md1: rescheduling block 892027447 
Nov  8 14:03:27 pop-3 kernel: attempt to access beyond end of device 
Nov  8 14:03:27 pop-3 kernel: 08:41: rw=0, want=1295592304, limit=8956206 
Nov  8 14:03:27 pop-3 kernel: raid1: only one disk left and IO error. 
Nov  8 14:03:27 pop-3 kernel: raid1: md1: rescheduling block 1295592303 
Nov  8 14:03:27 pop-3 kernel: md: recovery thread got woken up ... 
Nov  8 14:03:27 pop-3 kernel: md1: no spare disk to reconstruct array! -- continuing 
in degraded mode 
Nov  8 14:03:27 pop-3 kernel: md: recovery thread finished ... 
Nov  8 14:03:27 pop-3 kernel: dirty sb detected, updating. 
Nov  8 14:03:27 pop-3 kernel: md: updating md1 RAID superblock on device 
Nov  8 14:03:27 pop-3 kernel: (skipping faulty sdf1 ) 
Nov  8 14:03:27 pop-3 kernel: (skipping faulty sde1 ) 
Nov  8 14:03:27 pop-3 kernel: . 
Nov  8 14:03:27 pop-3 kernel: raid1: md1: unrecoverable I/O read error for block 
1295592303 
Nov  8 14:03:27 pop-3 kernel: raid1: md1: redirecting sector 892027447 to another 
mirror 
Nov  8 14:03:27 pop-3 kernel: attempt to access beyond end of device 
Nov  8 14:03:27 pop-3 kernel: 08:41: rw=0, want=892027448, limit=8956206 
Nov  8 14:03:27 pop-3 kernel: raid1: only one disk left and IO error. 
Nov  8 14:03:27 pop-3 kernel: raid1: md1: rescheduling block 892027447 
Nov  8 14:03:27 pop-3 kernel: raid1: md1: unrecoverable I/O read error for block 
892027447 
Nov  8 14:03:27 pop-3 kernel: md: recovery thread got woken up ... 
Nov  8 14:03:27 pop-3 kernel: md1: no spare disk to reconstruct array! -- contin uing 
in degraded mode 
Nov  8 14:03:27 pop-3 kernel: md: recovery thread finished ... 
Nov  8 14:03:27 pop-3 kernel: attempt to access beyond end of device 
Nov  8 14:03:27 pop-3 kernel: 08:41: rw=0, want=892027448, limit=8956206 
Nov  8 14:03:27 pop-3 kernel: raid1: only one disk left and IO error. 
Nov  8 14:03:27 pop-3 kernel: raid1: md1: rescheduling block 892027447 
Nov  8 14:03:27 pop-3 kernel: attempt to access beyond end of device 
Nov  8 14:03:27 pop-3 kernel: 08:41: rw=0, want=1295592304, limit=8956206 
Nov  8 14:03:27 pop-3 kernel: raid1: only one disk left and IO error. 
Nov  8 14:03:27 pop-3 kernel: raid1: md1: rescheduling block 1295592303 
Nov  8 14:03:27 pop-3 kernel: md: recovery thread got woken up ... 
Nov  8 14:03:27 pop-3 kernel: md1: no spare disk to reconstruct array! -- contin
uing in degraded mode 
Nov  8 14:03:27 pop-3 kernel: md: recovery thread finished ... 
Nov  8 14:03:27 pop-3 kernel: raid1: md1: unrecoverable I/O read error for
block 1295592303 
Nov  8 14:03:27 pop-3 kernel: raid1: md1: unrecoverable I/O read error for block 
892027447 

The disk drives involved are for our pop servers, so most likely the software
error was due to a corrupt map file.  

My /proc/mdstat currently looks like:

Personalities : [raid1] 
read_ahead 1024 sectors
md1 : active raid1 sdf1[1](F) sde1[0](F) 8956096 blocks [2/1] [U_]
unused devices: <none>

The first thing I tried was to do was:
raidhotremove /dev/md1 /dev/sde1
but it said the device was busy.  

>From looking at /proc/scsi/aic7xxx/0, I have determined that reads and writes
to /dev/md1 are going to device 4, which is /dev/sde1 .

Now the question is, what do I do to recover from this situation?  My
instinct is to tar /dev/sde1 up as a backup before I mess with the raid stuff
and then to reboot in single user mode after removing /dev/md1 from
/etc/fstab so that the raid does not attempt to start up.  Then do an fsck on
/dev/sde1.  Now if I restart the raid at this point, what ensures me that
this will resync in the right direction?  That is, uses /dev/sde1 as the
master and not /dev/sdf1.  Does it use the raid superblocks to determine
which way to sync up?

Comments???

My /etc/raidtab looks like:

 # raid-1 configuration
raiddev                 /dev/md1
raid-level              1
nr-raid-disks           2
nr-spare-disks          0
chunk-size              4
persistent-superblock   1

device                  /dev/sde1
raid-disk               0

device                  /dev/sdf1
raid-disk               1


Thanks,
Kent Ziebell
__________________________________________________________________
Kent A. Ziebell                              [EMAIL PROTECTED]
249 Durham Center                            
Iowa State University Computation Center     voice: (515) 294-9607
Ames, Iowa 50011                             fax:   (515) 294-1717

Reply via email to