Re: Trouble with 2.2.13ac3

James Manning Sun, 9 Jan 2000 09:52:07 -0800
Heavy snippage/rearrangement for brevity :)

[ Sunday, January  9, 2000 ] Jochen Scharrlach wrote:
> I recently upgraded one machine from the original RedHat 5.2/kernel
> 2.0.36 RAID-stuff (using RAID1) to kernel 2.2.13ac3 with
> raidtools-19990824-0.90.tar.gz. The partition table on both disks
> looks like this:
> 
>    Device Boot    Start      End   Blocks   Id  System
> /dev/sda5            67      656  4739143+  fd  Unknown
> 
> raiddev                 /dev/md0
>         device                  /dev/sda5
>         raid-disk               0
>         device                  /dev/sdb5
>         raid-disk               1
> 
> # cat /proc/mdstat
> md0 : active raid1 sdb5[1](F) sda5[0](F) 4739072 blocks [2/1] [U_]
> 
> Jan  6 00:21:14 picard kernel: attempt to access beyond end of device
> Jan  6 00:21:14 picard kernel: 08:15: rw=0, want=1094795586, limit=4739143
> Jan  6 00:21:14 picard kernel: dev 09:00 blksize=1024 blocknr=1094795585 
>sector=-2105376126 size=1024 count=1
> Jan  6 00:21:14 picard kernel: raid1: Disk failure on sdb5, disabling device.
> Jan  6 00:21:14 picard kernel:        Operation continuing on 1 devices
> Jan  6 00:21:14 picard kernel: raid1: md0: rescheduling block 1094795585
> Jan  6 00:21:14 picard kernel: attempt to access beyond end of device
> Jan  6 00:21:14 picard kernel: 08:05: rw=0, want=1094795586, limit=4739143
> Jan  6 00:21:14 picard kernel: dev 09:00 blksize=1024 blocknr=1094795585 
>sector=-2105376126 size=1024 count=1
> Jan  6 00:21:14 picard kernel: raid1: only one disk left and IO error.
> Jan  6 00:21:14 picard kernel: raid1: md0: rescheduling block 1094795585

What worries me is that what looks like is happening is that the
md-layer is passing a very-invalid sector request (for whatever reason
it got that far) down to the devices making up your raid1 and since the
ll_rw_blk::make_request() fails the md-layer tags that as a failing device
(without having checked the request against valid size itself) and moves
on, failing on successive devices (same reason, esp. in raid1 :) until
it gives up and just reschedules the block all over again, eventually just
failing altogether.

Is this a correct interpretation?  If so, it seems like either struct
mddev_s or struct mirror_info needs a size/sect_count/whatever parameter
added to check against the buffer_head being requested...  I don't see
a make_request path back that can handle this case on its own...

James
Re: Trouble with 2.2.13ac3

Reply via email to