attempt to access beyond end of device
Folks, kernel 2.2.13ac1, patched with ide.2.2.13.1999.patch and raid0145-19990824-2.2.11. I know this is no longer "state of the art", but it was pretty solid in its day. Recently, we've had 2 events which took our the entire raid5 array, both followed the same pattern. Here's the sequence: Drive loses DMA for some reason. Jul 14 09:20:11 osmin kernel: hdi: timeout waiting for DMA Jul 14 09:20:11 osmin kernel: hdi: irq timeout: status=0xd0 { Busy } Jul 14 09:20:11 osmin kernel: hdi: DMA disabled Jul 14 09:20:12 osmin kernel: ide4: reset: success Further attempts to access the disk lead to: Jul 14 09:22:25 osmin kernel: hdi: write_intr error2: nr_sectors=1, stat=0x58 Jul 14 09:22:25 osmin kernel: hdi: write_intr: status=0x58 { DriveReady SeekComplete DataRequest } Jul 14 09:22:25 osmin kernel: ide4: reset: success Jul 14 09:27:32 osmin kernel: hdi: write_intr error2: nr_sectors=1, stat=0x58 Jul 14 09:27:32 osmin kernel: hdi: write_intr: status=0x58 { DriveReady SeekComplete DataRequest } Jul 14 09:27:32 osmin kernel: ide4: reset: success This goes on for hours and hours, and the drive is still marked active in mdstat. Finally, after many hours: Jul 15 00:25:45 osmin kernel: hdi: write_intr error2: nr_sectors=1, stat=0x58 Jul 15 00:25:45 osmin kernel: hdi: write_intr: status=0x58 { DriveReady SeekComplete DataRequest } Jul 15 00:25:45 osmin kernel: ide4: reset: success Jul 15 00:25:47 osmin kernel: hdi: write_intr error2: nr_sectors=1, stat=0x58 Jul 15 00:25:47 osmin kernel: hdi: write_intr: status=0x58 { DriveReady SeekComplete DataRequest } Jul 15 00:25:47 osmin kernel: ide4: reset: success Jul 15 00:26:06 osmin kernel: attempt to access beyond end of device Jul 15 00:26:06 osmin kernel: 39:01: rw=0, want=635481100, limit=33417184 Jul 15 00:26:06 osmin kernel: dev 09:01 blksize=4096 blocknr=635481099 sector=1270962198 size=1024 count=1 Jul 15 00:26:06 osmin kernel: raid5: Disk failure on hdk1, disabling device. Operation continuing on 3 devices Jul 15 00:26:06 osmin kernel: raid5: restarting stripe 1270962198 Jul 15 00:26:06 osmin kernel: attempt to access beyond end of device Jul 15 00:26:06 osmin kernel: 16:41: rw=0, want=635481100, limit=36630688 Jul 15 00:26:06 osmin kernel: dev 09:01 blksize=4096 blocknr=635481099 sector=1270962198 size=1024 count=1 Jul 15 00:26:06 osmin kernel: raid5: Disk failure on hdd1, disabling device. Operation continuing on 2 devices Jul 15 00:26:06 osmin kernel: attempt to access beyond end of device Jul 15 00:26:06 osmin kernel: 22:01: rw=0, want=635481100, limit=33417184 Jul 15 00:26:06 osmin kernel: dev 09:01 blksize=4096 blocknr=635481099 sector=1270962198 size=1024 count=1 Jul 15 00:26:06 osmin kernel: raid5: Disk failure on hdg1, disabling device. Operation continuing on 1 devices Jul 15 00:26:06 osmin kernel: attempt to access beyond end of device Jul 15 00:26:06 osmin kernel: 38:01: rw=0, want=635481100, limit=33417184 Jul 15 00:26:06 osmin kernel: dev 09:01 blksize=4096 blocknr=635481099 sector=1270962198 size=1024 count=1 Jul 15 00:26:06 osmin kernel: raid5: Disk failure on hdi1, disabling device. Operation continuing on 0 devices Jul 15 00:26:06 osmin kernel: raid5: restarting stripe 1270962198 followed by Jul 15 00:26:06 osmin kernel: raid5: md1: unrecoverable I/O error for block 4053926987 Jul 15 00:26:06 osmin kernel: raid5: md1: unrecoverable I/O error for block 4053730379 on and on forever, and the array is dead to the world. Raid has failed me here. I lost one disk, I lost them all. The reason I installed RAID simply led me to a larger catastrophe. Why? Yes, I can reboot and fsck the array, but files are missing (old files not recently accessed) and there's repairing to be done. Not an ideal solution. My question is this: do the diagnostics above point to a misconfig on my part, or is this a shortcoming in Raid's ability to cope with a drive with DMA disabled? -Darren
Fatal: Only RAID1 devices are supported for boot images
How can I get lilo to work? Funny is I have a working lilo boot sector, but I cannot create a new one. I have no idea what I've changed... Sven - # lilo -v -t LILO version 21.4-4 (test mode), Copyright (C) 1992-1998 Werner Almesberger 'lba32' extensions Copyright (C) 1999,2000 John Coffman boot = /dev/hde, map = /boot/map.2101 Reading boot sector from /dev/hde Merging with /usr/local/src/lilo-21.4.4/boot.b Fatal: Only RAID1 devices are supported for boot images --- lilo.conf --- # more /etc/lilo.conf # LILO Konfigurations-Datei # Start LILO global Section install = /usr/local/src/lilo-21.4.4/boot.b # # I have tried both (hda and md100), but they didn't work!!! # # boot=/dev/hda boot=/dev/md100 # compact # faster, but won't work on all systems. linear # for RAID vga = normal# force sane state read-only prompt # timeout=00 timeout=50 # End LILO global Section # image = /boot/vmlinuz root = /dev/md100 label = Linux # cat /proc/mdstat Personalities : [raid0] [raid1] [raid5] read_ahead 1024 sectors md100 : active raid1 hdg1[1] hde1[0] 153600 blocks [2/2] [UU] md101 : active raid1 hdg2[1] hde2[0] 20480 blocks [2/2] [UU] md150 : active raid5 hdg6[1] hde6[0] 2054912 blocks level 5, 128k chunk, algorithm 2 [3/2] [UU_] md151 : active raid5 hdg7[1] hde7[0] 1027840 blocks level 5, 128k chunk, algorithm 2 [3/2] [UU_] md159 : active raid5 hdg8[1] hde8[0] 2055936 blocks level 5, 128k chunk, algorithm 2 [3/2] [UU_] md155 : active raid5 hdg9[1] hde9[0] 1027840 blocks level 5, 128k chunk, algorithm 2 [3/2] [UU_] md170 : active raid5 hdg10[1] hde10[0] 3084288 blocks level 5, 128k chunk, algorithm 2 [3/2] [UU_] md190 : active raid5 hdg11[1] hde11[0] 2055936 blocks level 5, 128k chunk, algorithm 2 [3/2] [UU_] md191 : active raid5 hdg12[1] hde12[0] 16 blocks level 5, 128k chunk, algorithm 2 [3/2] [UU_] md192 : active raid5 hdg13[1] hde13[0] 320256 blocks level 5, 128k chunk, algorithm 2 [3/2] [UU_] md200 : active raid5 hdg15[1] hde15[0] 1504 blocks level 5, 128k chunk, algorithm 2 [3/2] [UU_] md230 : active raid5 hdg16[1] hde16[0] 320256 blocks level 5, 128k chunk, algorithm 2 [3/2] [UU_] unused devices: none
Re: Failure autodetecting raid0 partitions
Anders Qvist wrote: I have a 2.2.11+intl+raid0.90 successfully mounting its ext2 root file system off /dev/md0, which is autodetected by the kernel. A 2.4-test2 kernel compiled with CONFIG_AUTODETECT_RAID fails to autodetect my partitons when I write it to a floppy and boot it. It just says autodetecting RAID arrays ... autorun DONE. There is probably something I don't know. I'd be grateful if someone told me what it was. NB: I'm not on the list. The autodetection was not done on partitions inside an extended partition. This patch fixes that. --- linux-2.4.0-test4/fs/partitions/msdos.c Sat Jul 15 13:22:29 2000 +++ linux/fs/partitions/msdos.c Sat Jul 15 21:03:38 2000 @@ -136,6 +136,12 @@ add_gd_partition(hd, current_minor, this_sector+START_SECT(p)*sector_size, NR_SECTS(p)*sector_size); +#if CONFIG_BLK_DEV_MD CONFIG_AUTODETECT_RAID + if (SYS_IND(p) == LINUX_RAID_PARTITION) { + md_autodetect_dev(MKDEV(hd-major,current_minor)); + } +#endif + current_minor++; loopct = 0; if ((current_minor mask) == 0)
Re: Failure autodetecting raid0 partitions
Wow, an email CCed to Linus himself! *faint*
Re: Failure autodetecting raid0 partitions
Edward Schernau wrote: Wow, an email CCed to Linus himself! *faint* Well do you know of another way to get a patch into the kernel ??
Re: Fatal: Only RAID1 devices are supported for boot images
# lilo -v -t LILO version 21.4-4 (test mode), Copyright (C) 1992-1998 Werner Almesberger 'lba32' extensions Copyright (C) 1999,2000 John Coffman boot = /dev/hde, map = /boot/map.2101 Reading boot sector from /dev/hde Merging with /usr/local/src/lilo-21.4.4/boot.b Fatal: Only RAID1 devices are supported for boot images --- lilo.conf --- install = /usr/local/src/lilo-21.4.4/boot.b I solved the problem. boot.b was on a RAID 5 devide. I didn't know thta's a problem and I don't think it is one bacause it worked... Sven
Re: Failure autodetecting raid0 partitions
From [EMAIL PROTECTED] Sat Jul 15 19:29:44 2000 Edward Schernau wrote: Wow, an email CCed to Linus himself! *faint* Well do you know of another way to get a patch into the kernel ?? So if Linus gets hit by a bus (or a fast moving hari krishna), how are folks to get things into the kernel then? C -- Christopher Mauritz [EMAIL PROTECTED]
Re: Failure autodetecting raid0 partitions
So if Linus gets hit by a bus (or a fast moving hari krishna), how are folks to get things into the kernel then? Probably Alan. -sv