Re: Hard disk woes
Michael Abbott wrote: I still think the question: why does FreeBSD hang? is interesting. indeed - no idea how Linux handles - win32 would probably BSOD (I had W2K servers BSOD because someone accidently powered down an external drive it was writing to. nasty). anyway, i had a weird problem too, ad4 (SATA drive) got detached overnight - more details at: http://lists.freebsd.org/pipermail/freebsd-questions/2005-September/097607.html When I got to the console in the morning, the box was completelly frozen at the console, though I could access just fine via ssh. Would anyone care to provide some explanation about this? (After a couple of full scans with mhdd and no problems detected, I put the drive back into the server and it's been running ok since then. bloody weird.) thanks in advance, beto ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Hard disk woes
I'm having some very odd behaviour from one of my hard disks and I wonder what anybody makes of it. In brief, the hard disk in questions works just fine much of the time, but when high volume data transfers are requested I get the following in /var/log/messages: Sep 3 15:21:02 saturn /kernel: ad6: READ command timeout tag=0 serv=0 - resetting Sep 3 15:21:02 saturn /kernel: ata3: resetting devices .. done Sep 3 15:21:12 saturn /kernel: ad6: READ command timeout tag=0 serv=0 - resetting Sep 3 15:21:12 saturn /kernel: ata3: resetting devices .. done Sep 3 15:21:23 saturn /kernel: ad6: READ command timeout tag=0 serv=0 - resetting Sep 3 15:21:23 saturn /kernel: ata3: resetting devices .. done Sep 3 15:21:33 saturn /kernel: ad6: READ command timeout tag=0 serv=0 - resetting Sep 3 15:21:33 saturn /kernel: ad6: trying fallback to PIO mode Sep 3 15:21:33 saturn /kernel: ata3: resetting devices .. done Sep 3 15:21:43 saturn /kernel: ad6: READ command timeout tag=0 serv=0 - resetting Sep 3 15:21:43 saturn /kernel: ata3: resetting devices .. ata3-slave: ATA identify retries exceeded Sep 3 15:21:43 saturn /kernel: done After this point the hard disk in question is frozen until I reboot, and any process that tries to touch it is similarly frozen (doesn't even respond to kill -9). `shutdown -r` is enough to restore operation, and the rest of the system seemed happy enough. Another interesting effect. I placed a replacement hard disk on the same ATA bus (as a slave, device ad7) and tried copying files from ad6 to ad7. This time when ad6 froze and the kerned decided to give up on ata3 (and so decided to disable ad7 at the same time, naturally enough) the entire system froze! No response from the console, stone cold dead, hard reset needed. So some questions seem to me to arise from this. 1. Why does FreeBSD handle this so ungracefully? If restarting is sufficient to bring ata3 back then can't the ata driver do a proper restart? 2. Goodness me, FreeBSD froze! I know it's a hardware failure, but still: it's on a auxillary ATA controller with no system files attached. Is this problem of general interest? It's certainly a massive hint to me not to consider (parallel) ATA for RAID! 3. Any thoughts on what is wrong with the hard disk in question? I've changed ATA controllers, so it seems to be the disk, not the controller. The behaviour is very odd. If I copy files off one at a time, eg using: find . -type f -exec cp {} $TARGET/{} \; -exec echo -n '.' \; the disk seems to hang in there, but if I just do cp -R . $TARGET then it freezes! (This statement may not have been thoroughly tested: having to restart each time gets old quite quickly.) Ok, now for the boring bits. $ uname -a FreeBSD saturn.araneidae.co.uk 4.11-RELEASE-p11 FreeBSD 4.11-RELEASE-p11 #6: Sat Aug 27 16:33:58 GMT 2005 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/GENERIC i386 $ dmesg | grep ata atapci0: HighPoint HPT370 ATA100 controller port 0xa000-0xa0ff,0x9c00-0x9c03,0x9800-0x9807,0x9400-0x9403,0x9000-0x9007 irq 12 at device 11.0 on pci0 ata2: at 0x9000 on atapci0 ata3: at 0x9800 on atapci0 atapci1: VIA 8233 ATA133 controller port 0xa800-0xa80f at device 17.1 on pci0 ata0: at 0x1f0 irq 14 on atapci1 ata1: at 0x170 irq 15 on atapci1 atapci2: HighPoint HPT372 ATA133 controller port 0xc400-0xc4ff,0xc000-0xc003,0xbc00-0xbc07,0xb800-0xb803,0xb400-0xb407 irq 10 at device 19.0 on pci0 ata4: at 0xb400 on atapci2 ata5: at 0xbc00 on atapci2 ad0: 39083MB Maxtor 4D040H2 [79408/16/63] at ata0-master UDMA100 ad1: 190782MB SAMSUNG SP2014N [387621/16/63] at ata0-slave UDMA133 ad4: 76319MB ST380021A [155061/16/63] at ata2-master UDMA100 ad6: 76319MB ST380021A [155061/16/63] at ata3-master UDMA100 acd0: DVD-ROM CREATIVEDVD-ROM DVD2240E 12/24/97 at ata1-master PIO4 $ sudo atacontrol cap ata3 0 ATA channel 3, Master, device ad6: ATA/ATAPI revision5 device model ST380021A serial number 3HV0MYL9 firmware revision 3.10 cylinders 16383 heads 16 sectors/track 63 lba supported 156301488 sectors lba48 not supported dma supported overlap not supported Feature Support EnableValue Vendor write cacheyes yes read ahead yes yes dma queued no no 0/00 SMART yes no microcode download yes yes security yes no power management yes yes advanced power management no no 65278/FEFE automatic acoustic management yes yes 128/80 128/80 $ That's everything I can think of. ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Hard disk woes
On Mon, Sep 05, 2005 at 03:16:13PM +, Michael Abbott wrote: I'm having some very odd behaviour from one of my hard disks and I wonder what anybody makes of it. In brief, the hard disk in questions works just fine much of the time, but when high volume data transfers are requested I get the following in /var/log/messages: Sep 3 15:21:02 saturn /kernel: ad6: READ command timeout tag=0 serv=0 - resetting Sep 3 15:21:02 saturn /kernel: ata3: resetting devices .. done Sep 3 15:21:12 saturn /kernel: ad6: READ command timeout tag=0 serv=0 - resetting Sep 3 15:21:12 saturn /kernel: ata3: resetting devices .. done Sep 3 15:21:23 saturn /kernel: ad6: READ command timeout tag=0 serv=0 - resetting Sep 3 15:21:23 saturn /kernel: ata3: resetting devices .. done Sep 3 15:21:33 saturn /kernel: ad6: READ command timeout tag=0 serv=0 - resetting Sep 3 15:21:33 saturn /kernel: ad6: trying fallback to PIO mode Sep 3 15:21:33 saturn /kernel: ata3: resetting devices .. done Sep 3 15:21:43 saturn /kernel: ad6: READ command timeout tag=0 serv=0 - resetting Sep 3 15:21:43 saturn /kernel: ata3: resetting devices .. ata3-slave: ATA identify retries exceeded Sep 3 15:21:43 saturn /kernel: done After this point the hard disk in question is frozen until I reboot, and any process that tries to touch it is similarly frozen (doesn't even respond to kill -9). `shutdown -r` is enough to restore operation, and the rest of the system seemed happy enough. Another interesting effect. I placed a replacement hard disk on the same ATA bus (as a slave, device ad7) and tried copying files from ad6 to ad7. This time when ad6 froze and the kerned decided to give up on ata3 (and so decided to disable ad7 at the same time, naturally enough) the entire system froze! No response from the console, stone cold dead, hard reset needed. So some questions seem to me to arise from this. 1. Why does FreeBSD handle this so ungracefully? If restarting is sufficient to bring ata3 back then can't the ata driver do a proper restart? 2. Goodness me, FreeBSD froze! I know it's a hardware failure, but still: it's on a auxillary ATA controller with no system files attached. Is this problem of general interest? It's certainly a massive hint to me not to consider (parallel) ATA for RAID! 3. Any thoughts on what is wrong with the hard disk in question? I've changed ATA controllers, so it seems to be the disk, not the controller. The behaviour is very odd. If I copy files off one at a time, eg using: find . -type f -exec cp {} $TARGET/{} \; -exec echo -n '.' \; the disk seems to hang in there, but if I just do cp -R . $TARGET then it freezes! (This statement may not have been thoroughly tested: having to restart each time gets old quite quickly.) Ok, now for the boring bits. $ uname -a FreeBSD saturn.araneidae.co.uk 4.11-RELEASE-p11 FreeBSD 4.11-RELEASE-p11 #6: Sat Aug 27 16:33:58 GMT 2005 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/GENERIC i386 $ dmesg | grep ata atapci0: HighPoint HPT370 ATA100 controller port 0xa000-0xa0ff,0x9c00-0x9c03,0x9800-0x9807,0x9400-0x9403,0x9000-0x9007 irq 12 at device 11.0 on pci0 ata2: at 0x9000 on atapci0 ata3: at 0x9800 on atapci0 atapci1: VIA 8233 ATA133 controller port 0xa800-0xa80f at device 17.1 on pci0 ata0: at 0x1f0 irq 14 on atapci1 ata1: at 0x170 irq 15 on atapci1 atapci2: HighPoint HPT372 ATA133 controller port 0xc400-0xc4ff,0xc000-0xc003,0xbc00-0xbc07,0xb800-0xb803,0xb400-0xb407 irq 10 at device 19.0 on pci0 ata4: at 0xb400 on atapci2 ata5: at 0xbc00 on atapci2 ad0: 39083MB Maxtor 4D040H2 [79408/16/63] at ata0-master UDMA100 ad1: 190782MB SAMSUNG SP2014N [387621/16/63] at ata0-slave UDMA133 ad4: 76319MB ST380021A [155061/16/63] at ata2-master UDMA100 ad6: 76319MB ST380021A [155061/16/63] at ata3-master UDMA100 acd0: DVD-ROM CREATIVEDVD-ROM DVD2240E 12/24/97 at ata1-master PIO4 $ sudo atacontrol cap ata3 0 ATA channel 3, Master, device ad6: ATA/ATAPI revision5 device model ST380021A serial number 3HV0MYL9 firmware revision 3.10 cylinders 16383 heads 16 sectors/track 63 lba supported 156301488 sectors lba48 not supported dma supported overlap not supported Feature Support EnableValue Vendor write cacheyes yes read ahead yes yes dma queued no no 0/00 SMART yes no microcode download yes yes security yes no power management yes yes advanced power management no no 65278/FEFE automatic acoustic management yes yes 128/80 128/80 $ That's everything I can think of. Just a general comment: I had a very similar problem a while back. After replacing the drive in question,
Re: Hard disk woes
On Mon, 5 Sep 2005, Jason Morgan wrote: On Mon, Sep 05, 2005 at 03:16:13PM +, Michael Abbott wrote: I'm having some very odd behaviour from one of my hard disks and I wonder what anybody makes of it. In brief, the hard disk in questions works just fine much of the time, but when high volume data transfers are requested I get the following in /var/log/messages: Sep 3 15:21:02 saturn /kernel: ad6: READ command timeout tag=0 serv=0 - resetting I had a very similar problem a while back. After replacing the drive in question, then replacing the motherboard, I discovered it was a power issue. The power supply was freaking out at medium to high loads, which was causing the device to continually reset. Well, I hope that's not it. I'm encouraged to think not: - the problem seems to be tied to one particular hard disk and I presently run with four hard disks - the system has operated trouble free for three years - my memory is that it was a good quality power supply. I don't really see how I'd diagnose a power supply problem, but as I say, the hard disk in question is the only part with problems. ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Hard disk woes
On Sep 5, 2005, at 1:56 PM, Michael Abbott wrote: I had a very similar problem a while back. After replacing the drive in question, then replacing the motherboard, I discovered it was a power issue. The power supply was freaking out at medium to high loads, which was causing the device to continually reset. Well, I hope that's not it. I'm encouraged to think not: - the problem seems to be tied to one particular hard disk and I presently run with four hard disks - the system has operated trouble free for three years - my memory is that it was a good quality power supply. I don't really see how I'd diagnose a power supply problem, but as I say, the hard disk in question is the only part with problems. Yeah But... Power supplies wear out. Particularly the capacitors. I have seen every single component replaced in denial that the problem could be related to the power supply. Then the PS was finally replaced because it was the only thing which had not. And the problem was the PS all along. -- David Kelly N4HHE, [EMAIL PROTECTED] Whom computers would destroy, they must first drive mad. ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: Hard disk woes
On Mon, 5 Sep 2005, David Kelly wrote: I had a very similar problem a while back. After replacing the drive in question, then replacing the motherboard, I discovered it was a power issue. The power supply was freaking out at medium to high loads, which was causing the device to continually reset. On Sep 5, 2005, at 1:56 PM, Michael Abbott wrote: Well, I hope that's not it. I'm encouraged to think not: Yeah But... Power supplies wear out. Particularly the capacitors. I have seen every single component replaced in denial that the problem could be related to the power supply. Then the PS was finally replaced because it was the only thing which had not. And the problem was the PS all along. Well, I do have another reason for thinking that it's nothing to do with the power supply: a bit of history I didn't mention (because it's long and not particularly interesting). When I first installed this machine (a bit over three years ago) I used the offending disk together with another disk of the same model. I first used the motherboard hardware RAID (using striping for speed, more fool me) on the motherboard and installed FreeBSD. It broke, really quite quickly (within a week or so). I blamed the RAID controller and tried again, this time using vinum. The system survived quite a bit longer (can't remember how long, a month or so maybe), but suddenly failed quite horribly: I lost all data. I retired the two disks and started again, and the resulting system has run sweetly for three years. Recently I brought the two disks out of retirement, and one of them seems most unhappy (as described). I'm strongly persuaded (convinced, even) that that one disk is dodgy. I think I'm going to have to bin it, unless somebody can come up with a way to reliably molycoddle it. I still think the question: why does FreeBSD hang? is interesting. ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to [EMAIL PROTECTED]