Re: ATA problems again ... general problem of ICH7 or ATA?
On Monday 21 August 2006 23:08, Dominic Marks wrote: > You can use device.hints(5) to do this. > > I have the following in mine to force a RAID card and Sound card to > share IRQ 17. > You need to modify it to suit your environment. > > hw.pci3.13.INTA.irq="17" > > The `13' value is the device number, you can find this in dmesg, same > for pciN. Any chance this could be documented somewhere? (Or tell me where if it is.. :) I checked pci(4) and device.hints(5)) -- Daniel O'Connor software and network engineer for Genesis Software - http://www.gsoft.com.au "The nice thing about standards is that there are so many of them to choose from." -- Andrew Tanenbaum GPG Fingerprint - 5596 B766 97C0 0E94 4347 295E E593 DC20 7B3F CE8C pgpsZqBLzK0cE.pgp Description: PGP signature
Re: ATA problems again ... general problem of ICH7 or ATA?
Matt Dawson wrote: On Monday 21 August 2006 13:00, [EMAIL PROTECTED] wrote: I can confirm the same behaviour with a ULi M1689/Newcastle Athlon64 based system running 6.1-RELEASE-p3 (i386). ad6 just detaches without warning and it takes a reboot to bring it back. atacontrol reinit has no effect. Tried the following to resolve the problems: I don't know what is supposed to be the canonical way to reattach a disconnected SATA drive, but while testing our new hardware and hot-pulling a drive while the system was running, atacaontrol reinit didn't find the reinserted drive here, either. atacontrol detach ata3; atacontrol attach ata3 did. Yes, that is the method for a controlled remove and reattach, a la hotplug SATA. AIUI, though, if the drive goes AWOL on its own you need to reinit the channel before issuing an atacontrol attach foo. In theory... (man 8 atacontrol) In practice, the drive disappears, never to be probed again. A warm reboot without power down makes it appear again, so the drive itself isn't confused. This is same in my case. FWIW, the problem takes *far* longer to rear its head when the SATA controller has a PCI INT and IRQ to itself. Put a NIC onto a shared slot (a very Bad Thing [TM] as the BIOS simply maps the INT to a single IRQ and both devices end up sharing it. Now tranfer a large file over the network and watch the ensuing hilarity) and it happens at least every couple of days. Now, with the slot shared with the SATA controller empty, I have six days uptime since the last event, which means I'm probably due one any time now. I thought so, but it did not solve my problem. I had UHCI sharing same IRQ with SATA (both on irq 19). Instead of playing with device.hints(5), I disabled all unused peripheries in BIOS (USB ports, LPT port, FDD...) After few days, system reports next disk lose. At least gmirror rebuilds the array after a simple reboot, but I would expect the dd operation to throw a wobbly if it's a timing issue/fight for interrupt between the two drives/channels. It doesn't, which makes me wonder if I'm barking up the wrong tree, but I can't help noticing that SATA channels have one interrupt between them whereas PATA channels have one each and all of these reports are from SATA users... Maybe you are right, I don't saw any report with one disk machine. All problems comes from machines with 2 or more SATA disks. I wonder what pciconf -lv shows on Miroslav's system? Is the SATA controller sharing an INT/IRQ with something else? Does moving that device to another slot alleviate the problem at all? SATA is no longer sharing IRQs, but problem persists. system dmesg after verbose boot http://www.quip.cz/1/freebsd/asus_rs120-e3/track_dmesg_verbose_2006-08-21.txt pciconf -lv http://www.quip.cz/1/freebsd/asus_rs120-e3/track_pciconf_2006-08-21.txt Mentioned problem appeared only on heavy disk load (e.g. ports tree copy). I have 3rd system with minimal disk load running for 10 days without problem (FreeBSD 6.0, now in production for mentioned 10 days - machine is "quick replacement" of failed server, system mirrored from old disks to new by dump & restore) Please not that Miroslav and I are using totally different drives, chipsets and processors. He's using, IIRC, an Intel chip with an ICH7 southbridge and Samsung drives. I'm using an AMD Athlon 64 Newcastle (running the i386 port) on a ULi M1689 chipset with WD RE2 drives so, although I'd be more than happy to be the numpty that is wrong and to have ata(4) vindicated by someone else, I suspect it is ata(4) that is the problem. However, finger pointing isn't productive and is certainly not fair given that ata(4) has been progressing so well. Anything else I can try to nail this irksome beast? Any suggestions for where I've been an idiot (easy, tiger!) and missed something obvious? BTW, this is a production server (DLT backed up nightly, so the data is safe) so I can't just pull it to bits. I do have an identical (CPU/mobo) box in the workshop as a workstation, however, which I could buy/borrow another drive for and set up gmirror to try things out. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: ATA problems again ... general problem of ICH7 or ATA?
Patrick M. Hausen wrote: Hi, Dominic! On Mon, Aug 21, 2006 at 02:40:17PM +0100, Dominic Marks wrote: hw.pci3.13.INTA.irq="17" The `13' value is the device number, you can find this in dmesg, same for pciN. So I tried this: em1: port 0x5000-0x501f mem 0xdc18-0xdc19,0xdc10-0xdc17 irq 17 at device 0.0 on pci5 atapci1: port 0x30e8-0x30ef,0x30dc-0x30df,0x30e0-0x30e7,0x30d8-0x30db,0x30b0-0x30bf mem 0xdc500400-0xdc5007ff irq 19 at device 31.2 on pci0 hw.pci0.31.2.INTA.irq="17" to force atapci1 to the same irq as em1. Didn't work. It's still using 19. Any hints? I myself only learnt about this relatively recently, so afraid not. Have you checked that it doesnt work if you use hw.pci5 I'm just guessing here, but it might be worth a shot. Cheers, Dominic Thanks, Patrick M. Hausen Leiter Netzwerke und Sicherheit ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: ATA problems again ... general problem of ICH7 or ATA?
Hi, Dominic! On Mon, Aug 21, 2006 at 02:40:17PM +0100, Dominic Marks wrote: > hw.pci3.13.INTA.irq="17" > > The `13' value is the device number, you can find this in dmesg, same > for pciN. So I tried this: em1: port 0x5000-0x501f mem 0xdc18-0xdc19,0xdc10-0xdc17 irq 17 at device 0.0 on pci5 atapci1: port 0x30e8-0x30ef,0x30dc-0x30df,0x30e0-0x30e7,0x30d8-0x30db,0x30b0-0x30bf mem 0xdc500400-0xdc5007ff irq 19 at device 31.2 on pci0 hw.pci0.31.2.INTA.irq="17" to force atapci1 to the same irq as em1. Didn't work. It's still using 19. Any hints? Thanks, Patrick M. Hausen Leiter Netzwerke und Sicherheit -- punkt.de GmbH Internet - Dienstleistungen - Beratung Vorholzstr. 25Tel. 0721 9109 -0 Fax: -100 76137 Karlsruhe http://punkt.de ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: ATA problems again ... general problem of ICH7 or ATA?
Patrick M. Hausen wrote: Hello! On Mon, Aug 21, 2006 at 02:14:16PM +0100, Matt Dawson wrote: FWIW, the problem takes *far* longer to rear its head when the SATA controller has a PCI INT and IRQ to itself. Put a NIC onto a shared slot (a very Bad Thing [TM] as the BIOS simply maps the INT to a single IRQ and both devices end up sharing it. Now tranfer a large file over the network and watch the ensuing hilarity) and it happens at least every couple of days. Now, with the slot shared with the SATA controller empty, I have six days uptime since the last event, which means I'm probably due one any time now. FWIW - here's the setup of my systems that have not shown the problem so far: Device IRQ -- --- em0 16 em1 17 uhci0 23 uhci1 19 uhci2 18 uhci3 16 ehci0 23 fxp016 atapci1 19 This is the SATA300 controller Is there a method to force the controller to share its IRQ with, say, em0 for testing? You can use device.hints(5) to do this. I have the following in mine to force a RAID card and Sound card to share IRQ 17. You need to modify it to suit your environment. hw.pci3.13.INTA.irq="17" The `13' value is the device number, you can find this in dmesg, same for pciN. HTH, Dominic Regards, Patrick M. Hausen Leiter Netzwerke und Sicherheit ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: ATA problems again ... general problem of ICH7 or ATA?
Patrick M. Hausen wrote: Hello! On Mon, Aug 21, 2006 at 02:14:16PM +0100, Matt Dawson wrote: FWIW, the problem takes *far* longer to rear its head when the SATA controller has a PCI INT and IRQ to itself. Put a NIC onto a shared slot (a very Bad Thing [TM] as the BIOS simply maps the INT to a single IRQ and both devices end up sharing it. Now tranfer a large file over the network and watch the ensuing hilarity) and it happens at least every couple of days. Now, with the slot shared with the SATA controller empty, I have six days uptime since the last event, which means I'm probably due one any time now. FWIW - here's the setup of my systems that have not shown the problem so far: Device IRQ -- --- em0 16 em1 17 uhci0 23 uhci1 19 uhci2 18 uhci3 16 ehci0 23 fxp016 atapci1 19 This is the SATA300 controller Is there a method to force the controller to share its IRQ with, say, em0 for testing? You can use device.hints(5) to do this. I have the following in mine to force a RAID card and Sound card to share IRQ 17. You need to modify it to suit your environment. hw.pci3.13.INTA.irq="17" The `13' value is the device number, you can find this in dmesg, same for pciN. HTH, Dominic Regards, Patrick M. Hausen Leiter Netzwerke und Sicherheit ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: ATA problems again ... general problem of ICH7 or ATA?
Hello! On Mon, Aug 21, 2006 at 02:14:16PM +0100, Matt Dawson wrote: > FWIW, the problem takes *far* longer to rear its head when the SATA > controller > has a PCI INT and IRQ to itself. Put a NIC onto a shared slot (a very Bad > Thing [TM] as the BIOS simply maps the INT to a single IRQ and both devices > end up sharing it. Now tranfer a large file over the network and watch the > ensuing hilarity) and it happens at least every couple of days. Now, with the > slot shared with the SATA controller empty, I have six days uptime since the > last event, which means I'm probably due one any time now. FWIW - here's the setup of my systems that have not shown the problem so far: Device IRQ -- --- em0 16 em1 17 uhci0 23 uhci1 19 uhci2 18 uhci3 16 ehci0 23 fxp016 atapci1 19 This is the SATA300 controller Is there a method to force the controller to share its IRQ with, say, em0 for testing? Regards, Patrick M. Hausen Leiter Netzwerke und Sicherheit -- punkt.de GmbH Internet - Dienstleistungen - Beratung Vorholzstr. 25Tel. 0721 9109 -0 Fax: -100 76137 Karlsruhe http://punkt.de ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: ATA problems again ... general problem of ICH7 or ATA?
On Monday 21 August 2006 22:44, Matt Dawson wrote: > > atacontrol detach ata3; atacontrol attach ata3 did. > > Yes, that is the method for a controlled remove and reattach, a la hotplug > SATA. AIUI, though, if the drive goes AWOL on its own you need to reinit > the channel before issuing an atacontrol attach foo. In theory... (man 8 > atacontrol) In practice, the drive disappears, never to be probed again. A > warm reboot without power down makes it appear again, so the drive itself > isn't confused. If you have a "proper" hot plug SATA controller you don't need to reinit anything. When I was testing a Promise 2300 the act of plugging the drive in caused a new disk to show up (which was nice :) This did not happen on the VIA 8237 controller (which, by the way, has a really really crappy RAID function, avoid at all costs). -- Daniel O'Connor software and network engineer for Genesis Software - http://www.gsoft.com.au "The nice thing about standards is that there are so many of them to choose from." -- Andrew Tanenbaum GPG Fingerprint - 5596 B766 97C0 0E94 4347 295E E593 DC20 7B3F CE8C pgpsowZQAoRVM.pgp Description: PGP signature
Re: ATA problems again ... general problem of ICH7 or ATA?
On Monday 21 August 2006 13:00, [EMAIL PROTECTED] wrote: > > I can confirm the same behaviour with a ULi M1689/Newcastle Athlon64 > > based system running 6.1-RELEASE-p3 (i386). ad6 just detaches without > > warning and it takes a reboot to bring it back. atacontrol reinit has no > > effect. Tried the following to resolve the problems: > > I don't know what is supposed to be the canonical way to > reattach a disconnected SATA drive, but while testing our > new hardware and hot-pulling a drive while the system > was running, atacaontrol reinit didn't find the reinserted drive > here, either. > > atacontrol detach ata3; atacontrol attach ata3 did. Yes, that is the method for a controlled remove and reattach, a la hotplug SATA. AIUI, though, if the drive goes AWOL on its own you need to reinit the channel before issuing an atacontrol attach foo. In theory... (man 8 atacontrol) In practice, the drive disappears, never to be probed again. A warm reboot without power down makes it appear again, so the drive itself isn't confused. FWIW, the problem takes *far* longer to rear its head when the SATA controller has a PCI INT and IRQ to itself. Put a NIC onto a shared slot (a very Bad Thing [TM] as the BIOS simply maps the INT to a single IRQ and both devices end up sharing it. Now tranfer a large file over the network and watch the ensuing hilarity) and it happens at least every couple of days. Now, with the slot shared with the SATA controller empty, I have six days uptime since the last event, which means I'm probably due one any time now. At least gmirror rebuilds the array after a simple reboot, but I would expect the dd operation to throw a wobbly if it's a timing issue/fight for interrupt between the two drives/channels. It doesn't, which makes me wonder if I'm barking up the wrong tree, but I can't help noticing that SATA channels have one interrupt between them whereas PATA channels have one each and all of these reports are from SATA users... I wonder what pciconf -lv shows on Miroslav's system? Is the SATA controller sharing an INT/IRQ with something else? Does moving that device to another slot alleviate the problem at all? Please not that Miroslav and I are using totally different drives, chipsets and processors. He's using, IIRC, an Intel chip with an ICH7 southbridge and Samsung drives. I'm using an AMD Athlon 64 Newcastle (running the i386 port) on a ULi M1689 chipset with WD RE2 drives so, although I'd be more than happy to be the numpty that is wrong and to have ata(4) vindicated by someone else, I suspect it is ata(4) that is the problem. However, finger pointing isn't productive and is certainly not fair given that ata(4) has been progressing so well. Anything else I can try to nail this irksome beast? Any suggestions for where I've been an idiot (easy, tiger!) and missed something obvious? BTW, this is a production server (DLT backed up nightly, so the data is safe) so I can't just pull it to bits. I do have an identical (CPU/mobo) box in the workshop as a workstation, however, which I could buy/borrow another drive for and set up gmirror to try things out. -- Matt Dawson. [EMAIL PROTECTED] MTD15-RIPE OpenNIC M_D9 MD51-6BONE ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: ATA problems again ... general problem of ICH7 or ATA?
Hi! On Sun, Aug 20, 2006 at 01:38:55PM +0100, Matt Dawson wrote: > I can confirm the same behaviour with a ULi M1689/Newcastle Athlon64 based > system running 6.1-RELEASE-p3 (i386). ad6 just detaches without warning and > it takes a reboot to bring it back. atacontrol reinit has no effect. Tried > the following to resolve the problems: I don't know what is supposed to be the canonical way to reattach a disconnected SATA drive, but while testing our new hardware and hot-pulling a drive while the system was running, atacaontrol reinit didn't find the reinserted drive here, either. atacontrol detach ata3; atacontrol attach ata3 did. HTH, Patrick -- punkt.de GmbH Internet - Dienstleistungen - Beratung Vorholzstr. 25Tel. 0721 9109 -0 Fax: -100 76137 Karlsruhe http://punkt.de ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: ATA problems again ... general problem of ICH7 or ATA?
On Mon, Aug 21, 2006 at 04:03:47AM +0200, Konstantin Saurbier wrote: > Am 20.08.2006 um 18:20 schrieb Greg Byshenk: > >What is different is that this was with a 3Ware RAID controller -- > >which made removing/raconfiguring/rebuilding much easier -- but I was > >seeing the exact same errors. > No your errors are not related. As of my experience (and the > experience of others) the controller forgetting or loosing drives is > a "feature" 3ware. > We had similar problems with 3ware-7500-8 ATA controllers and i was > reported of the same errors with 3ware-9000 series. Our in-house > 3ware-9500S are not showing this kind of errors. > This errors are not driver or OS dependent such as they appear on > FreeBSD as well on different Linux distros. > Since not all controllers suffering of these errors it is maybe > depending on the firmware or board/chip revisions. I hesitate to make too strong a statement on this matter, as I have not done any deep investigation, however... The explanation above does not appear consistent with my experience. I am now using (and have used over the past several years) a number of different 3Ware controllers (7000, 8000, and 9000 series) and have not previously seen this problem. Of course I have had drives fail -- and in one case one port of one controller simply stopped working -- but never this particular problem. Further, the very same controller that demonstrated problems (in the numerically identical server, performing the exact same jobs), had not demonstrated this problem (over a period of more than six months) until I installed the June 6.1 STABLE, after which the problem appeared consistently, until installing the July 6.1 STABLE, at which point the problem disappeared, and has not occurred since (despite my trying very hard to make it do so). It may well be that there is some bug in the 3Ware controllers, but my experience suggests that there is/was something else going on. At the very least, it suggests that there was something about the June 6.1 STABLE (but not the earlier or later versions) that was triggering a 3Ware bug -- as my problems occurred only when running the June 6.1 STABLE, and that was the _only_ difference between the cases of having problems and those of not having problems. -- greg byshenk - [EMAIL PROTECTED] - Leiden, NL ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: New Intel boards (was: Re: ATA problems again ... general problem of ICH7 or ATA?)
Hi, all! On Mon, Aug 21, 2006 at 04:22:06PM +0900, Pyun YongHyeon wrote: > Several users reported em(4) watchdog errors but I couldn't reproduce > it on my system. A blind patch posted to net ML and I'd like to hear > success/failure report. > > See http://lists.freebsd.org/pipermail/freebsd-net/2006-August/011352.html > Make sure to enable "debug.mpsafenet=1" during testing. Testing ... stay tuned. Regards, Patrick -- punkt.de GmbH Internet - Dienstleistungen - Beratung Vorholzstr. 25Tel. 0721 9109 -0 Fax: -100 76137 Karlsruhe http://punkt.de ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: New Intel boards (was: Re: ATA problems again ... general problem of ICH7 or ATA?)
Hi! Am 21.08.2006 um 09:10 schrieb Patrick M. Hausen: Hi, all! On Mon, Aug 21, 2006 at 04:03:47AM +0200, Konstantin Saurbier wrote: This errors are not driver or OS dependent such as they appear on FreeBSD as well on different Linux distros. Since not all controllers suffering of these errors it is maybe depending on the firmware or board/chip revisions. We have two brand new TYAN B5161G20SH4 systems that feature ICH7 controllers and SATA-hotplug-bays. One system is equipped with two Seagate ST3160811AS drives, the other one with WD1600YS-01SHB0 drives. Both are configured with gmirror for slice 1. No problems at all after several days of "make -j4 buildworld". OTOH I can confirm that I got random "watchdog timeouts" with the em driver. debug.mpsafenet=0 fixed the problem for now. Sorry my post was way too unspecific. My response was only for Greg Byshenk and his 3ware related problem. They tend to loose drives oder mark drives as broken which are not broken at all. So his problems with 3ware are not related to this thread of ATA/ICH bugs. -- Best regards, Konstantin Saurbier -- Konstantin SaurbierTel.: 0521 106 3861 Computerlabor MathematikU5-138 Universitaet Bielefeld Universitaetsstr.25 33501 Bielefeld email: [EMAIL PROTECTED] -- PGP.sig Description: Signierter Teil der Nachricht
Re: New Intel boards (was: Re: ATA problems again ... general problem of ICH7 or ATA?)
On Mon, Aug 21, 2006 at 09:10:53AM +0200, Patrick M. Hausen wrote: > Hi, all! > > On Mon, Aug 21, 2006 at 04:03:47AM +0200, Konstantin Saurbier wrote: > > > This errors are not driver or OS dependent such as they appear on > > FreeBSD as well on different Linux distros. > > Since not all controllers suffering of these errors it is maybe > > depending on the firmware or board/chip revisions. > > We have two brand new TYAN B5161G20SH4 systems that feature > ICH7 controllers and SATA-hotplug-bays. One system is equipped > with two Seagate ST3160811AS drives, the other one with > WD1600YS-01SHB0 drives. > Both are configured with gmirror for slice 1. > > No problems at all after several days of "make -j4 buildworld". > > OTOH I can confirm that I got random "watchdog timeouts" > with the em driver. debug.mpsafenet=0 fixed the problem > for now. > Several users reported em(4) watchdog errors but I couldn't reproduce it on my system. A blind patch posted to net ML and I'd like to hear success/failure report. See http://lists.freebsd.org/pipermail/freebsd-net/2006-August/011352.html Make sure to enable "debug.mpsafenet=1" during testing. -- Regards, Pyun YongHyeon ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
New Intel boards (was: Re: ATA problems again ... general problem of ICH7 or ATA?)
Hi, all! On Mon, Aug 21, 2006 at 04:03:47AM +0200, Konstantin Saurbier wrote: > This errors are not driver or OS dependent such as they appear on > FreeBSD as well on different Linux distros. > Since not all controllers suffering of these errors it is maybe > depending on the firmware or board/chip revisions. We have two brand new TYAN B5161G20SH4 systems that feature ICH7 controllers and SATA-hotplug-bays. One system is equipped with two Seagate ST3160811AS drives, the other one with WD1600YS-01SHB0 drives. Both are configured with gmirror for slice 1. No problems at all after several days of "make -j4 buildworld". OTOH I can confirm that I got random "watchdog timeouts" with the em driver. debug.mpsafenet=0 fixed the problem for now. HTH, Patrick -- punkt.de GmbH Internet - Dienstleistungen - Beratung Vorholzstr. 25Tel. 0721 9109 -0 Fax: -100 76137 Karlsruhe http://punkt.de ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: ATA problems again ... general problem of ICH7 or ATA?
Am 20.08.2006 um 18:20 schrieb Greg Byshenk: On Sun, Aug 20, 2006 at 01:38:55PM +0100, Matt Dawson wrote: On Sunday 20 August 2006 13:00, [EMAIL PROTECTED] wrote: Do you mean different type of cables, or just another piece? I can't change cables by myself, servers are dedicated from provider, but as I can saw, they picked whole new machine from their HW storage and put new Samsung disk drives in. So these two last machines are brand new with new cables. (Probably with a same type of cables - all machines are ASUS RS120) I can confirm the same behaviour with a ULi M1689/Newcastle Athlon64 based system running 6.1-RELEASE-p3 (i386). ad6 just detaches without warning and it takes a reboot to bring it back. atacontrol reinit has no effect. Tried the following to resolve the problems: - Changed cables (both ad4 and ad6) - Changed SATA power to legacy - Moved the NIC and anything else from the shared PCI INT (thought I'd cracked it at this point as it was stable for a month, then it lost ad6 on a nightly dump) - Remade my gmirror array as an ar. Put it straight back to gmirror again when I found out what a pain it is to rebuild after ad6 disappears. I am not sure if it is related, but... I experienced a similar sort of problem, although the details in my case are quite different. What was similar was that I would "lose" two ATA drives from an array, inexplicably. Reconfiguring the same drives and rebuilding would cause them to work perfectly again -- for some number of days, after which the same failure would occur. What is different is that this was with a 3Ware RAID controller -- which made removing/raconfiguring/rebuilding much easier -- but I was seeing the exact same errors. No your errors are not related. As of my experience (and the experience of others) the controller forgetting or loosing drives is a "feature" 3ware. We had similar problems with 3ware-7500-8 ATA controllers and i was reported of the same errors with 3ware-9000 series. Our in-house 3ware-9500S are not showing this kind of errors. This errors are not driver or OS dependent such as they appear on FreeBSD as well on different Linux distros. Since not all controllers suffering of these errors it is maybe depending on the firmware or board/chip revisions. -- Best regards, Konstantin Saurbier -- Konstantin SaurbierTel.: 0521 106 3861 Computerlabor MathematikU5-138 Universitaet Bielefeld Universitaetsstr.25 33501 Bielefeld email: [EMAIL PROTECTED] -- PGP.sig Description: Signierter Teil der Nachricht
Re: ATA problems again ... general problem of ICH7 or ATA?
Miroslav Lachman wrote: I upgraded to RELENG_6, changed all HW (whole servers and changed Seagate HHDs to Samsung so every piece of HW is different from time of my first post), but after one week I got the same error and system reboot today: Aug 19 15:11:20 track ntpd[456]: kernel time sync enabled 2001 Aug 19 15:15:47 track kernel: ad6: FAILURE - device detached Aug 19 15:15:47 track kernel: subdisk6: detached Aug 19 15:15:47 track kernel: ad6: detached Aug 19 15:15:47 track kernel: GEOM_MIRROR: Device gm0: provider ad6 disconnected. Aug 19 15:15:47 track kernel: g_vfs_done():mirror/gm0s2d[READ(offset=1169260544, leng th=131072)]error = 6 Aug 19 15:22:34 track syslogd: kernel boot file is /boot/kernel/kernel From my point of view - this is not related to 1 piece of HW, but general problem of ICH7 chipset or (s)ATA driver in FreeBSD 6.x. As other poster has different chipsets (ICH6 and nVidia), it seems more FreeBSD ATA driver related. (7 different machines was tried) Just a "me too", i have the same problems with ICH7 and disks mysteriously disconnecting. Aug 14 16:54:47 mx1 kernel: ad4: FAILURE - device detached Aug 14 16:54:47 mx1 kernel: ad4: detaGEOM_MIRROR:ched Device gm: provider ad4 disconnected. I think there definitely is a problem with the chipset/driver. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: ATA problems again ... general problem of ICH7 or ATA?
Greg Byshenk wrote: On Sun, Aug 20, 2006 at 07:51:29PM +0200, Miroslav Lachman wrote: Greg Byshenk wrote: [...] This happened four times (with the same errors that have been discussed here), running 6.1 STABLE as of June 22. Before attempting to RMA the drives, I tried an updated kernel, 6.1 STABLE as of July 19. Strangely enough, the problems disappeared. So, while I have not checked everything that has changed, it _might_ be worth trying 6.1 STABLE... I have problems with 6.1-RELEASE same as with 6.1-STABLE from August 2. I can try newer STABLE, but as I see on cvsweb, there are not much changes in ATA driver sources, only new chipsets added. It is only an idea, based on something that worked for me. And, as I said, my situation is not exactly the same as the others. It is strange to me, that I can see significant changes of read/write speed. (I am running nonstop tests with writing disk full of files, delete them, and start again + generating graphs) Speed vary from 2.5MB/s to 11MB/s by jumps. Not continuous from the lowest to the highest. Writing is for example 3MB/s for 20 hours, then jump to 10MB/s and after some time (6 - 20 hours) jump down to about 3MB/s. After some days of testing, disk disappear, system reboots itself, resynchronize gmirror and work for next few days till the next disk lose. Also earlier synchronization was done after 1:30 hour (at about 30MB/s), now synchronization run at lower speeds - from 2.5MB/s to 15MB/s, so the whole synchronization is done after more then 5 hours (the longest was 20 hours to synchronize 250GB HDDs) I don't know what more can I test, what more could be done to solve these problems. :( You are using gmirror, which I am not, so the situations are not analogous, since my situation was with h/w RAID. And I have no direct experience with gmirror (I use gvinum on a couple of secondary systems, but those are SCSI based). Does the output of 'systat -vm' tell you anything of interest? That is, are the disks running at or close to 100%, are the CPUs fully loaded, or anything else...? There is nothing interesting in systat / gstat / top or anything else. System is almost idle, just running test script for disk writing. Speed problems is not dependent on gmirror. I deactivated gmirror on second machine and run test on normaly mounted filesystems with same low speeds ;( This is systat from gmirrored system running test: 4 usersLoad 0.01 0.02 0.00 Aug 20 21:06 Mem:KBREALVIRTUAL VN PAGER SWAP PAGER Tot Share TotShareFree in out in out Act 1241449580 89761627168 43016 count All 1016728 75888364464876 210508 pages Interrupts Proc:r p d s wCsw Trp Sys Int Sof Fltcow4144 total 7 75 9794 155 288 27 161484 wire 1: atkb 191292 act 14: ata 0.4%Sys 0.0%Intr 0.0%User 0.0%Nice 99.6%Idl 624272 inact11 16: bge |||||||||| 41360 cache 133 19: ata 1656 free 2000 cpu0: time daefr 2000 cpu1: time Namei Name-cacheDir-cache prcfr Calls hits% hits% react 66 100 pdwake zfod 1354 pdpgs Disks ad4 ad6 ozfod intrn KB/t125 126 %slo-z 113888 buf tps 34331407 tfree17 dirtybuf MB/s 4.13 4.10 69977 desiredvnodes % busy 5448 20661 numvnodes 17286 freevnodes ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: ATA problems again ... general problem of ICH7 or ATA?
Hello! On Sat, 19 Aug 2006, Miroslav Lachman wrote: Aug 19 15:11:20 track ntpd[456]: kernel time sync enabled 2001 Aug 19 15:15:47 track kernel: ad6: FAILURE - device detached Aug 19 15:15:47 track kernel: subdisk6: detached Aug 19 15:15:47 track kernel: ad6: detached I think that's a shame to have such a "error recovery" in one of the basic drivers. ATA driver gives absolutely no clue about the reason of the failure and just disconnects device. I'm curious why the driver behaves in this way. If SATA code is just raw, it definitely must be corrected to implement proper error recovery. If SATA specification is written so purely that proper error recovery is just impossible (I really doubt if it's so), then SATA hardware should be simply avoided in mission-critical applications. Sincerely, Dmitry -- Atlantis ISP, System Administrator e-mail: [EMAIL PROTECTED] nic-hdl: LYNX-RIPE ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: ATA problems again ... general problem of ICH7 or ATA?
On Sun, Aug 20, 2006 at 07:51:29PM +0200, Miroslav Lachman wrote: > Greg Byshenk wrote: [...] > >This happened four times (with the same errors that have been discussed > >here), running 6.1 STABLE as of June 22. Before attempting to RMA the > >drives, I tried an updated kernel, 6.1 STABLE as of July 19. Strangely > >enough, the problems disappeared. > >So, while I have not checked everything that has changed, it _might_ be > >worth trying 6.1 STABLE... > I have problems with 6.1-RELEASE same as with 6.1-STABLE from August 2. > I can try newer STABLE, but as I see on cvsweb, there are not much > changes in ATA driver sources, only new chipsets added. It is only an idea, based on something that worked for me. And, as I said, my situation is not exactly the same as the others. > It is strange to me, that I can see significant changes of read/write > speed. (I am running nonstop tests with writing disk full of files, > delete them, and start again + generating graphs) Speed vary from > 2.5MB/s to 11MB/s by jumps. Not continuous from the lowest to the > highest. Writing is for example 3MB/s for 20 hours, then jump to 10MB/s > and after some time (6 - 20 hours) jump down to about 3MB/s. > After some days of testing, disk disappear, system reboots itself, > resynchronize gmirror and work for next few days till the next disk lose. > Also earlier synchronization was done after 1:30 hour (at about 30MB/s), > now synchronization run at lower speeds - from 2.5MB/s to 15MB/s, so the > whole synchronization is done after more then 5 hours (the longest was > 20 hours to synchronize 250GB HDDs) > I don't know what more can I test, what more could be done to solve > these problems. :( You are using gmirror, which I am not, so the situations are not analogous, since my situation was with h/w RAID. And I have no direct experience with gmirror (I use gvinum on a couple of secondary systems, but those are SCSI based). Does the output of 'systat -vm' tell you anything of interest? That is, are the disks running at or close to 100%, are the CPUs fully loaded, or anything else...? -- greg byshenk - [EMAIL PROTECTED] - Leiden, NL ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: ATA problems again ... general problem of ICH7 or ATA?
Greg Byshenk wrote: [...] I am not sure if it is related, but... I experienced a similar sort of problem, although the details in my case are quite different. What was similar was that I would "lose" two ATA drives from an array, inexplicably. Reconfiguring the same drives and rebuilding would cause them to work perfectly again -- for some number of days, after which the same failure would occur. What is different is that this was with a 3Ware RAID controller -- which made removing/raconfiguring/rebuilding much easier -- but I was seeing the exact same errors. This happened four times (with the same errors that have been discussed here), running 6.1 STABLE as of June 22. Before attempting to RMA the drives, I tried an updated kernel, 6.1 STABLE as of July 19. Strangely enough, the problems disappeared. So, while I have not checked everything that has changed, it _might_ be worth trying 6.1 STABLE... I have problems with 6.1-RELEASE same as with 6.1-STABLE from August 2. I can try newer STABLE, but as I see on cvsweb, there are not much changes in ATA driver sources, only new chipsets added. It is strange to me, that I can see significant changes of read/write speed. (I am running nonstop tests with writing disk full of files, delete them, and start again + generating graphs) Speed vary from 2.5MB/s to 11MB/s by jumps. Not continuous from the lowest to the highest. Writing is for example 3MB/s for 20 hours, then jump to 10MB/s and after some time (6 - 20 hours) jump down to about 3MB/s. After some days of testing, disk disappear, system reboots itself, resynchronize gmirror and work for next few days till the next disk lose. Also earlier synchronization was done after 1:30 hour (at about 30MB/s), now synchronization run at lower speeds - from 2.5MB/s to 15MB/s, so the whole synchronization is done after more then 5 hours (the longest was 20 hours to synchronize 250GB HDDs) I don't know what more can I test, what more could be done to solve these problems. :( Any help will be appreciated Miroslav Lachman ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: ATA problems again ... general problem of ICH7 or ATA?
On Sun, Aug 20, 2006 at 01:38:55PM +0100, Matt Dawson wrote: > On Sunday 20 August 2006 13:00, [EMAIL PROTECTED] wrote: > > Do you mean different type of cables, or just another piece? I can't > > change cables by myself, servers are dedicated from provider, but as I > > can saw, they picked whole new machine from their HW storage and put new > > Samsung disk drives in. So these two last machines are brand new with > > new cables. (Probably with a same type of cables - all machines are ASUS > > RS120) > I can confirm the same behaviour with a ULi M1689/Newcastle Athlon64 based > system running 6.1-RELEASE-p3 (i386). ad6 just detaches without warning and > it takes a reboot to bring it back. atacontrol reinit has no effect. Tried > the following to resolve the problems: > - Changed cables (both ad4 and ad6) > - Changed SATA power to legacy > - Moved the NIC and anything else from the shared PCI INT (thought I'd > cracked > it at this point as it was stable for a month, then it lost ad6 on a nightly > dump) > - Remade my gmirror array as an ar. Put it straight back to gmirror again > when > I found out what a pain it is to rebuild after ad6 disappears. I am not sure if it is related, but... I experienced a similar sort of problem, although the details in my case are quite different. What was similar was that I would "lose" two ATA drives from an array, inexplicably. Reconfiguring the same drives and rebuilding would cause them to work perfectly again -- for some number of days, after which the same failure would occur. What is different is that this was with a 3Ware RAID controller -- which made removing/raconfiguring/rebuilding much easier -- but I was seeing the exact same errors. This happened four times (with the same errors that have been discussed here), running 6.1 STABLE as of June 22. Before attempting to RMA the drives, I tried an updated kernel, 6.1 STABLE as of July 19. Strangely enough, the problems disappeared. So, while I have not checked everything that has changed, it _might_ be worth trying 6.1 STABLE... -- greg byshenk - [EMAIL PROTECTED] - Leiden, NL ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: ATA problems again ... general problem of ICH7 or ATA?
On Sunday 20 August 2006 13:00, [EMAIL PROTECTED] wrote: > Do you mean different type of cables, or just another piece? I can't > change cables by myself, servers are dedicated from provider, but as I > can saw, they picked whole new machine from their HW storage and put new > Samsung disk drives in. So these two last machines are brand new with > new cables. (Probably with a same type of cables - all machines are ASUS > RS120) I can confirm the same behaviour with a ULi M1689/Newcastle Athlon64 based system running 6.1-RELEASE-p3 (i386). ad6 just detaches without warning and it takes a reboot to bring it back. atacontrol reinit has no effect. Tried the following to resolve the problems: - Changed cables (both ad4 and ad6) - Changed SATA power to legacy - Moved the NIC and anything else from the shared PCI INT (thought I'd cracked it at this point as it was stable for a month, then it lost ad6 on a nightly dump) - Remade my gmirror array as an ar. Put it straight back to gmirror again when I found out what a pain it is to rebuild after ad6 disappears. Until I read this thread, I was convinced there was something flaky in my hardware/BIOS or WD's TLER. Now I'm not so sure. Hardware: $ pciconf -lv [EMAIL PROTECTED]:0:0: class=0x06 card=0x50001458 chip=0x168910b9 rev=0x00 hdr=0x00 vendor = 'Acer Labs Incorporated (ALi)' class= bridge subclass = HOST-PCI [EMAIL PROTECTED]:1:0: class=0x060400 card=0x chip=0x524610b9 rev=0x00 hdr=0x01 vendor = 'Acer Labs Incorporated (ALi)' class= bridge subclass = PCI-PCI [EMAIL PROTECTED]:2:0: class=0x060401 card=0x chip=0x524910b9 rev=0x00 hdr=0x01 vendor = 'Acer Labs Incorporated (ALi)' device = 'M5249 HyperTransport to PCI Bridge' class= bridge subclass = PCI-PCI [EMAIL PROTECTED]:3:0: class=0x060100 card=0x50011458 chip=0x156310b9 rev=0x70 hdr=0x00 vendor = 'Acer Labs Incorporated (ALi)' device = 'ALI M1563 South Bridge with Hypertransport Support' class= bridge subclass = PCI-ISA [EMAIL PROTECTED]:3:1: class=0x068000 card=0x50031458 chip=0x710110b9 rev=0x00 hdr=0x00 vendor = 'Acer Labs Incorporated (ALi)' device = 'ALI M7101 Power Management Controller' class= bridge [EMAIL PROTECTED]:14:0: class=0x0101fa card=0x50021458 chip=0x522910b9 rev=0xc7 hdr=0x00 vendor = 'Acer Labs Incorporated (ALi)' device = 'M1543 Southbridge EIDE Controller' class= mass storage subclass = ATA [EMAIL PROTECTED]:14:1: class=0x01018f card=0xb0031458 chip=0x528910b9 rev=0x10 hdr=0x00 vendor = 'Acer Labs Incorporated (ALi)' class= mass storage subclass = ATA [EMAIL PROTECTED]:15:0:class=0x0c0310 card=0x50041458 chip=0x523710b9 rev=0x03 hdr=0x00 vendor = 'Acer Labs Incorporated (ALi)' device = 'M5237 OpenHCI 1.1 USB Controller' class= serial bus subclass = USB [EMAIL PROTECTED]:15:1:class=0x0c0310 card=0x50041458 chip=0x523710b9 rev=0x03 hdr=0x00 vendor = 'Acer Labs Incorporated (ALi)' device = 'M5237 OpenHCI 1.1 USB Controller' class= serial bus subclass = USB [EMAIL PROTECTED]:15:2:class=0x0c0310 card=0x50041458 chip=0x523710b9 rev=0x03 hdr=0x00 vendor = 'Acer Labs Incorporated (ALi)' device = 'M5237 OpenHCI 1.1 USB Controller' class= serial bus subclass = USB [EMAIL PROTECTED]:15:3:class=0x0c0320 card=0x50041458 chip=0x523910b9 rev=0x01 hdr=0x00 vendor = 'Acer Labs Incorporated (ALi)' device = 'USB 2.0 Enhanced Host Controller' class= serial bus subclass = USB [EMAIL PROTECTED]:24:0: class=0x06 card=0x chip=0x11001022 rev=0x00 hdr=0x00 vendor = 'Advanced Micro Devices (AMD)' device = 'Athlon 64 / Opteron HyperTransport Technology Configuration' class= bridge subclass = HOST-PCI [EMAIL PROTECTED]:24:1: class=0x06 card=0x chip=0x11011022 rev=0x00 hdr=0x00 vendor = 'Advanced Micro Devices (AMD)' device = 'Athlon 64 / Opteron Address Map' class= bridge subclass = HOST-PCI [EMAIL PROTECTED]:24:2: class=0x06 card=0x chip=0x11021022 rev=0x00 hdr=0x00 vendor = 'Advanced Micro Devices (AMD)' device = 'Athlon 64 / Opteron DRAM Controller' class= bridge subclass = HOST-PCI [EMAIL PROTECTED]:24:3: class=0x06 card=0x chip=0x11031022 rev=0x00 hdr=0x00 vendor = 'Advanced Micro Devices (AMD)' device = 'Athlon 64 / Opteron Miscellaneous Control' class= bridge subclass = HOST-PCI [EMAIL PROTECTED]:0:0: class=0x03 card=0x02071787 chip=0x51571002 rev=0x00 hdr=0x00 vendor = 'ATI Technologies Inc' device = 'Radeon 7500 Series (RV200)' class= display subclass = VGA [EMAIL PROTECTED]:5:0: class=0x01 card=0x chip=0x81789004 rev=0x00 hdr=0x00 vendor = 'Adaptec Inc' device = 'AH
Re: ATA problems again ... general problem of ICH7 or ATA?
Igor Robul wrote: On Sat, Aug 19, 2006 at 04:39:55PM +0200, Miroslav Lachman wrote: I upgraded to RELENG_6, changed all HW (whole servers and changed Seagate HHDs to Samsung so every piece of HW is different from time of my first post), but after one week I got the same error and system Just a try - have you changed cables too? Do you mean different type of cables, or just another piece? I can't change cables by myself, servers are dedicated from provider, but as I can saw, they picked whole new machine from their HW storage and put new Samsung disk drives in. So these two last machines are brand new with new cables. (Probably with a same type of cables - all machines are ASUS RS120) Miroslav Lachman ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: ATA problems again ... general problem of ICH7 or ATA?
On Sat, Aug 19, 2006 at 04:39:55PM +0200, Miroslav Lachman wrote: > I upgraded to RELENG_6, changed all HW (whole servers and changed > Seagate HHDs to Samsung so every piece of HW is different from time of > my first post), but after one week I got the same error and system Just a try - have you changed cables too? ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"
Re: ATA problems again ... general problem of ICH7 or ATA?
Johan Ström wrote: [...] Usually when the box has been rebooted before the failed component has been rebuilt automaticly.. Solved with: $ gmirror forget $ gmirror insert gm0s1 ad4s1 And now its rebuilding ad4 again... Any new hints? Should i try RELENG_6 instead? I upgraded to RELENG_6, changed all HW (whole servers and changed Seagate HHDs to Samsung so every piece of HW is different from time of my first post), but after one week I got the same error and system reboot today: Aug 19 15:11:20 track ntpd[456]: kernel time sync enabled 2001 Aug 19 15:15:47 track kernel: ad6: FAILURE - device detached Aug 19 15:15:47 track kernel: subdisk6: detached Aug 19 15:15:47 track kernel: ad6: detached Aug 19 15:15:47 track kernel: GEOM_MIRROR: Device gm0: provider ad6 disconnected. Aug 19 15:15:47 track kernel: g_vfs_done():mirror/gm0s2d[READ(offset=1169260544, leng th=131072)]error = 6 Aug 19 15:22:34 track syslogd: kernel boot file is /boot/kernel/kernel From my point of view - this is not related to 1 piece of HW, but general problem of ICH7 chipset or (s)ATA driver in FreeBSD 6.x. As other poster has different chipsets (ICH6 and nVidia), it seems more FreeBSD ATA driver related. (7 different machines was tried) Now after reboot, writing and reading from ad6 is really slow (no other processes utilizing disks, no fsck runnig etc.) [EMAIL PROTECTED] ~/# dd if=/dev/zero of=/dev/ad6 bs=1m count=100 100+0 records in 100+0 records out 104857600 bytes transferred in 43.673244 secs (2400957 bytes/sec) [EMAIL PROTECTED] ~/# dd if=/dev/ad6 of=/dev/null bs=1m count=100 100+0 records in 100+0 records out 104857600 bytes transferred in 10.979482 secs (9550323 bytes/sec) Is there anyone who can help with finding the source of problem? It is really annoying that one can not use SATA / ICH7 under high load in FreeBSD 6.1 (tested on RELEASE and STABLE) (I am not so HW / FreeBSD experienced to locate the problem by myself) Miroslav Lachman ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to "[EMAIL PROTECTED]"