Re: ATA problems again ... general problem of ICH7 or ATA?

2006-08-21 Thread Daniel O'Connor
On Monday 21 August 2006 23:08, Dominic Marks wrote:
> You can use device.hints(5) to do this.
>
> I have the following in mine to force a RAID card and Sound card to
> share IRQ 17.
> You need to modify it to suit your environment.
>
> hw.pci3.13.INTA.irq="17"
>
> The `13' value is the device number, you can find this in dmesg, same
> for pciN.

Any chance this could be documented somewhere?

(Or tell me where if it is.. :) I checked pci(4) and device.hints(5))

-- 
Daniel O'Connor software and network engineer
for Genesis Software - http://www.gsoft.com.au
"The nice thing about standards is that there
are so many of them to choose from."
  -- Andrew Tanenbaum
GPG Fingerprint - 5596 B766 97C0 0E94 4347 295E E593 DC20 7B3F CE8C


pgpsZqBLzK0cE.pgp
Description: PGP signature


Re: ATA problems again ... general problem of ICH7 or ATA?

2006-08-21 Thread Miroslav Lachman

Matt Dawson wrote:


On Monday 21 August 2006 13:00, [EMAIL PROTECTED] wrote:


I can confirm the same behaviour with a ULi M1689/Newcastle Athlon64
based system running 6.1-RELEASE-p3 (i386). ad6 just detaches without
warning and it takes a reboot to bring it back. atacontrol reinit has no
effect. Tried the following to resolve the problems:


I don't know what is supposed to be the canonical way to
reattach a disconnected SATA drive, but while testing our
new hardware and hot-pulling a drive while the system
was running, atacaontrol reinit didn't find the reinserted drive
here, either.

atacontrol detach ata3; atacontrol attach ata3 did.



Yes, that is the method for a controlled remove and reattach, a la hotplug 
SATA. AIUI, though, if the drive goes AWOL on its own you need to reinit the 
channel before issuing an atacontrol attach foo. In theory... (man 8 
atacontrol) In practice, the drive disappears, never to be probed again. A 
warm reboot without power down makes it appear again, so the drive itself 
isn't confused.


This is same in my case.

FWIW, the problem takes *far* longer to rear its head when the SATA controller 
has a PCI INT and IRQ to itself. Put a NIC onto a shared slot (a very Bad 
Thing [TM] as the BIOS simply maps the INT to a single IRQ and both devices 
end up sharing it. Now tranfer a large file over the network and watch the 
ensuing hilarity) and it happens at least every couple of days. Now, with the 
slot shared with the SATA controller empty, I have six days uptime since the 
last event, which means I'm probably due one any time now. 


I thought so, but it did not solve my problem. I had UHCI sharing same 
IRQ with SATA (both on irq 19). Instead of playing with device.hints(5), 
I disabled all unused peripheries in BIOS (USB ports, LPT port, FDD...) 
After few days, system reports next disk lose.


At least gmirror rebuilds the array after a simple reboot, but I would expect 
the dd operation to throw a wobbly if it's a timing issue/fight for interrupt 
between the two drives/channels. It doesn't, which makes me wonder if I'm 
barking up the wrong tree, but I can't help noticing that SATA channels have 
one interrupt between them whereas PATA channels have one each and all of 
these reports are from SATA users...


Maybe you are right, I don't saw any report with one disk machine. All 
problems comes from machines with 2 or more SATA disks.


I wonder what pciconf -lv shows on Miroslav's system? Is the SATA controller 
sharing an INT/IRQ with something else? Does moving that device to another 
slot alleviate the problem at all?


SATA is no longer sharing IRQs, but problem persists.

system dmesg after verbose boot
http://www.quip.cz/1/freebsd/asus_rs120-e3/track_dmesg_verbose_2006-08-21.txt
pciconf -lv
http://www.quip.cz/1/freebsd/asus_rs120-e3/track_pciconf_2006-08-21.txt

Mentioned problem appeared only on heavy disk load (e.g. ports tree 
copy). I have 3rd system with minimal disk load running for 10 days 
without problem (FreeBSD 6.0, now in production for mentioned 10 days - 
machine is "quick replacement" of failed server, system mirrored from 
old disks to new by dump & restore)


Please not that Miroslav and I are using totally different drives, chipsets 
and processors. He's using, IIRC, an Intel chip with an ICH7 southbridge and 
Samsung drives. I'm using an AMD Athlon 64 Newcastle (running the i386 port) 
on a ULi M1689 chipset with WD RE2 drives so, although I'd be more than happy 
to be the numpty that is wrong and to have ata(4) vindicated by someone else, 
I suspect it is ata(4) that is the problem. However, finger pointing isn't 
productive and is certainly not fair given that ata(4) has been progressing 
so well. Anything else I can try to nail this irksome beast? Any suggestions 
for where I've been an idiot (easy, tiger!) and missed something obvious?


BTW, this is a production server (DLT backed up nightly, so the data is safe) 
so I can't just pull it to bits. I do have an identical (CPU/mobo) box in the 
workshop as a workstation, however, which I could buy/borrow another drive 
for and set up gmirror to try things out.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: ATA problems again ... general problem of ICH7 or ATA?

2006-08-21 Thread Dominic Marks

Patrick M. Hausen wrote:

Hi, Dominic!

On Mon, Aug 21, 2006 at 02:40:17PM +0100, Dominic Marks wrote:


hw.pci3.13.INTA.irq="17"

The `13' value is the device number, you can find this in dmesg, same 
for pciN.


So I tried this:

em1:  port 0x5000-0x501f 
mem 0xdc18-0xdc19,0xdc10-0xdc17 irq 17 at device 0.0 on pci5
atapci1:  port 
0x30e8-0x30ef,0x30dc-0x30df,0x30e0-0x30e7,0x30d8-0x30db,0x30b0-0x30bf mem 
0xdc500400-0xdc5007ff irq 19 at device 31.2 on pci0

hw.pci0.31.2.INTA.irq="17"

to force atapci1 to the same irq as em1. Didn't work. It's
still using 19. Any hints?


I myself only learnt about this relatively recently, so afraid not.
Have you checked that it doesnt work if you use hw.pci5 I'm just
guessing here, but it might be worth a shot.

Cheers,
Dominic


Thanks,

Patrick M. Hausen
Leiter Netzwerke und Sicherheit


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: ATA problems again ... general problem of ICH7 or ATA?

2006-08-21 Thread Patrick M. Hausen
Hi, Dominic!

On Mon, Aug 21, 2006 at 02:40:17PM +0100, Dominic Marks wrote:

> hw.pci3.13.INTA.irq="17"
> 
> The `13' value is the device number, you can find this in dmesg, same 
> for pciN.

So I tried this:

em1:  port 0x5000-0x501f 
mem 0xdc18-0xdc19,0xdc10-0xdc17 irq 17 at device 0.0 on pci5
atapci1:  port 
0x30e8-0x30ef,0x30dc-0x30df,0x30e0-0x30e7,0x30d8-0x30db,0x30b0-0x30bf mem 
0xdc500400-0xdc5007ff irq 19 at device 31.2 on pci0

hw.pci0.31.2.INTA.irq="17"

to force atapci1 to the same irq as em1. Didn't work. It's
still using 19. Any hints?

Thanks,

Patrick M. Hausen
Leiter Netzwerke und Sicherheit
-- 
punkt.de GmbH Internet - Dienstleistungen - Beratung
Vorholzstr. 25Tel. 0721 9109 -0 Fax: -100
76137 Karlsruhe   http://punkt.de
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: ATA problems again ... general problem of ICH7 or ATA?

2006-08-21 Thread Dominic Marks

Patrick M. Hausen wrote:

Hello!

On Mon, Aug 21, 2006 at 02:14:16PM +0100, Matt Dawson wrote:

FWIW, the problem takes *far* longer to rear its head when the SATA controller 
has a PCI INT and IRQ to itself. Put a NIC onto a shared slot (a very Bad 
Thing [TM] as the BIOS simply maps the INT to a single IRQ and both devices 
end up sharing it. Now tranfer a large file over the network and watch the 
ensuing hilarity) and it happens at least every couple of days. Now, with the 
slot shared with the SATA controller empty, I have six days uptime since the 
last event, which means I'm probably due one any time now. 


FWIW - here's the setup of my systems that have not shown the
problem so far:

Device  IRQ
--  ---

em0 16
em1 17
uhci0   23
uhci1   19
uhci2   18
uhci3   16
ehci0   23
fxp016
atapci1 19  This is the SATA300 controller

Is there a method to force the controller to share its IRQ with,
say, em0 for testing?


You can use device.hints(5) to do this.

I have the following in mine to force a RAID card and Sound card to 
share IRQ 17.

You need to modify it to suit your environment.

hw.pci3.13.INTA.irq="17"

The `13' value is the device number, you can find this in dmesg, same 
for pciN.


HTH,
Dominic


Regards,

Patrick M. Hausen
Leiter Netzwerke und Sicherheit


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: ATA problems again ... general problem of ICH7 or ATA?

2006-08-21 Thread Dominic Marks

Patrick M. Hausen wrote:

Hello!

On Mon, Aug 21, 2006 at 02:14:16PM +0100, Matt Dawson wrote:

FWIW, the problem takes *far* longer to rear its head when the SATA controller 
has a PCI INT and IRQ to itself. Put a NIC onto a shared slot (a very Bad 
Thing [TM] as the BIOS simply maps the INT to a single IRQ and both devices 
end up sharing it. Now tranfer a large file over the network and watch the 
ensuing hilarity) and it happens at least every couple of days. Now, with the 
slot shared with the SATA controller empty, I have six days uptime since the 
last event, which means I'm probably due one any time now. 


FWIW - here's the setup of my systems that have not shown the
problem so far:

Device  IRQ
--  ---

em0 16
em1 17
uhci0   23
uhci1   19
uhci2   18
uhci3   16
ehci0   23
fxp016
atapci1 19  This is the SATA300 controller

Is there a method to force the controller to share its IRQ with,
say, em0 for testing?


You can use device.hints(5) to do this.

I have the following in mine to force a RAID card and Sound card to 
share IRQ 17.

You need to modify it to suit your environment.

hw.pci3.13.INTA.irq="17"

The `13' value is the device number, you can find this in dmesg, same 
for pciN.


HTH,
Dominic


Regards,

Patrick M. Hausen
Leiter Netzwerke und Sicherheit


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: ATA problems again ... general problem of ICH7 or ATA?

2006-08-21 Thread Patrick M. Hausen
Hello!

On Mon, Aug 21, 2006 at 02:14:16PM +0100, Matt Dawson wrote:

> FWIW, the problem takes *far* longer to rear its head when the SATA 
> controller 
> has a PCI INT and IRQ to itself. Put a NIC onto a shared slot (a very Bad 
> Thing [TM] as the BIOS simply maps the INT to a single IRQ and both devices 
> end up sharing it. Now tranfer a large file over the network and watch the 
> ensuing hilarity) and it happens at least every couple of days. Now, with the 
> slot shared with the SATA controller empty, I have six days uptime since the 
> last event, which means I'm probably due one any time now. 

FWIW - here's the setup of my systems that have not shown the
problem so far:

Device  IRQ
--  ---

em0 16
em1 17
uhci0   23
uhci1   19
uhci2   18
uhci3   16
ehci0   23
fxp016
atapci1 19  This is the SATA300 controller

Is there a method to force the controller to share its IRQ with,
say, em0 for testing?

Regards,

Patrick M. Hausen
Leiter Netzwerke und Sicherheit
-- 
punkt.de GmbH Internet - Dienstleistungen - Beratung
Vorholzstr. 25Tel. 0721 9109 -0 Fax: -100
76137 Karlsruhe   http://punkt.de
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: ATA problems again ... general problem of ICH7 or ATA?

2006-08-21 Thread Daniel O'Connor
On Monday 21 August 2006 22:44, Matt Dawson wrote:
> > atacontrol detach ata3; atacontrol attach ata3 did.
>
> Yes, that is the method for a controlled remove and reattach, a la hotplug
> SATA. AIUI, though, if the drive goes AWOL on its own you need to reinit
> the channel before issuing an atacontrol attach foo. In theory... (man 8
> atacontrol) In practice, the drive disappears, never to be probed again. A
> warm reboot without power down makes it appear again, so the drive itself
> isn't confused.

If you have a "proper" hot plug SATA controller you don't need to reinit 
anything.

When I was testing a Promise 2300 the act of plugging the drive in caused a 
new disk to show up (which was nice :)

This did not happen on the VIA 8237 controller (which, by the way, has a 
really really crappy RAID function, avoid at all costs).

-- 
Daniel O'Connor software and network engineer
for Genesis Software - http://www.gsoft.com.au
"The nice thing about standards is that there
are so many of them to choose from."
  -- Andrew Tanenbaum
GPG Fingerprint - 5596 B766 97C0 0E94 4347 295E E593 DC20 7B3F CE8C


pgpsowZQAoRVM.pgp
Description: PGP signature


Re: ATA problems again ... general problem of ICH7 or ATA?

2006-08-21 Thread Matt Dawson
On Monday 21 August 2006 13:00, [EMAIL PROTECTED] wrote:
> > I can confirm the same behaviour with a ULi M1689/Newcastle Athlon64
> > based system running 6.1-RELEASE-p3 (i386). ad6 just detaches without
> > warning and it takes a reboot to bring it back. atacontrol reinit has no
> > effect. Tried the following to resolve the problems:
>
> I don't know what is supposed to be the canonical way to
> reattach a disconnected SATA drive, but while testing our
> new hardware and hot-pulling a drive while the system
> was running, atacaontrol reinit didn't find the reinserted drive
> here, either.
>
> atacontrol detach ata3; atacontrol attach ata3 did.

Yes, that is the method for a controlled remove and reattach, a la hotplug 
SATA. AIUI, though, if the drive goes AWOL on its own you need to reinit the 
channel before issuing an atacontrol attach foo. In theory... (man 8 
atacontrol) In practice, the drive disappears, never to be probed again. A 
warm reboot without power down makes it appear again, so the drive itself 
isn't confused.

FWIW, the problem takes *far* longer to rear its head when the SATA controller 
has a PCI INT and IRQ to itself. Put a NIC onto a shared slot (a very Bad 
Thing [TM] as the BIOS simply maps the INT to a single IRQ and both devices 
end up sharing it. Now tranfer a large file over the network and watch the 
ensuing hilarity) and it happens at least every couple of days. Now, with the 
slot shared with the SATA controller empty, I have six days uptime since the 
last event, which means I'm probably due one any time now. 

At least gmirror rebuilds the array after a simple reboot, but I would expect 
the dd operation to throw a wobbly if it's a timing issue/fight for interrupt 
between the two drives/channels. It doesn't, which makes me wonder if I'm 
barking up the wrong tree, but I can't help noticing that SATA channels have 
one interrupt between them whereas PATA channels have one each and all of 
these reports are from SATA users...

I wonder what pciconf -lv shows on Miroslav's system? Is the SATA controller 
sharing an INT/IRQ with something else? Does moving that device to another 
slot alleviate the problem at all?

Please not that Miroslav and I are using totally different drives, chipsets 
and processors. He's using, IIRC, an Intel chip with an ICH7 southbridge and 
Samsung drives. I'm using an AMD Athlon 64 Newcastle (running the i386 port) 
on a ULi M1689 chipset with WD RE2 drives so, although I'd be more than happy 
to be the numpty that is wrong and to have ata(4) vindicated by someone else, 
I suspect it is ata(4) that is the problem. However, finger pointing isn't 
productive and is certainly not fair given that ata(4) has been progressing 
so well. Anything else I can try to nail this irksome beast? Any suggestions 
for where I've been an idiot (easy, tiger!) and missed something obvious?

BTW, this is a production server (DLT backed up nightly, so the data is safe) 
so I can't just pull it to bits. I do have an identical (CPU/mobo) box in the 
workshop as a workstation, however, which I could buy/borrow another drive 
for and set up gmirror to try things out.
-- 
Matt Dawson.

[EMAIL PROTECTED]
MTD15-RIPE OpenNIC M_D9
MD51-6BONE
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: ATA problems again ... general problem of ICH7 or ATA?

2006-08-21 Thread Patrick M. Hausen
Hi!

On Sun, Aug 20, 2006 at 01:38:55PM +0100, Matt Dawson wrote:

> I can confirm the same behaviour with a ULi M1689/Newcastle Athlon64 based 
> system running 6.1-RELEASE-p3 (i386). ad6 just detaches without warning and 
> it takes a reboot to bring it back. atacontrol reinit has no effect. Tried 
> the following to resolve the problems:

I don't know what is supposed to be the canonical way to
reattach a disconnected SATA drive, but while testing our
new hardware and hot-pulling a drive while the system
was running, atacaontrol reinit didn't find the reinserted drive
here, either.

atacontrol detach ata3; atacontrol attach ata3 did.

HTH,
Patrick
-- 
punkt.de GmbH Internet - Dienstleistungen - Beratung
Vorholzstr. 25Tel. 0721 9109 -0 Fax: -100
76137 Karlsruhe   http://punkt.de
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: ATA problems again ... general problem of ICH7 or ATA?

2006-08-21 Thread Greg Byshenk
On Mon, Aug 21, 2006 at 04:03:47AM +0200, Konstantin Saurbier wrote:
> Am 20.08.2006 um 18:20 schrieb Greg Byshenk:

> >What is different is that this was with a 3Ware RAID controller --
> >which made removing/raconfiguring/rebuilding much easier -- but I was
> >seeing the exact same errors.
 
> No your errors are not related. As of my experience (and the  
> experience of others) the controller forgetting or loosing drives is  
> a "feature" 3ware.
> We had similar problems with 3ware-7500-8 ATA controllers and i was  
> reported of the same errors with 3ware-9000 series. Our in-house  
> 3ware-9500S are not showing this kind of errors.
 
> This errors are not driver or OS dependent such as they appear on  
> FreeBSD as well on different Linux distros.
> Since not all controllers suffering of these errors it is maybe  
> depending on the firmware or board/chip revisions.

I hesitate to make too strong a statement on this matter, as I have
not done any deep investigation, however...

The explanation above does not appear consistent with my experience.
I am now using (and have used over the past several years) a number
of different 3Ware controllers (7000, 8000, and 9000 series) and have
not previously seen this problem.  Of course I have had drives fail
-- and in one case one port of one controller simply stopped working
-- but never this particular problem.

Further, the very same controller that demonstrated problems (in the
numerically identical server, performing the exact same jobs), had
not demonstrated this problem (over a period of more than six months)
until I installed the June 6.1 STABLE, after which the problem appeared
consistently, until installing the July 6.1 STABLE, at which point the
problem disappeared, and has not occurred since (despite my trying very
hard to make it do so).

It may well be that there is some bug in the 3Ware controllers, but 
my experience suggests that there is/was something else going on.  At
the very least, it suggests that there was something about the June
6.1 STABLE (but not the earlier or later versions) that was triggering
a 3Ware bug -- as my problems occurred only when running the June
6.1 STABLE, and that was the _only_ difference between the cases of
having problems and those of not having problems.


-- 
greg byshenk  -  [EMAIL PROTECTED]  -  Leiden, NL
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: New Intel boards (was: Re: ATA problems again ... general problem of ICH7 or ATA?)

2006-08-21 Thread Patrick M. Hausen
Hi, all!

On Mon, Aug 21, 2006 at 04:22:06PM +0900, Pyun YongHyeon wrote:

> Several users reported em(4) watchdog errors but I couldn't reproduce
> it on my system. A blind patch posted to net ML and I'd like to hear
> success/failure report.
> 
> See http://lists.freebsd.org/pipermail/freebsd-net/2006-August/011352.html
> Make sure to enable "debug.mpsafenet=1" during testing.

Testing ... stay tuned.

Regards,
Patrick
-- 
punkt.de GmbH Internet - Dienstleistungen - Beratung
Vorholzstr. 25Tel. 0721 9109 -0 Fax: -100
76137 Karlsruhe   http://punkt.de
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: New Intel boards (was: Re: ATA problems again ... general problem of ICH7 or ATA?)

2006-08-21 Thread Konstantin Saurbier

Hi!

Am 21.08.2006 um 09:10 schrieb Patrick M. Hausen:


Hi, all!

On Mon, Aug 21, 2006 at 04:03:47AM +0200, Konstantin Saurbier wrote:


This errors are not driver or OS dependent such as they appear on
FreeBSD as well on different Linux distros.
Since not all controllers suffering of these errors it is maybe
depending on the firmware or board/chip revisions.


We have two brand new TYAN B5161G20SH4 systems that feature
ICH7 controllers and SATA-hotplug-bays. One system is equipped
with two Seagate ST3160811AS drives, the other one with
WD1600YS-01SHB0 drives.
Both are configured with gmirror for slice 1.

No problems at all after several days of "make -j4 buildworld".

OTOH I can confirm that I got random "watchdog timeouts"
with the em driver. debug.mpsafenet=0 fixed the problem
for now.


Sorry my post was way too unspecific.
My response was only for Greg Byshenk and his 3ware related problem.  
They tend to loose drives oder mark drives as broken which are not  
broken at all.


So his problems with 3ware are not related to this thread of ATA/ICH  
bugs.


--

Best regards,

Konstantin Saurbier

--
Konstantin SaurbierTel.: 0521 106 3861
Computerlabor MathematikU5-138
Universitaet Bielefeld Universitaetsstr.25
33501 Bielefeld
email:  [EMAIL PROTECTED]
--





PGP.sig
Description: Signierter Teil der Nachricht


Re: New Intel boards (was: Re: ATA problems again ... general problem of ICH7 or ATA?)

2006-08-21 Thread Pyun YongHyeon
On Mon, Aug 21, 2006 at 09:10:53AM +0200, Patrick M. Hausen wrote:
 > Hi, all!
 > 
 > On Mon, Aug 21, 2006 at 04:03:47AM +0200, Konstantin Saurbier wrote:
 > 
 > > This errors are not driver or OS dependent such as they appear on  
 > > FreeBSD as well on different Linux distros.
 > > Since not all controllers suffering of these errors it is maybe  
 > > depending on the firmware or board/chip revisions.
 > 
 > We have two brand new TYAN B5161G20SH4 systems that feature
 > ICH7 controllers and SATA-hotplug-bays. One system is equipped
 > with two Seagate ST3160811AS drives, the other one with
 > WD1600YS-01SHB0 drives.
 > Both are configured with gmirror for slice 1.
 > 
 > No problems at all after several days of "make -j4 buildworld".
 > 
 > OTOH I can confirm that I got random "watchdog timeouts"
 > with the em driver. debug.mpsafenet=0 fixed the problem
 > for now.
 > 

Several users reported em(4) watchdog errors but I couldn't reproduce
it on my system. A blind patch posted to net ML and I'd like to hear
success/failure report.

See http://lists.freebsd.org/pipermail/freebsd-net/2006-August/011352.html
Make sure to enable "debug.mpsafenet=1" during testing.

-- 
Regards,
Pyun YongHyeon
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


New Intel boards (was: Re: ATA problems again ... general problem of ICH7 or ATA?)

2006-08-21 Thread Patrick M. Hausen
Hi, all!

On Mon, Aug 21, 2006 at 04:03:47AM +0200, Konstantin Saurbier wrote:

> This errors are not driver or OS dependent such as they appear on  
> FreeBSD as well on different Linux distros.
> Since not all controllers suffering of these errors it is maybe  
> depending on the firmware or board/chip revisions.

We have two brand new TYAN B5161G20SH4 systems that feature
ICH7 controllers and SATA-hotplug-bays. One system is equipped
with two Seagate ST3160811AS drives, the other one with
WD1600YS-01SHB0 drives.
Both are configured with gmirror for slice 1.

No problems at all after several days of "make -j4 buildworld".

OTOH I can confirm that I got random "watchdog timeouts"
with the em driver. debug.mpsafenet=0 fixed the problem
for now.

HTH,
Patrick
-- 
punkt.de GmbH Internet - Dienstleistungen - Beratung
Vorholzstr. 25Tel. 0721 9109 -0 Fax: -100
76137 Karlsruhe   http://punkt.de
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: ATA problems again ... general problem of ICH7 or ATA?

2006-08-20 Thread Konstantin Saurbier


Am 20.08.2006 um 18:20 schrieb Greg Byshenk:


On Sun, Aug 20, 2006 at 01:38:55PM +0100, Matt Dawson wrote:
On Sunday 20 August 2006 13:00, [EMAIL PROTECTED]  
wrote:



Do you mean different type of cables, or just another piece? I can't
change cables by myself, servers are dedicated from provider, but  
as I
can saw, they picked whole new machine from their HW storage and  
put new
Samsung disk drives in. So these two last machines are brand new  
with
new cables. (Probably with a same type of cables - all machines  
are ASUS

RS120)


I can confirm the same behaviour with a ULi M1689/Newcastle  
Athlon64 based
system running 6.1-RELEASE-p3 (i386). ad6 just detaches without  
warning and
it takes a reboot to bring it back. atacontrol reinit has no  
effect. Tried

the following to resolve the problems:



- Changed cables (both ad4 and ad6)
- Changed SATA power to legacy
- Moved the NIC and anything else from the shared PCI INT (thought  
I'd cracked
it at this point as it was stable for a month, then it lost ad6 on  
a nightly

dump)
- Remade my gmirror array as an ar. Put it straight back to  
gmirror again when

I found out what a pain it is to rebuild after ad6 disappears.


I am not sure if it is related, but...  I experienced a similar  
sort of

problem, although the details in my case are quite different.

What was similar was that I would "lose" two ATA drives from an array,
inexplicably.  Reconfiguring the same drives and rebuilding would  
cause

them to work perfectly again -- for some number of days, after which
the same failure would occur.

What is different is that this was with a 3Ware RAID controller --
which made removing/raconfiguring/rebuilding much easier -- but I was
seeing the exact same errors.


No your errors are not related. As of my experience (and the  
experience of others) the controller forgetting or loosing drives is  
a "feature" 3ware.
We had similar problems with 3ware-7500-8 ATA controllers and i was  
reported of the same errors with 3ware-9000 series. Our in-house  
3ware-9500S are not showing this kind of errors.


This errors are not driver or OS dependent such as they appear on  
FreeBSD as well on different Linux distros.
Since not all controllers suffering of these errors it is maybe  
depending on the firmware or board/chip revisions.


--

Best regards,

Konstantin Saurbier

--
Konstantin SaurbierTel.: 0521 106 3861
Computerlabor MathematikU5-138
Universitaet Bielefeld Universitaetsstr.25
33501 Bielefeld
email:  [EMAIL PROTECTED]
--





PGP.sig
Description: Signierter Teil der Nachricht


Re: ATA problems again ... general problem of ICH7 or ATA?

2006-08-20 Thread Mike Jakubik

Miroslav Lachman wrote:
I upgraded to RELENG_6, changed all HW (whole servers and changed 
Seagate HHDs to Samsung so every piece of HW is different from time of 
my first post), but after one week I got the same error and system 
reboot today:

Aug 19 15:11:20 track ntpd[456]: kernel time sync enabled 2001
Aug 19 15:15:47 track kernel: ad6: FAILURE - device detached
Aug 19 15:15:47 track kernel: subdisk6: detached
Aug 19 15:15:47 track kernel: ad6: detached
Aug 19 15:15:47 track kernel: GEOM_MIRROR: Device gm0: provider ad6 
disconnected.
Aug 19 15:15:47 track kernel: 
g_vfs_done():mirror/gm0s2d[READ(offset=1169260544, leng

th=131072)]error = 6
Aug 19 15:22:34 track syslogd: kernel boot file is /boot/kernel/kernel

From my point of view - this is not related to 1 piece of HW, but 
general problem of ICH7 chipset or (s)ATA driver in FreeBSD 6.x. As 
other poster has different chipsets (ICH6 and nVidia), it seems more 
FreeBSD ATA driver related. (7 different machines was tried)




Just a "me too", i have the same problems with ICH7 and disks 
mysteriously disconnecting.


Aug 14 16:54:47 mx1 kernel: ad4: FAILURE - device detached
Aug 14 16:54:47 mx1 kernel: ad4: detaGEOM_MIRROR:ched Device gm: 
provider ad4 disconnected.


I think there definitely is a problem with the chipset/driver.


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: ATA problems again ... general problem of ICH7 or ATA?

2006-08-20 Thread Miroslav Lachman

Greg Byshenk wrote:


On Sun, Aug 20, 2006 at 07:51:29PM +0200, Miroslav Lachman wrote:


Greg Byshenk wrote:


[...]


This happened four times (with the same errors that have been discussed
here), running 6.1 STABLE as of June 22.  Before attempting to RMA the
drives, I tried an updated kernel, 6.1 STABLE as of July 19.  Strangely
enough, the problems disappeared.




So, while I have not checked everything that has changed, it _might_ be
worth trying 6.1 STABLE...


 
I have problems with 6.1-RELEASE same as with 6.1-STABLE from August 2. 
I can try newer STABLE, but as I see on cvsweb, there are not much 
changes in ATA driver sources, only new chipsets added.



It is only an idea, based on something that worked for me.  And, as I
said, my situation is not exactly the same as the others.

It is strange to me, that I can see significant changes of read/write 
speed. (I am running nonstop tests with writing disk full of files, 
delete them, and start again + generating graphs) Speed vary from 
2.5MB/s to 11MB/s by jumps. Not continuous from the lowest to the 
highest. Writing is for example 3MB/s for 20 hours, then jump to 10MB/s 
and after some time (6 - 20 hours) jump down to about 3MB/s.
After some days of testing, disk disappear, system reboots itself, 
resynchronize gmirror and work for next few days till the next disk lose.
Also earlier synchronization was done after 1:30 hour (at about 30MB/s), 
now synchronization run at lower speeds - from 2.5MB/s to 15MB/s, so the 
whole synchronization is done after more then 5 hours (the longest was 
20 hours to synchronize 250GB HDDs)



I don't know what more can I test, what more could be done to solve 
these problems. :(



You are using gmirror, which I am not, so the situations are not
analogous, since my situation was with h/w RAID.  And I have no direct
experience with gmirror (I use gvinum on a couple of secondary systems,
but those are SCSI based).

Does the output of 'systat -vm' tell you anything of interest?  That is,
are the disks running at or close to 100%, are the CPUs fully loaded, or
anything else...?


There is nothing interesting in systat / gstat / top or anything else.
System is almost idle, just running test script for disk writing. Speed 
problems is not dependent on gmirror. I deactivated gmirror on second 
machine and run test on normaly mounted filesystems with same low speeds ;(


This is systat from gmirrored system running test:

4 usersLoad  0.01  0.02  0.00  Aug 20 21:06

Mem:KBREALVIRTUAL VN PAGER  SWAP PAGER
Tot   Share  TotShareFree in  out in  out
Act  1241449580   89761627168   43016 count
All 1016728   75888364464876   210508 pages
 Interrupts
Proc:r  p  d  s  wCsw  Trp  Sys  Int  Sof  Fltcow4144 total
   7 75   9794  155  288   27  161484 wire 
1: atkb
   191292 act 
14: ata
 0.4%Sys   0.0%Intr  0.0%User  0.0%Nice 99.6%Idl   624272 inact11 
16: bge
||||||||||  41360 cache   133 
19: ata
 1656 free   2000 
cpu0: time
  daefr  2000 
cpu1: time

Namei Name-cacheDir-cache prcfr
Calls hits% hits% react
66  100   pdwake
  zfod   1354 pdpgs
Disks   ad4   ad6 ozfod   intrn
KB/t125   126 %slo-z   113888 buf
tps  34331407 tfree17 dirtybuf
MB/s   4.13  4.10   69977 desiredvnodes
% busy   5448   20661 numvnodes
17286 freevnodes

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: ATA problems again ... general problem of ICH7 or ATA?

2006-08-20 Thread Dmitry Pryanishnikov


Hello!

On Sat, 19 Aug 2006, Miroslav Lachman wrote:

Aug 19 15:11:20 track ntpd[456]: kernel time sync enabled 2001
Aug 19 15:15:47 track kernel: ad6: FAILURE - device detached
Aug 19 15:15:47 track kernel: subdisk6: detached
Aug 19 15:15:47 track kernel: ad6: detached


  I think that's a shame to have such a "error recovery" in one of the basic
drivers. ATA driver gives absolutely no clue about the reason of the failure
and just disconnects device. I'm curious why the driver behaves in this way.
If SATA code is just raw, it definitely must be corrected to implement proper
error recovery. If SATA specification is written so purely that proper error
recovery is just impossible (I really doubt if it's so), then SATA hardware
should be simply avoided in mission-critical applications.


Sincerely, Dmitry
--
Atlantis ISP, System Administrator
e-mail:  [EMAIL PROTECTED]
nic-hdl: LYNX-RIPE
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: ATA problems again ... general problem of ICH7 or ATA?

2006-08-20 Thread Greg Byshenk
On Sun, Aug 20, 2006 at 07:51:29PM +0200, Miroslav Lachman wrote:
> Greg Byshenk wrote:

[...]

> >This happened four times (with the same errors that have been discussed
> >here), running 6.1 STABLE as of June 22.  Before attempting to RMA the
> >drives, I tried an updated kernel, 6.1 STABLE as of July 19.  Strangely
> >enough, the problems disappeared.

> >So, while I have not checked everything that has changed, it _might_ be
> >worth trying 6.1 STABLE...
 
> I have problems with 6.1-RELEASE same as with 6.1-STABLE from August 2. 
> I can try newer STABLE, but as I see on cvsweb, there are not much 
> changes in ATA driver sources, only new chipsets added.

It is only an idea, based on something that worked for me.  And, as I
said, my situation is not exactly the same as the others.
 
> It is strange to me, that I can see significant changes of read/write 
> speed. (I am running nonstop tests with writing disk full of files, 
> delete them, and start again + generating graphs) Speed vary from 
> 2.5MB/s to 11MB/s by jumps. Not continuous from the lowest to the 
> highest. Writing is for example 3MB/s for 20 hours, then jump to 10MB/s 
> and after some time (6 - 20 hours) jump down to about 3MB/s.
> After some days of testing, disk disappear, system reboots itself, 
> resynchronize gmirror and work for next few days till the next disk lose.
> Also earlier synchronization was done after 1:30 hour (at about 30MB/s), 
> now synchronization run at lower speeds - from 2.5MB/s to 15MB/s, so the 
> whole synchronization is done after more then 5 hours (the longest was 
> 20 hours to synchronize 250GB HDDs)

> I don't know what more can I test, what more could be done to solve 
> these problems. :(

You are using gmirror, which I am not, so the situations are not
analogous, since my situation was with h/w RAID.  And I have no direct
experience with gmirror (I use gvinum on a couple of secondary systems,
but those are SCSI based).

Does the output of 'systat -vm' tell you anything of interest?  That is,
are the disks running at or close to 100%, are the CPUs fully loaded, or
anything else...?
 

-- 
greg byshenk  -  [EMAIL PROTECTED]  -  Leiden, NL
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: ATA problems again ... general problem of ICH7 or ATA?

2006-08-20 Thread Miroslav Lachman

Greg Byshenk wrote:
[...]

I am not sure if it is related, but...  I experienced a similar sort of
problem, although the details in my case are quite different.

What was similar was that I would "lose" two ATA drives from an array,
inexplicably.  Reconfiguring the same drives and rebuilding would cause
them to work perfectly again -- for some number of days, after which 
the same failure would occur.


What is different is that this was with a 3Ware RAID controller -- 
which made removing/raconfiguring/rebuilding much easier -- but I was

seeing the exact same errors.

This happened four times (with the same errors that have been discussed
here), running 6.1 STABLE as of June 22.  Before attempting to RMA the
drives, I tried an updated kernel, 6.1 STABLE as of July 19.  Strangely
enough, the problems disappeared.

So, while I have not checked everything that has changed, it _might_ be
worth trying 6.1 STABLE...


I have problems with 6.1-RELEASE same as with 6.1-STABLE from August 2. 
I can try newer STABLE, but as I see on cvsweb, there are not much 
changes in ATA driver sources, only new chipsets added.


It is strange to me, that I can see significant changes of read/write 
speed. (I am running nonstop tests with writing disk full of files, 
delete them, and start again + generating graphs) Speed vary from 
2.5MB/s to 11MB/s by jumps. Not continuous from the lowest to the 
highest. Writing is for example 3MB/s for 20 hours, then jump to 10MB/s 
and after some time (6 - 20 hours) jump down to about 3MB/s.
After some days of testing, disk disappear, system reboots itself, 
resynchronize gmirror and work for next few days till the next disk lose.
Also earlier synchronization was done after 1:30 hour (at about 30MB/s), 
now synchronization run at lower speeds - from 2.5MB/s to 15MB/s, so the 
whole synchronization is done after more then 5 hours (the longest was 
20 hours to synchronize 250GB HDDs)


I don't know what more can I test, what more could be done to solve 
these problems. :(


Any help will be appreciated

Miroslav Lachman
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: ATA problems again ... general problem of ICH7 or ATA?

2006-08-20 Thread Greg Byshenk
On Sun, Aug 20, 2006 at 01:38:55PM +0100, Matt Dawson wrote:
> On Sunday 20 August 2006 13:00, [EMAIL PROTECTED] wrote:

> > Do you mean different type of cables, or just another piece? I can't
> > change cables by myself, servers are dedicated from provider, but as I
> > can saw, they picked whole new machine from their HW storage and put new
> > Samsung disk drives in. So these two last machines are brand new with
> > new cables. (Probably with a same type of cables - all machines are ASUS
> > RS120)
 
> I can confirm the same behaviour with a ULi M1689/Newcastle Athlon64 based 
> system running 6.1-RELEASE-p3 (i386). ad6 just detaches without warning and 
> it takes a reboot to bring it back. atacontrol reinit has no effect. Tried 
> the following to resolve the problems:
 
> - Changed cables (both ad4 and ad6)
> - Changed SATA power to legacy
> - Moved the NIC and anything else from the shared PCI INT (thought I'd 
> cracked 
> it at this point as it was stable for a month, then it lost ad6 on a nightly 
> dump)
> - Remade my gmirror array as an ar. Put it straight back to gmirror again 
> when 
> I found out what a pain it is to rebuild after ad6 disappears.

I am not sure if it is related, but...  I experienced a similar sort of
problem, although the details in my case are quite different.

What was similar was that I would "lose" two ATA drives from an array,
inexplicably.  Reconfiguring the same drives and rebuilding would cause
them to work perfectly again -- for some number of days, after which 
the same failure would occur.

What is different is that this was with a 3Ware RAID controller -- 
which made removing/raconfiguring/rebuilding much easier -- but I was
seeing the exact same errors.

This happened four times (with the same errors that have been discussed
here), running 6.1 STABLE as of June 22.  Before attempting to RMA the
drives, I tried an updated kernel, 6.1 STABLE as of July 19.  Strangely
enough, the problems disappeared.

So, while I have not checked everything that has changed, it _might_ be
worth trying 6.1 STABLE...
 

-- 
greg byshenk  -  [EMAIL PROTECTED]  -  Leiden, NL
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: ATA problems again ... general problem of ICH7 or ATA?

2006-08-20 Thread Matt Dawson
On Sunday 20 August 2006 13:00, [EMAIL PROTECTED] wrote:
> Do you mean different type of cables, or just another piece? I can't
> change cables by myself, servers are dedicated from provider, but as I
> can saw, they picked whole new machine from their HW storage and put new
> Samsung disk drives in. So these two last machines are brand new with
> new cables. (Probably with a same type of cables - all machines are ASUS
> RS120)

I can confirm the same behaviour with a ULi M1689/Newcastle Athlon64 based 
system running 6.1-RELEASE-p3 (i386). ad6 just detaches without warning and 
it takes a reboot to bring it back. atacontrol reinit has no effect. Tried 
the following to resolve the problems:

- Changed cables (both ad4 and ad6)
- Changed SATA power to legacy
- Moved the NIC and anything else from the shared PCI INT (thought I'd cracked 
it at this point as it was stable for a month, then it lost ad6 on a nightly 
dump)
- Remade my gmirror array as an ar. Put it straight back to gmirror again when 
I found out what a pain it is to rebuild after ad6 disappears.

Until I read this thread, I was convinced there was something flaky in my 
hardware/BIOS or WD's TLER. Now I'm not so sure.

Hardware:
$ pciconf -lv
[EMAIL PROTECTED]:0:0:  class=0x06 card=0x50001458 chip=0x168910b9 rev=0x00 
hdr=0x00
vendor   = 'Acer Labs Incorporated (ALi)'
class= bridge
subclass = HOST-PCI
[EMAIL PROTECTED]:1:0: class=0x060400 card=0x chip=0x524610b9 rev=0x00 
hdr=0x01
vendor   = 'Acer Labs Incorporated (ALi)'
class= bridge
subclass = PCI-PCI
[EMAIL PROTECTED]:2:0: class=0x060401 card=0x chip=0x524910b9 rev=0x00 
hdr=0x01
vendor   = 'Acer Labs Incorporated (ALi)'
device   = 'M5249 HyperTransport to PCI Bridge'
class= bridge
subclass = PCI-PCI
[EMAIL PROTECTED]:3:0: class=0x060100 card=0x50011458 chip=0x156310b9 rev=0x70 
hdr=0x00
vendor   = 'Acer Labs Incorporated (ALi)'
device   = 'ALI M1563 South Bridge with Hypertransport Support'
class= bridge
subclass = PCI-ISA
[EMAIL PROTECTED]:3:1: class=0x068000 card=0x50031458 chip=0x710110b9 rev=0x00 
hdr=0x00
vendor   = 'Acer Labs Incorporated (ALi)'
device   = 'ALI M7101 Power Management Controller'
class= bridge
[EMAIL PROTECTED]:14:0:  class=0x0101fa card=0x50021458 chip=0x522910b9 
rev=0xc7 hdr=0x00
vendor   = 'Acer Labs Incorporated (ALi)'
device   = 'M1543 Southbridge EIDE Controller'
class= mass storage
subclass = ATA
[EMAIL PROTECTED]:14:1:  class=0x01018f card=0xb0031458 chip=0x528910b9 
rev=0x10 hdr=0x00
vendor   = 'Acer Labs Incorporated (ALi)'
class= mass storage
subclass = ATA
[EMAIL PROTECTED]:15:0:class=0x0c0310 card=0x50041458 chip=0x523710b9 
rev=0x03 hdr=0x00
vendor   = 'Acer Labs Incorporated (ALi)'
device   = 'M5237 OpenHCI 1.1 USB Controller'
class= serial bus
subclass = USB
[EMAIL PROTECTED]:15:1:class=0x0c0310 card=0x50041458 chip=0x523710b9 
rev=0x03 hdr=0x00
vendor   = 'Acer Labs Incorporated (ALi)'
device   = 'M5237 OpenHCI 1.1 USB Controller'
class= serial bus
subclass = USB
[EMAIL PROTECTED]:15:2:class=0x0c0310 card=0x50041458 chip=0x523710b9 
rev=0x03 hdr=0x00
vendor   = 'Acer Labs Incorporated (ALi)'
device   = 'M5237 OpenHCI 1.1 USB Controller'
class= serial bus
subclass = USB
[EMAIL PROTECTED]:15:3:class=0x0c0320 card=0x50041458 chip=0x523910b9 
rev=0x01 hdr=0x00
vendor   = 'Acer Labs Incorporated (ALi)'
device   = 'USB 2.0 Enhanced Host Controller'
class= serial bus
subclass = USB
[EMAIL PROTECTED]:24:0:   class=0x06 card=0x chip=0x11001022 
rev=0x00 hdr=0x00
vendor   = 'Advanced Micro Devices (AMD)'
device   = 'Athlon 64 / Opteron HyperTransport Technology Configuration'
class= bridge
subclass = HOST-PCI
[EMAIL PROTECTED]:24:1:   class=0x06 card=0x chip=0x11011022 
rev=0x00 hdr=0x00
vendor   = 'Advanced Micro Devices (AMD)'
device   = 'Athlon 64 / Opteron Address Map'
class= bridge
subclass = HOST-PCI
[EMAIL PROTECTED]:24:2:   class=0x06 card=0x chip=0x11021022 
rev=0x00 hdr=0x00
vendor   = 'Advanced Micro Devices (AMD)'
device   = 'Athlon 64 / Opteron DRAM Controller'
class= bridge
subclass = HOST-PCI
[EMAIL PROTECTED]:24:3:   class=0x06 card=0x chip=0x11031022 
rev=0x00 hdr=0x00
vendor   = 'Advanced Micro Devices (AMD)'
device   = 'Athlon 64 / Opteron Miscellaneous Control'
class= bridge
subclass = HOST-PCI
[EMAIL PROTECTED]:0:0: class=0x03 card=0x02071787 chip=0x51571002 rev=0x00 
hdr=0x00
vendor   = 'ATI Technologies Inc'
device   = 'Radeon 7500 Series (RV200)'
class= display
subclass = VGA
[EMAIL PROTECTED]:5:0:  class=0x01 card=0x chip=0x81789004 rev=0x00 
hdr=0x00
vendor   = 'Adaptec Inc'
device   = 'AH

Re: ATA problems again ... general problem of ICH7 or ATA?

2006-08-20 Thread Miroslav Lachman

Igor Robul wrote:

On Sat, Aug 19, 2006 at 04:39:55PM +0200, Miroslav Lachman wrote:

I upgraded to RELENG_6, changed all HW (whole servers and changed 
Seagate HHDs to Samsung so every piece of HW is different from time of 
my first post), but after one week I got the same error and system 


Just a try - have you changed cables too?


Do you mean different type of cables, or just another piece? I can't 
change cables by myself, servers are dedicated from provider, but as I 
can saw, they picked whole new machine from their HW storage and put new 
Samsung disk drives in. So these two last machines are brand new with 
new cables. (Probably with a same type of cables - all machines are ASUS 
RS120)


Miroslav Lachman
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: ATA problems again ... general problem of ICH7 or ATA?

2006-08-19 Thread Igor Robul
On Sat, Aug 19, 2006 at 04:39:55PM +0200, Miroslav Lachman wrote:
> I upgraded to RELENG_6, changed all HW (whole servers and changed 
> Seagate HHDs to Samsung so every piece of HW is different from time of 
> my first post), but after one week I got the same error and system 
Just a try - have you changed cables too?
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: ATA problems again ... general problem of ICH7 or ATA?

2006-08-19 Thread Miroslav Lachman

Johan Ström wrote:
[...]
Usually when the box has been rebooted before the failed component  has 
been rebuilt automaticly.. Solved with:


$ gmirror forget
$ gmirror insert gm0s1 ad4s1

And now its rebuilding ad4 again...

Any new hints? Should i try RELENG_6 instead?


I upgraded to RELENG_6, changed all HW (whole servers and changed 
Seagate HHDs to Samsung so every piece of HW is different from time of 
my first post), but after one week I got the same error and system 
reboot today:

Aug 19 15:11:20 track ntpd[456]: kernel time sync enabled 2001
Aug 19 15:15:47 track kernel: ad6: FAILURE - device detached
Aug 19 15:15:47 track kernel: subdisk6: detached
Aug 19 15:15:47 track kernel: ad6: detached
Aug 19 15:15:47 track kernel: GEOM_MIRROR: Device gm0: provider ad6 
disconnected.
Aug 19 15:15:47 track kernel: 
g_vfs_done():mirror/gm0s2d[READ(offset=1169260544, leng

th=131072)]error = 6
Aug 19 15:22:34 track syslogd: kernel boot file is /boot/kernel/kernel

From my point of view - this is not related to 1 piece of HW, but 
general problem of ICH7 chipset or (s)ATA driver in FreeBSD 6.x. As 
other poster has different chipsets (ICH6 and nVidia), it seems more 
FreeBSD ATA driver related. (7 different machines was tried)


Now after reboot, writing and reading from ad6 is really slow (no other 
processes utilizing disks, no fsck runnig etc.)


[EMAIL PROTECTED] ~/# dd if=/dev/zero of=/dev/ad6 bs=1m count=100
100+0 records in
100+0 records out
104857600 bytes transferred in 43.673244 secs (2400957 bytes/sec)

[EMAIL PROTECTED] ~/# dd if=/dev/ad6 of=/dev/null bs=1m count=100
100+0 records in
100+0 records out
104857600 bytes transferred in 10.979482 secs (9550323 bytes/sec)

Is there anyone who can help with finding the source of problem? It is 
really annoying that one can not use SATA / ICH7 under high load in 
FreeBSD 6.1 (tested on RELEASE and STABLE) (I am not so HW / FreeBSD 
experienced to locate the problem by myself)


Miroslav Lachman
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"