Re: ata "fallback to PIO mode" on dual processor AMD systems

2004-09-24 Thread Andrew MacIntyre
On Thu, 2 Jan 2003, Bruce Campbell wrote:

> I've manually set:
>
>   atacontrol mode 0 UDMA33 UDMA33
>
> and the problem has not recurred.

That sort of hints that there's some issue with the cabling, as UDMA33 is
the highest you can go on a 40wire IDE cable.  Going beyond requires an
80wire cable (& no longer than 450mm/18" as I recall).

-
Andrew I MacIntyre "These thoughts are mine alone..."
E-mail: [EMAIL PROTECTED]  (pref) | Snail: PO Box 370
[EMAIL PROTECTED] (alt) |Belconnen ACT 2616
Web:http://www.andymac.org/   |Australia
___
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "[EMAIL PROTECTED]"


Re: ata "fallback to PIO mode" on dual processor AMD systems

2003-01-10 Thread Francesco Casadei
On Sun, Jan 05, 2003 at 03:02:46PM +0100, Francesco Casadei wrote:
> 
[snip]
> Yesterday I checked the drive ad6 with the Drive Fitness Test program from IBM.
> Both quick and advanced test returned that the drive is ok. I then ran the test
> against ad0 (the backup drive): the quick test showed that the drive was
> defective because of "Excessive Shock". Re-executing the test gave same result.
> I rebooted the system and disabled the S.M.A.R.T. option for the drive attached
> to the motherboard's controller (i.e. the backup drive). Re-executing the quick
> test showed that the drive is ok!
> 
> After 16 hours of uptime and one level-0 file system dump all drives are still
> using UDMA100.
> 
> If for some reason the system will fall back again to PIO4 mode I will try to
> remove the two following options from the kernel:
> 
> # ISA optimization
> options AUTO_EOI_1
> options AUTO_EOI_2
> 
> 
> If the problem won't still be solved then I will try in order the following:
> - disable tagged queuing
> - buy different hardware!
> 
>   Francesco Casadei
> -- 
> You can download my public key from http://digilander.libero.it/fcasadei/
> or retrieve it from a keyserver (pgpkeys.mit.edu, wwwkeys.pgp.net, ...)
> 
> Key fingerprint is: 1671 9A23 ACB4 520A E7EE  00B0 7EC3 375F 164E B17B
> 
> end of the original message

Disabling S.M.A.R.T. capability on ad0 did not solve the problem :(
After ~5 days of uptime:

Jan  9 05:39:34 zeus /kernel: ad6: SERVICE timeout tag=24 s=c0 e=04
Jan  9 05:39:44 zeus /kernel: ad6: invalidating queued requests
Jan  9 05:39:44 zeus /kernel: ad6: timeout sending command=00 s=c0 e=04
Jan  9 05:39:44 zeus /kernel: ad6: flush queue failed
Jan  9 05:39:44 zeus /kernel: ad6: timeout sending command=c7 s=c0 e=04
Jan  9 05:39:44 zeus /kernel: ad6: error executing commandad6: invalidating queued 
requests
Jan  9 05:39:44 zeus /kernel: ad6: timeout sending command=00 s=c0 e=04
Jan  9 05:39:44 zeus /kernel: ad6: flush queue failed
Jan  9 05:39:44 zeus /kernel: - resetting
Jan  9 05:39:44 zeus /kernel: ata3: resetting devices .. ad6: invalidating queued 
requests
Jan  9 05:39:44 zeus /kernel: done
Jan  9 05:39:44 zeus /kernel: ad6: no request for tag=1
Jan  9 05:39:44 zeus /kernel: ad6: invalidating queued requests
Jan  9 05:39:34 zeus apcsmart[159]: Serial port read timed out
Jan  9 05:39:44 zeus upsd[162]: Data for UPS [Back-UPS_PRO_650] is stale - check 
support module (shm_ctime too old)
Jan  9 05:39:44 zeus upsmon[166]: Poll UPS [Back-UPS_Pro_650@localhost] failed - Data 
stale
Jan  9 05:39:44 zeus /kernel: Jan  9 05:39:44 zeus upsmon[166]: Poll UPS 
[Back-UPS_Pro_650@localhost] failed - Data stale
Jan  9 05:39:44 zeus upsd[162]: Host 127.0.0.1 disconnected
Jan  9 05:39:44 zeus upsmon[166]: Communications with UPS Back-UPS_Pro_650@localhost 
lost
Jan  9 05:39:44 zeus apcsmart[159]: Serial port read ok again
Jan  9 05:39:46 zeus upsd[162]: Data for UPS [Back-UPS_PRO_650] is now OK
Jan  9 05:39:46 zeus upsd[162]: Data source for UPS [Back-UPS_PRO_650]: SHM (65536)
Jan  9 05:39:49 zeus upsd[162]: Connection from 127.0.0.1
Jan  9 05:39:49 zeus upsmon[166]: Communications with UPS Back-UPS_Pro_650@localhost 
established
Jan  9 05:39:49 zeus upsd[162]: Client 127.0.0.1 logged into UPS [Back-UPS_Pro_650]
Jan  9 05:39:54 zeus /kernel: ad6: READ command timeout tag=1 serv=0 - resetting
Jan  9 05:40:15 zeus /kernel: ad6: invalidating queued requests
Jan  9 05:40:15 zeus /kernel: ata3: resetting devices .. ad6: invalidating queued 
requests
Jan  9 05:40:15 zeus /kernel: done
Jan  9 05:40:15 zeus /kernel: ad6: READ command timeout tag=0 serv=1 - resetting
Jan  9 05:40:15 zeus /kernel: ad6: invalidating queued requests
Jan  9 05:40:15 zeus /kernel: ata3: resetting devices .. ad6: invalidating queued 
requests
Jan  9 05:40:15 zeus /kernel: done
Jan  9 05:40:15 zeus /kernel: ad6: timeout waiting for READY
Jan  9 05:40:15 zeus /kernel: ad6: invalidating queued requests
Jan  9 05:40:15 zeus /kernel: ad6: timeout sending command=00 s=d0 e=04
Jan  9 05:40:15 zeus /kernel: ad6: flush queue failed
Jan  9 05:40:15 zeus /kernel: - resetting
Jan  9 05:40:15 zeus /kernel: ata3: resetting devices .. ad6: invalidating queued 
requests
Jan  9 05:40:15 zeus /kernel: done
Jan  9 05:40:15 zeus /kernel: ad6: READ command timeout tag=1 serv=0 - resetting
Jan  9 05:40:15 zeus /kernel: ad6: invalidating queued requests
Jan  9 05:40:15 zeus /kernel: ata3: resetting devices .. ad6: invalidating queued 
requests
Jan  9 05:40:15 zeus /kernel: done
Jan  9 05:40:15 zeus /kernel: ad6: READ command timeout tag=0 serv=1 - resetting
Jan  9 05:40:15 zeus /kernel: ad6: invalidating queued requests
Jan  9 05:40:15 zeus /kernel: ata3: resetting devices .. ad6: invalidating queued 
requests
Jan  9 05:40:15 zeus /kernel: done
Jan  9 05:40:15 zeus /kernel: ad6: no request for tag=0
Jan  9 05:40:15 zeus /kernel: ad6: invalidating queued requests
Jan  9 05:40:15 zeus /kernel: ad6: WRITE command timeout tag=0 serv=0 - resetting
Jan  9 

Re: ata "fallback to PIO mode" on dual processor AMD systems

2003-01-06 Thread Guy Dawson
This article from The Register may be of interest:

http://www.theregister.co.uk/content/3/18267.html

It talks about a bug in the VIA 686B Southbridge chipset that can cause
data corruption when processing large amounts of data.

Guy
-- 
Guy DawsonI.T. Manager  Crossflight Ltd
[EMAIL PROTECTED] 07973  79781901753 776104




**
This email contains the views and opinions of a Crossflight Limited
employee and at this stage are in no way a direct representation of
Crossflight Limited.
This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the system manager. To ensure the integrity and appropriate use of
its email system, Crossflight Limited reserves the right to examine
any email held on its email system or sent to or from it.
This footnote also confirms that this email message has been swept by
MIMEsweeper for the presence of computer viruses.
We strongly recomend that you check this email with your own virus
software as Crossflight Limited will not be held responsible for any
damage caused by viruses as a result of opening this email.
**


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-questions" in the body of the message



Re: ata "fallback to PIO mode" on dual processor AMD systems

2003-01-05 Thread Adam Maas
This would be legacy behaviour from the days of buggy ATA33/UDMA
implementations, where falling back to PIO mode would allow a device with a
buggy UDMA implementation (Unfortunately rather common at the time) to
function.

--Adam
- Original Message -
From: "Bruce Campbell" <[EMAIL PROTECTED]>
To: <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>
Sent: Sunday, January 05, 2003 10:01 PM
Subject: Re: ata "fallback to PIO mode" on dual processor AMD systems


> Quoting Bruce Campbell <[EMAIL PROTECTED]>:
>
> > Quoting Matthew Emmerton <[EMAIL PROTECTED]>:
> >
> > > [ cc'ing Soren since he's the ATA guru ]
> > >
> > > > Dec 30 23:27:00 ecserv13 /kernel: ad0: trying fallback to PIO mode
> > > > Dec 30 23:27:00 ecserv13 /kernel: ata0: resetting devices .. done
> > > >
> > > > The test continues to run with the ata controller in PIO mode, with
> > > > slower performance, and higher load average.
> > > >
> > > > Once the master drops to PIO, attempts to access the slave then
cause
> > > > it to drop to PIO.
> > >
> > > Are you using 80-conductor cables on all your drives?  These are
required
> > to
> > > get consistent high throughput, and running without them may cause the
> > > problems you're seeing.
> >
> > Thanks for the information about the design of IDE etc, and the
suggestion
> > about the cables.  I was about to shuffle things to get the disks
> > onto separate channels, but I now see that would be a mistake as my
> > CD drive would share a cable with a disk.
>
> ps.  As an aside, I have since determined that putting a PIO device and
>  a UDMA device on the same channel does not affect the performance
>  of the UDMA device, unless the PIO device is in use.  So, sharing
>  a low use CD rom drive with a disk wouldn't be so bad.
>
>  I am puzzled about the fallback to PIO concept.  If a disk has
>  gives some sort of timeout error or whatever, why would trying
>  PIO correct the problem ?  That seems equivalent to asking the
>  disk to do the same thing, just more slowly.
>
>  In my case, some sort of timeout error occurs on ad0, so
>  it falls back to PIO, and works.  A later access to ad1
>  also yields a timeout error, and then it drops to PIO,
>  and works too.  I'm fairly confident both disks did not
>  experience media errors at the same time, which suggests
>  a problem with the onboard IDE controller, or a driver bug.
>
>  Tests continue...
>
>
>
>
>
>
>
> 
> This mail sent through www.mywaterloo.ca
>
> To Unsubscribe: send mail to [EMAIL PROTECTED]
> with "unsubscribe freebsd-questions" in the body of the message
>


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-questions" in the body of the message



Re: ata "fallback to PIO mode" on dual processor AMD systems

2003-01-05 Thread Bruce Campbell
Quoting Bruce Campbell <[EMAIL PROTECTED]>:

> Quoting Matthew Emmerton <[EMAIL PROTECTED]>:
> 
> > [ cc'ing Soren since he's the ATA guru ]
> > 
> > > Dec 30 23:27:00 ecserv13 /kernel: ad0: trying fallback to PIO mode
> > > Dec 30 23:27:00 ecserv13 /kernel: ata0: resetting devices .. done
> > >
> > > The test continues to run with the ata controller in PIO mode, with
> > > slower performance, and higher load average.
> > >
> > > Once the master drops to PIO, attempts to access the slave then cause
> > > it to drop to PIO.
> >
> > Are you using 80-conductor cables on all your drives?  These are required
> to
> > get consistent high throughput, and running without them may cause the
> > problems you're seeing.
> 
> Thanks for the information about the design of IDE etc, and the suggestion
> about the cables.  I was about to shuffle things to get the disks
> onto separate channels, but I now see that would be a mistake as my
> CD drive would share a cable with a disk.

ps.  As an aside, I have since determined that putting a PIO device and
 a UDMA device on the same channel does not affect the performance
 of the UDMA device, unless the PIO device is in use.  So, sharing
 a low use CD rom drive with a disk wouldn't be so bad.

 I am puzzled about the fallback to PIO concept.  If a disk has
 gives some sort of timeout error or whatever, why would trying
 PIO correct the problem ?  That seems equivalent to asking the
 disk to do the same thing, just more slowly.

 In my case, some sort of timeout error occurs on ad0, so
 it falls back to PIO, and works.  A later access to ad1
 also yields a timeout error, and then it drops to PIO,
 and works too.  I'm fairly confident both disks did not 
 experience media errors at the same time, which suggests 
 a problem with the onboard IDE controller, or a driver bug.

 Tests continue...

 






This mail sent through www.mywaterloo.ca

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-questions" in the body of the message



Re: ata "fallback to PIO mode" on dual processor AMD systems

2003-01-05 Thread Francesco Casadei
On Thu, Jan 02, 2003 at 01:42:03PM -0500, Bruce Campbell wrote:
[snip]
> 
> I don't have it enabled:
> 
>   hw.ata.tags: 0
> 
> I've manually set:
> 
>   atacontrol mode 0 UDMA33 UDMA33
> 
> and the problem has not recurred.
> 
> -- 
> Bruce Campbell
> Engineering Computing
> CPH-2374B
> University of Waterloo
> (519)888-4567 ext 5889
> 
> 
> This mail sent through www.mywaterloo.ca
> 
> To Unsubscribe: send mail to [EMAIL PROTECTED]
> with "unsubscribe freebsd-questions" in the body of the message
> 
> end of the original message

Yesterday I checked the drive ad6 with the Drive Fitness Test program from IBM.
Both quick and advanced test returned that the drive is ok. I then ran the test
against ad0 (the backup drive): the quick test showed that the drive was
defective because of "Excessive Shock". Re-executing the test gave same result.
I rebooted the system and disabled the S.M.A.R.T. option for the drive attached
to the motherboard's controller (i.e. the backup drive). Re-executing the quick
test showed that the drive is ok!

After 16 hours of uptime and one level-0 file system dump all drives are still
using UDMA100.

If for some reason the system will fall back again to PIO4 mode I will try to
remove the two following options from the kernel:

# ISA optimization
options AUTO_EOI_1
options AUTO_EOI_2


If the problem won't still be solved then I will try in order the following:
- disable tagged queuing
- buy different hardware!

Francesco Casadei
-- 
You can download my public key from http://digilander.libero.it/fcasadei/
or retrieve it from a keyserver (pgpkeys.mit.edu, wwwkeys.pgp.net, ...)

Key fingerprint is: 1671 9A23 ACB4 520A E7EE  00B0 7EC3 375F 164E B17B




msg14309/pgp0.pgp
Description: PGP signature


Re: ata "fallback to PIO mode" on dual processor AMD systems

2003-01-02 Thread Francesco Casadei
On Thu, Jan 02, 2003 at 01:42:03PM -0500, Bruce Campbell wrote:
> 
[snip]
> I don't have it enabled:
> 
>   hw.ata.tags: 0
> 
> I've manually set:
> 
>   atacontrol mode 0 UDMA33 UDMA33
> 
> and the problem has not recurred.
> 
> -- 
> Bruce Campbell
> Engineering Computing
> CPH-2374B
> University of Waterloo
> (519)888-4567 ext 5889
> 
> 
> This mail sent through www.mywaterloo.ca
> 
> To Unsubscribe: send mail to [EMAIL PROTECTED]
> with "unsubscribe freebsd-questions" in the body of the message
> 
> end of the original message

# atacontrol mode 3
Master = PIO4 
Slave  = ???

# atacontrol mode 3 udma33 xxx
Master = UDMA33 
Slave  = ???

# atacontrol mode 3
Master = UDMA33 
Slave  = ???

# find / -name nonexistent -print

# atacontrol mode 3
Master = PIO4 
Slave  = ???


After little disk activity, like searching a file throughout the entire
filesystem, the second disk of the RAID array falls back to PIO4 mode.

I booted the system from the live system cd (2nd disk of the freebsd
distribution set) then ran dd to read from and write to ad6: no errors were
found.

Francesco Casadei
-- 
You can download my public key from http://digilander.libero.it/fcasadei/
or retrieve it from a keyserver (pgpkeys.mit.edu, wwwkeys.pgp.net, ...)

Key fingerprint is: 1671 9A23 ACB4 520A E7EE  00B0 7EC3 375F 164E B17B




msg14040/pgp0.pgp
Description: PGP signature


Re: ata "fallback to PIO mode" on dual processor AMD systems

2003-01-02 Thread Bruce Campbell
Quoting Francesco Casadei <[EMAIL PROTECTED]>:
> On Tue, Dec 31, 2002 at 03:57:16PM -0500, Bruce Campbell wrote:
> > 
> > I am seeing a problem with ata disks on 4 new systems, which
> > I believe is either a bug in the ata driver, or a problem with
> > the onboard IDE controller, or something else.  Systems are as follows:
> > ...
> > Motherboard: ASUS A7M266-D
> > CPUs   : 2 x 2000+ AMD MP
> > Memory : 2 x 512MB Crucial part: CT6472Y265
> > Dec 30 23:26:59 ecserv13 /kernel: ad0: WRITE command timeout tag=0 serv=0
> -
> > resetting
> > Dec 30 23:26:59 ecserv13 /kernel: ata0: resetting devices .. done
> > Dec 30 23:26:59 ecserv13 /kernel: ad0: WRITE command timeout tag=0 serv=0 
> > resetting
> > Dec 30 23:27:00 ecserv13 /kernel: ata0: resetting devices .. done
> > Dec 30 23:27:00 ecserv13 /kernel: ad0: WRITE command timeout tag=0 serv=0 
> > resetting
> > Dec 30 23:27:00 ecserv13 /kernel: ata0: resetting devices .. done
> > Dec 30 23:27:00 ecserv13 /kernel: ad0: WRITE command timeout tag=0 serv=0 
> > resetting
> > Dec 30 23:27:00 ecserv13 /kernel: ad0: timeout waiting for cmd=ef s=d0
> e=00
> > Dec 30 23:27:00 ecserv13 /kernel: ad0: trying fallback to PIO mode
>
> Same problem here, but slightly different configuration:
> 
> # atacontrol list
> ATA channel 0:
> Master:  ad0  ATA/ATAPI rev 5
> Slave:   no device present
> ATA channel 1:
> Master: acd0  ATA/ATAPI rev 0
> Slave:   no device present
> ATA channel 2:
> Master:  ad4  ATA/ATAPI rev 5
> Slave:   no device present
> ATA channel 3:
> Master:  ad6  ATA/ATAPI rev 5
> Slave:   no device present
> 
> ad4 and ad6 are attached to a Promise FastTrak 100 TX2 ATA RAID controller.
> 
> # atacontrol mode 0
> Master = UDMA100 
> Slave  = ???
> 
> # atacontrol mode 1
> Master = PIO4 
> Slave  = ???
> 
> # atacontrol mode 2
> Master = UDMA100 
> Slave  = ???
> 
> # atacontrol mode 3
> Master = PIO4 
> Slave  = ???
> 
> ad6 falls back to PIO mode on heavy I/O activity, i.e. when the system does
> a
> level 0 file systems dump from the RAID 1 array (ad4,ad6) to the backup disk
> ad0.
> Rebooting and rebuilding the array with the Promise BIOS utility temporarily
> solve the problem. The system may be up and running for 1-4 weeks doing a
> level 0 dump every morning at 5:30am and then one day the drive ad6 falls
> back
> to PIO mode again (little before the completion of fs dump).
> 
> Do the hard drives you are using support the ATA tagged queuing? And if so,
> do
> you have TQ enbled?

I don't have it enabled:

  hw.ata.tags: 0

I've manually set:

  atacontrol mode 0 UDMA33 UDMA33

and the problem has not recurred.

-- 
Bruce Campbell
Engineering Computing
CPH-2374B
University of Waterloo
(519)888-4567 ext 5889


This mail sent through www.mywaterloo.ca

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-questions" in the body of the message



Re: ata "fallback to PIO mode" on dual processor AMD systems

2003-01-02 Thread Francesco Casadei
On Tue, Dec 31, 2002 at 03:57:16PM -0500, Bruce Campbell wrote:
> 
> I am seeing a problem with ata disks on 4 new systems, which
> I believe is either a bug in the ata driver, or a problem with
> the onboard IDE controller, or something else.  Systems are as follows:
> 
> Motherboard: ASUS A7M266-D
> CPUs   : 2 x 2000+ AMD MP
> Memory : 2 x 512MB Crucial part: CT6472Y265
> 
> Disks (all UDMA100):
> 
> Master   Slave
> System 1:  WDC WD400BB WDC WD1000BB
> System 2:  WDC WD400BB WDC WD1000BB
> System 3:  WDC WD400BB WDC WD800BB
> System 4:  WDC WD400BB Maxtor 98196H8
> 
> Kernel : 4.7-RELEASE, custom kernel (compared to GENERIC):
> 
> commented out:
> 
>  cpu   I386_CPU
>  cpu   I486_CPU
> 
> enabled 
> 
>  options   SMP # Symmetric MultiProcessor Kernel
>  options   APIC_IO # Symmetric (APIC) I/O
> 
> 
> I am running a test with "dbench" (/usr/ports/benchmarks/dbench)
> with a script which runs:
> 
>   dbench 1
>   sleep for 5 minutes
>   dbench 2
>   sleep for 5 minutes
>   dbench 3
>   ...
> 
> to simulate 1,2,3... clients.
> 
> The following has happened on systems 2,3 and 4, after about 15 hours
> of running the test:
> 
> Dec 30 23:26:59 ecserv13 /kernel: ad0: WRITE command timeout tag=0 serv=0 -
> resetting
> Dec 30 23:26:59 ecserv13 /kernel: ata0: resetting devices .. done
> Dec 30 23:26:59 ecserv13 /kernel: ad0: WRITE command timeout tag=0 serv=0 
> resetting
> Dec 30 23:27:00 ecserv13 /kernel: ata0: resetting devices .. done
> Dec 30 23:27:00 ecserv13 /kernel: ad0: WRITE command timeout tag=0 serv=0 
> resetting
> Dec 30 23:27:00 ecserv13 /kernel: ata0: resetting devices .. done
> Dec 30 23:27:00 ecserv13 /kernel: ad0: WRITE command timeout tag=0 serv=0 
> resetting
> Dec 30 23:27:00 ecserv13 /kernel: ad0: timeout waiting for cmd=ef s=d0 e=00
> Dec 30 23:27:00 ecserv13 /kernel: ad0: trying fallback to PIO mode
> Dec 30 23:27:00 ecserv13 /kernel: ata0: resetting devices .. done
> 
> The test continues to run with the ata controller in PIO mode, with
> slower performance, and higher load average.
> 
> Once the master drops to PIO, attempts to access the slave then cause
> it to drop to PIO.
> 
> If I run:
> 
>   atacontrol mode 0 UDMA100 UDMA100
> 
> attempts to access either drive result in a delay until the controller
> drops to PIO, and then operations resume.  A soft reboot and things
> work in UDMA mode again.  Also tried UDMA33 and UDMA66 with no change.
> I also tried "atacontrol reinit 0" with no help.
> 
> Theories when I search the web for "fallback to PIO mode" include:
> 
>  - bad disks
>  - something to do with thermal recalibration
> 
> I don't believe the problems are bad disks, as the slave drops to PIO
> after the master does, and I can't get in back to UDMA, other than by
> soft reboot.  Plus I see the problem on 6 of 8 disks.
> 
> The problem is very repeatable.
> 
> Can anyone offer any ideas, or suggest investigative steps ?  I have a system
> in PIO mode right now.
> 
> Thanks,
> 
> -- 
> Bruce Campbell
> Engineering Computing
> CPH-2374B
> University of Waterloo
> (519)888-4567 ext 5889
> 
> 
> This mail sent through www.mywaterloo.ca
> 
> To Unsubscribe: send mail to [EMAIL PROTECTED]
> with "unsubscribe freebsd-questions" in the body of the message
> 
> end of the original message

Same problem here, but slightly different configuration:

# atacontrol list
ATA channel 0:
Master:  ad0  ATA/ATAPI rev 5
Slave:   no device present
ATA channel 1:
Master: acd0  ATA/ATAPI rev 0
Slave:   no device present
ATA channel 2:
Master:  ad4  ATA/ATAPI rev 5
Slave:   no device present
ATA channel 3:
Master:  ad6  ATA/ATAPI rev 5
Slave:   no device present

ad4 and ad6 are attached to a Promise FastTrak 100 TX2 ATA RAID controller.

# atacontrol mode 0
Master = UDMA100 
Slave  = ???

# atacontrol mode 1
Master = PIO4 
Slave  = ???

# atacontrol mode 2
Master = UDMA100 
Slave  = ???

# atacontrol mode 3
Master = PIO4 
Slave  = ???

ad6 falls back to PIO mode on heavy I/O activity, i.e. when the system does a
level 0 file systems dump from the RAID 1 array (ad4,ad6) to the backup disk
ad0.
Rebooting and rebuilding the array with the Promise BIOS utility temporarily
solve the problem. The system may be up and running for 1-4 weeks doing a
level 0 dump every morning at 5:30am and then one day the drive ad6 falls back
to PIO mode again (little before the completion of fs dump).

Do the hard drives you are using support the ATA tagged queuing? And if so, do
you have TQ enbled?

Francesco Casadei

-- 
You can download my public key from http://digilander.libero.it/fcasadei/
or retrieve it from a keyserver (pgpkeys.mit.edu, wwwkeys.pgp.net, ...)

Key fingerprint is: 1671 9A23 ACB4 520A E7EE  00B0 7EC3 375F 164E B17B




msg13998/pgp0.pgp
Description: PGP signa

Re: ata "fallback to PIO mode" on dual processor AMD systems

2002-12-31 Thread Bruce Campbell
Quoting Matthew Emmerton <[EMAIL PROTECTED]>:

> [ cc'ing Soren since he's the ATA guru ]
> 
> > Dec 30 23:27:00 ecserv13 /kernel: ad0: trying fallback to PIO mode
> > Dec 30 23:27:00 ecserv13 /kernel: ata0: resetting devices .. done
> >
> > The test continues to run with the ata controller in PIO mode, with
> > slower performance, and higher load average.
> >
> > Once the master drops to PIO, attempts to access the slave then cause
> > it to drop to PIO.
>
> Are you using 80-conductor cables on all your drives?  These are required to
> get consistent high throughput, and running without them may cause the
> problems you're seeing.

Thanks for the information about the design of IDE etc, and the suggestion
about the cables.  I was about to shuffle things to get the disks
onto separate channels, but I now see that would be a mistake as my
CD drive would share a cable with a disk.

Anyway, they all have the 80 conductor cable.  I forgot to add some 
environmental and other information.

 The 4 AMD systems are in Aopen hx08 towers, with 400 watt power supplies,
 and 5 auxilliary fans (in addition to the power supply fan, and fan on
 each cpu).  They are in an air conditioned machine room.  The CPU and
 motherboard temperatures are within spec.  I mention this as I note
 many reported AMD system problems traced to overheating.

 All drives are installed in removeable drive bays.  I don't have the make/model
 on hand right now.  They were $19 CAD.  ($13USD).  The low cost makes
 me suspicious now, but...

 I'm running the same tests on 4 single processor 2.4GHz Intel systems.
 They have not failed in this manner so far.

 Initially, I had 1GB memory modules in the AMD systems (I can't remember
 the make) and the systems froze and rebooted randomly.  I moved to
 Crucial 512MB modules to cure that problem.




This mail sent through www.mywaterloo.ca

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-questions" in the body of the message



Re: ata "fallback to PIO mode" on dual processor AMD systems

2002-12-31 Thread Matthew Emmerton
[ cc'ing Soren since he's the ATA guru ]

> I am seeing a problem with ata disks on 4 new systems, which
> I believe is either a bug in the ata driver, or a problem with
> the onboard IDE controller, or something else.  Systems are as follows:
>
> Motherboard: ASUS A7M266-D
> CPUs   : 2 x 2000+ AMD MP
> Memory : 2 x 512MB Crucial part: CT6472Y265
>
> Disks (all UDMA100):
>
> Master   Slave
> System 1:WDC WD400BB   WDC WD1000BB
> System 2:  WDC WD400BB WDC WD1000BB
> System 3:  WDC WD400BB WDC WD800BB
> System 4:  WDC WD400BB Maxtor 98196H8
>
> Kernel : 4.7-RELEASE, custom kernel (compared to GENERIC):
>
> commented out:
>
>  cpu   I386_CPU
>  cpu   I486_CPU
>
> enabled
>
>  options   SMP # Symmetric MultiProcessor Kernel
>  options   APIC_IO # Symmetric (APIC) I/O
>
>
> I am running a test with "dbench" (/usr/ports/benchmarks/dbench)
> with a script which runs:
>
>   dbench 1
>   sleep for 5 minutes
>   dbench 2
>   sleep for 5 minutes
>   dbench 3
>   ...
>
> to simulate 1,2,3... clients.
>
> The following has happened on systems 2,3 and 4, after about 15 hours
> of running the test:
>
> Dec 30 23:26:59 ecserv13 /kernel: ad0: WRITE command timeout tag=0
serv=0 -
> resetting
> Dec 30 23:26:59 ecserv13 /kernel: ata0: resetting devices .. done
> Dec 30 23:26:59 ecserv13 /kernel: ad0: WRITE command timeout tag=0 serv=0
> resetting
> Dec 30 23:27:00 ecserv13 /kernel: ata0: resetting devices .. done
> Dec 30 23:27:00 ecserv13 /kernel: ad0: WRITE command timeout tag=0 serv=0
> resetting
> Dec 30 23:27:00 ecserv13 /kernel: ata0: resetting devices .. done
> Dec 30 23:27:00 ecserv13 /kernel: ad0: WRITE command timeout tag=0 serv=0
> resetting
> Dec 30 23:27:00 ecserv13 /kernel: ad0: timeout waiting for cmd=ef s=d0
e=00
> Dec 30 23:27:00 ecserv13 /kernel: ad0: trying fallback to PIO mode
> Dec 30 23:27:00 ecserv13 /kernel: ata0: resetting devices .. done
>
> The test continues to run with the ata controller in PIO mode, with
> slower performance, and higher load average.
>
> Once the master drops to PIO, attempts to access the slave then cause
> it to drop to PIO.
>
> If I run:
>
>   atacontrol mode 0 UDMA100 UDMA100
>
> attempts to access either drive result in a delay until the controller
> drops to PIO, and then operations resume.  A soft reboot and things
> work in UDMA mode again.  Also tried UDMA33 and UDMA66 with no change.
> I also tried "atacontrol reinit 0" with no help.
>
> Theories when I search the web for "fallback to PIO mode" include:
>
>  - bad disks
>  - something to do with thermal recalibration
>
> I don't believe the problems are bad disks, as the slave drops to PIO
> after the master does, and I can't get in back to UDMA, other than by
> soft reboot.  Plus I see the problem on 6 of 8 disks.
>
> The problem is very repeatable.
>
> Can anyone offer any ideas, or suggest investigative steps ?  I have a
system
> in PIO mode right now.

The reason the slave drops to PIO after the master does is by design - the
master and slave have to use the same signalling mode since they're on the
same cable.  (People often report lackluster performance of fast UDMA hard
drives with non-UDMA CD-ROMs on the same channel.)

Are you using 80-conductor cables on all your drives?  These are required to
get consistent high throughput, and running without them may cause the
problems you're seeing.

--
Matt Emmerton


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-questions" in the body of the message



ata "fallback to PIO mode" on dual processor AMD systems

2002-12-31 Thread Bruce Campbell

I am seeing a problem with ata disks on 4 new systems, which
I believe is either a bug in the ata driver, or a problem with
the onboard IDE controller, or something else.  Systems are as follows:

Motherboard: ASUS A7M266-D
CPUs   : 2 x 2000+ AMD MP
Memory : 2 x 512MB Crucial part: CT6472Y265

Disks (all UDMA100):

Master   Slave
System 1:  WDC WD400BB WDC WD1000BB
System 2:  WDC WD400BB WDC WD1000BB
System 3:  WDC WD400BB WDC WD800BB
System 4:  WDC WD400BB Maxtor 98196H8

Kernel : 4.7-RELEASE, custom kernel (compared to GENERIC):

commented out:

 cpu   I386_CPU
 cpu   I486_CPU

enabled 

 options   SMP # Symmetric MultiProcessor Kernel
 options   APIC_IO # Symmetric (APIC) I/O


I am running a test with "dbench" (/usr/ports/benchmarks/dbench)
with a script which runs:

  dbench 1
  sleep for 5 minutes
  dbench 2
  sleep for 5 minutes
  dbench 3
  ...

to simulate 1,2,3... clients.

The following has happened on systems 2,3 and 4, after about 15 hours
of running the test:

Dec 30 23:26:59 ecserv13 /kernel: ad0: WRITE command timeout tag=0 serv=0 -
resetting
Dec 30 23:26:59 ecserv13 /kernel: ata0: resetting devices .. done
Dec 30 23:26:59 ecserv13 /kernel: ad0: WRITE command timeout tag=0 serv=0 
resetting
Dec 30 23:27:00 ecserv13 /kernel: ata0: resetting devices .. done
Dec 30 23:27:00 ecserv13 /kernel: ad0: WRITE command timeout tag=0 serv=0 
resetting
Dec 30 23:27:00 ecserv13 /kernel: ata0: resetting devices .. done
Dec 30 23:27:00 ecserv13 /kernel: ad0: WRITE command timeout tag=0 serv=0 
resetting
Dec 30 23:27:00 ecserv13 /kernel: ad0: timeout waiting for cmd=ef s=d0 e=00
Dec 30 23:27:00 ecserv13 /kernel: ad0: trying fallback to PIO mode
Dec 30 23:27:00 ecserv13 /kernel: ata0: resetting devices .. done

The test continues to run with the ata controller in PIO mode, with
slower performance, and higher load average.

Once the master drops to PIO, attempts to access the slave then cause
it to drop to PIO.

If I run:

  atacontrol mode 0 UDMA100 UDMA100

attempts to access either drive result in a delay until the controller
drops to PIO, and then operations resume.  A soft reboot and things
work in UDMA mode again.  Also tried UDMA33 and UDMA66 with no change.
I also tried "atacontrol reinit 0" with no help.

Theories when I search the web for "fallback to PIO mode" include:

 - bad disks
 - something to do with thermal recalibration

I don't believe the problems are bad disks, as the slave drops to PIO
after the master does, and I can't get in back to UDMA, other than by
soft reboot.  Plus I see the problem on 6 of 8 disks.

The problem is very repeatable.

Can anyone offer any ideas, or suggest investigative steps ?  I have a system
in PIO mode right now.

Thanks,

-- 
Bruce Campbell
Engineering Computing
CPH-2374B
University of Waterloo
(519)888-4567 ext 5889


This mail sent through www.mywaterloo.ca

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-questions" in the body of the message