Re: ATA errors on recent -current

2002-04-19 Thread msch

  So: I changed line 186 in sys/dev/ata/ata-disk.c from
  
  adp-num_tags = atadev-param-queuelen;
  
  to
  
  adp-num_tags = 0x10;
  
  which is roughly the half of the reported queuelenght (which is 0x1F).
  
  And, Terry, I can't avoid to disappoint you... there's absolutely *no*
  change in the behaviour of the new kernel :-(
 
 Uh... the 16 you changed to 10 was decimal, so changining it
 to 0x10 changes it to ... 16.
 
 Rather than point out the hex/decimal confusion earlier, that's
 why I said /2.

Ahm, Terry, perhaps I misunderstand you, but: The reported queue-length is
31(dec), which is 0x1F(hex), as stated above. The half of it would be 15.5(dec)
what I rounded up to 16(dec), which is approx. 0x10(hex). Where's your point?

 Soren's commit is for a -current specific merge.  The problems
 you are seeing supposedly are in RELENG_4, and will probably not
 be effected... though the commit will provide much better
 diagnostics than I've suggested.  8-).

All I posted here is done, even if my signature states something different,
under -current. This last test was done under a -current of Apr 18,2002, 18:00
UTC. I run my allday system, from which I'm posting and writing my e-mail, under
-STABLE... I hope that clears things a bit.

Ciao/BSD -
Matthias


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: ATA errors on recent -current

2002-04-18 Thread Søren Schmidt

It seems Terry Lambert wrote:
 My other hunch is that there will need to be a channel reserved
 for reset commands to be queued to the disk, so that you can
 queue more commands to it later (e.g. can't connect to send the
 reset because of the already disconnected commands in progress).

Terry, read the ATA spec, it doesn't work that way, tags on
ATA is very different from tags on SCSI, and beside a reset
is not a command, but a bit in a HW port..

-Søren

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: ATA errors on recent -current

2002-04-18 Thread Terry Lambert

Søren Schmidt wrote:
 It seems Terry Lambert wrote:
  My other hunch is that there will need to be a channel reserved
  for reset commands to be queued to the disk, so that you can
  queue more commands to it later (e.g. can't connect to send the
  reset because of the already disconnected commands in progress).
 
 Terry, read the ATA spec, it doesn't work that way, tags on
 ATA is very different from tags on SCSI, and beside a reset
 is not a command, but a bit in a HW port..

I didn't mean for the reset itself, I meant for the process.  You
can't take back writes that are in progress and not acknowledged,
in order to retry them after the reset, so as to not lose data.

-- Terry

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: ATA errors on recent -current

2002-04-18 Thread Søren Schmidt

It seems Terry Lambert wrote:
 Søren Schmidt wrote:
  It seems Terry Lambert wrote:
   My other hunch is that there will need to be a channel reserved
   for reset commands to be queued to the disk, so that you can
   queue more commands to it later (e.g. can't connect to send the
   reset because of the already disconnected commands in progress).
  
  Terry, read the ATA spec, it doesn't work that way, tags on
  ATA is very different from tags on SCSI, and beside a reset
  is not a command, but a bit in a HW port..
 
 I didn't mean for the reset itself, I meant for the process.  You
 can't take back writes that are in progress and not acknowledged,
 in order to retry them after the reset, so as to not lose data.

Oh yes you can, the ATA driver does just that in case of the drive
loosing its marbels.

-Søren

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: ATA errors on recent -current

2002-04-18 Thread Matthias Schuendehuette

Am Donnerstag, 18. April 2002 16:44 schrieb Søren Schmidt:
 It seems Terry Lambert wrote:
  Søren Schmidt wrote:
   It seems Terry Lambert wrote:
My other hunch is that there will need to be a channel reserved
for reset commands to be queued to the disk, so that you can
queue more commands to it later (e.g. can't connect to send the
reset because of the already disconnected commands in
progress).
  
   Terry, read the ATA spec, it doesn't work that way, tags on
   ATA is very different from tags on SCSI, and beside a reset
   is not a command, but a bit in a HW port..
 
  I didn't mean for the reset itself, I meant for the process.  You
  can't take back writes that are in progress and not acknowledged,
  in order to retry them after the reset, so as to not lose data.

 Oh yes you can, the ATA driver does just that in case of the drive
 loosing its marbels.

Does that mean that the driver isn't aware of the 'tags-problem'? If I 
understand you right, it should be possible to reset the drive and 
continue, maybe without tags or at a reduced UDMA-Speed or whatever 
actions seem appropriate...

...ahh, I mean, the driver *does* take an action (it/he(?) switches 
back to PIO4), but why is any UDMA-Mode no longer usable afterwards? Is 
the drive been reset or just switched back? What is the impact of a 
reset compared to a switch back?

Well, just my thoghts, I'm no specialist at all
-- 
Ciao/BSD - Matthias

Matthias Schuendehuette [EMAIL PROTECTED], Berlin (Germany)
Powered by FreeBSD 4.5-STABLE

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: ATA errors on recent -current

2002-04-18 Thread Søren Schmidt

It seems Matthias Schuendehuette wrote:
   I didn't mean for the reset itself, I meant for the process.  You
   can't take back writes that are in progress and not acknowledged,
   in order to retry them after the reset, so as to not lose data.
 
  Oh yes you can, the ATA driver does just that in case of the drive
  loosing its marbels.
 
 Does that mean that the driver isn't aware of the 'tags-problem'? If I 
 understand you right, it should be possible to reset the drive and 
 continue, maybe without tags or at a reduced UDMA-Speed or whatever 
 actions seem appropriate...
 
 ...ahh, I mean, the driver *does* take an action (it/he(?) switches 
 back to PIO4), but why is any UDMA-Mode no longer usable afterwards? Is 
 the drive been reset or just switched back? What is the impact of a 
 reset compared to a switch back?

The driver always resets the ATA channel if a command times out, thats
the only way to gain control of the device(s) again.
The driver always falls back to PIO if it encounters a DMA problem,
be it with tags or not, as chances are DMA doesn't work at all if
a problem shows up. Now this could be changed, but in 99% of the cases
it will just make the pain last longer, until it finally switches
back to PIO. I chose this route because most users prefers to
keep thier data intact at (almost) any price. However in -current
and recent -stables you can switch on DMA again with atacontrol,
if you think it was a fluke that got it set back to PIO.

-Søren

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: ATA errors on recent -current

2002-04-18 Thread Terry Lambert

Søren Schmidt wrote:
  I didn't mean for the reset itself, I meant for the process.  You
  can't take back writes that are in progress and not acknowledged,
  in order to retry them after the reset, so as to not lose data.
 
 Oh yes you can, the ATA driver does just that in case of the drive
 loosing its marbels.

If it worked, people wouldn't be having this problem.

What's your theory on it?

-- Terry

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: ATA errors on recent -current

2002-04-18 Thread Terry Lambert

Matthias Schuendehuette wrote:
 ...ahh, I mean, the driver *does* take an action (it/he(?) switches
 back to PIO4), but why is any UDMA-Mode no longer usable afterwards?

This is the $64 question.


-- Terry

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: ATA errors on recent -current

2002-04-18 Thread Søren Schmidt

It seems Terry Lambert wrote:
 Søren Schmidt wrote:
   I didn't mean for the reset itself, I meant for the process.  You
   can't take back writes that are in progress and not acknowledged,
   in order to retry them after the reset, so as to not lose data.
  
  Oh yes you can, the ATA driver does just that in case of the drive
  loosing its marbels.
 
 If it worked, people wouldn't be having this problem.

Hmm, since I havn't been able to get my hands on the problem
(I've been running 3 systems here with tags all over since the
first report, not a single hickup yet :( ) I can't tell whats
going on, it might be that the drive somehow gets really confused
I dont know, for now those having tags problems should just
not enable it...

 What's your theory on it?

None so far, I've instrumented the code here, and I simply cannot
see what should go wrong (yet).

BUT recent current with the busdma'd ATA driver screws up with
tags, fix is coming as soon as I get a few hours to commit it...

-Søren

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: ATA errors on recent -current

2002-04-18 Thread Terry Lambert

Søren Schmidt wrote:
   Oh yes you can, the ATA driver does just that in case of the drive
   loosing its marbels.
 
  If it worked, people wouldn't be having this problem.
 
 Hmm, since I havn't been able to get my hands on the problem
 (I've been running 3 systems here with tags all over since the
 first report, not a single hickup yet :( ) I can't tell whats
 going on, it might be that the drive somehow gets really confused
 I dont know, for now those having tags problems should just
 not enable it...


I wish someone who is having the problem would try the three
hacks I suggested, and report back.  I personally can't reproduce
the problem here, either.


  What's your theory on it?
 
 None so far, I've instrumented the code here, and I simply cannot
 see what should go wrong (yet).

Heh.  My hardware works ... I instrument the code ... my hardware
still works.  8-) 8-).

I think that it's going to be up to the people who are complaining
to give feedback.


 BUT recent current with the busdma'd ATA driver screws up with
 tags, fix is coming as soon as I get a few hours to commit it...

Totally different problem, of course... they were complaining
about -stable vs. 4.5-release, too.  I agree that the additional
hardware support is more important.  8-(.

-- Terry

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: ATA errors on recent -current

2002-04-18 Thread Søren Schmidt

It seems Terry Lambert wrote:
  Hmm, since I havn't been able to get my hands on the problem
  (I've been running 3 systems here with tags all over since the
  first report, not a single hickup yet :( ) I can't tell whats
  going on, it might be that the drive somehow gets really confused
  I dont know, for now those having tags problems should just
  not enable it...
 
 I wish someone who is having the problem would try the three
 hacks I suggested, and report back.  I personally can't reproduce
 the problem here, either.

Thats life I guess, but eventually I'll get the combination together
that fails (I hope), since this is more or less impossible to debug
remotely...

-Søren

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: ATA errors on recent -current

2002-04-18 Thread Andrew Tulloch

I have a dell poweredge 500sc currently running 4.5-STABLE with the
following:

atapci0: ServerWorks CSB5 ATA100 controller port
0x8c0-0x8c3,0x8b0-0x8bf,0x374-0x377,0x170-0x177,0x3f4-0x3f7,0x1f0-0x1f7 at
device 15.1 on pci0

ad0: 19073MB IC35L020AVER07-0 [38752/16/63] at ata0-master UDMA100

and it eventually stops at a mount root prompt after several timeout/resets
attempting to mount the root FS with tags enabled. Although after Soren's
intial MFC it did panic it stopped soemtime later, I don't know exactly when
as I haven't had tiem to fiddle with tags again. I'm afraid I've just joined
up to current list and seem to have missed these hacks, if you can point the
ones relevant to -STABLE in my direction I'll give each a whirl and report
back.

Cheers
Andrew


- Original Message -
From: Terry Lambert [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]
Sent: Thursday, April 18, 2002 4:54 PM
Subject: Re: ATA errors on recent -current


Søren Schmidt wrote:
   Oh yes you can, the ATA driver does just that in case of the drive
   loosing its marbels.
 
  If it worked, people wouldn't be having this problem.

 Hmm, since I havn't been able to get my hands on the problem
 (I've been running 3 systems here with tags all over since the
 first report, not a single hickup yet :( ) I can't tell whats
 going on, it might be that the drive somehow gets really confused
 I dont know, for now those having tags problems should just
 not enable it...


I wish someone who is having the problem would try the three
hacks I suggested, and report back.  I personally can't reproduce
the problem here, either.


  What's your theory on it?

 None so far, I've instrumented the code here, and I simply cannot
 see what should go wrong (yet).

Heh.  My hardware works ... I instrument the code ... my hardware
still works.  8-) 8-).

I think that it's going to be up to the people who are complaining
to give feedback.


 BUT recent current with the busdma'd ATA driver screws up with
 tags, fix is coming as soon as I get a few hours to commit it...

Totally different problem, of course... they were complaining
about -stable vs. 4.5-release, too.  I agree that the additional
hardware support is more important.  8-(.

-- Terry

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: ATA errors on recent -current

2002-04-18 Thread Alexander Leidinger

On 18 Apr, Søren Schmidt wrote:

 What's your theory on it?
 
 None so far, I've instrumented the code here, and I simply cannot
 see what should go wrong (yet).

Does it make sense to give this instrumentation to someone who can
reproduce it?

Bye,
Alexander.

-- 
   It's not a bug, it's tradition!

http://www.Leidinger.net   Alexander @ Leidinger.net
  GPG fingerprint = C518 BC70 E67F 143F BE91  3365 79E2 9C60 B006 3FE7


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: ATA errors on recent -current

2002-04-18 Thread Søren Schmidt

It seems Alexander Leidinger wrote:
 On 18 Apr, Søren Schmidt wrote:
 
  What's your theory on it?
  
  None so far, I've instrumented the code here, and I simply cannot
  see what should go wrong (yet).
 
 Does it make sense to give this instrumentation to someone who can
 reproduce it?

Not directly since its tied in with special HW to look for interrupts etc,
a real hackers delight setup :)

Now I have this patch that fixes the mess from the busdma integration
that will go in later tonight when my test machine has finished its
current test round. 
When thats done I need -current users with tag problems to upgrade
and those with problems should mail me thers dmesg so I can try to
get a grasp on what HW fails exactly.
Semilar for -stable users, if you have problems with tags, mail me
your dmesg...

-Søren

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: ATA errors on recent -current

2002-04-18 Thread Matthias Schuendehuette

Am Donnerstag, 18. April 2002 17:54 schrieb Terry Lambert:


 I wish someone who is having the problem would try the three
 hacks I suggested, and report back.  I personally can't reproduce
 the problem here, either.

Ok, ok... ;-) I start *now*. I just compiled a new -current world 
(...puhh) and kernels are in place...

...'till later.
-- 
Ciao/BSD - Matthias

Matthias Schuendehuette [EMAIL PROTECTED], Berlin (Germany)
Powered by FreeBSD 4.5-STABLE

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: ATA errors on recent -current

2002-04-18 Thread Matthias Schuendehuette

Am Donnerstag, 18. April 2002 17:54 schrieb Terry Lambert:
 I wish someone who is having the problem would try the three
 hacks I suggested, and report back.  I personally can't reproduce
 the problem here, either.

So: I changed line 186 in sys/dev/ata/ata-disk.c from

adp-num_tags = atadev-param-queuelen;

to

adp-num_tags = 0x10;

which is roughly the half of the reported queuelenght (which is 0x1F).

And, Terry, I can't avoid to disappoint you... there's absolutely *no* 
change in the behaviour of the new kernel :-(

As I've reported earlier, the writecaching also makes no difference as 
does (not) changing the UDMA-speed (with 'atacontrol').

If you pretend on it, I'll change the DMA-speed with the IBM-tool, but 
I think we can do without it... (urghh, I would have to change to 
Windoze :-/ )

Sorry for the bad news, but... I think, we'll wait for Soren's commit 
tonight.
-- 
Ciao/BSD - Matthias

Matthias Schuendehuette [EMAIL PROTECTED], Berlin (Germany)
Powered by FreeBSD 4.5-STABLE

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: ATA errors on recent -current

2002-04-18 Thread Terry Lambert

Matthias Schuendehuette wrote:
 Am Donnerstag, 18. April 2002 17:54 schrieb Terry Lambert:
  I wish someone who is having the problem would try the three
  hacks I suggested, and report back.  I personally can't reproduce
  the problem here, either.
 
 So: I changed line 186 in sys/dev/ata/ata-disk.c from
 
 adp-num_tags = atadev-param-queuelen;
 
 to
 
 adp-num_tags = 0x10;
 
 which is roughly the half of the reported queuelenght (which is 0x1F).
 
 And, Terry, I can't avoid to disappoint you... there's absolutely *no*
 change in the behaviour of the new kernel :-(

Uh... the 16 you changed to 10 was decimal, so changining it
to 0x10 changes it to ... 16.

Rather than point out the hex/decimal confusion earlier, that's
why I said /2.


 Sorry for the bad news, but... I think, we'll wait for Soren's commit
 tonight.

Soren's commit is for a -current specific merge.  The problems
you are seeing supposedly are in RELENG_4, and will probably not
be effected... though the commit will provide much better
diagnostics than I've suggested.  8-).

-- Terry

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: ATA errors on recent -current

2002-04-17 Thread Alexander Leidinger

On 16 Apr, Matthias Schuendehuette wrote:

 Then I tried various combinations of UDMA100/66/33 and wc=0/1 - it 
 nearly doesn't change anything. If WC was enabled, I saw errors 
 concerning tags 0 *and* 1, whereas without write caching only tag=0 was 
 mentioned. I should say that my simple test was a 'tar cvf /dev/null 
 /usr/ports' with /usr/ports on an ATA-partition. Why *Write*Caching has 
 any influence here...???

If it wasn't read only: access time update.


 CPU: AMD Duron(tm) processor (801.82-MHz 686-class CPU)
 real memory  = 268369920 (262080K bytes)

Same here.

 pcib1: VIA 8363 (Apollo KT133) PCI-PCI (AGP) bridge \
   at device 1.0 on pci0

I've an KT133A.


 ...and 'atacontrol cap 0 0' says:

ATA channel 0, Master, device ad0:

ATA/ATAPI revision5
device model  IC35L060AVER07-0
firmware revision ER6OA44A
cylinders 16383
heads 16
sectors/track 63
lba supported 120103200 sectors
lba48 not supported 
dma supported
overlap not supported

Feature  Support  EnableValue   Vendor
write cacheyes  yes
read ahead yes  yes
dma queued yes  yes 31/1F
SMART  yes  yes
microcode download no   no
security   yes  yes
power management   yes  yes
advanced power management  yes  no  0/00
automatic acoustic management  yes  no  254/FE  128/80


And some general questions not related to the problem:
 - What's the security feature?
 - What does {,advanced} power management do?
 - Is there a way to modify the acoustic management setting?
 - We don't have SMART support, right?

Bye,
Alexander.

-- 
  Loose bits sink chips.

http://www.Leidinger.net   Alexander @ Leidinger.net
  GPG fingerprint = C518 BC70 E67F 143F BE91  3365 79E2 9C60 B006 3FE7


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: ATA errors on recent -current

2002-04-17 Thread Terry Lambert

Alexander Leidinger wrote:
 device model  IC35L060AVER07-0
**  **
These match the test in ad_tagsupported(); I have to wonder about:

 device model  IC35L060AVER07-0
  **

 firmware revision ER6OA44A

I also have to wonder about the firmware revision feature set;
it's probably not an issue.

-- Terry

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: ATA errors on recent -current

2002-04-17 Thread Alexander Leidinger

On 17 Apr, Terry Lambert wrote:

 device model  IC35L060AVER07-0
**  **
 These match the test in ad_tagsupported(); I have to wonder about:
 
 device model  IC35L060AVER07-0
  **

Can you be more specific?

 firmware revision ER6OA44A
 
 I also have to wonder about the firmware revision feature set;
 it's probably not an issue.

I don't knwo what you are trying to tell me.

Bye,
Alexander.

-- 
   One world, one web, one program  -- Microsoft promotional ad
 Ein Volk, ein Reich, ein Fuehrer  -- Adolf Hitler

http://www.Leidinger.net   Alexander @ Leidinger.net
  GPG fingerprint = C518 BC70 E67F 143F BE91  3365 79E2 9C60 B006 3FE7


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: ATA errors on recent -current

2002-04-17 Thread Matthias Schuendehuette

Hello,

Am Mittwoch, 17. April 2002 03:14 schrieben Sie:
 Matthias Schuendehuette wrote:
  I used 'atacontrol' to read the number of tags allowed: it is 31
  (0x1F). Perhaps Soren could tell me how to force it to, say, 0x10?

 You have to modify the source code in ~line 180 of
 /sys/dev/ata/ata-disk.c.

Well, thanks for the hint. I just have to wait until I get a new 
'current'-world... yesterday it didn't compile and because of an, say, 
'indisposition' of vinum (I changed another slice on a vinum-disk, so 
it dislikes the whole plex :-^ ) I lost my current /usr/obj...

  Then I tried various combinations of UDMA100/66/33 and wc=0/1 - it
  nearly doesn't change anything.

BTW: I switched UDMA speed using 'atacontrol'...

  After the first switch to PIO4, I umounted the filesystem and
  switched back to UDMA33 for instance - I couldn't even *mount* the
  filesystem again!
  [...]
 My hunch, which is why I suggested decreasing the number of
 tags seen by the driver, is that the tagged queues are over
 used, and this locks the disk up. [...]

Yes, I understand this (I for myself had already your 
'off-by-one'-suspicion - it's obvious if one sees the error message) 
and I'll test it ASAP.

What I was wondering yesterday before I fell asleep is that the disk is 
obviously not able to recover from this error - even if the error 
condition is no longer valid due to the switch to PIO-mode. *Any* 
DMA-mode is no longer useable.

I don't know if it's an attribute of these disks or an issue solvable 
by a/the driver. I would expect to be able to do a software reset of 
the drive like with SCSI, but I'm a bit biased against ATA (vs. SCSI) 
because of the opinion/argues of a very knowledgeable guy here in the 
german newsgroup (former core team member ;-), so I wouldn't be 
surprised if that's not possible or not specified.

-- 
Ciao/BSD - Matthias

Matthias Schuendehuette [EMAIL PROTECTED], Berlin (Germany)
Powered by FreeBSD 4.5-STABLE

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: ATA errors on recent -current

2002-04-17 Thread Terry Lambert

Matthias Schuendehuette wrote:
  My hunch, which is why I suggested decreasing the number of
  tags seen by the driver, is that the tagged queues are over
  used, and this locks the disk up. [...]
 
 Yes, I understand this (I for myself had already your
 'off-by-one'-suspicion - it's obvious if one sees the error message)
 and I'll test it ASAP.
 
 What I was wondering yesterday before I fell asleep is that the disk is
 obviously not able to recover from this error - even if the error
 condition is no longer valid due to the switch to PIO-mode. *Any*
 DMA-mode is no longer useable.

My other hunch is that there will need to be a channel reserved
for reset commands to be queued to the disk, so that you can
queue more commands to it later (e.g. can't connect to send the
reset because of the already disconnected commands in progress).

This is what I was implying when I said that it involved error
handling with the decoupled operations, which John Baldwin took
exception to the idea (It is still my hunch...).  I think that
control channel commands, which aren't data commands, need to be
explicitly serialized (maybe) on a reserved channel, to avoid the
problem.  This takes 1/N tags out of service, but guarantees that
you can reset the disk drive or whatever.


 I don't know if it's an attribute of these disks or an issue solvable
 by a/the driver. I would expect to be able to do a software reset of
 the drive like with SCSI, but I'm a bit biased against ATA (vs. SCSI)
 because of the opinion/argues of a very knowledgeable guy here in the
 german newsgroup (former core team member ;-), so I wouldn't be
 surprised if that's not possible or not specified.

I'm personally very biased against ATA for most production use;
assuming you know your application, though, and there's not a
huge concurrent access requirement, then ATA is OK (I guess),
if you can live with the electrical limitations.

Manually hacking the drive probe/attach to halve the number of
tag queues that get used based on the reported values seems like
a very quick way to validate whether it's command queue overflow,
or an intrinsic problem with the drive, that's hanging you up.

-- Terry

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: ATA errors on recent -current

2002-04-16 Thread msch

 [...]
 Since you have one of these beasts, could you maybe try changing
 the number of tagged command queue entries you permit to be used
 at one time?

Of course, I'll do it as soon as...

1) I'm at home again... ;-)
2) Someone tells me how to achive that. I looked at 'man 8 atacontrol'
   as well as 'man 4 ata', but I can't find anything that lets me set
   the queue depth nor inquire the advertised queue length...

 [...]
 As I said: it could be drive settings unrelated to the code
 itself being correct.  I've given three suggestions to verify
 this, one way or the other:
 
 1)Control the drive DMA speed down

I *did* test with UDMA66 instead of UDMA100 and it was even worse...
With UDMA100, the system switched back to PIO4 - with UDMA66 there was a system
freeze after the second (well known) error message... :-(

But I admit, this test was done some days ago, I'll try it again this evening
(approx. 19:00 UTC)...

 2)Pretend the maximum tagged command queue depth is
   smaller than it is

How to?

 3)Toggle the write caching on the drive

OK - I'm running all my disks without write cache, but I'll check this too.

 Until you try all three of these and report back, you can't say
 that the problem is Soren's.

This is a real misunderstanding! I thought I stated clearly enough that I
don't want to blame Soren for this obviously highly complex issue!
Shit happens - the only ensurance against that is to stay in bed (alone! :-)

Ciao/BSD -
Matthias


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: ATA errors on recent -current

2002-04-16 Thread Matthias Schuendehuette

Hi Terry and you all,

On Tuesday, 16. April 2002 01:48 you wrote:

 [...]
 As I said: it could be drive settings unrelated to the code
 itself being correct.  I've given three suggestions to verify
 this, one way or the other:

 1)Control the drive DMA speed down

 2)Pretend the maximum tagged command queue depth is
   smaller than it is

 3)Toggle the write caching on the drive

I used 'atacontrol' to read the number of tags allowed: it is 31 
(0x1F). Perhaps Soren could tell me how to force it to, say, 0x10?

Then I tried various combinations of UDMA100/66/33 and wc=0/1 - it 
nearly doesn't change anything. If WC was enabled, I saw errors 
concerning tags 0 *and* 1, whereas without write caching only tag=0 was 
mentioned. I should say that my simple test was a 'tar cvf /dev/null 
/usr/ports' with /usr/ports on an ATA-partition. Why *Write*Caching has 
any influence here...???

What was consistent thru all test was, that the disk operates quite 
some time until the error occures the first time. After that, it is not 
possible to access the disk in UDMA-Mode any more, regardeless *which* 
UDMA-Mode it is. 'Quite some time' means approx. 50% of /usr/ports in 
the above mentioned 'test'.

After the first switch to PIO4, I umounted the filesystem and switched 
back to UDMA33 for instance - I couldn't even *mount* the filesystem 
again!

But w/o Tagged Queuing the disk operates flawlessly, so I'm a bit in 
doubt, if the errors with WD-disks have the same source... but may be.

So far - but still some data:

CPU: AMD Duron(tm) processor (801.82-MHz 686-class CPU)
real memory  = 268369920 (262080K bytes)
pcib1: VIA 8363 (Apollo KT133) PCI-PCI (AGP) bridge \
at device 1.0 on pci0
/* It's an EPoX 8KTA2 MoBo */
atapci0: VIA 82C686 ATA100 controller port 0xd000-0xd00f \
at device 7.1 on pci0
atapci0: Correcting VIA config for southbridge data corruption bug
ad0: 43979MB IBM-DTLA-307045 [89355/16/63] at ata0-master UDMA100


...and 'atacontrol cap 0 0' says:

ATA channel 0, Master, device ad0:

ATA/ATAPI revision5
device model  IBM-DTLA-307045
firmware revision TX6OA50C
cylinders 16383
heads 16
sectors/track 63
lba supported 90069840 sectors
lba48 not supported
dma supported
overlap not supported

Feature  Support  EnableValue   Vendor
write cacheyes  no
read ahead yes  yes
dma queued yes  yes 31/1F
SMART  yes  no
microcode download no   no
security   yes  no
power management   yes  yes
advanced power management  yes  no  0/00
automatic acoustic management  yes  no  254/FE  128/80


That's it.

-- 
Ciao/BSD - Matthias

Matthias Schuendehuette [EMAIL PROTECTED], Berlin (Germany)
Powered by FreeBSD 4.5-STABLE

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: ATA errors on recent -current

2002-04-16 Thread Terry Lambert

[EMAIL PROTECTED] wrote:
  As I said: it could be drive settings unrelated to the code
  itself being correct.  I've given three suggestions to verify
  this, one way or the other:
 
  1)Control the drive DMA speed down
 
 I *did* test with UDMA66 instead of UDMA100 and it was even worse...
 With UDMA100, the system switched back to PIO4 - with UDMA66 there was a system
 freeze after the second (well known) error message... :-(
 
 But I admit, this test was done some days ago, I'll try it again this evening
 (approx. 19:00 UTC)...

Was this with the atacontrol, or was it with the manufacturer
supplied utility, ibmatarw.exe or ibmata66.exe?


  2)Pretend the maximum tagged command queue depth is
smaller than it is
 
 How to?

Modify the source code and compile a new kernel.  The code has
changed, but, ~line 180 of /sys/dev/ata/ata-disk.c:

/* use tagged queueing if allowed and supported */
if (ata_tags  ad_tagsupported(adp)) {   
adp-num_tags = AD_PARAM-queuelen;
adp-flags |= AD_F_TAG_ENABLED;
adp-controller-flags |= ATA_QUEUED;

Change:
adp-num_tags = AD_PARAM-queuelen;

to:
adp-num_tags = AD_PARAM-queuelen / 2;

Or just set it to some known value less than the number supported
by your drive, as indicated by the proble message.

Note that there might have been changes to the ad_tagsupported(adp)
function, in the same file.  If so, it may be showing a false
positive for your drive.  Here is the old code:

static int
ad_tagsupported(struct ad_softc *adp)
{
const char *drives[] = {IBM-DPTA, IBM-DTLA, NULL};
int i = 0;

switch (adp-controller-chiptype) {
case 0x4d33105a: /* Promises before TX2 doesn't work with tagged queuing */
case 0x4d38105a:
case 0x0d30105a:
case 0x4d30105a:
return 0;
}

/* check that drive does DMA, has tags enabled, and is one we know works */
if (adp-controller-mode[ATA_DEV(adp-unit)] = ATA_DMA  
AD_PARAM-support.queued  AD_PARAM-enabled.queued) {
while (drives[i] != NULL) {
if (!strncmp(AD_PARAM-model, drives[i], strlen(drives[i])))   
return 1;
i++;
}
/*
 * check IBM's new obscure way of naming drives
 * we want IC (IBM CORP) and AT or AV (ATA interface)
 * but doesn't care about the other info (size, capacity etc)
 */
if (!strncmp(AD_PARAM-model, IC, 2) 
(!strncmp(AD_PARAM-model + 8, AT, 2) ||
 !strncmp(AD_PARAM-model + 8, AV, 2)))
return 1;
}
return 0;
}


  3)Toggle the write caching on the drive
 
 OK - I'm running all my disks without write cache, but I'll check this too.


Turning write caching off makes the drive work harder, as well as
making it more reliable (just like real life: the more reliable,
the harder it works 8-)).  Write caching avoids some additional
work that might otherwise slow the drive electronics.


  Until you try all three of these and report back, you can't say
  that the problem is Soren's.
 
 This is a real misunderstanding! I thought I stated clearly enough that I
 don't want to blame Soren for this obviously highly complex issue!
 Shit happens - the only ensurance against that is to stay in bed (alone! :-)

No problem.  I just wanted it made clear.  The circumstances were a
before it didn't happen/after it does happen, so it loked like
there was blame being tossed.

-- Terry

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: ATA errors on recent -current

2002-04-16 Thread Terry Lambert

Matthias Schuendehuette wrote:
 On Tuesday, 16. April 2002 01:48 you wrote:
  [...]
  As I said: it could be drive settings unrelated to the code
  itself being correct.  I've given three suggestions to verify
  this, one way or the other:
 
  1)Control the drive DMA speed down
 
  2)Pretend the maximum tagged command queue depth is
smaller than it is
 
  3)Toggle the write caching on the drive
 
 I used 'atacontrol' to read the number of tags allowed: it is 31
 (0x1F). Perhaps Soren could tell me how to force it to, say, 0x10?

You have to modify the source code in ~line 180 of /sys/dev/ata/ata-disk.c.


 Then I tried various combinations of UDMA100/66/33 and wc=0/1 - it
 nearly doesn't change anything. If WC was enabled, I saw errors
 concerning tags 0 *and* 1, whereas without write caching only tag=0 was
 mentioned. I should say that my simple test was a 'tar cvf /dev/null
 /usr/ports' with /usr/ports on an ATA-partition. Why *Write*Caching has
 any influence here...???

I rather expected you to have *more* problems with write caching
than without, not the other way around.  I can't explain this.


 What was consistent thru all test was, that the disk operates quite
 some time until the error occures the first time. After that, it is not
 possible to access the disk in UDMA-Mode any more, regardeless *which*
 UDMA-Mode it is. 'Quite some time' means approx. 50% of /usr/ports in
 the above mentioned 'test'.
 
 After the first switch to PIO4, I umounted the filesystem and switched
 back to UDMA33 for instance - I couldn't even *mount* the filesystem
 again!
 
 But w/o Tagged Queuing the disk operates flawlessly, so I'm a bit in
 doubt, if the errors with WD-disks have the same source... but may be.

My hunch, which is why I suggested decreasing the number of
tags seen by the driver, is that the tagged queues are over
used, and this locks the disk up.  My best guess is an off-by-one
or an exceptional condition handler that was not an issue until
recently, because of a FreeBSD interrupt architecture change
having nothing to do with the driver itself (i.e. the reason it
only happens under load, and didn't happen under the same load,
before).

-- Terry

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: ATA errors on recent -current

2002-04-16 Thread John Baldwin


On 17-Apr-2002 Terry Lambert wrote:
 What was consistent thru all test was, that the disk operates quite
 some time until the error occures the first time. After that, it is not
 possible to access the disk in UDMA-Mode any more, regardeless *which*
 UDMA-Mode it is. 'Quite some time' means approx. 50% of /usr/ports in
 the above mentioned 'test'.
 
 After the first switch to PIO4, I umounted the filesystem and switched
 back to UDMA33 for instance - I couldn't even *mount* the filesystem
 again!
 
 But w/o Tagged Queuing the disk operates flawlessly, so I'm a bit in
 doubt, if the errors with WD-disks have the same source... but may be.
 
 My hunch, which is why I suggested decreasing the number of
 tags seen by the driver, is that the tagged queues are over
 used, and this locks the disk up.  My best guess is an off-by-one
 or an exceptional condition handler that was not an issue until
 recently, because of a FreeBSD interrupt architecture change
 having nothing to do with the driver itself (i.e. the reason it
 only happens under load, and didn't happen under the same load,
 before).

Terry, we've had threaded interrupt handlers for over a year and a half
now.  If the had really broken things in this basic a fashion we wouldn't
have made it this far with running systems.  Your hypothesis about
something busted in the tagged queueing code seems sound but blaiming
this on interrupt threads doesn't make much sense to me.

-- 

John Baldwin [EMAIL PROTECTED]http://www.FreeBSD.org/~jhb/
Power Users Use the Power to Serve!  -  http://www.FreeBSD.org/

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: ATA errors on recent -current

2002-04-16 Thread Terry Lambert

John Baldwin wrote:
  My hunch, which is why I suggested decreasing the number of
  tags seen by the driver, is that the tagged queues are over
  used, and this locks the disk up.  My best guess is an off-by-one
  or an exceptional condition handler that was not an issue until
  recently, because of a FreeBSD interrupt architecture change
  having nothing to do with the driver itself (i.e. the reason it
  only happens under load, and didn't happen under the same load,
  before).
 
 Terry, we've had threaded interrupt handlers for over a year and a half
 now.  If the had really broken things in this basic a fashion we wouldn't
 have made it this far with running systems.  Your hypothesis about
 something busted in the tagged queueing code seems sound but blaiming
 this on interrupt threads doesn't make much sense to me.

The problems don't show up, except under extreme loads, with
particular drives.

Therefore, it is still my hunch.  ;^).

Dropping the queue depth to 8 from 16 to attempt to verify my
hunch won't hurt anything, and may find the problem.  It could
still be an off-by-one error in Soren's code, as well (but I
don't think it is).

-- Terry

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: ATA errors on recent -current

2002-04-15 Thread Terry Lambert

Giorgos Keramidas wrote:
   ad0: READ command timeout tag=1 serv=1 - resetting
   ata0: resetting devices .. ad0: invalidating queued requests
   done
 
  Turn off tagged queing. S?ren knows about this error and tries to
  reproduce it (but fails as far as I know).
 
 I've seen this quite a few times, but I can't reliably reproduce it
 yet.  It seems to hit me a lot when the ad0 drive spins like crazy
 doing stuff that is heavy on disk I/O.  Disabling tag queueing now to
 see if this fixes things.  But even if it does, I think I should
 enable it again and help S?ren track this down, if I can.

Is your drive perchance an IBM DTLA?

It's known to have these problems.

-- Terry

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: ATA errors on recent -current

2002-04-15 Thread Søren Schmidt

It seems Terry Lambert wrote:
 Giorgos Keramidas wrote:
ad0: READ command timeout tag=1 serv=1 - resetting
ata0: resetting devices .. ad0: invalidating queued requests
done
  
   Turn off tagged queing. S?ren knows about this error and tries to
   reproduce it (but fails as far as I know).
  
  I've seen this quite a few times, but I can't reliably reproduce it
  yet.  It seems to hit me a lot when the ad0 drive spins like crazy
  doing stuff that is heavy on disk I/O.  Disabling tag queueing now to
  see if this fixes things.  But even if it does, I think I should
  enable it again and help S?ren track this down, if I can.
 
 Is your drive perchance an IBM DTLA?
 
 It's known to have these problems.

Cool! would you like to share where that information is available so
I can possibly work around the problem ??

-Søren

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: ATA errors on recent -current

2002-04-15 Thread Alexander Leidinger

On 15 Apr, Giorgos Keramidas wrote:

  I updated to -current today and am now getting these errors
 
  ad0: READ command timeout tag=1 serv=1 - resetting
  ata0: resetting devices .. ad0: invalidating queued requests
  done

 Turn off tagged queing. S?ren knows about this error and tries to
 reproduce it (but fails as far as I know).
 
 I've seen this quite a few times, but I can't reliably reproduce it
 yet.  It seems to hit me a lot when the ad0 drive spins like crazy
 doing stuff that is heavy on disk I/O.  Disabling tag queueing now to
 see if this fixes things.  But even if it does, I think I should
 enable it again and help S?ren track this down, if I can.

There are a lot of people which want to help him...

First I got it only once (as you in a heavy disk I/O situation). After
another new world I got it at every boot...

Some people see this after the mega MFC on -stable too.

Bye,
Alexander.

-- 
Give a man a fish and you feed him for a day;
 teach him to use the Net and he won't bother you for weeks.

http://www.Leidinger.net   Alexander @ Leidinger.net
  GPG fingerprint = C518 BC70 E67F 143F BE91  3365 79E2 9C60 B006 3FE7


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: ATA errors on recent -current

2002-04-15 Thread Søren Schmidt

It seems Alexander Leidinger wrote:
  I've seen this quite a few times, but I can't reliably reproduce it
  yet.  It seems to hit me a lot when the ad0 drive spins like crazy
  doing stuff that is heavy on disk I/O.  Disabling tag queueing now to
  see if this fixes things.  But even if it does, I think I should
  enable it again and help S?ren track this down, if I can.
 
 There are a lot of people which want to help him...
 
 First I got it only once (as you in a heavy disk I/O situation). After
 another new world I got it at every boot...
 
 Some people see this after the mega MFC on -stable too.

Could I have you guys try this simple patch ? 

Index: ata-all.c
===
RCS file: /home/ncvs/src/sys/dev/ata/ata-all.c,v
retrieving revision 1.149
diff -u -r1.149 ata-all.c
--- ata-all.c   10 Apr 2002 11:18:07 -  1.149
+++ ata-all.c   15 Apr 2002 08:05:49 -
@@ -1009,13 +1009,12 @@
   rman_get_start(atadev-channel-r_io), 
   command, lba, count, feature, flags);
 #endif
-
-/* select device */
-ATA_OUTB(atadev-channel-r_io, ATA_DRIVE, ATA_D_IBM | atadev-unit);
-
 /* disable interrupt from device */
 if (atadev-channel-flags  ATA_QUEUED)
ATA_OUTB(atadev-channel-r_altio, ATA_ALTSTAT, ATA_A_IDS | ATA_A_4BIT);
+
+/* select device */
+ATA_OUTB(atadev-channel-r_io, ATA_DRIVE, ATA_D_IBM | atadev-unit);
 
 /* ready to issue command ? */
 if (ata_wait(atadev, 0)  0) { 

-Søren

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: ATA errors on recent -current

2002-04-15 Thread Alexander Leidinger

On 14 Apr, Terry Lambert wrote:

  Turn off tagged queing. S?ren knows about this error and tries to
  reproduce it (but fails as far as I know).
 
 I've seen this quite a few times, but I can't reliably reproduce it
 yet.  It seems to hit me a lot when the ad0 drive spins like crazy
 doing stuff that is heavy on disk I/O.  Disabling tag queueing now to
 see if this fixes things.  But even if it does, I think I should
 enable it again and help S?ren track this down, if I can.
 
 Is your drive perchance an IBM DTLA?
 
 It's known to have these problems.

Does this also apply to other IBM drives?

(7) root@ttyp2 # dmesg |grep ata
Preloaded elf module /boot/kernel/accf_data.ko at 0xc04cbed8.
atapci0: VIA 82C686 ATA100 controller port 0xd000-0xd00f at device 7.1 on pci0
atapci0: Correcting VIA config for southbridge data corruption bug
ata0: at 0x1f0 irq 14 on atapci0
ata1: at 0x170 irq 15 on atapci0
ad0: 58644MB IC35L060AVER07-0 [119150/16/63] at ata0-master UDMA100
afd0: 96MB IOMEGA ZIP 100 ATAPI [32/64/96] at ata1-master PIO0

Bye,
Alexander.

-- 
   Press every key to continue.

http://www.Leidinger.net   Alexander @ Leidinger.net
  GPG fingerprint = C518 BC70 E67F 143F BE91  3365 79E2 9C60 B006 3FE7


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: ATA errors on recent -current

2002-04-15 Thread Alexander Leidinger

On 15 Apr, =?x-unknown?Q?S=F8ren?= Schmidt wrote:

 Some people see this after the mega MFC on -stable too.
 
 Could I have you guys try this simple patch ? 

It failed to apply, applied it by hand. Compiling a new kernel now.

Bye,
Alexander. 

-- 
Where do you think you're going today?

http://www.Leidinger.net   Alexander @ Leidinger.net
  GPG fingerprint = C518 BC70 E67F 143F BE91  3365 79E2 9C60 B006 3FE7


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: ATA errors on recent -current

2002-04-15 Thread Søren Schmidt

It seems Alexander Leidinger wrote:
 On 15 Apr, Søren Schmidt wrote:
 
  Some people see this after the mega MFC on -stable too.
  
  Could I have you guys try this simple patch ? 
 
 Does not work.

As in:

No change or breaks completely (if so how)...

-Søren

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: ATA errors on recent -current

2002-04-15 Thread Alexander Leidinger

On 15 Apr, =?x-unknown?Q?S=F8ren?= Schmidt wrote:

  Some people see this after the mega MFC on -stable too.
  
  Could I have you guys try this simple patch ? 
 
 Does not work.
 
 As in:
 
 No change or breaks completely (if so how)...

Sorry: No change.

Bye,
Alexander.

-- 
  The best things in life are free, but the
expensive ones are still worth a look.

http://www.Leidinger.net   Alexander @ Leidinger.net
  GPG fingerprint = C518 BC70 E67F 143F BE91  3365 79E2 9C60 B006 3FE7


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: ATA errors on recent -current

2002-04-15 Thread Terry Lambert

Alexander Leidinger wrote:
 On 14 Apr, Terry Lambert wrote:
  Is your drive perchance an IBM DTLA?
 
  It's known to have these problems.
 
 Does this also apply to other IBM drives?

Potentially.  IBM renamed the part number when the drives got
known to be dogs.  I thought they also defaulted the firmware
to get around the problem.

You would have to check the full threads complaining about the
DTLA parts to be certain; I didn't follow the problem closely
enough except to recommend using the outer cylinders only for
the FS and OS data for an embedded system I worked on at one
time (no, not the InterJet).

-- Terry

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: ATA errors on recent -current

2002-04-15 Thread Terry Lambert

Alexander Leidinger wrote:
   Some people see this after the mega MFC on -stable too.
   Could I have you guys try this simple patch ?
  Does not work.
  No change or breaks completely (if so how)...
 Sorry: No change.

Download the Windows executable I pointed to in a previous posting.
Run it.  It will create a floppy disk.

Using the floppy, set the DMA transfer rate slower on the drive.

STANDARD WARNINGS APPLY!  MAY HOSE YOUR DISK FIRMWARE IRRETRIEVABLY
IF NOT USED CORRECTLY!  FOLLOW THE IBM SUPPLIED INSTRUCTIONS TO THE
LETTER!  I AM NOT RESPONSIBLE FOR WHA?T HAPPENS TO YOUR DISK IF YOU
USE THE TOOL!

-- Terry

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: ATA errors on recent -current

2002-04-15 Thread Søren Schmidt

It seems Terry Lambert wrote:
 Søren Schmidt wrote:
   Is your drive perchance an IBM DTLA?
  
   It's known to have these problems.
  
  Cool! would you like to share where that information is available so
  I can possibly work around the problem ??
 
 IBM DTLA drives are known to be problematic.  If you use that
 in a search engine, it will find numerous references to the
 drive electronics being too slow for sustained access to the
 sectors closes to the spindle.

This thread is about tagged queueing problems on IBM drives since they 
are the only ones that supports it, it is not specific to the DTLA
series at all, which this thread has already explained.
So Terry, do you have anything to share, or just noise like this ?

(I dont care about if the DTLA may have other problems)

-Søren

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: ATA errors on recent -current

2002-04-15 Thread Terry Lambert

Søren Schmidt wrote:
  Is your drive perchance an IBM DTLA?
 
  It's known to have these problems.
 
 Cool! would you like to share where that information is available so
 I can possibly work around the problem ??

IBM DTLA drives are known to be problematic.  If you use that
in a search engine, it will find numerous references to the
drive electronics being too slow for sustained access to the
sectors closes to the spindle.

The place I first read about the problem was Tom's hardware.

The generally accepted fix is don't use the cylinders nearest
the spindle and/or replace the drive.

Here's the most complete (if biased) web reference I found:

http://www.goldengate.net/~dlpeters/IBMSucks/

This has been discussed on the FreeBSD lists before... the
supposed problem cited was the electronics could not keep up
with the data rate on the interior cylinders.

Supposely, one of the utilities on this page:

  http://www.axiontech.com/cgi-local/download.asp?product=hdibmutilhard_drives

can work around the problem (depending on the drive you have)
by changing some firmware settings on the drive.  If you are
interested, get them while you can: this is a mirror of an IBM
set of downloads which are no longer available from the IBM
URLs where they used to be located.

The workaround works by setting a lower DMA transfer rate.

Here is a PDF about the drives, which mentions the tool and
its use:

http://vendors.asbis.com/download/D60GXP_ht20.pdf

Again, this is mirrored from an IBM URL that is no longer
available, so if you intend to grab it, grab it now.

-- Terry

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: ATA errors on recent -current

2002-04-15 Thread Søren Schmidt

It seems Terry Lambert wrote:
   IBM DTLA drives are known to be problematic.  If you use that
   in a search engine, it will find numerous references to the
   drive electronics being too slow for sustained access to the
   sectors closes to the spindle.
  
  This thread is about tagged queueing problems on IBM drives since they
  are the only ones that supports it, it is not specific to the DTLA
  series at all, which this thread has already explained.
  So Terry, do you have anything to share, or just noise like this ?
  
  (I dont care about if the DTLA may have other problems)
 
 Sorry; all I can give you is hear-say, which I guess you could
 consider to be noise, except we confirmed that the problem disk
 in this case was an IBM drive, which tends to support the theory.

Indeed, the problem at hand here show up on *any* tagged queueing
capable drive, it is not specific to a certain model.

 For a more scientific test, downloading the firmware tool and
 setting the DMA transfer rate down, and checking for problems,
 would be pretty overwhelming evidence.  Personally, I don't have
 any of the buggers lying around to test with any more.

Why on earth would you do that ? (hint man atacontrol)

Besides I dont see this as any evidence at all, but thats another matter...

-Søren

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: ATA errors on recent -current

2002-04-15 Thread Terry Lambert

Søren Schmidt wrote:
  For a more scientific test, downloading the firmware tool and
  setting the DMA transfer rate down, and checking for problems,
  would be pretty overwhelming evidence.  Personally, I don't have
  any of the buggers lying around to test with any more.
 
 Why on earth would you do that ? (hint man atacontrol)
 
 Besides I dont see this as any evidence at all, but thats another matter...

If it fixes the problem, then the problem is most likely related
to what firmware setting the tool changes.

8-).

From my reading of the FreeBSD man pages, it can't blow the flash
byte that controls the DMA speed, like the IBM provided utility
does.

Obviously, turning off tagged commands works, according to at least
one person who is reporting the problem.

I wonder if limiting outstanding tagged commands to less than the
number advertised by the drive would also work... can't be worse
than the initialization reordering patch that failed (e.g the
worst case is it still has the problems).  A lot safer than banging
bits in the firmware, I'm sure, though...

Limiting the outstanding tagged commands to less than the advertised
amount would actually be my first choice of a hack for a software
workaround.

Can you do that with sysctl hw.ata.tags=XXX?  Or is that just a 1/0
thing?  A scan doesn't indicate documentation, but I'm probably just
not looking very hard...

-- Terry

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: ATA errors on recent -current

2002-04-15 Thread Søren Schmidt

It seems Terry Lambert wrote:
 Søren Schmidt wrote:
   For a more scientific test, downloading the firmware tool and
   setting the DMA transfer rate down, and checking for problems,
   would be pretty overwhelming evidence.  Personally, I don't have
   any of the buggers lying around to test with any more.
  
  Why on earth would you do that ? (hint man atacontrol)
  
  Besides I dont see this as any evidence at all, but thats another matter...
 
 If it fixes the problem, then the problem is most likely related
 to what firmware setting the tool changes.

AFAIK it only set the maximum DMA speed the drive will allow, that 
you can do with atacontrol as well...

 Obviously, turning off tagged commands works, according to at least
 one person who is reporting the problem.

Again that has *nothing* to do with the DTLA drives and DMA speed
and the phase of the moon...
But it shows (as we already know) that using tags on any drive
that supports it, can fail on some systems.

 I wonder if limiting outstanding tagged commands to less than the
 number advertised by the drive would also work... can't be worse
 than the initialization reordering patch that failed (e.g the
 worst case is it still has the problems).  A lot safer than banging
 bits in the firmware, I'm sure, though...
 
 Limiting the outstanding tagged commands to less than the advertised
 amount would actually be my first choice of a hack for a software
 workaround.

Thats not the problem either, the problem is that I apparently
changed some subtle bits that make it fail on some systems, regardless
of controller and disk type, but which is marginal enough that I
cant reproduce the problem here in the lab...

-Søren

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: ATA errors on recent -current

2002-04-15 Thread Alexander Leidinger

On 15 Apr, Terry Lambert wrote:

 Obviously, turning off tagged commands works, according to at least
 one person who is reporting the problem.

It helps every one I know of.

[...]
 Limiting the outstanding tagged commands to less than the advertised
 amount would actually be my first choice of a hack for a software
 workaround.
 
 Can you do that with sysctl hw.ata.tags=XXX?  Or is that just a 1/0
 thing?  A scan doesn't indicate documentation, but I'm probably just
 not looking very hard...

It's only a 1/0 thing (at least at the moment).

Bye,
Alexander.

-- 
  To boldly go where I surely don't belong.

http://www.Leidinger.net   Alexander @ Leidinger.net
  GPG fingerprint = C518 BC70 E67F 143F BE91  3365 79E2 9C60 B006 3FE7


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: ATA errors on recent -current

2002-04-15 Thread Alexander Leidinger

On 15 Apr, Søren Schmidt wrote:

 Again that has *nothing* to do with the DTLA drives and DMA speed
 and the phase of the moon...

But perhaps it depends on the distance between the drive and the
nordpole... the ones with the problems are all more far away from it
than you... ;-)

 But it shows (as we already know) that using tags on any drive
 that supports it, can fail on some systems.

More strangely: it worked for me a lot longer than for other people. The
first kernel which showed the problem to me was a Apr 8(?) kernel,
Martin Schündehütte complained already with a Mar xx (xx  28) kernel in
de.comp.os.unix.bsd.

 I wonder if limiting outstanding tagged commands to less than the
 number advertised by the drive would also work... can't be worse
 than the initialization reordering patch that failed (e.g the
 worst case is it still has the problems).  A lot safer than banging
 bits in the firmware, I'm sure, though...
 
 Limiting the outstanding tagged commands to less than the advertised
 amount would actually be my first choice of a hack for a software
 workaround.
 
 Thats not the problem either, the problem is that I apparently
 changed some subtle bits that make it fail on some systems, regardless
 of controller and disk type, but which is marginal enough that I
 cant reproduce the problem here in the lab...

What about Brian's offer to give you access to his machine? Isn't this
enough in this case to play a little bit?

Bye,
Alexander.

-- 
  The best things in life are free, but the
expensive ones are still worth a look.

http://www.Leidinger.net   Alexander @ Leidinger.net
  GPG fingerprint = C518 BC70 E67F 143F BE91  3365 79E2 9C60 B006 3FE7


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: ATA errors on recent -current

2002-04-15 Thread Alexander Leidinger

On 15 Apr, Giorgos Keramidas wrote:

 Is your drive perchance an IBM DTLA?
 It's known to have these problems.
 
 Nay.  A Western Digital disk I bought about 2.5 years ago.

And it does tagged queing? I thought IBM is the only manufacturer of
such IDE drives...

Bye,
Alexander.

-- 
The dark ages were caused by the Y1K problem.

http://www.Leidinger.net   Alexander @ Leidinger.net
  GPG fingerprint = C518 BC70 E67F 143F BE91  3365 79E2 9C60 B006 3FE7


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: ATA errors on recent -current

2002-04-15 Thread Søren Schmidt

It seems Giorgos Keramidas wrote:
 On 2002-04-14 23:46, Terry Lambert wrote:
  Is your drive perchance an IBM DTLA?
  It's known to have these problems.
 
 Nay.  A Western Digital disk I bought about 2.5 years ago.

Hmm, AFAIK WD newer had a disk that worked right with tags,
and I've newer been able to find a workaround on those I 
have here in the lab

-Søren

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: ATA errors on recent -current

2002-04-15 Thread Giorgos Keramidas

On 2002-04-15 15:56, Sren Schmidt wrote:
 It seems Giorgos Keramidas wrote:
  On 2002-04-14 23:46, Terry Lambert wrote:
   Is your drive perchance an IBM DTLA?
   It's known to have these problems.
 
  Nay.  A Western Digital disk I bought about 2.5 years ago.

 Hmm, AFAIK WD newer had a disk that worked right with tags,
 and I've newer been able to find a workaround on those I
 have here in the lab

It doesn't.  You're right.  I had posted that message before checking
with `atacontrol cap'.  My problems with the disk are obviously caused
by something else that's broken in my local setup.  Sorry for jumping
up and making noise :)  The console messages I'm getting were similar:

| Apr 12 00:09:27 hades kernel: ad0: READ command timeout tag=0 serv=0 - resetting
| Apr 12 00:09:28 hades kernel: ata0: resetting devices .. ata0-slave: ATA identify 
|retries exceeded
| Apr 12 00:09:28 hades kernel: done

This is caused by something else, as I've found out later.  Tags have
nothing to do with what I'm seeing.  Before saying hey, this is a
bug I want to do further tests to make sure that it's not the
hardware's fault.

Giorgos.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: ATA errors on recent -current

2002-04-15 Thread Matthias Schuendehuette

I'm very sorry if I will be a bit unpolite, but I have to mail the 
following statement concerning the DTLA-Disks and FreeBSD:

It may be all true and horrible, but -

I still have an old FreeBSD Test-Installation (45GB are big enough :-) 
with a 4.4-STABLE as of Okt 23, 2001...

It boots off the DTLA, uses tagged-queuing and connects using UDMA100...
... and doesn't have any problems!!

So, to bring some of you down to earth again, the DTLA may be a 
horrible disk and I'm one of the last to praise ATA at all (My machine 
has two SCSI host adaptors, five SCSI-Disks and several other SCSI 
Devices), but it once worked!

I really, really don't want to blame Søren, he's doing a great job and 
everybody, who makes something makes occasionally some errors, but (at 
least for me) it doesn't seem to be a fundamental technical problem, 
because *it once worked* - sorry, but it's true.

And maybe it isn't related to tagged queuing and the DTLA at all - if I 
correctly understand Giorgos' mail...
-- 
Ciao/BSD - Matthias

Matthias Schuendehuette [EMAIL PROTECTED], Berlin (Germany)
Powered by FreeBSD 4.5-STABLE

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: ATA errors on recent -current

2002-04-15 Thread Terry Lambert

Matthias Schuendehuette wrote:
 I still have an old FreeBSD Test-Installation (45GB are big enough :-)
 with a 4.4-STABLE as of Okt 23, 2001...
 
 It boots off the DTLA, uses tagged-queuing and connects using UDMA100...
 ... and doesn't have any problems!!
 
 So, to bring some of you down to earth again, the DTLA may be a
 horrible disk and I'm one of the last to praise ATA at all (My machine
 has two SCSI host adaptors, five SCSI-Disks and several other SCSI
 Devices), but it once worked!

I think we all already agree, though, that the tagged command
queuing problem comes from a code change.  That doesn't identify
it very closely (or you would have included a patch ;^)).


It may be that the OS is slower in older revisions (one would
hope that was the case), and that now the code is faster, it's
too fast for the hardware.

It may also be that the switches between write caching on/off by
default in various versions have remove stall points in the write
code path which would have otherwise protected the drive from
being overwhelmed by the host OS.

There are a lot of possibilities for timing problems having been
introduced, that don't require that Soren's code be wrong, and
that it's impossible to blame the problem on the hardware.


On the theory that it is an off-by-one error, introduced either
by increased concurrency in an error path, or a direct off-by-one,
I've suggested dropping the effective number of tagged commands
supported by the drive.

That way, if you exceed this number for whatever coding error
reason, you won't exceed the capicty of the drive.

Since you have one of these beasts, could you maybe try changing
the number of tagged command queue entries you permit to be used
at one time?


 I really, really don't want to blame Søren, he's doing a great job and
 everybody, who makes something makes occasionally some errors, but (at
 least for me) it doesn't seem to be a fundamental technical problem,
 because *it once worked* - sorry, but it's true.
 
 And maybe it isn't related to tagged queuing and the DTLA at all - if I
 correctly understand Giorgos' mail...

As I said: it could be drive settings unrelated to the code
itself being correct.  I've given three suggestions to verify
this, one way or the other:

1)  Control the drive DMA speed down

2)  Pretend the maximum tagged command queue depth is
smaller than it is

3)  Toggle the write caching on the drive

Until you try all three of these and report back, you can't say
that the problem is Soren's.

-- Terry

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: ATA errors on recent -current

2002-04-14 Thread Alexander Leidinger

On 14 Apr, David W. Chapman Jr. wrote:
 I updated to -current today and am now getting these errors
 
 ad0: READ command timeout tag=1 serv=1 - resetting
 ata0: resetting devices .. ad0: invalidating queued requests
 done

Turn off tagged queing. Søren knows about this error and tries to
reproduce it (but fails as far as I know).

Bye,
Alexander.

-- 
   One world, one web, one program  -- Microsoft promotional ad
 Ein Volk, ein Reich, ein Fuehrer  -- Adolf Hitler

http://www.Leidinger.net   Alexander @ Leidinger.net
  GPG fingerprint = C518 BC70 E67F 143F BE91  3365 79E2 9C60 B006 3FE7


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: ATA errors on recent -current

2002-04-14 Thread Michael Class

Hello,

just as an additional datapoint. My 5.0-current system panics
during boot when I enable tagged queing.
This did not happen with a system built on March 16th, but there
have been numerous changes on the ata-subsystem inbetween and I was
not able to trace this down to a specific change.

The trace looks like this (this is just handwritten)

ad_service (e5217c00,1,12788100,0,0) +0x36
ad_transfer (e51fcdc0)
ata_start
adstrategy
ar_rw
ar_promise_read_conf
ata_raiddisk_attach
ad_attach

The panic appears right when the disks should be attached.
This happens with a GENERIC kernel too!

This is a dmesg output without tagging:

Copyright (c) 1992-2002 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
FreeBSD 5.0-CURRENT #0: Sun Apr 14 09:29:41 MEST 2002
[EMAIL PROTECTED]:/usr/src/sys/i386/compile/MCSMP2
Preloaded elf kernel /boot/kernel/kernel at 0xc0523000.
Preloaded elf module /boot/kernel/acpi.ko at 0xc05230a8.
Timecounter i8254  frequency 1193182 Hz
CPU: Pentium III/Pentium III Xeon/Celeron (996.55-MHz 686-class CPU)
  Origin = GenuineIntel  Id = 0x68a  Stepping = 10
  
Features=0x383fbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR,SSE
real memory  = 1073676288 (1048512K bytes)
avail memory = 1038569472 (1014228K bytes)
Programming 24 pins in IOAPIC #0
IOAPIC #0 intpin 2 - irq 0
FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs
 cpu0 (BSP): apic id:  0, version: 0x00040011, at 0xfee0
 cpu1 (AP):  apic id:  1, version: 0x00040011, at 0xfee0
 io0 (APIC): apic id:  2, version: 0x00178011, at 0xfec0
Pentium Pro MTRR support enabled
Using $PIR table, 8 entries at 0xc00f7570
npx0: math processor on motherboard
npx0: INT 16 interface
acpi0: AMIINT  on motherboard
acpi0: power button is handled as a fixed feature programming model.
Timecounter ACPI-fast  frequency 3579545 Hz
acpi_timer0: 24-bit timer at 3.579545MHz port 0x808-0x80b on acpi0
acpi_cpu0: CPU on acpi0
acpi_cpu1: CPU on acpi0
acpi_tz0: thermal zone on acpi0
acpi_button0: Power Button on acpi0
acpi_pcib0: Host-PCI bridge port 0xcf8-0xcff on acpi0
pci0: PCI bus on acpi_pcib0
agp0: VIA 82C691 (Apollo Pro) host to PCI bridge mem 0xe000-0xe3ff at device 
0.0 on pci0
pcib1: PCI-PCI bridge at device 1.0 on pci0
pci1: PCI bus on pcib1
pci1: display, VGA at device 0.0 (no driver attached)
isab0: PCI-ISA bridge at device 7.0 on pci0
isa0: ISA bus on isab0
atapci0: VIA 82C686 ATA100 controller port 0xffa0-0xffaf at device 7.1 on pci0
ata0: at 0x1f0 irq 14 on atapci0
ata1: at 0x170 irq 15 on atapci0
uhci0: VIA 83C572 USB controller port 0xcc00-0xcc1f irq 10 at device 7.2 on pci0
usb0: VIA 83C572 USB controller on uhci0
usb0: USB revision 1.0
uhub0: VIA UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub0: 2 ports with 2 removable, self powered
ums0: KYE Genius USB Wheel Mouse, rev 1.00/0.00, addr 2, iclass 3/1
ums0: 3 buttons and Z dir.
ulpt0: Hewlett-Packard DeskJet 990C, rev 1.10/1.00, addr 3, iclass 7/1
ulpt0: using bi-directional mode
uhci1: VIA 83C572 USB controller port 0xd800-0xd81f irq 10 at device 7.3 on pci0
usb1: VIA 83C572 USB controller on uhci1
usb1: USB revision 1.0
uhub1: VIA UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub1: 2 ports with 2 removable, self powered
pci0: serial bus, SMBus at device 7.4 (no driver attached)
xl0: 3Com 3c905B-TX Fast Etherlink XL port 0xc800-0xc87f mem 0xde80-0xdeff 
irq 12 at device 9.0 on pci0
xl0: Ethernet address: 00:10:5a:d7:dd:9c
miibus0: MII bus on xl0
xlphy0: 3Com internal media interface on miibus0
xlphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
pcm0: Creative EMU10K1 port 0xc400-0xc41f irq 9 at device 10.0 on pci0
bktr0: BrookTree 878 mem 0xdedfe000-0xdedfefff irq 10 at device 11.0 on pci0
bktr0: Hauppauge Model 61344 D121
bktr0: Detected a MSP3410D-B4 at 0x80
bktr0: Hauppauge WinCast/TV, Philips FR1216 PAL FM tuner, msp3400c stereo, remote 
control.
pci0: multimedia at device 11.1 (no driver attached)
sym0: 875 port 0xd000-0xd0ff mem 0xdfffe000-0xdfffefff,0xdf00-0xdfff irq 11 
at device 12.0 on pci0
sym0: Symbios NVRAM, ID 7, Fast-20, SE, parity checking
sym0: open drain IRQ line driver, using on-chip SRAM
sym0: using LOAD/STORE-based firmware.
sym0: SCAN FOR LUNS disabled for targets 0 1 2 3 4 5 6 8 9 10 11 12 13 14 15.
acpi_button1: Sleep Button on acpi0
atkbdc0: Keyboard controller (i8042) port 0x64,0x60 irq 1 on acpi0
atkbd0: AT Keyboard flags 0x1 irq 1 on atkbdc0
kbd0 at atkbd0
fdc0: enhanced floppy controller (i82077, NE72065 or clone) port 0x3f7,0x3f2-0x3f5 
irq 6 drq 2 on acpi0
fdc0: FIFO enabled, 8 bytes threshold
fd0: 1440-KB 3.5 drive on fdc0 drive 0
sio0 port 0x3f8-0x3ff irq 4 on acpi0
sio0: type 16550A
sio1 port 0x2f8-0x2ff irq 3 on acpi0
sio1: type 16550A
ppc0 port 0x778-0x77b,0x378-0x37f irq 7 drq 3 on acpi0
ppc0: SMC-like chipset (ECP/EPP/PS2/NIBBLE) in COMPATIBLE mode
ppc0: 

Re: ATA errors on recent -current

2002-04-14 Thread Jeroen Ruigrok/asmodai

-On [20020414 17:00], Michael Class ([EMAIL PROTECTED]) wrote:

Quoting the real panic message would have been nice.

ad_service (e5217c00,1,12788100,0,0) +0x36
ad_transfer (e51fcdc0)
ata_start
adstrategy
ar_rw
ar_promise_read_conf
ata_raiddisk_attach
ad_attach

This looks a lot like the panic on boot problems fixed earlier this week.
If you panic was biodone: bp 0xnumber not busy 0, update your sourcetree
and try again.

-- 
Jeroen Ruigrok van der Werven / asmodai / Kita no Mono
asmodai@[wxs.nl|xmach.org], finger [EMAIL PROTECTED]
http://www.softweyr.com/asmodai/ | http://www.[tendra|xmach].org/
Every revolution was first a thought in one man's mind...

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: ATA errors on recent -current

2002-04-14 Thread Giorgos Keramidas

On 2002-04-14 10:34, Alexander Leidinger wrote:
 On 14 Apr, David W. Chapman Jr. wrote:
  I updated to -current today and am now getting these errors
 
  ad0: READ command timeout tag=1 serv=1 - resetting
  ata0: resetting devices .. ad0: invalidating queued requests
  done

 Turn off tagged queing. S?ren knows about this error and tries to
 reproduce it (but fails as far as I know).

I've seen this quite a few times, but I can't reliably reproduce it
yet.  It seems to hit me a lot when the ad0 drive spins like crazy
doing stuff that is heavy on disk I/O.  Disabling tag queueing now to
see if this fixes things.  But even if it does, I think I should
enable it again and help S?ren track this down, if I can.

Giorgos.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



ATA errors on recent -current

2002-04-13 Thread David W. Chapman Jr.

I updated to -current today and am now getting these errors

ad0: READ command timeout tag=1 serv=1 - resetting
ata0: resetting devices .. ad0: invalidating queued requests
done

-- 
David W. Chapman Jr.
[EMAIL PROTECTED]   Raintree Network Services, Inc. www.inethouston.net
[EMAIL PROTECTED]   FreeBSD Committer www.FreeBSD.org

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: ATA errors

2000-02-22 Thread Nick Hibma

 I cvsupped this morning and I just had a chance to build a new kernel, and
 now I get a "cannot mount root" and it drops into some kind of commandline
 where I can enter a root for it to mount. This is the error it gives me
 now:
 
 ata0-slave: WARNING: WAIT_INTR active=ATA_WAIT_READY
 ata0-slave: ata_command: timeout waiting for intr
 ata0-slave: identify failed
 
 I went through this this morning.  If you are loading modules from the
 boot loader, load them later, like from rc.conf.  I'm not sure what
 broke there, but it's a good workaround.


Well, the same seems to apply to some OHCI host controllers. They fail
to work as well if preloaded or postloaded, but do work if they are
compiled into the kernel. I haven't tried yet whether the compiled USB
support fails if a module is preloaded though. My gut feeling is that
this must be related. I think, and this is just a bad guess, that this
broke around 2 weeks ago. I'll see if I can run a few test kernels this
evening.

Nick

--
[EMAIL PROTECTED]
[EMAIL PROTECTED]  USB project
http://www.etla.net/~n_hibma/



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



ATA errors

2000-02-19 Thread Kenneth Wayne Culver

I cvsupped this morning and I just had a chance to build a new kernel, and
now I get a "cannot mount root" and it drops into some kind of commandline
where I can enter a root for it to mount. This is the error it gives me
now:

ata0-slave: WARNING: WAIT_INTR active=ATA_WAIT_READY
ata0-slave: ata_command: timeout waiting for intr
ata0-slave: identify failed

it goes on to say that ad1 doesn't exist...

This didn't happen with a kernel as of yesterday (which is what I'm using
now). This is how it probes right now:

ata-pci0: Intel PIIX4 ATA-33 controller port 0xf000-0xf00f at device 7.1 on pci0
ata0 at 0x01f0 irq 14 on ata-pci0
ata1 at 0x0170 irq 15 on ata-pci0

ad0: 8063MB Maxtor 90845D4 [16383/16/63] at ata0-master using UDMA33
ad1: 13029MB Maxtor 91366U4 [26473/16/63] at ata0-slave using UDMA33
ad2: 6187MB FUJITSU MPC3064AT [13410/15/63] at ata1-master using UDMA33
acd0: DVD-ROM TOSHIBA DVD-ROM SD-M1212 at ata1-slave using UDMA33

Any help would be appreciated..


=
| Kenneth Culver  | FreeBSD: The best OS around.|
| Unix Systems Administrator  | ICQ #: 24767726 |
| and student at The  | AIM: muythaibxr |
| The University of Maryland, | Website: (Under Construction)   |
| College Park.   | http://www.wam.umd.edu/~culverk/|
=



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: ATA errors

2000-02-19 Thread Bryan Liesner

On Sat, 19 Feb 2000, Kenneth Wayne Culver wrote:

I cvsupped this morning and I just had a chance to build a new kernel, and
now I get a "cannot mount root" and it drops into some kind of commandline
where I can enter a root for it to mount. This is the error it gives me
now:

ata0-slave: WARNING: WAIT_INTR active=ATA_WAIT_READY
ata0-slave: ata_command: timeout waiting for intr
ata0-slave: identify failed

I went through this this morning.  If you are loading modules from the
boot loader, load them later, like from rc.conf.  I'm not sure what
broke there, but it's a good workaround.

-Bryan




To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



ATA errors

2000-01-16 Thread Peter Jeremy

Whilst my -current system has stopped crashing every night, I got some
odd log messages last night.  At the time the system should have been
doing a cvs update (or possibly a make world).  There was nothing in
the CD-ROM (ata0-slave) at the time.  I'm running the latest version
of ata (about 7 days old).

Jan 17 00:48:45 gsmx07 /kernel: ad2: ad_timeout: lost disk contact - resetting
Jan 17 00:48:45 gsmx07 /kernel: ata1: resetting devices .. done
Jan 17 00:48:45 gsmx07 /kernel: ad0: ad_timeout: lost disk contact - resetting
Jan 17 00:48:45 gsmx07 /kernel: ata0: resetting devices .. done
Jan 17 00:49:56 gsmx07 /kernel: ad2: ad_timeout: lost disk contact - resetting
Jan 17 00:49:56 gsmx07 /kernel: ata1: resetting devices .. done
Jan 17 00:49:56 gsmx07 /kernel: ad0: ad_timeout: lost disk contact - resetting
Jan 17 00:49:56 gsmx07 /kernel: ata0: resetting devices .. ata0-slave: timeout waiting 
for command=ef s=00 e=64
Jan 17 00:49:56 gsmx07 /kernel: ata0-slave: timeout waiting for command=ef s=00 e=64
Jan 17 00:49:56 gsmx07 /kernel: done
Jan 17 00:49:56 gsmx07 /kernel: ad0: ad_timeout: lost disk contact - resetting
Jan 17 00:49:56 gsmx07 /kernel: ata0: resetting devices .. ata0-slave: timeout waiting 
for command=ef s=00 e=64
Jan 17 00:49:56 gsmx07 /kernel: ata0-slave: timeout waiting for command=ef s=00 e=64
Jan 17 00:49:56 gsmx07 /kernel: done
Jan 17 00:49:56 gsmx07 /kernel: ad2: ad_timeout: lost disk contact - resetting
Jan 17 00:49:56 gsmx07 /kernel: ata1: resetting devices .. done
Jan 17 00:49:56 gsmx07 /kernel: ad0: ad_timeout: lost disk contact - resetting
Jan 17 00:49:56 gsmx07 /kernel: ata0: resetting devices .. ata0-slave: timeout waiting 
for command=ef s=00 e=64
Jan 17 00:49:56 gsmx07 /kernel: ata0-slave: timeout waiting for command=ef s=00 e=64
Jan 17 00:49:57 gsmx07 /kernel: done
Jan 17 00:49:57 gsmx07 /kernel: ad0: ad_timeout: lost disk contact - resetting
Jan 17 00:49:57 gsmx07 /kernel: ata0: resetting devices .. ata0-slave: timeout waiting 
for command=ef s=00 e=64
Jan 17 00:49:57 gsmx07 /kernel: ata0-slave: timeout waiting for command=ef s=00 e=64
Jan 17 00:49:57 gsmx07 /kernel: done
Jan 17 00:49:57 gsmx07 /kernel: ad2: ad_timeout: lost disk contact - resetting
Jan 17 00:49:57 gsmx07 /kernel: ad2: ad_timeout: trying fallback to PIO mode
Jan 17 00:49:57 gsmx07 /kernel: ata1: resetting devices .. done
Jan 17 00:49:57 gsmx07 /kernel: ad0: ad_timeout: lost disk contact - resetting
Jan 17 00:49:57 gsmx07 /kernel: ad0: ad_timeout: trying fallback to PIO mode
Jan 17 00:49:57 gsmx07 /kernel: ata0: resetting devices .. ata0-slave: timeout waiting 
for command=ef s=00 e=64
Jan 17 00:49:57 gsmx07 /kernel: ata0-slave: timeout waiting for command=ef s=00 e=64
Jan 17 00:49:57 gsmx07 /kernel: done
Jan 17 00:49:57 gsmx07 /kernel: ad0: ad_timeout: lost disk contact - resetting
Jan 17 00:49:57 gsmx07 /kernel: ata0: resetting devices .. ata0-slave: timeout waiting 
for command=ef s=00 e=64
Jan 17 00:49:57 gsmx07 /kernel: ata0-slave: timeout waiting for command=ef s=00 e=64
Jan 17 00:49:57 gsmx07 /kernel: done


The relevant probe messages are:

Jan 14 07:37:02 gsmx07 /kernel: The Regents of the University of California. All 
rights reserved.
Jan 14 07:37:02 gsmx07 /kernel: FreeBSD 4.0-CURRENT #25: Fri Jan 14 07:35:47 EST 2000
Jan 14 07:37:02 gsmx07 /kernel: root@:/3.0/cvs/src/sys/compile/gsmx
Jan 14 07:37:02 gsmx07 /kernel: CPU: Pentium II (267.31-MHz 686-class CPU)
Jan 14 07:37:02 gsmx07 /kernel: Origin = "GenuineIntel"  Id = 0x633  Stepping = 3
Jan 14 07:37:02 gsmx07 /kernel: pcib0: Intel 82443LX (440 LX) host to PCI bridge on 
motherboard
Jan 14 07:37:02 gsmx07 /kernel: pci0: PCI bus on pcib0
Jan 14 07:37:02 gsmx07 /kernel: pcib1: Intel 82443LX (440 LX) PCI-PCI (AGP) bridge 
at device 1.0 on pci0
Jan 14 07:37:02 gsmx07 /kernel: pci1: PCI bus on pcib1
Jan 14 07:37:02 gsmx07 /kernel: isab0: Intel 82371AB PCI to ISA bridge at device 7.0 
on pci0
Jan 14 07:37:02 gsmx07 /kernel: isa0: ISA bus on isab0
Jan 14 07:37:02 gsmx07 /kernel: ata-pci0: Intel PIIX4 ATA-33 controller port 
0xf000-0xf00f at device 7.1 on pci0
Jan 14 07:37:02 gsmx07 /kernel: ata-pci0: Busmastering DMA supported
Jan 14 07:37:02 gsmx07 /kernel: ata0 at 0x01f0 irq 14 on ata-pci0
Jan 14 07:37:02 gsmx07 /kernel: ata1 at 0x0170 irq 15 on ata-pci0
Jan 14 07:37:02 gsmx07 /kernel: ata-isa0: already registered as ata0
Jan 14 07:37:02 gsmx07 /kernel: ata-isa1: already registered as ata1
Jan 14 07:37:02 gsmx07 /kernel: ad0: FUJITSU MPB3064ATU E/4010 ATA-3 disk at ata0 as 
master
Jan 14 07:37:02 gsmx07 /kernel: ad0: 6187MB (12672450 sectors), 13410 cyls, 15 heads, 
63 S/T, 512 B/S
Jan 14 07:37:02 gsmx07 /kernel: ad0: 16 secs/int, 1 depth queue, UDMA33
Jan 14 07:37:02 gsmx07 /kernel: ad2: QUANTUM FIREBALL_TM1280A/A6B.2D00 ATA-0 disk at 
ata1 as master
Jan 14 07:37:02 gsmx07 /kernel: ad2: 1222MB (2503872 sectors), 2484 cyls, 16 heads, 63 
S/T, 512 B/S
Jan 14 07:37:02 gsmx07 /kernel: ad2: 16 secs/int, 1 depth queue, WDMA2
Jan 14 07:37:02 gsmx07 /kernel: 

ATA errors and AUTO_EOI

1999-12-21 Thread Dieter Rothacker

Hi,

I do not know if this issue has already been solved, but I cannot remember
having read something about it.

ATA errors directly after booting the kernel seem to be related to the usage
of the fast IRQ tuning parameter "AUTO_EOI".

Last night I migrated from my
GA586DX (Dualboard, 430HX chipset,PIIX3) 1x P233MMX
to a
GA686BX (440BX chipset,PIIX4) Celeron300A
(both used with the same HPT366 Controller and same disks).

Using the old board and AUTO_EOI1 and AUTO_EOI2, everything was stable. 

Using the new board, I get "waiting for interrupt" errors, and the system
freezes while trying to mount the disks (with kernel from 12/03) or the
system freeezes before being able to detect the drives (with kernel from
12/20).

The solution for me was to recompile the kernel without AUTO_EOI1 and
AUTO_EOI2.
-- 
Dieter 'Didi' Rothacker
ICQ#3327455
"There is a crack, a crack in everything.
 That's how the light gets in." (+Fravia)


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: ATA errors and AUTO_EOI

1999-12-21 Thread Soren Schmidt

It seems Dieter Rothacker wrote:
 
 Using the new board, I get "waiting for interrupt" errors, and the system
 freezes while trying to mount the disks (with kernel from 12/03) or the
 system freeezes before being able to detect the drives (with kernel from
 12/20).
 
 The solution for me was to recompile the kernel without AUTO_EOI1 and
 AUTO_EOI2.

Those options newer worked (for me at least) reliably with anything, could
those that are seeing the hangs please check this ??

-Søren


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: ATA errors and AUTO_EOI

1999-12-21 Thread Doug White

On Tue, 21 Dec 1999, Soren Schmidt wrote:

 It seems Dieter Rothacker wrote:
  
  Using the new board, I get "waiting for interrupt" errors, and the system
  freezes while trying to mount the disks (with kernel from 12/03) or the
  system freeezes before being able to detect the drives (with kernel from
  12/20).
  
  The solution for me was to recompile the kernel without AUTO_EOI1 and
  AUTO_EOI2.
 
 Those options newer worked (for me at least) reliably with anything, could
 those that are seeing the hangs please check this ??

Although this isn't immediately related to ATA, I've found that Intel
L440GX+ boards *hate* AUTO_EOI_2 when running SMP.  They freeze going into
multiuser mode.  Took me quite a while to figure that out.

So if you're having wacky interrupt-related problems and have AUTO_EOIs in
your kernel, you should take them out first.

Doug White|  FreeBSD: The Power to Serve
[EMAIL PROTECTED] |  www.FreeBSD.org



To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: ATA errors and AUTO_EOI

1999-12-21 Thread Oliver Fromme

Doug White wrote in list.freebsd-current:
  On Tue, 21 Dec 1999, Soren Schmidt wrote:
   It seems Dieter Rothacker wrote:
The solution for me was to recompile the kernel without AUTO_EOI1 and
AUTO_EOI2.
   
   Those options newer worked (for me at least) reliably with anything, could
   those that are seeing the hangs please check this ??
  
  Although this isn't immediately related to ATA, I've found that Intel
  L440GX+ boards *hate* AUTO_EOI_2 when running SMP.  They freeze going into
  multiuser mode.  Took me quite a while to figure that out.

I have always been using AUTO_EOI_1, but _not_ AUTO_EOI_2, and
it has always worked very well.

The comment in LINT about AUTO_EOI_2 sounds pretty suspicous,
so I never even tried it:  "it works for some clones and some
integrated versions."  That sounds to me like "it works on a
very limited set of hardware (and if you're lucky)."

AUTO_EOI_1 seems to be fine, though.

Regards
   Oliver

-- 
Oliver Fromme, Leibnizstr. 18/61, 38678 Clausthal, Germany
(Info: finger userinfo:[EMAIL PROTECTED])

"In jedem Stück Kohle wartet ein Diamant auf seine Geburt"
 (Terry Pratchett)


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: ATA errors and AUTO_EOI

1999-12-21 Thread D. Rock

Oliver Fromme schrieb:
 
 Doug White wrote in list.freebsd-current:
   On Tue, 21 Dec 1999, Soren Schmidt wrote:
It seems Dieter Rothacker wrote:
 The solution for me was to recompile the kernel without AUTO_EOI1 and
 AUTO_EOI2.
   
Those options newer worked (for me at least) reliably with anything, could
those that are seeing the hangs please check this ??
  
   Although this isn't immediately related to ATA, I've found that Intel
   L440GX+ boards *hate* AUTO_EOI_2 when running SMP.  They freeze going into
   multiuser mode.  Took me quite a while to figure that out.
 
 I have always been using AUTO_EOI_1, but _not_ AUTO_EOI_2, and
 it has always worked very well.
 
 The comment in LINT about AUTO_EOI_2 sounds pretty suspicous,
 so I never even tried it:  "it works for some clones and some
 integrated versions."  That sounds to me like "it works on a
 very limited set of hardware (and if you're lucky)."
 
 AUTO_EOI_1 seems to be fine, though.
Same for me.

Except for my laptop, which didn't even like AUTO_EOI_1 (which is also
mentioned
in LINT, but noticed it only at 3rd read).

Daniel


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Is AUTO_EOI better? [was:Re: ATA errors and AUTO_EOI]

1999-12-21 Thread Dieter Rothacker

On Tue, 21 Dec 1999 17:13:20 +0100, D. Rock wrote:

Oliver Fromme schrieb:
 Doug White wrote in list.freebsd-current:
   On Tue, 21 Dec 1999, Soren Schmidt wrote:
It seems Dieter Rothacker wrote:
 The solution for me was to recompile the kernel without AUTO_EOI1 and
 AUTO_EOI2.
   
Those options newer worked (for me at least) reliably with anything, could
those that are seeing the hangs please check this ??
  
   Although this isn't immediately related to ATA, I've found that Intel
   L440GX+ boards *hate* AUTO_EOI_2 when running SMP.  They freeze going into
   multiuser mode.  Took me quite a while to figure that out.
 
 I have always been using AUTO_EOI_1, but _not_ AUTO_EOI_2, and
 it has always worked very well.
 
 The comment in LINT about AUTO_EOI_2 sounds pretty suspicous,
 so I never even tried it:  "it works for some clones and some
 integrated versions."  That sounds to me like "it works on a
 very limited set of hardware (and if you're lucky)."
 
 AUTO_EOI_1 seems to be fine, though.
Same for me.

Yeah, you are right. My system is now running with a kernel with AUTO_EOI_1.
Seems like AUTO_EOI_2 really was the only problem...

Does somebody have any actual evidence that the AUTO_EOI really boosts
performance on modern integrated chipsets like the 440BX?
-- 
Dieter Rothacker


To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message



Re: ATA errors and AUTO_EOI

1999-12-21 Thread jack

Today Oliver Fromme wrote:

 The comment in LINT about AUTO_EOI_2 sounds pretty suspicous,
 so I never even tried it:  "it works for some clones and some
 integrated versions."  That sounds to me like "it works on a
 very limited set of hardware (and if you're lucky)."

I've only got one out of over a dozen boxes where AUTO_EOI_2 will
not work.  Micronics, Tyan, Intel, and Brand X boards.  It's one
of the three Tyans that doesn't work.

Only one box is running current and it works there with ata and 
antique drives.  

ata-pci0: Intel PIIX3 ATA controller at device 7.1 on pci0
ata-pci0: Busmastering DMA supported
ata0 at 0x01f0 irq 14 on ata-pci0
ata1 at 0x0170 irq 15 on ata-pci0

ad0: WDC AC2540H/11.06P29 ATA-0 disk at ata0 as master
ad0: 515MB (1056384 sectors), 1048 cyls, 16 heads, 63 S/T, 512 B/S
ad0: 16 secs/int, 1 depth queue, PIO
ad1: WDC AC31200F/14.04E28 ATA-0 disk at ata0 as slave
ad1: 1222MB (2503872 sectors), 2484 cyls, 16 heads, 63 S/T, 512 B/S
ad1: 16 secs/int, 1 depth queue, PIO
acd0: Chinon CD-ROM CDS-545/A1.3 CDROM drive at ata1 as master
acd0: read 344KB/s (689KB/s), 128KB buffer, PIO
acd0: Reads: CD-DA stream
acd0: Audio: play, 64 volume levels
acd0: Mechanism: ejectable tray
acd0: Medium: no/blank disc inside, unlocked
ata_command: timeout waiting for interrupt

--
Jack O'NeillSystems Administrator / Systems Analyst
[EMAIL PROTECTED] Crystal Wind Communications, Inc.
  Finger [EMAIL PROTECTED] for my PGP key.
   PGP Key fingerprint = F6 C4 E6 D4 2F 15 A7 67   FD 09 E9 3C 5F CC EB CD
   enriched, vcard, HTML messages  /dev/null
--




To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-current" in the body of the message