Re: SCSI hangs w/SuperMicro 6010H

2001-07-11 Thread Dave Cornejo

I have gotten this system to boot with a current SMP kernel from as
late as mid-September 2000, and am working on stabilising the system
so that I can try later versions.

It seems that all the working versions don't use Ultra-160 (I'm not
sure when this hit the tree) - could the problem have been introduced
when support was added?

thanks,
dave c

-- 
Dave Cornejo @ Dogwood Media, Fremont, California (also [EMAIL PROTECTED])
  There aren't any monkeys chasing us... - Xochi

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: SCSI hangs w/SuperMicro 6010H

2001-06-26 Thread Justin T. Gibbs

John Baldwin wrote:
 Hrmm, perhaps you are getting an interrupt storm from ahc.  Ok, try
 this: find the ahc driver's interrupt handler, and add a printf.
 Then see if the printf fires while the machine is hung.

Ok, I put a printf in ahc_handle_seqint() and ahc_handle_scsiint().

That won't catch all interrupts.  Most notably, you won't know
if commands are completing.  Command completions are much more
prevalent than sequencer or scsi interrupts.

My current (freshly cvsupped sources) kernel with the printf()s in it
is pretty consistent in it's behavior: with SMP it hangs soon after
the 15 second SCSI delay and keystrokes will not cause it to continue
to boot.

The order that they print out on the screen is this:

message Waiting 15 seconds for SCSI devices to settle

(approximately 15 second delay)

26 times scsiint called with intstat = 0x4, status0 = 0, status = 0x88
  (SELTO  BUSFREE?)

So 26 of the 30 possible target ID positions on the controller are
empty.

2 times seqint called with instat = 0x71 (BAD_STATUS?)

Two commands returned status other than 0 - most likely check condition.

36 times seqint called with intstat = 0x61 (HOST_MSG_LOOP?)

We negotiated transfer settings with some devices.

These all seem quite normal.

--
Justin

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: SCSI hangs w/SuperMicro 6010H

2001-06-26 Thread Dave Cornejo

Justin T. Gibbs wrote:
 John Baldwin wrote:
  Hrmm, perhaps you are getting an interrupt storm from ahc.  Ok, try
  this: find the ahc driver's interrupt handler, and add a printf.
  Then see if the printf fires while the machine is hung.
 
 Ok, I put a printf in ahc_handle_seqint() and ahc_handle_scsiint().
 
 That won't catch all interrupts.  Most notably, you won't know
 if commands are completing.  Command completions are much more
 prevalent than sequencer or scsi interrupts.

should I try and catch the command completions?  which routine is best
to do this in?

btw, thanks very much for your help!

dave c

-- 
Dave Cornejo @ Dogwood Media, Fremont, California (also [EMAIL PROTECTED])
  There aren't any monkeys chasing us... - Xochi

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: SCSI hangs w/SuperMicro 6010H

2001-06-26 Thread Justin T. Gibbs

 That won't catch all interrupts.  Most notably, you won't know
 if commands are completing.  Command completions are much more
 prevalent than sequencer or scsi interrupts.

should I try and catch the command completions?  which routine is best
to do this in?

ahc_intr() in aic7xxx_inline.h gates all interrupt activity.  I don't
know that it will tell you why you are hung though.  All that is clear
is that interrupts at least work for a time.

btw, thanks very much for your help!

Sure.

--
Justin

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: SCSI hangs w/SuperMicro 6010H

2001-06-23 Thread Dave Cornejo

John Baldwin wrote:
 Hrmm, perhaps you are getting an interrupt storm from ahc.  Ok, try
 this: find the ahc driver's interrupt handler, and add a printf.
 Then see if the printf fires while the machine is hung.

Ok, I put a printf in ahc_handle_seqint() and ahc_handle_scsiint().

My current (freshly cvsupped sources) kernel with the printf()s in it
is pretty consistent in it's behavior: with SMP it hangs soon after
the 15 second SCSI delay and keystrokes will not cause it to continue
to boot.

The order that they print out on the screen is this:

message Waiting 15 seconds for SCSI devices to settle

(approximately 15 second delay)

26 times scsiint called with intstat = 0x4, status0 = 0, status = 0x88
  (SELTO  BUSFREE?)

2 times seqint called with instat = 0x71 (BAD_STATUS?)

36 times seqint called with intstat = 0x61 (HOST_MSG_LOOP?)

and then the system hangs.

I have gone back to a SMP kernel from April 15th - using a GENERIC
kernel with SMP enabled it exhibits the same problem.  Will work my
way back to -stable and see if anything changes...

dave c

-- 
Dave Cornejo @ Dogwood Media, Fremont, California (also [EMAIL PROTECTED])
  There aren't any monkeys chasing us... - Xochi


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: SCSI hangs w/SuperMicro 6010H

2001-06-22 Thread John Baldwin


On 22-Jun-01 Dave Cornejo wrote:
 John Baldwin wrote:
 Actuually, KTR is your friend here. :)  Read the ktr(4) manpage, then
 compile a
 kernel with KTR_MASK and KTR_COMPILE set to KTR_INTR|KTR_PROC.  Then when it
 hangs, break into DDB and look at the longs via 'show ktr' to see if you can
 locate any interrutps coming in from ahc0 or ahc1.
 
 Okay - fired up the box, built a kernel off of a 6/18 source snapshot,
 and it hangs in about the same place - however what I get that as soon
 as I touch a key to invoke the debugger from the console, it continues
 merrily booting and I can't break into DDB until way past the
 problem.  In my mind this kind of confirms something is busted in the
 interrupts.

Hrmm, perhaps you are getting an interrupt storm from ahc.  Ok, try this: find
the ahc driver's interrupt handler, and add a printf.  Then see if the printf
fires while the machine is hung.

 Tried looking back through the show ktr output and I'm not 100% clear
 on what it all means - I guess I'm interested in the ithread stuff and
 the only thing I ever see is swi6: tty:sio+ in the trace buffer
 besides what appears to be normal process rescheduling (?) which is
 mostly idle task time...

Unfortunately, clock interrupts can fill the trace buffer up, yes. :(

If rolling back the source tree gets you a working kernel, then you might want
to do a binary search using date tags to narrow down what commit actually broke
things on your box.

-- 

John Baldwin [EMAIL PROTECTED] -- http://www.FreeBSD.org/~jhb/
PGP Key: http://www.baldwin.cx/~john/pgpkey.asc
Power Users Use the Power to Serve!  -  http://www.FreeBSD.org/

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



RE: SCSI hangs w/SuperMicro 6010H

2001-06-21 Thread John Baldwin


On 17-Jun-01 Dave Cornejo wrote:
 Please excuse me if you've seen this in questions, but I found a
 relevancy to current: If I drop back to 4.3 release, this system boots
 every time with no hangs observed in half a dozen tries in either UP
 or SMP mode.  Anyone else seeing similar?

Is this on -current or -stable?  If it's on -current, why did you ask on
-questions? :)  It looks like an interrupt problem however.

-- 

John Baldwin [EMAIL PROTECTED] -- http://www.FreeBSD.org/~jhb/
PGP Key: http://www.baldwin.cx/~john/pgpkey.asc
Power Users Use the Power to Serve!  -  http://www.FreeBSD.org/

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: SCSI hangs w/SuperMicro 6010H

2001-06-21 Thread Dave Cornejo

John Baldwin wrote:
 Is this on -current or -stable?  If it's on -current, why did you ask on
 -questions? :)  It looks like an interrupt problem however.

When I asked on questions, I was of the belief that I had a hardware
problem and that it was not necessarily a -current issue.  When I
later went back and installed 4.3 and it worked I then realized that I
had justification to post it on -current.  Hey, at least I didn't
cross-post to questions, stable, scsi, and current! :)

I guess my next step is to try and trace through what's happening -
can you suggest a good place to start (like a routine to start
tracing), or is there anything I can do that might get more info for
the people that know what is going on?

thanks!
dave c

-- 
Dave Cornejo @ Dogwood Media, Fremont, California (also [EMAIL PROTECTED])
  There aren't any monkeys chasing us... - Xochi

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: SCSI hangs w/SuperMicro 6010H

2001-06-21 Thread John Baldwin


On 21-Jun-01 Dave Cornejo wrote:
 John Baldwin wrote:
 Is this on -current or -stable?  If it's on -current, why did you ask on
 -questions? :)  It looks like an interrupt problem however.
 
 When I asked on questions, I was of the belief that I had a hardware
 problem and that it was not necessarily a -current issue.  When I
 later went back and installed 4.3 and it worked I then realized that I
 had justification to post it on -current.  Hey, at least I didn't
 cross-post to questions, stable, scsi, and current! :)

Ok, sounds good, just checking. :)  Can you provide the output of mptable for
this box?  In the SMP case, -current does interrupt routing for PCI interrupts
a bit differently, which might be a possible reason.  Hmm, but you are getting
interrupts eventually it seems.

 I guess my next step is to try and trace through what's happening -
 can you suggest a good place to start (like a routine to start
 tracing), or is there anything I can do that might get more info for
 the people that know what is going on?

Actuually, KTR is your friend here. :)  Read the ktr(4) manpage, then compile a
kernel with KTR_MASK and KTR_COMPILE set to KTR_INTR|KTR_PROC.  Then when it
hangs, break into DDB and look at the longs via 'show ktr' to see if you can
locate any interrutps coming in from ahc0 or ahc1.

-- 

John Baldwin [EMAIL PROTECTED] -- http://www.FreeBSD.org/~jhb/
PGP Key: http://www.baldwin.cx/~john/pgpkey.asc
Power Users Use the Power to Serve!  -  http://www.FreeBSD.org/

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: SCSI hangs w/SuperMicro 6010H

2001-06-21 Thread Dave Cornejo

John Baldwin wrote:
 Ok, sounds good, just checking. :)  Can you provide the output of mptable for
 this box?  In the SMP case, -current does interrupt routing for PCI interrupts
 a bit differently, which might be a possible reason.  Hmm, but you are getting
 interrupts eventually it seems.

I can only get the thing past the hang maybe once in twenty+ tries.
Below is the mptable output (I don't remember what version of FreeBSD
I had installed when I did this, hope it doesn't matter).  I'll try
the KTR stuff later tonight.

thanks!
dave c

===

MPTable, version 2.0.15

---

MP Floating Pointer Structure:

  location: BIOS
  physical address: 0x000ff780
  signature:'_MP_'
  length:   16 bytes
  version:  1.1
  checksum: 0xb9
  mode: Virtual Wire

---

MP Config Table Header:

  physical address: 0x000f0bd0
  signature:'PCMP'
  base table length:284
  version:  1.1
  checksum: 0x28
  OEM ID:   'AMI '
  Product ID:   'CNB20HE '
  OEM table pointer:0x
  OEM table size:   0
  entry count:  27
  local APIC address:   0xfee0
  extended table length:0
  extended table checksum:  0

---

MP Config Base Table Entries:

--
Processors: APIC ID Version State   Family  Model   StepFlags
 0   0x11BSP, usable 6   8   6   0x387fbff
 1   0x11AP, usable  6   8   6   0x387fbff
--
Bus:Bus ID  Type
 0   PCI   
 1   PCI   
 2   PCI   
 3   ISA   
--
I/O APICs:  APIC ID Version State   Address
 4   0x11usable  0xfec0
 5   0x11usable  0xfec01000
--
I/O Ints:   TypePolarityTrigger Bus ID   IRQAPIC ID PIN#
INT active-lo   level1   0:A  5   14
INT active-lo   level0   5:B  5   11
INT active-lo   level0  15:A  4   10
INT active-lo   level0   6:A  5   15
INT active-lo   level0   5:A  5   10
INT active-lo   level0   4:A  5   12
ExtINT  active-hiedge3 0  40
INT active-hiedge3 1  41
INT active-hiedge3 0  42
INT active-hiedge3 3  43
INT active-hiedge3 4  44
INT active-hiedge3 6  46
INT active-hiedge3 8  48
INT active-hiedge312  4   12
INT active-hiedge313  4   13
INT active-hiedge314  4   14
INT active-hiedge315  4   15
--
Local Ints: TypePolarityTrigger Bus ID   IRQAPIC ID PIN#
ExtINT  active-hiedge3 02550
NMI active-hiedge0   0:A2551

===


-- 
Dave Cornejo @ Dogwood Media, Fremont, California (also [EMAIL PROTECTED])
  There aren't any monkeys chasing us... - Xochi

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: SCSI hangs w/SuperMicro 6010H

2001-06-21 Thread Dave Cornejo

John Baldwin wrote:
 Actuually, KTR is your friend here. :)  Read the ktr(4) manpage, then compile a
 kernel with KTR_MASK and KTR_COMPILE set to KTR_INTR|KTR_PROC.  Then when it
 hangs, break into DDB and look at the longs via 'show ktr' to see if you can
 locate any interrutps coming in from ahc0 or ahc1.

Okay - fired up the box, built a kernel off of a 6/18 source snapshot,
and it hangs in about the same place - however what I get that as soon
as I touch a key to invoke the debugger from the console, it continues
merrily booting and I can't break into DDB until way past the
problem.  In my mind this kind of confirms something is busted in the
interrupts.

Tried looking back through the show ktr output and I'm not 100% clear
on what it all means - I guess I'm interested in the ithread stuff and
the only thing I ever see is swi6: tty:sio+ in the trace buffer
besides what appears to be normal process rescheduling (?) which is
mostly idle task time...

Do think there's any use to rolling my source tree back a ways and
compiling a kernel?

thanks,
dave c

-- 
Dave Cornejo @ Dogwood Media, Fremont, California (also [EMAIL PROTECTED])
  There aren't any monkeys chasing us... - Xochi

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: SCSI hangs w/SuperMicro 6010H

2001-06-20 Thread Justin T. Gibbs

Please excuse me if you've seen this in questions, but I found a
relevancy to current: If I drop back to 4.3 release, this system boots
every time with no hangs observed in half a dozen tries in either UP
or SMP mode.  Anyone else seeing similar?

I doubt that this is related to CAM or the aic7xxx driver.  You
probably need to work with John Baldwin to trace the early
execution of the system to see why it is haning up.

--
Justin

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message