Re: SCSI hangs w/SuperMicro 6010H
I have gotten this system to boot with a current SMP kernel from as late as mid-September 2000, and am working on stabilising the system so that I can try later versions. It seems that all the working versions don't use Ultra-160 (I'm not sure when this hit the tree) - could the problem have been introduced when support was added? thanks, dave c -- Dave Cornejo @ Dogwood Media, Fremont, California (also [EMAIL PROTECTED]) There aren't any monkeys chasing us... - Xochi To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: SCSI hangs w/SuperMicro 6010H
John Baldwin wrote: Hrmm, perhaps you are getting an interrupt storm from ahc. Ok, try this: find the ahc driver's interrupt handler, and add a printf. Then see if the printf fires while the machine is hung. Ok, I put a printf in ahc_handle_seqint() and ahc_handle_scsiint(). That won't catch all interrupts. Most notably, you won't know if commands are completing. Command completions are much more prevalent than sequencer or scsi interrupts. My current (freshly cvsupped sources) kernel with the printf()s in it is pretty consistent in it's behavior: with SMP it hangs soon after the 15 second SCSI delay and keystrokes will not cause it to continue to boot. The order that they print out on the screen is this: message Waiting 15 seconds for SCSI devices to settle (approximately 15 second delay) 26 times scsiint called with intstat = 0x4, status0 = 0, status = 0x88 (SELTO BUSFREE?) So 26 of the 30 possible target ID positions on the controller are empty. 2 times seqint called with instat = 0x71 (BAD_STATUS?) Two commands returned status other than 0 - most likely check condition. 36 times seqint called with intstat = 0x61 (HOST_MSG_LOOP?) We negotiated transfer settings with some devices. These all seem quite normal. -- Justin To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: SCSI hangs w/SuperMicro 6010H
Justin T. Gibbs wrote: John Baldwin wrote: Hrmm, perhaps you are getting an interrupt storm from ahc. Ok, try this: find the ahc driver's interrupt handler, and add a printf. Then see if the printf fires while the machine is hung. Ok, I put a printf in ahc_handle_seqint() and ahc_handle_scsiint(). That won't catch all interrupts. Most notably, you won't know if commands are completing. Command completions are much more prevalent than sequencer or scsi interrupts. should I try and catch the command completions? which routine is best to do this in? btw, thanks very much for your help! dave c -- Dave Cornejo @ Dogwood Media, Fremont, California (also [EMAIL PROTECTED]) There aren't any monkeys chasing us... - Xochi To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: SCSI hangs w/SuperMicro 6010H
That won't catch all interrupts. Most notably, you won't know if commands are completing. Command completions are much more prevalent than sequencer or scsi interrupts. should I try and catch the command completions? which routine is best to do this in? ahc_intr() in aic7xxx_inline.h gates all interrupt activity. I don't know that it will tell you why you are hung though. All that is clear is that interrupts at least work for a time. btw, thanks very much for your help! Sure. -- Justin To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: SCSI hangs w/SuperMicro 6010H
John Baldwin wrote: Hrmm, perhaps you are getting an interrupt storm from ahc. Ok, try this: find the ahc driver's interrupt handler, and add a printf. Then see if the printf fires while the machine is hung. Ok, I put a printf in ahc_handle_seqint() and ahc_handle_scsiint(). My current (freshly cvsupped sources) kernel with the printf()s in it is pretty consistent in it's behavior: with SMP it hangs soon after the 15 second SCSI delay and keystrokes will not cause it to continue to boot. The order that they print out on the screen is this: message Waiting 15 seconds for SCSI devices to settle (approximately 15 second delay) 26 times scsiint called with intstat = 0x4, status0 = 0, status = 0x88 (SELTO BUSFREE?) 2 times seqint called with instat = 0x71 (BAD_STATUS?) 36 times seqint called with intstat = 0x61 (HOST_MSG_LOOP?) and then the system hangs. I have gone back to a SMP kernel from April 15th - using a GENERIC kernel with SMP enabled it exhibits the same problem. Will work my way back to -stable and see if anything changes... dave c -- Dave Cornejo @ Dogwood Media, Fremont, California (also [EMAIL PROTECTED]) There aren't any monkeys chasing us... - Xochi To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: SCSI hangs w/SuperMicro 6010H
On 22-Jun-01 Dave Cornejo wrote: John Baldwin wrote: Actuually, KTR is your friend here. :) Read the ktr(4) manpage, then compile a kernel with KTR_MASK and KTR_COMPILE set to KTR_INTR|KTR_PROC. Then when it hangs, break into DDB and look at the longs via 'show ktr' to see if you can locate any interrutps coming in from ahc0 or ahc1. Okay - fired up the box, built a kernel off of a 6/18 source snapshot, and it hangs in about the same place - however what I get that as soon as I touch a key to invoke the debugger from the console, it continues merrily booting and I can't break into DDB until way past the problem. In my mind this kind of confirms something is busted in the interrupts. Hrmm, perhaps you are getting an interrupt storm from ahc. Ok, try this: find the ahc driver's interrupt handler, and add a printf. Then see if the printf fires while the machine is hung. Tried looking back through the show ktr output and I'm not 100% clear on what it all means - I guess I'm interested in the ithread stuff and the only thing I ever see is swi6: tty:sio+ in the trace buffer besides what appears to be normal process rescheduling (?) which is mostly idle task time... Unfortunately, clock interrupts can fill the trace buffer up, yes. :( If rolling back the source tree gets you a working kernel, then you might want to do a binary search using date tags to narrow down what commit actually broke things on your box. -- John Baldwin [EMAIL PROTECTED] -- http://www.FreeBSD.org/~jhb/ PGP Key: http://www.baldwin.cx/~john/pgpkey.asc Power Users Use the Power to Serve! - http://www.FreeBSD.org/ To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
RE: SCSI hangs w/SuperMicro 6010H
On 17-Jun-01 Dave Cornejo wrote: Please excuse me if you've seen this in questions, but I found a relevancy to current: If I drop back to 4.3 release, this system boots every time with no hangs observed in half a dozen tries in either UP or SMP mode. Anyone else seeing similar? Is this on -current or -stable? If it's on -current, why did you ask on -questions? :) It looks like an interrupt problem however. -- John Baldwin [EMAIL PROTECTED] -- http://www.FreeBSD.org/~jhb/ PGP Key: http://www.baldwin.cx/~john/pgpkey.asc Power Users Use the Power to Serve! - http://www.FreeBSD.org/ To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: SCSI hangs w/SuperMicro 6010H
John Baldwin wrote: Is this on -current or -stable? If it's on -current, why did you ask on -questions? :) It looks like an interrupt problem however. When I asked on questions, I was of the belief that I had a hardware problem and that it was not necessarily a -current issue. When I later went back and installed 4.3 and it worked I then realized that I had justification to post it on -current. Hey, at least I didn't cross-post to questions, stable, scsi, and current! :) I guess my next step is to try and trace through what's happening - can you suggest a good place to start (like a routine to start tracing), or is there anything I can do that might get more info for the people that know what is going on? thanks! dave c -- Dave Cornejo @ Dogwood Media, Fremont, California (also [EMAIL PROTECTED]) There aren't any monkeys chasing us... - Xochi To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: SCSI hangs w/SuperMicro 6010H
On 21-Jun-01 Dave Cornejo wrote: John Baldwin wrote: Is this on -current or -stable? If it's on -current, why did you ask on -questions? :) It looks like an interrupt problem however. When I asked on questions, I was of the belief that I had a hardware problem and that it was not necessarily a -current issue. When I later went back and installed 4.3 and it worked I then realized that I had justification to post it on -current. Hey, at least I didn't cross-post to questions, stable, scsi, and current! :) Ok, sounds good, just checking. :) Can you provide the output of mptable for this box? In the SMP case, -current does interrupt routing for PCI interrupts a bit differently, which might be a possible reason. Hmm, but you are getting interrupts eventually it seems. I guess my next step is to try and trace through what's happening - can you suggest a good place to start (like a routine to start tracing), or is there anything I can do that might get more info for the people that know what is going on? Actuually, KTR is your friend here. :) Read the ktr(4) manpage, then compile a kernel with KTR_MASK and KTR_COMPILE set to KTR_INTR|KTR_PROC. Then when it hangs, break into DDB and look at the longs via 'show ktr' to see if you can locate any interrutps coming in from ahc0 or ahc1. -- John Baldwin [EMAIL PROTECTED] -- http://www.FreeBSD.org/~jhb/ PGP Key: http://www.baldwin.cx/~john/pgpkey.asc Power Users Use the Power to Serve! - http://www.FreeBSD.org/ To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: SCSI hangs w/SuperMicro 6010H
John Baldwin wrote: Ok, sounds good, just checking. :) Can you provide the output of mptable for this box? In the SMP case, -current does interrupt routing for PCI interrupts a bit differently, which might be a possible reason. Hmm, but you are getting interrupts eventually it seems. I can only get the thing past the hang maybe once in twenty+ tries. Below is the mptable output (I don't remember what version of FreeBSD I had installed when I did this, hope it doesn't matter). I'll try the KTR stuff later tonight. thanks! dave c === MPTable, version 2.0.15 --- MP Floating Pointer Structure: location: BIOS physical address: 0x000ff780 signature:'_MP_' length: 16 bytes version: 1.1 checksum: 0xb9 mode: Virtual Wire --- MP Config Table Header: physical address: 0x000f0bd0 signature:'PCMP' base table length:284 version: 1.1 checksum: 0x28 OEM ID: 'AMI ' Product ID: 'CNB20HE ' OEM table pointer:0x OEM table size: 0 entry count: 27 local APIC address: 0xfee0 extended table length:0 extended table checksum: 0 --- MP Config Base Table Entries: -- Processors: APIC ID Version State Family Model StepFlags 0 0x11BSP, usable 6 8 6 0x387fbff 1 0x11AP, usable 6 8 6 0x387fbff -- Bus:Bus ID Type 0 PCI 1 PCI 2 PCI 3 ISA -- I/O APICs: APIC ID Version State Address 4 0x11usable 0xfec0 5 0x11usable 0xfec01000 -- I/O Ints: TypePolarityTrigger Bus ID IRQAPIC ID PIN# INT active-lo level1 0:A 5 14 INT active-lo level0 5:B 5 11 INT active-lo level0 15:A 4 10 INT active-lo level0 6:A 5 15 INT active-lo level0 5:A 5 10 INT active-lo level0 4:A 5 12 ExtINT active-hiedge3 0 40 INT active-hiedge3 1 41 INT active-hiedge3 0 42 INT active-hiedge3 3 43 INT active-hiedge3 4 44 INT active-hiedge3 6 46 INT active-hiedge3 8 48 INT active-hiedge312 4 12 INT active-hiedge313 4 13 INT active-hiedge314 4 14 INT active-hiedge315 4 15 -- Local Ints: TypePolarityTrigger Bus ID IRQAPIC ID PIN# ExtINT active-hiedge3 02550 NMI active-hiedge0 0:A2551 === -- Dave Cornejo @ Dogwood Media, Fremont, California (also [EMAIL PROTECTED]) There aren't any monkeys chasing us... - Xochi To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: SCSI hangs w/SuperMicro 6010H
John Baldwin wrote: Actuually, KTR is your friend here. :) Read the ktr(4) manpage, then compile a kernel with KTR_MASK and KTR_COMPILE set to KTR_INTR|KTR_PROC. Then when it hangs, break into DDB and look at the longs via 'show ktr' to see if you can locate any interrutps coming in from ahc0 or ahc1. Okay - fired up the box, built a kernel off of a 6/18 source snapshot, and it hangs in about the same place - however what I get that as soon as I touch a key to invoke the debugger from the console, it continues merrily booting and I can't break into DDB until way past the problem. In my mind this kind of confirms something is busted in the interrupts. Tried looking back through the show ktr output and I'm not 100% clear on what it all means - I guess I'm interested in the ithread stuff and the only thing I ever see is swi6: tty:sio+ in the trace buffer besides what appears to be normal process rescheduling (?) which is mostly idle task time... Do think there's any use to rolling my source tree back a ways and compiling a kernel? thanks, dave c -- Dave Cornejo @ Dogwood Media, Fremont, California (also [EMAIL PROTECTED]) There aren't any monkeys chasing us... - Xochi To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: SCSI hangs w/SuperMicro 6010H
Please excuse me if you've seen this in questions, but I found a relevancy to current: If I drop back to 4.3 release, this system boots every time with no hangs observed in half a dozen tries in either UP or SMP mode. Anyone else seeing similar? I doubt that this is related to CAM or the aic7xxx driver. You probably need to work with John Baldwin to trace the early execution of the system to see why it is haning up. -- Justin To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message