On Tue, May 09, 2000 at 11:54:28PM +0100, Kenn Humborg wrote:
>
> I've got a problem with two DEC RZ55 drives on an Advansys ABP3925
> host adapter.
>
> However, if I load the module while the drives are powered down,
> I get the following:
>
> May 9 21:51:19 avalon kernel: scsi : aborting command due to timeout :
> pid 49, scsi0, channel 0, id 1, lun 0 Test Unit Ready 00 00 00 00 00
[...snip...]
>
> To get some more info, I compiled up 2.3.99-pre6 with scsi logging and
> did echo "scsi log all" > /proc/scsi/scsi. After doing this, a
> modprobe advansys generated loads of info, but worked fine without
> any delays.
>
> This suggests to me that the issue is timing related. So, I want to
> try enabling various subsets of the logging types until I see which
> one 'cures' the timeout problem.
OK, I've played around with this some more and here are a few more
clues. These tests were performed with the 3.3A driver from
ftp.advansys.com on 2.3.99-pre6. (I had to comment out a few
printk()'s that don't compile when ADVANSYS_DEBUG is defined.)
The hardware setup is
ABP3925 adapter. Nothing connected to internal bus connector.
Termination set to Enabled in BIOS.
1metre cable to DEC BA42A disk enclosure. This box holds
two RZ55 disks. The internal twisted-pair ribbon cable in
this box is about 60cm long. The two IDC headers on the cable
are about 30cm apart. This cable is the original DEC cable.
The BA42A has two external Centronics connectors. One goes
to the ABP3925 and the other is terminated with a DEC 50-pin
Centronics-style terminator.
The RZ55s were powered down for the duration of this testing.
If I disconnect both drives, the driver loads OK (as expected).
If I connect one drive (doesn't matter which one, I've tried both
individually), the driver loads OK (as expected).
If I disconnect the terminator, and leave both drives connected,
the driver fails to load with
advansys: AscInitAsc1000Driver: board 0: error: init_state 13e, warn 0 error 8
(not quite as expected, but understandable).
If I connect both drives and the terminator, the driver takes
about a minute to load. (This time I don't get any error messages
because we're using the new SCSI error handling, which isn't as
chatty as the stuff in scsi_obsolete.c).
Turning on logging by setting asc_dbglvl to 1 or 2 and by playing
with echo "scsi log ..." > /proc/scsi/scsi caused the driver to
load correctly again.
So, I started to narrow down exactly how much needed to be logged
to make it work. I found that I could reduce it to this:
ASC_DBG(1, "advansys_interrupt: end\n");
-> printk("xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
-> "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx\n");
-> printk("xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
-> "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx\n");
return;
at the end of advansys_interrupt().
I always killed klogd during the module load to eliminate the
effect that disk I/O might have. So, during the test, the log
messages were only going to the console. Also, those two
printk()'s were only sufficient if the current cursor position
on the console was at the bottom, so that the console had to
scroll while printing. I imagine that different machine with
different CPU speeds and graphics cards will require different
amounts of log output here to 'fix' the problem.
So, this is my theory:
The additional console output delays handling of the next
interrupt slightly, thus allowing time for _something_ in
the card status to change before the ISR deals with the
interrupt.
So, let's try adding a small delay to the start of the ISR:
ASC_DBG(1, "advansys_interrupt: begin\n");
-> mdelay(10);
/*
* Check for interrupts on all boards.
* AscISR() will call asc_isr_callback().
*/
This doesn't help. The driver still goes into slow error-recovery
for each SCSI device ID that it tries to scan for.
So where do we go from here? Does anyone need me to check anything
else? Want any more info?
Someone else suggested that maybe my host adapter is not supplying
termination power. Unfortunately, I haven't been able to bring
home a voltmeter to check this. In any case, dodgy termination
shouldn't lead to this kind of slow error recovery (with interrupts
being locked out for 500ms during part of it), should it?
Later,
Kenn
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]