Hi Doug, and all,

The problem was solved by setting the HA BIOS to NOT allow
disconnects for the SCSI ID(s) involved.

----- Original Message -----
From: Douglas Gilbert <[EMAIL PROTECTED]>
To: David C. Hoos, Sr. <[EMAIL PROTECTED]>
Cc: Linux SCSI <[EMAIL PROTECTED]>
Sent: Monday, April 10, 2000 10:55 PM
Subject: Re: Device driver problems


> "David C. Hoos, Sr." wrote:
> >
<large snip>

> Perhaps disconnect is a red herring. That mid-level flag that
> sg prints out is showing 0 for both my both my IBM DCHSU disks
> which is pretty unlikely. Doing a grep on the mid-level code
> seems to indicate that the mid-level has no interest in the
> state of that flag (i.e. it is left to the HBA driver). If
> so perhaps I should replace that column with something more
> useful.
>
Well, it certainly misled me.

<large snip>
> > Is the sg 3.0.13 (20000323) any different from 3.0.4 (991127) in any
> > respect affecting disconnect?
>
> I don't think so. At the time of 3.0.4 the same sg built on
> lk 2.2+2.3 . Around 3.0.9 it was split into 3.0.10 for lk 2.2
> and 3.1.10 for lk 2.3 . Nearly all of the changes have been
> on the lk 2.3 side (mid level changes, bugs etc). You can
> re-install sg 3.0.4 to do a regression test. Please tell
> me if that makes a difference.
>
> While you are trying regression testing, why not wind the
> aic7xxx driver back to the earlier version as well.
>
Since my code is not in C, it's a little more cumbersome to
regression test, since I have to revise the sg_io_hdr data
structure which changed from 3.0.4 to 3.0.13, so.. given my
schedule pressures, and the fact that changing to HA BIOS
setting solved the proble, I don't plan to regression test.

> Looking at the logs, all 3 luns are now picked up. Was that
> as a result of turning the MULTI_LUN config switch on?

No.  I had already compiled MULTI_LUN.  What made the
difference was putting add single device commands in rc.local,
as can be seen from the /var/log/messages file.

Incidentally, in your first reply, you said:
"Well I looked at the logs and I can find no reference to luns 1
or 2 in the "aborting ..." messages and all the second bytes
of the read(10) and write(10) commands seems to be 0 (indicating
lun 0)."

One of the several examples from that original log is:
Apr  3 07:53:32 piu01 kernel: scsi : aborting command due to timeout :
pid 18, scsi0, channel 0, id 4, lun 0 Read (10) 20 00 00 00 00 00 28 a4 00

If I understand the format of these messages correctly, the second byte
to which you refer is the first one of the nine printed --the first byte
being
the opcode which was decoded as Read (10).  Therefore, it is a
command for LUN 1.

Similarly, an example write timeout message is:
Apr  3 07:57:38 piu01 kernel: scsi : aborting command due to timeout :
pid 52, scsi0, channel 0, id 4, lun 0 Write (10) 40 00 00 00 00 00 00 0c 00
which shows a command directed to LUN 2.

Perhaps in the context of sg/scsi/aic7xxx, the lun shown in the message
is the lun associated with the file descriptor, and not what's imbedded in
the command block?

Just some food for thought....

Respectfully,
David C. Hoos, Sr.  W1DCH
------------------
The primary purpose of the DATA statement is to give names to
constants; instead of referring to pi as 3.141592653589793 at every
appearance, the variable PI can be given that value with a DATA
statement and used instead of the longer form of the constant.  This
also simplifies modifying the program, should the value of pi change.
                 -- FORTRAN manual for Xerox Computers






-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]

Reply via email to