On Sat, Dec 04, 1999 at 12:08:16PM -0500, Douglas Gilbert wrote:
> ard wrote:
> > Hi,
> > While trying to do some pass through commands using the megaraid 466 card,
> > I got some bizarre error conditions.
> > The problem was that the megaraid driver just returned the status without
> > massageing it into linux-friendly values.
> > Resulting of course in scsi_obsolete errors and requeueing the command.
> > This patch fixes this by ALWAYS returning DID_BAD_DRIVE on any status
> > condition for ioctl's on the megaraid host-adapter.
> > This patch will not influence normal behavior, but it might however have
> > influence on the megamgr program (which I am not using anyway), that does
> > do ioctl's on the host-adapter.
> 
> There are 2 ways that I can think of that you did this "pass through":
>  - using the SCSI_IOCTL_SEND_COMMAND ioctl()
>  - using the scsi generic (sg) device.
> 
> Given the problem that you are reporting my guess is that you
> used the first one. There are 2 problems with that, retries
> are attempted (that in the worst case can lead to SCSI bus resets)
> and that ioctl either gives your data back (in the case of a read
> like command) or a sense buffer (in the case of any non-zero
> status/error).
Well, actually, no. Although AMI's megamgr does use the
SCSI_IOCTL_SEND_COMMAND, I cannot use it. The program I work on is meant
to initialize the raid-controller's configuration. There are no 
logical SD devices yet, only one sg device: the SAF-TE.
> The sg device instructs the mid-level not to attempt retries
> (which the mid-level ignores in the case of DID_SOFT_ERROR ??)
> and if the SCSI command requested returned data then something
> will be returned as well as SCSI status/error codes. [The
> "something returned" can be useful data in the case of
> CHECK CONDITION, RECOVERED ERROR.]
The thing is: this is about a special ioctl command. Result
codes are returned straight from the firmware, without massaging
it. When you walk through the scsi_obsolete code, you will
see that if you return DID_OK, but the status_byte != 0, weird
things will happen, resulting in a sense_request send to the
device.
This is what happens:
I open the /dev/sgx for the SAF_TE. I use the fd to issue the
special "scsi" command 0x80 or 0x81 to send commands to the
megaraid controller, and NOT the SAF_TE. But the return codes
from the megaraid firmware are not filtered. So I get a
DID_OK, and a status that most of the times is OK (==0),
but sometimes contains linux-scsi incompatible values, and so
makes scsi_obsolete barf that there is an internal error.
Somehow the command seems to be queued again. Maybe after a
sense_request, but a sense_request for an ioctl is rather
useless, since the SAF_TE will -of course- return a-ok.
> Even though the SCSI_IOCTL_SEND_COMMAND ioctl() is deprecated
> it is still supported. Recently I wrote some documentation for
> it that appears in front of the 
> scsi_ioctl::scsi_ioctl_send_command() function definition in
> the lk 2.3 tree. It is also discussed in
> http://www.torque.net/sg/p/scsi-generic_long.txt .
I need stable. Currently I'm using sg-II. It works fine, except
for the megaraid driver returning "bogus".
> The megaraid driver obviously should clean up its internal
> scsi return value.
Hey, that's one thing that is definitely certain. And some /proc
support should be welcome too. And of course adhereing to the
2.2 scsi code (i.e. not using scsi_obsolete anymore). Maybe that
also fixes ALL problems. The scsi_obsolete code says this:
/* This function is the mid-level interrupt routine, which decides how
 *  to handle error conditions.  Each invocation of this function must
 *  do one and *only* one of the following:
 *
 *  (1) Call last_cmnd[host].done.  This is done for fatal errors and
 *      normal completion, and indicates that the handling for this
 *      request is complete.
 *  (2) Call internal_cmnd to requeue the command.  This will result in
 *      scsi_done being called again when the retry is complete.
 *  (3) Call scsi_request_sense.  This asks the host adapter/drive for
 *      more information about the error condition.  When the information
 *      is available, scsi_done will be called again.
 *  (4) Call reset().  This is sort of a last resort, and the idea is that
 *      this may kick things loose and get the drive working again.  reset()
 *      automatically calls scsi_request_sense, and thus scsi_done will be
 *      called again once the reset is complete.
 *
 *      If none of the above actions are taken, the drive in question
 *      will hang. If more than one of the above actions are taken by
 *      scsi_done, then unpredictable behavior will result.
 */
(2), (3) and (4) are things I definitely do not want to occur.

> P.S. Sorry about this cross post, the original post should have
> gone to the linux-scsi group.
I will reply on this again with the patch...
But I do not want to dial my job right now...

Now for another approach:
My problem is that I have to give the megaraid controller commands,
and not any devices on the scsi-bus. So I was wondering: isn't it
wise to have the driver register itself as a scsi device (which it
actually is)? I then could open the /dev/sgx device to the controller,
and not to a real device, and use that as a way to reach the
controller. I think it makes sense, especially if you want to cluster
machines with the scsi-bus as the major communications channel... :)
--
 intel1: 6:09pm up 10 days, 19:35, 4 users, load average: 0.00, 0.00, 0.00

-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]

Reply via email to