Hi All,
As youall may know, Tru64 Unix SCSI is implemented using the
Common Access Method (CAM). With CAM, there are several callbacks
for bus or device resets, new devices found during scan, etc.
Without going into Compaq's cluster design, we have two flavors,
folks will have a difficult (if not impossible) task of implementing
cluster device interlocking, without SCSI bus/device resets.
Both our products have specialized device drivers for implementing
cluster details, but the point is, our implementations would be impossible
without the lower levels supporting bus/device resets.
At the user level, my 'scu' program allows users to issue bus or
device resets, plus a bus scanning API on Tru64 Unix. This user level
control of bus and device resets is invaluable for developing tests to
verify that devices are cluster "safe".
In general, wacking the SCSI with bus resets to abort I/O may
do the trick, but it's not nice to affect all devices because of say
a timeout on a single device In our CAM implementation, we first try
an Abort, then BDR, then BR as a last resort (well actually, reloading
firmware is the last resort on some adapters).
So yes, as Linux becomes more mature, these other uses of SCSI
need to be considered. I agree with Dan, James, and Doug, that some
improvement is necessary in this area.
Kind Regards,
Robin
Dan Jones wrote:
>
> I just want to affirm James' point. Any SCSI (i.e. bus-based)
> multi-initator configuration will require intelligent handling
> of reset. The best case would be where every initiator got a
> chance to vote whether a reset should be issued. Since there
> are good reasons why that won't happen, not the least of which
> is that there would still have to be an override case, the next
> best solution is not to break out a reset unless nothing else
> is left to try. Hmmm, this may ultimately be a case for separate
> SCSI buses for different classes of devices.
>
> Anyway, based on my experience in the past, SCSI drivers tend
> to believe that god intended only one host for each SCSI bus and,
> when in multi-initiator configuration, a driver of that design
> has a propensity to get into reset wars of the "oh yeah, take that!"
> variety.
>
> As long as I am writing on the topic, another problem for
> multi-initiator is when each layer in the boot sequence decides
> that the best way to proceed is to make no assumptions on the
> current state of the SCSI interface and to issue a reset. This
> problem is normally solved by hiring a gorilla to flip the power
> switches of the enclosures with both initiators at the same time
> when the systems are more than 6 feet apart.
>
> James Bottomley wrote:
> >
> > [EMAIL PROTECTED] said:
> > > I tried to add some logic to the mid level driver [core] so that the
> > > upper level drivers (specifically sg under superuser's instruction)
> > > could request a SCSI bus/device/host reset. Basically the patch was
> > > rejected because it might interfere with the mid level's error
> > > processing.
> >
> > We really need to modify the mid level's attitude to resets, otherwise we're
> > going to get killed on fibre channel (which is basically a huge
> > multi-initiator environment). Once you have more than one initiator on a bus
> > (or SAN ring) a reset can come at any time or stage during command processing
> > (Linux actually exacerbates the situation by sending a reset on boot).
> > Probably the way to hande this is some type of reset notify callback from low
> > to mid layers. Drivers can detect bus resets directly. Device or LUN resets
> > can be detected by unexpected Unit Attention conditions. At the moment
> > external reset handling is done at the low level. Nice drivers (like the
> > aic7xxx, sym8xx and most fibre drivers) detect resets and UA's, abort
> > outstanding commands and return them to the mid level. Cards which handle
> > reset processing correctly also correctly handle reset injection without
> > causing problems to the mid level.
> >
> > Given the above, I really think it makes sense to provide a reset injection
> > ability through the sg driver on a "caveat emptor" basis. If you have a low
> > level driver that doesn't do reset processing, the chances are that the mid
> > level error handling will get very confused. However, for a (reasonably long)
> > list of cards reset injection will work correctly. Since reset injection is
> > an essential facility for handling reservations in a SAN environment, I don't
> > think it's necessary to reject it just because of a few bad low level drivers,
> > just take the standard UNIX philosophy of letting root decide if it's safe or
> > not.
> >
> > > My feeling is that in some contexts [best known to a user application
> > > or an upper level driver] turning off the mid level's error processing
> > > might be advantageous. There is also no way for an application or an
> > > upper level driver to abort a SCSI command in progress. This means
> > > that an application can't set an extremely long timeout to the SCSI
> > > subsystem, run its own timer, and if that timer expires, abort the
> > > outstanding command.
> >
> > That would be truly great and useful functionality!
> >
> > James Bottomley
> >
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> > the body of a message to [EMAIL PROTECTED]
>
> --
> Dan Jones, Manager, Storage Products VA Linux Systems
> V:(408)542-5737 F:(408)745-9911 1382 Bordeaux Drive
> [EMAIL PROTECTED] Sunnyvale, CA 94089
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
> the body of a message to [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to [EMAIL PROTECTED]