Re: [SUMMARY] libata EH

Jeff Garzik Sat, 20 Aug 2005 21:10:47 -0700

Tejun Heo wrote:

Jeff Garzik wrote:
Simple stuff like "command aborted" (invalid command) can be handledimmediately, no need to kick in the error handling.

But as long as the right hardware interrupts are acknowledged, I don'tmind if all error handling is moved to the thread.

My preference is toward unifying into single path as long asperformance penalty is acceptable for the sake of simplicity.


The hot path is completing reads and writes successfully.
Secondary hot path is completing <other commands> successfully.

For everything else, clear, simple, maintainable code is preferred overfast code.

2. Synchronization

    * SCSI EH entrance is not synchronized with normal processing.
      ATAPI error handling/timeout handling can run concurrently
      with normal command processing.  Albert, I think it's the
      same problem you're trying to solve by moving ATA_QCFLAG_ACTIVE
      clearing.

      http://marc.theaimsgroup.com/?l=linux-ide&m=112417360223374&w=2
The SCSI layer stops all command processing before calling->eh_strategy_handler(). Where do you see that it runs concurrentlywith normal command processing? That should definitely -not- behappening.


 There are currently two problems.

 * As we don't grab host_set lock on entry to ata_scsi_error(), we can
   run concurrently with latter part of ata_qc_complete().  This race is
   addressed by the following patches I've just posted.

   http://marc.theaimsgroup.com/?l=linux-ide&m=112454734102242&w=2



hmmmmm.  I can see a bit of that:

When ->eh_strategy_handler() is called, the SCSI layer has stoppedsending commands to all ports on the specified SCSI host.


However, it looks like we can race against
(a) interrupt handler completing a command on another port
(b) interrupt handler belatedly completing a command on our port
(c) if polling, another kernel thread

(a) shouldn't matter right now, but will in the future when we take ahost-reset action that can 'blip' all ports.(b) is a -very- rare worry in ATA, since commands that don't completeafter 30 seconds probably will never complete. But given how CHECKCONDITION is implemented in libata's ATAPI code, falling immediatelyover to the EH, this might be a real concern for ATAPI.

(c) was mentioned in previous emails.  A rare worry.

Did I miss anything?

 * After entering EH, normal command completion or spurious interrupt
   can occur.  We currently don't peg those interrupts, so interrupt
   handling can interfere with EH.

As long as it is not the local port, it shouldn't interfere with EH(currently).

As there are concerns regarding semantics of ->eh_strategy_handler andit's a less-used and less-charted territory, I'm gonna try to write adocument describing the following.
 * How SCSI EH works and commands flow through it with the default
   fine-grained hooks.
 * From above, extract what ->eh_strategy_handler() should do.
 * What libata error conditions are there and how qc's should be
   handle.
 * How to integrate libata EH into SCSI EH without losing commands.
I don't how good the doc will turn out (don't expect too much), but Ihope it could serve as a basis for discussion if nothing else.


It would certainly be nice to get all of this written down.

After writing above mentioned doc, I'll try to improve/revise and breakdown my previously posted EH patchset and explain how they conform toabove yet-to-be-written document such that it can be better understoodand easier to review/debug.


Cool.  Thank you.

I'll get those patches reviewed sometime this weekend.

        Jeff


-
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [SUMMARY] libata EH

Reply via email to