Alan Stern wrote:
On Mon, 3 May 2004, David Brownell wrote:

If all the locking obeys the same (correct, and presumably top-down)
locking protocol, then deadlocks won't happen.  That'd include the
code doing the reset ... which would see -ENODEV faults starting at
some point, and then fail cleanly.


No, you've missed a key point.  Let's say disconnect and reset happen at
nearly the same time -- not unlikely, since a user frustrated at the
errors that the reset is trying to correct might very well unplug the
device.  khubd locks the hub, then the device, then calls the driver's

That was the example wherein I made that typo .. :)


Other than locking the hub of the device being disconnected, none of
that should be happening before the tree of devices being unplugged is
marked as USB_STATE_NOTATTACHED.  (Often a degenerate tree: one device.)


disconnect().  Meanwhile the error handler part of the driver has called
usb_reset_device() which is trying to lock the hub.

Whatever any driver tries to do with a USB_STATE_NOTATTACHED device should be producing -ENODEV responses. We could even extend that to acquiring its lock ... though I expect that should be necessary.


Your strictly top-down analysis would say: Okay, khubd will proceed, and
when it's done the device reset will proceed and fail with -ENODEV or
-ESHUTDOWN.  But that's wrong -- khubd can't proceed because the driver
_mustn't_ return from disconnect() until it is finished using the device,
i.e, until usb_reset_device() returns.  (In the case of usb-storage it
_can't_ return; scsi_unregister_host() will block until the error-handler
thread has finished with the device.)

That's not _my_ analysis ... :)


One or the other task will lock the hub first.  Simple case:  khubd
wins, driver's top-down lock acquisition will first block (because it
can't get past khubd) and then later fail (the device is gone, though
that task has an old device pointer that it's still got to release).

The not-as-simple case is that the reset() gets in first, and then khubd
marks the device as NOTATTACHED before it blocks (on the reset task).
At some point the driver notices (starts getting -ENODEV), stops, gives
up, drops its lock, then khubd finishes.


This disconnect() issue is a parallel of the open()/disconnect() issue. In both cases, there's state that must linger after disconnect() returns, and be cleaned up later. In one case it's what close() accesses, and it's associated with a user file handle. In the other case, it's what the SCSI EH task will have to work with as it's noticing -ENODEV.


What's needed is a way for usb_reset_device() to try to acquire the hub's
serialize semaphore in such a way that it can be interrupted and fail if
and when the device is marked DISCONNECTED. Normal semaphores won't
permit something like that. (I don't think sending a signal and using down_interruptible() is a good approach.)

Erm, I thought I'd described several parts of exactly that. The basic version: locktree() failing if the device has actually been removed from the tree by then. (Possibly khubd could do that bit rather early ...) The slightly fancier version, checking for NOTATTACHED as it acquires locks, just fails a wee bit faster. Or as above, later requests failing because state is NOTATTACHED.


    // internal to usbcore
    void change_device_state(struct usb_device *dev,
                                enum usb_device_state state)
    {
         static spinlock_t   lock = SPIN_LOCK_UNLOCKED;
         unsigned long       flags;

         spin_lock_irqsave(&lock, flags);
         if (dev->state != USB_STATE_UNATTACHED) {
               dev->state = state;
               /* hey, why not recurse right here? */
         }
         spin_unlock_irqrestore(&lock, flags);
     }

I'm not quite sure what your concern is with children[].


Same as before. If we recurse right there, we end up calling

change_device_state(dev->children[i], USB_STATE_DISCONNECTED);

How do we know that between the time we make that call and the time the
inner invocation looks at (dev->children[i])->state, the child device
structure hasn't been deallocated? Remember, we aren't allowed to acquire a topology semaphore.

What I've been talking about is a scheme where hub->serialize serves as the (physical) topology lock for that subtree ... and everyone uses the same simple locking convention. I'd require whoever changes any child's state to NOTATTACHED to hold that lock, then later to actually get rid of the device object.

The "why not recurse" stage would be an optimization, and you could
be right that it's not easily worked.

- Dave








-------------------------------------------------------
This SF.Net email is sponsored by: Oracle 10g
Get certified on the hottest thing ever to hit the market... Oracle 10g. Take an Oracle 10g class now, and we'll give you the exam FREE. http://ads.osdn.com/?ad_id=3149&alloc_id=8166&op=click
_______________________________________________
[EMAIL PROTECTED]
To unsubscribe, use the last form field at:
https://lists.sourceforge.net/lists/listinfo/linux-usb-devel

Reply via email to