Alan Stern wrote:
...

One or the other task will lock the hub first.  Simple case:  khubd
wins, driver's top-down lock acquisition will first block (because it
can't get past khubd) and then later fail (the device is gone, though
that task has an old device pointer that it's still got to release).


This is the case I was concerned about. As you say, the driver's top-down
lock acquisition will block because it can't get past khubd. Without
something like the polling I described in my previous message, however,
your "later fail" part is wrong. It won't fail later; it will deadlock.

No, it'll fail one way or another ...


usb_reset_device() can't proceed until khubd releases the hub, khubd can't
release the hub until the driver's disconnect() returns, and the driver's
disconnect() can't return until usb_reset_device() returns.

And therein would be a bug in the device driver:



This disconnect() issue is a parallel of the open()/disconnect() issue.
In both cases, there's state that must linger after disconnect() returns,
and be cleaned up later.  In one case it's what close() accesses, and it's
associated with a user file handle.  In the other case, it's what the
SCSI EH task will have to work with as it's noticing -ENODEV.


That is wrong.  It's not simply a question of lingering state; it's also a
question of lingering code.  After disconnect() returns we have to assume
that the driver is no longer resident in memory.  Unlike open(),

Usbcore certainly does. But the device driver doesn't need to make such assumptions unless they're true. And in this case they'd clearly be false!


usb_reset_device() doesn't take a reference to the driver's module. Hence there can't be any threads (like SCSI EH) still trying to use it.

That would be a driver bug: the EH thread would have taken an extra reference to the device, and certainly should have refcounted it before dropping the lock which allowed disconnect() to start.

(And maybe an extra reference to the driver module, but that sounds
more like something SCSI should have done to usb-storage.)


Let's also consider the special case of usb-storage, and let's suppose for
a moment that the module won't be removed from memory when disconnect() returns. It's _still_ a problem, because disconnect() calls
scsi_unregister_host() and that routine won't return until the EH has finished.

A similar observation applies. Although that one might be harder to resolve, since in this case it's SCSI that's placing curious synchronization problems on the rest of Linux. It's still not all that hotplug-friendly, I guess ... or the "unregister host" call would let tasks currently using that host stop doing so at their own rate, rather than expect that they do so "right now".

If one thinks of refcount models (which have limitations!), the
disconnect() call, like unregister_host(), is just a way to drop
one "special reference".  There'd often be other refcounts, like
ones given out through sysfs and other channels.  Each of those
references would need to be individually released later ... when
the component with the reference learns that it's stale, and does
its own particular cleanup.

- Dave





-------------------------------------------------------
This SF.Net email is sponsored by: Oracle 10g
Get certified on the hottest thing ever to hit the market... Oracle 10g. Take an Oracle 10g class now, and we'll give you the exam FREE. http://ads.osdn.com/?ad_id=3149&alloc_id=8166&op=click
_______________________________________________
[EMAIL PROTECTED]
To unsubscribe, use the last form field at:
https://lists.sourceforge.net/lists/listinfo/linux-usb-devel

Reply via email to