On 4 Apr 2004, James Bottomley wrote: > OK, your "problem" definition is that "there's a race", which I agree > with, I just don't agree that it's a problem. > > Disconnections are fundamentally asynchronous events (a device may be > disconnected by the user at any stage regardless of what any kernel > internal state model is doing). Trying to impose synchronisation on > asynchronous events is asking for trouble. > > In the open race scenario, either the open is refused or the user gets a > fd that cannot do anything (because the device isn't there) and simply > returns errors to all operations. Both cases are correct, so who wins > the race is irrelevant.
Ah, you have left out the third, bad alternative: open succeeds, user gets an fd that points to a deallocated device. More details below... > Let me illustrate: the user may disconnect the device then open it. If > they open it before even the USB subsystem gets notified of the > disconnection then all the elaborate synchronisation in the world isn't > going to be able to prevent that (the device was gone when they opened > it, just nothing in the kernel knew that). Since we cannot solve that > race, there's no reason to try to solve the "some parts of the kernel > know but others don't" part of the race. I agree with everything except your conclusion. :-) Just because some outcomes of the race lead to a benign result no matter what, that doesn't mean all outcomes will or that we can ignore the race. Let's consider a simple example, one that doesn't have all the complexities of the sr driver with its multiple driver layers. The usb-skeleton program in drivers/usb makes a good illustration; I added a semaphore to it some time ago to protect against exactly this sort of race. Without that semaphore, here's what can happen: Open process: Disconnect process: Get minor number from inode Lookup USB interface using minor number Get device pointer from the interface's private data and check it's not NULL Get device pointer from the interface's private data Set the private data to NULL Lock the device sem Unregister the minor number Terminate ongoing I/O operations Clear the device->present flag Unlock the device Since the open count is 0, deallocate the device structure Lock the device sem --> oops The idea is that at some stage the open process has got far enough along to believe the device exists, but not far enough to hold a reference to it. (That's inevitable, since you can't try to acquire a reference until you're sure the device does exist.) If the disconnect process deallocates the device at that time, there will be trouble. Alan Stern ------------------------------------------------------- This SF.Net email is sponsored by: IBM Linux Tutorials Free Linux tutorial presented by Daniel Robbins, President and CEO of GenToo technologies. Learn everything from fundamentals to system administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click _______________________________________________ [EMAIL PROTECTED] To unsubscribe, use the last form field at: https://lists.sourceforge.net/lists/listinfo/linux-usb-devel