On Sun, Apr 04, 2004 at 11:17:55PM -0400, Alan Stern wrote:

> As it turns out, the block layer guarantees that when sd_open runs the
> bd_disk pointer will be valid.  It does this by following the pattern I
> mentioned in an earlier message -- drivers/base/map.c uses a
> subsystem-wide semaphore, domain_sem, to properly synchronize lookups and
> deletes.
> 
> Next, the scsi_disk inline function returns:
> 
>       container_of(disk->private_data, struct scsi_disk, driver);
> 
> How do you know that the scsi_disk pointed to by disk->private_data still 
> exists?  So far as I can see, the gendisk doesn't take any references to 
> it.  Correct me if I'm wrong, but there doesn't seem to be anything 
> preventing a disconnect event from arriving after the open() call has got 
> a valid reference to the gendisk, and succeeding in deallocating the 
> scsi_disk before this code executes.  There's only one reference between 
> the scsi_disk and the gendisk, and it goes the wrong way: the scsi_disk 
> owns a reference to the gendisk.
> 
> But let's suppose that works okay, so sdkp is a valid pointer.  Then 
> the code calls scsi_disk_get(), which in turn calls scsi_device_get() for 
> sdkp->device.  How do you know that this doesn't point to deallocated 
> storage?  The only reference to the scsi_device is taken (in a rather 
> convoluted way) by the gendisk, and it is dropped during del_gendisk() -- 
> not when the gendisk is released.  Hence it is entirely possible for a 
> disconnect event to have freed the scsi_device when this code executes.
> 
> 
> There's two potential oopses for you.  I don't have a full grasp of the
> web of interlocking references (and interlocking code) in the SCSI,
> gendisk, and block layers, but it seems likely that at least one of
> these might actually happen.

Yes. 

If we are in middle opening an sd, and are anywhere between the
kobj_lookup having completed call to up_read(domain->sem) [called via
do_open calling get_gendisk] and sd_open calling scsi_device_get(), and
separately an sd_remove call completes [freeing the scsi_disk and the
scsi_device] there could be an oops.

Much as I hate it, simply adding a lock_kernel in sd_remove would solve
the problem, since do_open uses lock_kernel.

And I don't see why the kobject_get/put use an atomic_inc/dec, it just
obscures the need for a lock. We already have a reference to the kobject
(it's passed as an argument). The calls to the get/put must either be
protected with a lock or other method, or we have already gotten a
reference to the object and getting another is meaningless.

-- Patrick Mansfield


-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
[EMAIL PROTECTED]
To unsubscribe, use the last form field at:
https://lists.sourceforge.net/lists/listinfo/linux-usb-devel

Reply via email to