PS, I am traveling today so future comments will be delayed a bit. Alan Stern [EMAIL PROTECTED] wrote: > All right, let's look at sd.c. I'll show you that _it_ doesn't obey the > object lifetime rules. In sd_open we see this code (lightly edited): > > > static int sd_open(struct inode *inode, struct file *filp) > { > struct gendisk *disk = inode->i_bdev->bd_disk; > struct scsi_disk *sdkp = scsi_disk(disk); > struct scsi_device *sdev; > int retval; > > retval = scsi_disk_get(sdkp); > if (retval) > return retval; > > sdev = sdkp->device; > > > As it turns out, the block layer guarantees that when sd_open runs the > bd_disk pointer will be valid. It does this by following the pattern I > mentioned in an earlier message -- drivers/base/map.c uses a > subsystem-wide semaphore, domain_sem, to properly synchronize lookups and > deletes. > > Next, the scsi_disk inline function returns: > > container_of(disk->private_data, struct scsi_disk, driver); > > How do you know that the scsi_disk pointed to by disk->private_data still > exists? So far as I can see, the gendisk doesn't take any references to > it. Correct me if I'm wrong, but there doesn't seem to be anything > preventing a disconnect event from arriving after the open() call has got > a valid reference to the gendisk, and succeeding in deallocating the > scsi_disk before this code executes. There's only one reference between > the scsi_disk and the gendisk, and it goes the wrong way: the scsi_disk > owns a reference to the gendisk. > > But let's suppose that works okay, so sdkp is a valid pointer. Then > the code calls scsi_disk_get(), which in turn calls scsi_device_get() for > sdkp->device. How do you know that this doesn't point to deallocated > storage? The only reference to the scsi_device is taken (in a rather > convoluted way) by the gendisk, and it is dropped during del_gendisk() -- > not when the gendisk is released. Hence it is entirely possible for a > disconnect event to have freed the scsi_device when this code executes. > > > There's two potential oopses for you. I don't have a full grasp of the > web of interlocking references (and interlocking code) in the SCSI, > gendisk, and block layers, but it seems likely that at least one of > these might actually happen. > > The object lifetime rules require that in your disconnect() routine, you > must tell all your users that your structure is going away, but you must > not free the structure until your users have notified you that they won't > try to use it any more. When the scsi_disk is on its way out, sd.c tells > the gendisk but doesn't wait for a notification in return. When the > scsi_device is on its way out the SCSI core tells sd.c, but sd.c doesn't > send back its notification at the right time.
As I previously stated there is no notification back from the block layer that users are complete with a structure. Currently the only method to prevent an oops looks like sd_remove would need to use lock_kernel so that scsi could ensure that a user would be through open which means the sd_remove would not take references to zero or the user has not made it far enough through do open that they have received a gendisk structure so that del_gendisk will ensure they do not call sd_open. I would like another method, but this looks like it would need to be another shared sync mechanism between scsi layer (an other blk interfaces) and block layer or a lookup method similar to kobj_lookup in scsi open routines so that a object can be unmapped atomically. -andmike -- Michael Anderson [EMAIL PROTECTED] ------------------------------------------------------- This SF.Net email is sponsored by: IBM Linux Tutorials Free Linux tutorial presented by Daniel Robbins, President and CEO of GenToo technologies. Learn everything from fundamentals to system administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click _______________________________________________ [EMAIL PROTECTED] To unsubscribe, use the last form field at: https://lists.sourceforge.net/lists/listinfo/linux-usb-devel