Re: BUG in: Driver core: convert block from raw kobjects to core devices (fwd)
On Tue, 2007-10-30 at 21:25 -0700, Greg KH wrote: On Mon, Oct 29, 2007 at 08:13:17PM +0100, Kay Sievers wrote: On Mon, 2007-10-29 at 14:47 -0400, Alan Stern wrote: On Mon, 29 Oct 2007, James Bottomley wrote: I'm not sure if we can do this patch. If you kill a device, you see there's a processing state in scsi_prep_fn() for SDEV_DEL, which has a printk message I see quite often when I unplug devices while they're operating. If you NULL out and free the request queue immediately after setting SDEV_DEL, without allowing the devices and filesystems to finish, are you sure we're not going to get a null deref on sdev-request_queue? Let's get at this another way. The patch below probably should be applied in any case. It's essentially a reversion of this patch from 2003: http://git.kernel.org/?p=linux/kernel/git/tglx/history.git;a=commit;h=10921a8f1305b8ec97794941db78b825db5839bc which was written to compensate for problems in the SCSI stack. The idea was to avoid releasing a kobject before all its children were released. However as far as I can see, in general we should be able to release a kobject as soon as all its children are removed and its refcount drops to 0, without waiting for the children to be released. To put it another way, once a child is removed from visibility it should not try (or need) to access its parent at all. If it has to then it should take an explicit reference to the parent. Since the SCSI stack is now in much better shape, there doesn't seem to be any reason to keep the old code. Do you agree that the patch below is worth merging? I submitted it to Greg some time ago, but he didn't want to accept it without some assurance that it is now safe. It would fix the problem Kay and saw with the circular references, because the references would all get dropped when the scsi_device is removed instead of waiting for a release that will never come. It indeed fixes my problem. And it sounds you are right, the fix from 2003 is probably just a paper-over for a missing explicit reference that time. I'm all for giving the reversion a try, and add explicit parent get/put if needed somewhere. If, for some reason, that will not happen, I'll need to do something like this in the block patch, which will then be a fix for the paper-over solution for an unknown bug. :) --- a/fs/partitions/check.c +++ b/fs/partitions/check.c ... + device_del(disk-dev); + + /* +* disconnect from parent device, so the parent can clean up +* without waiting for us to clean up +* +* the driver core took this reference while we added ourself as +* a child of the parent device +*/ + parent = disk-dev.kobj.parent; Save off the parent before calling device_del() otherwise it could be a stale pointer, right? I still hold a ref, the one I drop a few lines later. It should be safe. + disk-dev.kobj.parent = NULL; + disk-dev.parent = NULL; + kobject_put(parent); No, that's just wrong, if this is needed, something else really is broken, and I don't think it's the driver core... It's just a kobject_orphan() function. We need to break the circiular reference somewhere, one of the objects in the circle has to clean up, to allow the others to clean up. The objects are properly deleted, but all final references are only dropped on cleanup. The kobject_orphan() here is just what Alan's patch is doing for the whole core. With the current logic, if you have any parent referencing a child, the core will deadlock, and never reach any cleanup funtion, right? That's the loop I need to break here. Thanks, Kay - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BUG in: Driver core: convert block from raw kobjects to core devices (fwd)
On Wed, Oct 31, 2007 at 11:46:30AM +0100, Kay Sievers wrote: On Tue, 2007-10-30 at 21:25 -0700, Greg KH wrote: On Mon, Oct 29, 2007 at 08:13:17PM +0100, Kay Sievers wrote: On Mon, 2007-10-29 at 14:47 -0400, Alan Stern wrote: On Mon, 29 Oct 2007, James Bottomley wrote: I'm not sure if we can do this patch. If you kill a device, you see there's a processing state in scsi_prep_fn() for SDEV_DEL, which has a printk message I see quite often when I unplug devices while they're operating. If you NULL out and free the request queue immediately after setting SDEV_DEL, without allowing the devices and filesystems to finish, are you sure we're not going to get a null deref on sdev-request_queue? Let's get at this another way. The patch below probably should be applied in any case. It's essentially a reversion of this patch from 2003: http://git.kernel.org/?p=linux/kernel/git/tglx/history.git;a=commit;h=10921a8f1305b8ec97794941db78b825db5839bc which was written to compensate for problems in the SCSI stack. The idea was to avoid releasing a kobject before all its children were released. However as far as I can see, in general we should be able to release a kobject as soon as all its children are removed and its refcount drops to 0, without waiting for the children to be released. To put it another way, once a child is removed from visibility it should not try (or need) to access its parent at all. If it has to then it should take an explicit reference to the parent. Since the SCSI stack is now in much better shape, there doesn't seem to be any reason to keep the old code. Do you agree that the patch below is worth merging? I submitted it to Greg some time ago, but he didn't want to accept it without some assurance that it is now safe. It would fix the problem Kay and saw with the circular references, because the references would all get dropped when the scsi_device is removed instead of waiting for a release that will never come. It indeed fixes my problem. And it sounds you are right, the fix from 2003 is probably just a paper-over for a missing explicit reference that time. I'm all for giving the reversion a try, and add explicit parent get/put if needed somewhere. If, for some reason, that will not happen, I'll need to do something like this in the block patch, which will then be a fix for the paper-over solution for an unknown bug. :) --- a/fs/partitions/check.c +++ b/fs/partitions/check.c ... + device_del(disk-dev); + + /* +* disconnect from parent device, so the parent can clean up +* without waiting for us to clean up +* +* the driver core took this reference while we added ourself as +* a child of the parent device +*/ + parent = disk-dev.kobj.parent; Save off the parent before calling device_del() otherwise it could be a stale pointer, right? I still hold a ref, the one I drop a few lines later. It should be safe. Ok, but it's still wrong that this is needed :) + disk-dev.kobj.parent = NULL; + disk-dev.parent = NULL; + kobject_put(parent); No, that's just wrong, if this is needed, something else really is broken, and I don't think it's the driver core... It's just a kobject_orphan() function. We need to break the circiular reference somewhere, one of the objects in the circle has to clean up, to allow the others to clean up. The objects are properly deleted, but all final references are only dropped on cleanup. The kobject_orphan() here is just what Alan's patch is doing for the whole core. With the current logic, if you have any parent referencing a child, the core will deadlock, and never reach any cleanup funtion, right? That's the loop I need to break here. Hm, I seem to have missed the part in this thread where someone said that it was valid to have a parent reference a child device. That's just wrong and needs to be fixed. Is that in the scsi layer somewhere? The block layer? It sure isn't in the driver core... thanks, greg k-h - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BUG in: Driver core: convert block from raw kobjects to core devices (fwd)
On Wed, 2007-10-31 at 07:32 -0700, Greg KH wrote: Hm, I seem to have missed the part in this thread where someone said that it was valid to have a parent reference a child device. That's just wrong and needs to be fixed. Is that in the scsi layer somewhere? The block layer? It sure isn't in the driver core... This is the piece I'm still not clear on. It's something to do with the gendisk. I'd have to look in block, but I believe the queue takes a ref to the gendisk. The scsi_device has a ref to the queue and the scsi_disk (in sd) has a ref to both the scsi_device and the gendisk. That means, until sd is unbound and the scsi_disk released, there's an implied unbreakable reference chain. at least, I think that's what the problem is. James - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BUG in: Driver core: convert block from raw kobjects to core devices (fwd)
On Wed, 2007-10-31 at 10:15 -0500, James Bottomley wrote: On Wed, 2007-10-31 at 07:32 -0700, Greg KH wrote: Hm, I seem to have missed the part in this thread where someone said that it was valid to have a parent reference a child device. That's just wrong and needs to be fixed. Is that in the scsi layer somewhere? The block layer? It sure isn't in the driver core... This is the piece I'm still not clear on. It's something to do with the gendisk. I'd have to look in block, but I believe the queue takes a ref to the gendisk. Yes, the queue is a child of the disk. The scsi_device has a ref to the queue Yeah, while the queue is a grandchild of the scsi_device with the unified sysfs layout. and the scsi_disk (in sd) has a ref to both the scsi_device and the gendisk. That means, until sd is unbound and the scsi_disk released, there's an implied unbreakable reference chain. at least, I think that's what the problem is. Yes, sounds right. We need to break that deleted-but-wait-for-cleanup at least at one of the devices involved. Thanks, Kay - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BUG in: Driver core: convert block from raw kobjects to core devices (fwd)
On Wed, 2007-10-31 at 16:40 +0100, Kay Sievers wrote: On Wed, 2007-10-31 at 10:15 -0500, James Bottomley wrote: On Wed, 2007-10-31 at 07:32 -0700, Greg KH wrote: Hm, I seem to have missed the part in this thread where someone said that it was valid to have a parent reference a child device. That's just wrong and needs to be fixed. Is that in the scsi layer somewhere? The block layer? It sure isn't in the driver core... This is the piece I'm still not clear on. It's something to do with the gendisk. I'd have to look in block, but I believe the queue takes a ref to the gendisk. Yes, the queue is a child of the disk. Right, so this goes gendisk-queue (- meaning parent of, or takes reference to) The scsi_device has a ref to the queue Yeah, while the queue is a grandchild of the scsi_device with the unified sysfs layout. No, the scsi_device is a direct parent of the queue, so we have scsi_device-queue and the scsi_disk (in sd) has a ref to both the scsi_device and the gendisk. That means, until sd is unbound and the scsi_disk released, there's an implied unbreakable reference chain. at least, I think that's what the problem is. Yes, sounds right. We need to break that deleted-but-wait-for-cleanup at least at one of the devices involved. But it's broken when the driver is unbound. Diagrammatically it's: scsi_disk - scsi_device - queue - gendisk - It's not circular, it's released when scsi_disk is released. It can become circular if there's some hidden dependency between any of the components ... but I don't think there is. James - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BUG in: Driver core: convert block from raw kobjects to core devices (fwd)
On Tue, 30 Oct 2007, Greg KH wrote: Since the SCSI stack is now in much better shape, there doesn't seem to be any reason to keep the old code. Do you agree that the patch below is worth merging? I submitted it to Greg some time ago, but he didn't want to accept it without some assurance that it is now safe. It would fix the problem Kay and saw with the circular references, because the references would all get dropped when the scsi_device is removed instead of waiting for a release that will never come. Hm, will this really solve the problem properly? I kept that above patch because of the very real problem where a device in the middle of the device tree can go away without the lower devices being gone. When you say go away, do you mean be deleted (deallocated)? In general that should not pose a problem. Once a child has been unregistered it should not try to access its parent's structure any more. Hence, if the parent structure gets deallocated before the child structure, nothing bad will happen. Or when you say go away, do you mean be removed (unregistered)? If a parent is removed while it still has children registered, that's a genuine bug. (In fact, driver_del() should check for this and issue a WARN_ON if it ever happens.) The driver doing the remove needs to be fixed. The example at the time was usb-serial devices. The physical USB device goes away, but the tty device that userspace has open still sticks around until userspace closes the char device. Now you are using goes away to mean is unplugged from the computer! :-) There's nothing wrong with the physical device being unplugged, so long as the logical usb_device structure remains allocated. And since the usb_device is the parent of the tty device (it is the parent, isn't it?), it _will_ remain pinned while the tty device is registered. So the only problem that could arise would be if the tty device and the usb_device are both unregistered while userspace continues to hold the char device file open. There are two possible ways to handle this: 1. Ideally, the usb-serial driver would realize that the USB device was disconnected (by checking a private flag) and would therefore avoid trying to access the usb_device structure. 2. If that's impractical, the usb-serial driver could take an explicit reference to the usb_device structure each time the char device file is opened, and drop the reference when the file is closed. Either approach would work. With your patch, will things break for this usage model? Not if the usb-serial drivers are written properly. This kobject cleanup path is so fickle, I hate having to touch it :( Is usb-serial the worst case? Then if it turns out to be compatible with this change, all other problems will be equally fixable. Alan Stern - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BUG in: Driver core: convert block from raw kobjects to core devices (fwd)
On Wed, 31 Oct 2007, James Bottomley wrote: On Wed, 2007-10-31 at 07:32 -0700, Greg KH wrote: Hm, I seem to have missed the part in this thread where someone said that it was valid to have a parent reference a child device. That's just wrong and needs to be fixed. Is that in the scsi layer somewhere? The block layer? It sure isn't in the driver core... This is the piece I'm still not clear on. It's something to do with the gendisk. I'd have to look in block, but I believe the queue takes a ref to the gendisk. The scsi_device has a ref to the queue and the scsi_disk (in sd) has a ref to both the scsi_device and the gendisk. That means, until sd is unbound and the scsi_disk released, there's an implied unbreakable reference chain. at least, I think that's what the problem is. No, you haven't got it right. Parent Child Grandchild -- - -- scsi_device gendisk request_queue The odd part is that the scsi_device holds a reference to the queue. That creates a reference loop: scsi_device holds ref torequest_queue (done explicitly) request_queue holds ref togendisk (implicit, parent-child) gendisk holds ref toscsi_device (implicit, parent-child) The scsi_disk adds confusion to the picture but it doesn't make things any worse than they already are. Alan Stern - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BUG in: Driver core: convert block from raw kobjects to core devices (fwd)
On Wed, 31 Oct 2007, James Bottomley wrote: Yes, the queue is a child of the disk. Right, so this goes gendisk-queue (- meaning parent of, or takes reference to) No, no! The _child_ takes an implicit reference to the _parent_, not the other way around. The scsi_device has a ref to the queue Yeah, while the queue is a grandchild of the scsi_device with the unified sysfs layout. No, the scsi_device is a direct parent of the queue, so we have scsi_device-queue Wrong -- the gendisk is the direct parent of the queue. The relevant line is in ll_rw_blk.c:blk_register_queue(): q-kobj.parent = kobject_get(disk-dev.kobj); Yes, sounds right. We need to break that deleted-but-wait-for-cleanup at least at one of the devices involved. But it's broken when the driver is unbound. Diagrammatically it's: scsi_disk - scsi_device - queue - gendisk - It's not circular, it's released when scsi_disk is released. It can become circular if there's some hidden dependency between any of the components ... but I don't think there is. Forget about the scsi_disk. It isn't part of the problem. Just concentrate on the scsi_device, the gendisk, and the queue. We have: scsi_device - gendisk - queue - scsi_device, where A - B means that B holds a reference to A. Alan Stern - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BUG in: Driver core: convert block from raw kobjects to core devices (fwd)
On Wed, 2007-10-31 at 11:58 -0400, Alan Stern wrote: On Wed, 31 Oct 2007, James Bottomley wrote: On Wed, 2007-10-31 at 07:32 -0700, Greg KH wrote: Hm, I seem to have missed the part in this thread where someone said that it was valid to have a parent reference a child device. That's just wrong and needs to be fixed. Is that in the scsi layer somewhere? The block layer? It sure isn't in the driver core... This is the piece I'm still not clear on. It's something to do with the gendisk. I'd have to look in block, but I believe the queue takes a ref to the gendisk. The scsi_device has a ref to the queue and the scsi_disk (in sd) has a ref to both the scsi_device and the gendisk. That means, until sd is unbound and the scsi_disk released, there's an implied unbreakable reference chain. at least, I think that's what the problem is. No, you haven't got it right. Parent Child Grandchild -- - -- scsi_device gendisk request_queue This is what I think doesn't happen. The scsi_device should never be parented to the gendisk. The odd part is that the scsi_device holds a reference to the queue. That creates a reference loop: scsi_device holds ref torequest_queue (done explicitly) request_queue holds ref togendisk (implicit, parent-child) agreed. gendisk holds ref toscsi_device (implicit, parent-child) Where exactly does this last part happen? the scsi_device is a mid layer object; the mid-layer doesn't know about gendisks. James - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BUG in: Driver core: convert block from raw kobjects to core devices (fwd)
On Mon, Oct 29, 2007 at 02:47:59PM -0400, Alan Stern wrote: On Mon, 29 Oct 2007, James Bottomley wrote: I'm not sure if we can do this patch. If you kill a device, you see there's a processing state in scsi_prep_fn() for SDEV_DEL, which has a printk message I see quite often when I unplug devices while they're operating. If you NULL out and free the request queue immediately after setting SDEV_DEL, without allowing the devices and filesystems to finish, are you sure we're not going to get a null deref on sdev-request_queue? Let's get at this another way. The patch below probably should be applied in any case. It's essentially a reversion of this patch from 2003: http://git.kernel.org/?p=linux/kernel/git/tglx/history.git;a=commit;h=10921a8f1305b8ec97794941db78b825db5839bc which was written to compensate for problems in the SCSI stack. The idea was to avoid releasing a kobject before all its children were released. However as far as I can see, in general we should be able to release a kobject as soon as all its children are removed and its refcount drops to 0, without waiting for the children to be released. To put it another way, once a child is removed from visibility it should not try (or need) to access its parent at all. If it has to then it should take an explicit reference to the parent. Since the SCSI stack is now in much better shape, there doesn't seem to be any reason to keep the old code. Do you agree that the patch below is worth merging? I submitted it to Greg some time ago, but he didn't want to accept it without some assurance that it is now safe. It would fix the problem Kay and saw with the circular references, because the references would all get dropped when the scsi_device is removed instead of waiting for a release that will never come. Hm, will this really solve the problem properly? I kept that above patch because of the very real problem where a device in the middle of the device tree can go away without the lower devices being gone. The example at the time was usb-serial devices. The physical USB device goes away, but the tty device that userspace has open still sticks around until userspace closes the char device. With your patch, will things break for this usage model? This kobject cleanup path is so fickle, I hate having to touch it :( thanks, greg k-h - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BUG in: Driver core: convert block from raw kobjects to core devices (fwd)
On Mon, Oct 29, 2007 at 08:13:17PM +0100, Kay Sievers wrote: On Mon, 2007-10-29 at 14:47 -0400, Alan Stern wrote: On Mon, 29 Oct 2007, James Bottomley wrote: I'm not sure if we can do this patch. If you kill a device, you see there's a processing state in scsi_prep_fn() for SDEV_DEL, which has a printk message I see quite often when I unplug devices while they're operating. If you NULL out and free the request queue immediately after setting SDEV_DEL, without allowing the devices and filesystems to finish, are you sure we're not going to get a null deref on sdev-request_queue? Let's get at this another way. The patch below probably should be applied in any case. It's essentially a reversion of this patch from 2003: http://git.kernel.org/?p=linux/kernel/git/tglx/history.git;a=commit;h=10921a8f1305b8ec97794941db78b825db5839bc which was written to compensate for problems in the SCSI stack. The idea was to avoid releasing a kobject before all its children were released. However as far as I can see, in general we should be able to release a kobject as soon as all its children are removed and its refcount drops to 0, without waiting for the children to be released. To put it another way, once a child is removed from visibility it should not try (or need) to access its parent at all. If it has to then it should take an explicit reference to the parent. Since the SCSI stack is now in much better shape, there doesn't seem to be any reason to keep the old code. Do you agree that the patch below is worth merging? I submitted it to Greg some time ago, but he didn't want to accept it without some assurance that it is now safe. It would fix the problem Kay and saw with the circular references, because the references would all get dropped when the scsi_device is removed instead of waiting for a release that will never come. It indeed fixes my problem. And it sounds you are right, the fix from 2003 is probably just a paper-over for a missing explicit reference that time. I'm all for giving the reversion a try, and add explicit parent get/put if needed somewhere. If, for some reason, that will not happen, I'll need to do something like this in the block patch, which will then be a fix for the paper-over solution for an unknown bug. :) --- a/fs/partitions/check.c +++ b/fs/partitions/check.c ... + device_del(disk-dev); + + /* +* disconnect from parent device, so the parent can clean up +* without waiting for us to clean up +* +* the driver core took this reference while we added ourself as +* a child of the parent device +*/ + parent = disk-dev.kobj.parent; Save off the parent before calling device_del() otherwise it could be a stale pointer, right? + disk-dev.kobj.parent = NULL; + disk-dev.parent = NULL; + kobject_put(parent); No, that's just wrong, if this is needed, something else really is broken, and I don't think it's the driver core... thanks, greg k-h - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BUG in: Driver core: convert block from raw kobjects to core devices (fwd)
James: With regard to the discussion below, is there any objection to the attached patch? It moves the call to scsi_release_queue() from scsi_device_dev_release_usercontext() to __scsi_remove_device(). In fact, it might turn out that with this change, the extra _usercontext routine isn't needed at all. As far as I can see, the only reason for not adopting this patch would be if a scsi_device's queue structure was needed even after the scsi_device was unregistered. Does this ever happen? If it does, it would be indicative of a nasty reference-loop problem (scsi_device needs reference to request_queue, request_queue holds reference to gendisk, gendisk holds reference to scsi_device). Alan Stern -- Forwarded message -- Date: Tue, 23 Oct 2007 13:27:49 +0200 From: Kay Sievers [EMAIL PROTECTED] To: Alan Stern [EMAIL PROTECTED] Cc: Greg KH [EMAIL PROTECTED], Kernel development list [EMAIL PROTECTED], Hannes Reinecke [EMAIL PROTECTED] Subject: Re: BUG in: Driver core: convert block from raw kobjects to core devices On Tue, 2007-10-23 at 00:14 -0400, Alan Stern wrote: On Tue, 23 Oct 2007, Kay Sievers wrote: There must be something going wrong with the block patch in conjunction with the crazy SCSI release logic. I don't know -- there's nothing obviously wrong with the block patch except the extra put_device. But you're right that the SCSI logic is crazy. The scsi_device is the parent of the gendisk, which is the parent of the request_queue. But the scsi_device holds a reference to the request_queue, which is dropped in the scsi_device's release routine! That's the thing. We see a circular reference when we use the SCSI LUN as the parent. The sysfs relationship is: scsi_device - genhd - queue, while the queue holds a ref to to the blockdev (parent), the blockdev to the scsi_device (parent) and the scsi_devices to the queue (SCSI). All waiting for their release functions to be called, to release the other refs. The scsi_device needs to drop the queue reference while the device is removed, not when it's data is released. Hannes came up with the attached patch, which seems to work fine here. That doesn't make much sense to me, and it is complicated by the fact that deleting a kobject doesn't drop the kobject's reference to its parent -- only releasing the kobject does. Right, that makes things very complicated. In fact, for my development work I normally use a patch which makes kobjects behave better: They do drop the reference to their parent when they are deleted. This actually is nothing more than a reversion of a patch added several years ago to try and cover up some of the weaknesses of the SCSI stack! Now that the SCSI stack is in better shape, maybe my patch should be included in the mainstream kernel. The patch is below; see what you think. Sounds good to me, to disconnect a dead object from its parent when it's deleted. We only need to protect for use after free but not lock the parent, I guess. We should give it a try. Thanks, Kay diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c index daed37d..577bbf3 100644 --- a/drivers/scsi/scsi_sysfs.c +++ b/drivers/scsi/scsi_sysfs.c @@ -280,15 +280,6 @@ static void scsi_device_dev_release_usercontext(struct work_struct *work) list_del(sdev-starved_entry); spin_unlock_irqrestore(sdev-host-host_lock, flags); - if (sdev-request_queue) { - sdev-request_queue-queuedata = NULL; - /* user context needed to free queue */ - scsi_free_queue(sdev-request_queue); - /* temporary expedient, try to catch use of queue lock - * after free of sdev */ - sdev-request_queue = NULL; - } - scsi_target_reap(scsi_target(sdev)); kfree(sdev-inquiry); @@ -799,6 +790,16 @@ void __scsi_remove_device(struct scsi_device *sdev) if (sdev-host-hostt-slave_destroy) sdev-host-hostt-slave_destroy(sdev); transport_destroy_device(dev); + + if (sdev-request_queue) { + sdev-request_queue-queuedata = NULL; + /* user context needed to free queue */ + scsi_free_queue(sdev-request_queue); + /* temporary expedient, try to catch use of queue lock + * after free of sdev */ + sdev-request_queue = NULL; + } + put_device(dev); }
Re: BUG in: Driver core: convert block from raw kobjects to core devices (fwd)
On Mon, 2007-10-29 at 11:18 -0400, Alan Stern wrote: With regard to the discussion below, is there any objection to the attached patch? It moves the call to scsi_release_queue() from scsi_device_dev_release_usercontext() to __scsi_remove_device(). In fact, it might turn out that with this change, the extra _usercontext routine isn't needed at all. There's a flaw in the discussion. As far as I can see, the only reason for not adopting this patch would be if a scsi_device's queue structure was needed even after the scsi_device was unregistered. Does this ever happen? If it does, it would be indicative of a nasty reference-loop problem (scsi_device needs reference to request_queue, request_queue holds reference to gendisk, gendisk holds reference to scsi_device). I'm not sure if we can do this patch. If you kill a device, you see there's a processing state in scsi_prep_fn() for SDEV_DEL, which has a printk message I see quite often when I unplug devices while they're operating. If you NULL out and free the request queue immediately after setting SDEV_DEL, without allowing the devices and filesystems to finish, are you sure we're not going to get a null deref on sdev-request_queue? Alan Stern -- Forwarded message -- Date: Tue, 23 Oct 2007 13:27:49 +0200 From: Kay Sievers [EMAIL PROTECTED] To: Alan Stern [EMAIL PROTECTED] Cc: Greg KH [EMAIL PROTECTED], Kernel development list [EMAIL PROTECTED], Hannes Reinecke [EMAIL PROTECTED] Subject: Re: BUG in: Driver core: convert block from raw kobjects to core devices On Tue, 2007-10-23 at 00:14 -0400, Alan Stern wrote: On Tue, 23 Oct 2007, Kay Sievers wrote: There must be something going wrong with the block patch in conjunction with the crazy SCSI release logic. I don't know -- there's nothing obviously wrong with the block patch except the extra put_device. But you're right that the SCSI logic is crazy. The scsi_device is the parent of the gendisk, which is the parent of the request_queue. But the scsi_device holds a reference to the request_queue, which is dropped in the scsi_device's release routine! That's the thing. We see a circular reference when we use the SCSI LUN as the parent. The sysfs relationship is: scsi_device - genhd - queue, while the That's not true. The relationship is scsi_device - queue However, genhd is conditioned on the ULD attachment (we can have queues with no gendisk), so for instance, for sd it's scsi_disk - genhd - queue - scsi_device However, when we get to the device destruction part, the first thing that should happen is that the driver should unbind, so we release the driver specific structure and the genhd (and it's ref to the queue and the scsi_device). Then we can let the mid-layer remove functions release the queue. queue holds a ref to to the blockdev (parent), the blockdev to the scsi_device (parent) and the scsi_devices to the queue (SCSI). All waiting for their release functions to be called, to release the other refs. This isn't how it's supposed to be ... the refs are supposed to be the other way around (blockdev/genhd has a ref to the queue, but scsi_device has a separate ref to the queue; the genhd should be releaseable even if the device hasn't given up the queue). I probably need more thread context to make a more informed statement, though. The scsi_device needs to drop the queue reference while the device is removed, not when it's data is released. Hannes came up with the attached patch, which seems to work fine here. That doesn't make much sense to me, and it is complicated by the fact that deleting a kobject doesn't drop the kobject's reference to its parent -- only releasing the kobject does. Right, that makes things very complicated. In fact, for my development work I normally use a patch which makes kobjects behave better: They do drop the reference to their parent when they are deleted. This actually is nothing more than a reversion of a patch added several years ago to try and cover up some of the weaknesses of the SCSI stack! Now that the SCSI stack is in better shape, maybe my patch should be included in the mainstream kernel. The patch is below; see what you think. Sounds good to me, to disconnect a dead object from its parent when it's deleted. We only need to protect for use after free but not lock the parent, I guess. We should give it a try. James - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BUG in: Driver core: convert block from raw kobjects to core devices (fwd)
On Mon, 2007-10-29 at 12:38 -0400, Alan Stern wrote: No, that's not what happens. The genhd's parent is the scsi_device and the queue's parent is the genhd. You can see this immediately from the directory layout in sysfs. Thus: How has this happened? The scsi_device doesn't know anything about the genhd; how can it hold a reference to it? If you grep the code, you'll see that only the ULD's (that's sd, sr, st, osst et al) know what a gendisk is. James - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BUG in: Driver core: convert block from raw kobjects to core devices (fwd)
On Mon, 29 Oct 2007, James Bottomley wrote: On Mon, 2007-10-29 at 11:18 -0400, Alan Stern wrote: With regard to the discussion below, is there any objection to the attached patch? It moves the call to scsi_release_queue() from scsi_device_dev_release_usercontext() to __scsi_remove_device(). In fact, it might turn out that with this change, the extra _usercontext routine isn't needed at all. There's a flaw in the discussion. As far as I can see, the only reason for not adopting this patch would be if a scsi_device's queue structure was needed even after the scsi_device was unregistered. Does this ever happen? If it does, it would be indicative of a nasty reference-loop problem (scsi_device needs reference to request_queue, request_queue holds reference to gendisk, gendisk holds reference to scsi_device). I'm not sure if we can do this patch. If you kill a device, you see there's a processing state in scsi_prep_fn() for SDEV_DEL, which has a printk message I see quite often when I unplug devices while they're operating. Me too. If you NULL out and free the request queue immediately after setting SDEV_DEL, without allowing the devices and filesystems to finish, are you sure we're not going to get a null deref on sdev-request_queue? That's basically what I was asking you! :-) What happens if you try to clean up a block queue when there are still outstanding requests? On Tue, 2007-10-23 at 00:14 -0400, Alan Stern wrote: On Tue, 23 Oct 2007, Kay Sievers wrote: There must be something going wrong with the block patch in conjunction with the crazy SCSI release logic. I don't know -- there's nothing obviously wrong with the block patch except the extra put_device. But you're right that the SCSI logic is crazy. The scsi_device is the parent of the gendisk, which is the parent of the request_queue. But the scsi_device holds a reference to the request_queue, which is dropped in the scsi_device's release routine! That's the thing. We see a circular reference when we use the SCSI LUN as the parent. The sysfs relationship is: scsi_device - genhd - queue, while the That's not true. The relationship is scsi_device - queue I think Kay wrote his arrows backwards. Or maybe he just made a simple mistake. However, genhd is conditioned on the ULD attachment (we can have queues with no gendisk), so for instance, for sd it's scsi_disk - genhd - queue - scsi_device However, when we get to the device destruction part, the first thing that should happen is that the driver should unbind, so we release the driver specific structure and the genhd (and it's ref to the queue and the scsi_device). Then we can let the mid-layer remove functions release the queue. That's what should happen. And in fact the scsi_disk structure doesn't cause any difficulties; we're only worried about the scsi_device structure. queue holds a ref to to the blockdev (parent), the blockdev to the scsi_device (parent) and the scsi_devices to the queue (SCSI). All waiting for their release functions to be called, to release the other refs. This isn't how it's supposed to be ... the refs are supposed to be the other way around (blockdev/genhd has a ref to the queue, but scsi_device has a separate ref to the queue; the genhd should be releaseable even if the device hasn't given up the queue). No, that's not what happens. The genhd's parent is the scsi_device and the queue's parent is the genhd. You can see this immediately from the directory layout in sysfs. Thus: the scsi_device holds a reference to the queue, which holds a reference to the genhd, which holds a reference to the scsi_device. In fact it's not the struct device's reference to the parent which causes the problem -- it's the embedded kobject's reference to the kobject embedded in the parent device. This reference doesn't get dropped when the device (and its embedded kobject) are removed; it persists until they are released. But with the reference loop they never do get released. I probably need more thread context to make a more informed statement, though. If you want to wade through the email messages, the thread starts here: http://marc.info/?t=11927357356r=1w=2 Here's a brief summary. Kay wrote a patch modifying the block-device core, the Driver core: convert block from raw kobjects to core devices patch in $Subject. The patch itself is located here: http://www.kernel.org/pub/linux/kernel/people/gregkh/gregkh-2.6/gregkh-01-driver/block-device.patch although this version has been updated; in its original form the patch included an extra call to put_device(disk_dev) right at the end of del_gendisk() in fs/partitions/check.c. In my testing I found the extra put_device() caused problems and I complained to Kay about it. He explained that on his system the device structures would
Re: BUG in: Driver core: convert block from raw kobjects to core devices (fwd)
On Mon, 29 Oct 2007, James Bottomley wrote: On Mon, 2007-10-29 at 12:38 -0400, Alan Stern wrote: No, that's not what happens. The genhd's parent is the scsi_device and the queue's parent is the genhd. You can see this immediately from the directory layout in sysfs. Thus: How has this happened? The scsi_device doesn't know anything about the genhd; how can it hold a reference to it? No, no -- it's the other way around. The scsi_device is the parent of the genhd, so the genhd holds a reference to the scsi_device, not vice versa. Alan Stern - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BUG in: Driver core: convert block from raw kobjects to core devices (fwd)
On Mon, 29 Oct 2007, James Bottomley wrote: I'm not sure if we can do this patch. If you kill a device, you see there's a processing state in scsi_prep_fn() for SDEV_DEL, which has a printk message I see quite often when I unplug devices while they're operating. If you NULL out and free the request queue immediately after setting SDEV_DEL, without allowing the devices and filesystems to finish, are you sure we're not going to get a null deref on sdev-request_queue? Let's get at this another way. The patch below probably should be applied in any case. It's essentially a reversion of this patch from 2003: http://git.kernel.org/?p=linux/kernel/git/tglx/history.git;a=commit;h=10921a8f1305b8ec97794941db78b825db5839bc which was written to compensate for problems in the SCSI stack. The idea was to avoid releasing a kobject before all its children were released. However as far as I can see, in general we should be able to release a kobject as soon as all its children are removed and its refcount drops to 0, without waiting for the children to be released. To put it another way, once a child is removed from visibility it should not try (or need) to access its parent at all. If it has to then it should take an explicit reference to the parent. Since the SCSI stack is now in much better shape, there doesn't seem to be any reason to keep the old code. Do you agree that the patch below is worth merging? I submitted it to Greg some time ago, but he didn't want to accept it without some assurance that it is now safe. It would fix the problem Kay and saw with the circular references, because the references would all get dropped when the scsi_device is removed instead of waiting for a release that will never come. Alan Stern Index: usb-2.6/lib/kobject.c === --- usb-2.6.orig/lib/kobject.c +++ usb-2.6/lib/kobject.c @@ -206,12 +206,16 @@ void kobject_init(struct kobject * kobj) static void unlink(struct kobject * kobj) { + struct kobject *parent = kobj-parent; + if (kobj-kset) { spin_lock(kobj-kset-list_lock); list_del_init(kobj-entry); spin_unlock(kobj-kset-list_lock); } + kobj-parent = NULL; kobject_put(kobj); + kobject_put(parent); } /** @@ -255,7 +259,6 @@ int kobject_add(struct kobject * kobj) if (error) { /* unlink does the kobject_put() for us */ unlink(kobj); - kobject_put(parent); /* be noisy on error issues */ if (error == -EEXIST) @@ -498,7 +501,6 @@ void kobject_cleanup(struct kobject * ko { struct kobj_type * t = get_ktype(kobj); struct kset * s = kobj-kset; - struct kobject * parent = kobj-parent; const char *name = kobj-k_name; pr_debug(kobject %s: cleaning up\n,kobject_name(kobj)); @@ -515,7 +517,6 @@ void kobject_cleanup(struct kobject * ko } if (s) kset_put(s); - kobject_put(parent); } static void kobject_release(struct kref *kref) - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BUG in: Driver core: convert block from raw kobjects to core devices (fwd)
On Mon, 2007-10-29 at 14:47 -0400, Alan Stern wrote: On Mon, 29 Oct 2007, James Bottomley wrote: I'm not sure if we can do this patch. If you kill a device, you see there's a processing state in scsi_prep_fn() for SDEV_DEL, which has a printk message I see quite often when I unplug devices while they're operating. If you NULL out and free the request queue immediately after setting SDEV_DEL, without allowing the devices and filesystems to finish, are you sure we're not going to get a null deref on sdev-request_queue? Let's get at this another way. The patch below probably should be applied in any case. It's essentially a reversion of this patch from 2003: http://git.kernel.org/?p=linux/kernel/git/tglx/history.git;a=commit;h=10921a8f1305b8ec97794941db78b825db5839bc which was written to compensate for problems in the SCSI stack. The idea was to avoid releasing a kobject before all its children were released. However as far as I can see, in general we should be able to release a kobject as soon as all its children are removed and its refcount drops to 0, without waiting for the children to be released. To put it another way, once a child is removed from visibility it should not try (or need) to access its parent at all. If it has to then it should take an explicit reference to the parent. Since the SCSI stack is now in much better shape, there doesn't seem to be any reason to keep the old code. Do you agree that the patch below is worth merging? I submitted it to Greg some time ago, but he didn't want to accept it without some assurance that it is now safe. It would fix the problem Kay and saw with the circular references, because the references would all get dropped when the scsi_device is removed instead of waiting for a release that will never come. It indeed fixes my problem. And it sounds you are right, the fix from 2003 is probably just a paper-over for a missing explicit reference that time. I'm all for giving the reversion a try, and add explicit parent get/put if needed somewhere. If, for some reason, that will not happen, I'll need to do something like this in the block patch, which will then be a fix for the paper-over solution for an unknown bug. :) --- a/fs/partitions/check.c +++ b/fs/partitions/check.c ... + device_del(disk-dev); + + /* +* disconnect from parent device, so the parent can clean up +* without waiting for us to clean up +* +* the driver core took this reference while we added ourself as +* a child of the parent device +*/ + parent = disk-dev.kobj.parent; + disk-dev.kobj.parent = NULL; + disk-dev.parent = NULL; + kobject_put(parent); } Thanks, Kay - To unsubscribe from this list: send the line unsubscribe linux-scsi in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html