Re: BUG in: Driver core: convert block from raw kobjects to core devices (fwd)

2007-10-31 Thread Kay Sievers
On Tue, 2007-10-30 at 21:25 -0700, Greg KH wrote:
 On Mon, Oct 29, 2007 at 08:13:17PM +0100, Kay Sievers wrote:
  On Mon, 2007-10-29 at 14:47 -0400, Alan Stern wrote:
   On Mon, 29 Oct 2007, James Bottomley wrote:
   
I'm not sure if we can do this patch.  If you kill a device, you see
there's a processing state in scsi_prep_fn() for SDEV_DEL, which has a
printk message I see quite often when I unplug devices while they're
operating.

If you NULL out and free the request queue immediately after setting
SDEV_DEL, without allowing the devices and filesystems to finish, are
you sure we're not going to get a null deref on sdev-request_queue?
   
   Let's get at this another way.  The patch below probably should be 
   applied in any case.  It's essentially a reversion of this patch from 
   2003:
   
   http://git.kernel.org/?p=linux/kernel/git/tglx/history.git;a=commit;h=10921a8f1305b8ec97794941db78b825db5839bc
   
   which was written to compensate for problems in the SCSI stack.  The
   idea was to avoid releasing a kobject before all its children were
   released.  However as far as I can see, in general we should be able to
   release a kobject as soon as all its children are removed and its
   refcount drops to 0, without waiting for the children to be released.  
   To put it another way, once a child is removed from visibility it
   should not try (or need) to access its parent at all.  If it has to
   then it should take an explicit reference to the parent.
   
   Since the SCSI stack is now in much better shape, there doesn't seem to 
   be any reason to keep the old code.  Do you agree that the patch below 
   is worth merging?  I submitted it to Greg some time ago, but he didn't 
   want to accept it without some assurance that it is now safe.
   
   It would fix the problem Kay and saw with the circular references, 
   because the references would all get dropped when the scsi_device is 
   removed instead of waiting for a release that will never come.
  
  It indeed fixes my problem. And it sounds you are right, the fix from
  2003 is probably just a paper-over for a missing explicit reference that
  time.
  
  I'm all for giving the reversion a try, and add explicit parent get/put
  if needed somewhere. If, for some reason, that will not happen, I'll
  need to do something like this in the block patch, which will then be a
  fix for the paper-over solution for an unknown bug. :)
  
  
  --- a/fs/partitions/check.c
  +++ b/fs/partitions/check.c
  ...
  +   device_del(disk-dev);
  +
  +   /*
  +* disconnect from parent device, so the parent can clean up
  +* without waiting for us to clean up
  +*
  +* the driver core took this reference while we added ourself as
  +* a child of the parent device
  +*/
  +   parent = disk-dev.kobj.parent;
 
 Save off the parent before calling device_del() otherwise it could be a
 stale pointer, right?

I still hold a ref, the one I drop a few lines later. It should be safe.

  +   disk-dev.kobj.parent = NULL;
  +   disk-dev.parent = NULL;
  +   kobject_put(parent);
 
 No, that's just wrong, if this is needed, something else really is
 broken, and I don't think it's the driver core...

It's just a kobject_orphan() function. We need to break the circiular
reference somewhere, one of the objects in the circle has to clean up,
to allow the others to clean up. The objects are properly deleted, but
all final references are only dropped on cleanup. The kobject_orphan()
here is just what Alan's patch is doing for the whole core.

With the current logic, if you have any parent referencing a child, the
core will deadlock, and never reach any cleanup funtion, right? That's
the loop I need to break here.

Thanks,
Kay

-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG in: Driver core: convert block from raw kobjects to core devices (fwd)

2007-10-31 Thread Greg KH
On Wed, Oct 31, 2007 at 11:46:30AM +0100, Kay Sievers wrote:
 On Tue, 2007-10-30 at 21:25 -0700, Greg KH wrote:
  On Mon, Oct 29, 2007 at 08:13:17PM +0100, Kay Sievers wrote:
   On Mon, 2007-10-29 at 14:47 -0400, Alan Stern wrote:
On Mon, 29 Oct 2007, James Bottomley wrote:

 I'm not sure if we can do this patch.  If you kill a device, you see
 there's a processing state in scsi_prep_fn() for SDEV_DEL, which has a
 printk message I see quite often when I unplug devices while they're
 operating.
 
 If you NULL out and free the request queue immediately after setting
 SDEV_DEL, without allowing the devices and filesystems to finish, are
 you sure we're not going to get a null deref on sdev-request_queue?

Let's get at this another way.  The patch below probably should be 
applied in any case.  It's essentially a reversion of this patch from 
2003:

http://git.kernel.org/?p=linux/kernel/git/tglx/history.git;a=commit;h=10921a8f1305b8ec97794941db78b825db5839bc

which was written to compensate for problems in the SCSI stack.  The
idea was to avoid releasing a kobject before all its children were
released.  However as far as I can see, in general we should be able to
release a kobject as soon as all its children are removed and its
refcount drops to 0, without waiting for the children to be released.  
To put it another way, once a child is removed from visibility it
should not try (or need) to access its parent at all.  If it has to
then it should take an explicit reference to the parent.

Since the SCSI stack is now in much better shape, there doesn't seem to 
be any reason to keep the old code.  Do you agree that the patch below 
is worth merging?  I submitted it to Greg some time ago, but he didn't 
want to accept it without some assurance that it is now safe.

It would fix the problem Kay and saw with the circular references, 
because the references would all get dropped when the scsi_device is 
removed instead of waiting for a release that will never come.
   
   It indeed fixes my problem. And it sounds you are right, the fix from
   2003 is probably just a paper-over for a missing explicit reference that
   time.
   
   I'm all for giving the reversion a try, and add explicit parent get/put
   if needed somewhere. If, for some reason, that will not happen, I'll
   need to do something like this in the block patch, which will then be a
   fix for the paper-over solution for an unknown bug. :)
   
   
   --- a/fs/partitions/check.c
   +++ b/fs/partitions/check.c
   ...
   +   device_del(disk-dev);
   +
   +   /*
   +* disconnect from parent device, so the parent can clean up
   +* without waiting for us to clean up
   +*
   +* the driver core took this reference while we added ourself as
   +* a child of the parent device
   +*/
   +   parent = disk-dev.kobj.parent;
  
  Save off the parent before calling device_del() otherwise it could be a
  stale pointer, right?
 
 I still hold a ref, the one I drop a few lines later. It should be safe.

Ok, but it's still wrong that this is needed :)

   +   disk-dev.kobj.parent = NULL;
   +   disk-dev.parent = NULL;
   +   kobject_put(parent);
  
  No, that's just wrong, if this is needed, something else really is
  broken, and I don't think it's the driver core...
 
 It's just a kobject_orphan() function. We need to break the circiular
 reference somewhere, one of the objects in the circle has to clean up,
 to allow the others to clean up. The objects are properly deleted, but
 all final references are only dropped on cleanup. The kobject_orphan()
 here is just what Alan's patch is doing for the whole core.
 
 With the current logic, if you have any parent referencing a child, the
 core will deadlock, and never reach any cleanup funtion, right? That's
 the loop I need to break here.

Hm, I seem to have missed the part in this thread where someone said
that it was valid to have a parent reference a child device.  That's
just wrong and needs to be fixed.  Is that in the scsi layer somewhere?
The block layer?  It sure isn't in the driver core...

thanks,

greg k-h
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG in: Driver core: convert block from raw kobjects to core devices (fwd)

2007-10-31 Thread James Bottomley
On Wed, 2007-10-31 at 07:32 -0700, Greg KH wrote:
 Hm, I seem to have missed the part in this thread where someone said
 that it was valid to have a parent reference a child device.  That's
 just wrong and needs to be fixed.  Is that in the scsi layer somewhere?
 The block layer?  It sure isn't in the driver core...

This is the piece I'm still not clear on.  It's something to do with the
gendisk.  I'd have to look in block, but I believe the queue takes a ref
to the gendisk.

The scsi_device has a ref to the queue and the scsi_disk (in sd) has a
ref to both the scsi_device and the gendisk.  That means, until sd is
unbound and the scsi_disk released, there's an implied unbreakable
reference chain.

at least, I think that's what the problem is.

James


-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG in: Driver core: convert block from raw kobjects to core devices (fwd)

2007-10-31 Thread Kay Sievers
On Wed, 2007-10-31 at 10:15 -0500, James Bottomley wrote:
 On Wed, 2007-10-31 at 07:32 -0700, Greg KH wrote:
  Hm, I seem to have missed the part in this thread where someone said
  that it was valid to have a parent reference a child device.  That's
  just wrong and needs to be fixed.  Is that in the scsi layer somewhere?
  The block layer?  It sure isn't in the driver core...
 
 This is the piece I'm still not clear on.  It's something to do with the
 gendisk.  I'd have to look in block, but I believe the queue takes a ref
 to the gendisk.

Yes, the queue is a child of the disk.

 The scsi_device has a ref to the queue

Yeah, while the queue is a grandchild of the scsi_device with the
unified sysfs layout.

 and the scsi_disk (in sd) has a
 ref to both the scsi_device and the gendisk.  That means, until sd is
 unbound and the scsi_disk released, there's an implied unbreakable
 reference chain.
 
 at least, I think that's what the problem is.

Yes, sounds right. We need to break that deleted-but-wait-for-cleanup at
least at one of the devices involved.

Thanks,
Kay

-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG in: Driver core: convert block from raw kobjects to core devices (fwd)

2007-10-31 Thread James Bottomley
On Wed, 2007-10-31 at 16:40 +0100, Kay Sievers wrote:
 On Wed, 2007-10-31 at 10:15 -0500, James Bottomley wrote:
  On Wed, 2007-10-31 at 07:32 -0700, Greg KH wrote:
   Hm, I seem to have missed the part in this thread where someone said
   that it was valid to have a parent reference a child device.  That's
   just wrong and needs to be fixed.  Is that in the scsi layer somewhere?
   The block layer?  It sure isn't in the driver core...
  
  This is the piece I'm still not clear on.  It's something to do with the
  gendisk.  I'd have to look in block, but I believe the queue takes a ref
  to the gendisk.
 
 Yes, the queue is a child of the disk.

Right, so this goes gendisk-queue (- meaning parent of, or takes
reference to)

  The scsi_device has a ref to the queue
 
 Yeah, while the queue is a grandchild of the scsi_device with the
 unified sysfs layout.

No, the scsi_device is a direct parent of the queue, so we have

scsi_device-queue

  and the scsi_disk (in sd) has a
  ref to both the scsi_device and the gendisk.  That means, until sd is
  unbound and the scsi_disk released, there's an implied unbreakable
  reference chain.
  
  at least, I think that's what the problem is.
 
 Yes, sounds right. We need to break that deleted-but-wait-for-cleanup at
 least at one of the devices involved.

But it's broken when the driver is unbound.  Diagrammatically it's:

scsi_disk - scsi_device - queue
  - gendisk -

It's not circular, it's released when scsi_disk is released.  It can
become circular if there's some hidden dependency between any of the
components ... but I don't think there is.

James


-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG in: Driver core: convert block from raw kobjects to core devices (fwd)

2007-10-31 Thread Alan Stern
On Tue, 30 Oct 2007, Greg KH wrote:

  Since the SCSI stack is now in much better shape, there doesn't seem to 
  be any reason to keep the old code.  Do you agree that the patch below 
  is worth merging?  I submitted it to Greg some time ago, but he didn't 
  want to accept it without some assurance that it is now safe.
  
  It would fix the problem Kay and saw with the circular references, 
  because the references would all get dropped when the scsi_device is 
  removed instead of waiting for a release that will never come.
 
 Hm, will this really solve the problem properly?  I kept that above
 patch because of the very real problem where a device in the middle of
 the device tree can go away without the lower devices being gone.

When you say go away, do you mean be deleted (deallocated)?  In 
general that should not pose a problem.  Once a child has been 
unregistered it should not try to access its parent's structure any 
more.  Hence, if the parent structure gets deallocated before the child 
structure, nothing bad will happen.

Or when you say go away, do you mean be removed (unregistered)?  If 
a parent is removed while it still has children registered, that's a 
genuine bug.  (In fact, driver_del() should check for this and issue a 
WARN_ON if it ever happens.)  The driver doing the remove needs to be 
fixed.

 The example at the time was usb-serial devices.  The physical USB device
 goes away, but the tty device that userspace has open still sticks
 around until userspace closes the char device.

Now you are using goes away to mean is unplugged from the computer!  :-)

There's nothing wrong with the physical device being unplugged, so long
as the logical usb_device structure remains allocated.  And since the
usb_device is the parent of the tty device (it is the parent, isn't
it?), it _will_ remain pinned while the tty device is registered.

So the only problem that could arise would be if the tty device and the
usb_device are both unregistered while userspace continues to hold the
char device file open.  There are two possible ways to handle this:

 1. Ideally, the usb-serial driver would realize that the USB 
device was disconnected (by checking a private flag) and would
therefore avoid trying to access the usb_device structure.

 2. If that's impractical, the usb-serial driver could take an
explicit reference to the usb_device structure each time
the char device file is opened, and drop the reference when
the file is closed.

Either approach would work.

 With your patch, will things break for this usage model?

Not if the usb-serial drivers are written properly.

 This kobject cleanup path is so fickle, I hate having to touch it :(

Is usb-serial the worst case?  Then if it turns out to be compatible 
with this change, all other problems will be equally fixable.

Alan Stern

-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG in: Driver core: convert block from raw kobjects to core devices (fwd)

2007-10-31 Thread Alan Stern
On Wed, 31 Oct 2007, James Bottomley wrote:

 On Wed, 2007-10-31 at 07:32 -0700, Greg KH wrote:
  Hm, I seem to have missed the part in this thread where someone said
  that it was valid to have a parent reference a child device.  That's
  just wrong and needs to be fixed.  Is that in the scsi layer somewhere?
  The block layer?  It sure isn't in the driver core...
 
 This is the piece I'm still not clear on.  It's something to do with the
 gendisk.  I'd have to look in block, but I believe the queue takes a ref
 to the gendisk.
 
 The scsi_device has a ref to the queue and the scsi_disk (in sd) has a
 ref to both the scsi_device and the gendisk.  That means, until sd is
 unbound and the scsi_disk released, there's an implied unbreakable
 reference chain.
 
 at least, I think that's what the problem is.

No, you haven't got it right.

Parent  Child   Grandchild
--  -   --
scsi_device gendisk request_queue

The odd part is that the scsi_device holds a reference to the queue.  
That creates a reference loop:

scsi_device holds ref torequest_queue (done explicitly)
request_queue   holds ref togendisk (implicit, parent-child)
gendisk holds ref toscsi_device (implicit, parent-child)

The scsi_disk adds confusion to the picture but it doesn't make things
any worse than they already are.

Alan Stern

-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG in: Driver core: convert block from raw kobjects to core devices (fwd)

2007-10-31 Thread Alan Stern
On Wed, 31 Oct 2007, James Bottomley wrote:

  Yes, the queue is a child of the disk.
 
 Right, so this goes gendisk-queue (- meaning parent of, or takes
 reference to)

No, no!  The _child_ takes an implicit reference to the _parent_, not 
the other way around.

   The scsi_device has a ref to the queue
  
  Yeah, while the queue is a grandchild of the scsi_device with the
  unified sysfs layout.
 
 No, the scsi_device is a direct parent of the queue, so we have
 
 scsi_device-queue

Wrong -- the gendisk is the direct parent of the queue.  The relevant 
line is in ll_rw_blk.c:blk_register_queue():

q-kobj.parent = kobject_get(disk-dev.kobj);

  Yes, sounds right. We need to break that deleted-but-wait-for-cleanup at
  least at one of the devices involved.
 
 But it's broken when the driver is unbound.  Diagrammatically it's:
 
 scsi_disk - scsi_device - queue
   - gendisk -
 
 It's not circular, it's released when scsi_disk is released.  It can
 become circular if there's some hidden dependency between any of the
 components ... but I don't think there is.

Forget about the scsi_disk.  It isn't part of the problem.  Just 
concentrate on the scsi_device, the gendisk, and the queue.  We have:

scsi_device - gendisk - queue - scsi_device,

where A - B means that B holds a reference to A.

Alan Stern

-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG in: Driver core: convert block from raw kobjects to core devices (fwd)

2007-10-31 Thread James Bottomley
On Wed, 2007-10-31 at 11:58 -0400, Alan Stern wrote:
 On Wed, 31 Oct 2007, James Bottomley wrote:
 
  On Wed, 2007-10-31 at 07:32 -0700, Greg KH wrote:
   Hm, I seem to have missed the part in this thread where someone said
   that it was valid to have a parent reference a child device.  That's
   just wrong and needs to be fixed.  Is that in the scsi layer somewhere?
   The block layer?  It sure isn't in the driver core...
  
  This is the piece I'm still not clear on.  It's something to do with the
  gendisk.  I'd have to look in block, but I believe the queue takes a ref
  to the gendisk.
  
  The scsi_device has a ref to the queue and the scsi_disk (in sd) has a
  ref to both the scsi_device and the gendisk.  That means, until sd is
  unbound and the scsi_disk released, there's an implied unbreakable
  reference chain.
  
  at least, I think that's what the problem is.
 
 No, you haven't got it right.
 
   Parent  Child   Grandchild
   --  -   --
   scsi_device gendisk request_queue

This is what I think doesn't happen.  The scsi_device should never be
parented to the gendisk.

 
 The odd part is that the scsi_device holds a reference to the queue.  
 That creates a reference loop:
 
   scsi_device holds ref torequest_queue (done explicitly)
   request_queue   holds ref togendisk (implicit, parent-child)

agreed.

   gendisk holds ref toscsi_device (implicit, parent-child)

Where exactly does this last part happen? the scsi_device is a mid layer
object; the mid-layer doesn't know about gendisks.

James


-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG in: Driver core: convert block from raw kobjects to core devices (fwd)

2007-10-30 Thread Greg KH
On Mon, Oct 29, 2007 at 02:47:59PM -0400, Alan Stern wrote:
 On Mon, 29 Oct 2007, James Bottomley wrote:
 
  I'm not sure if we can do this patch.  If you kill a device, you see
  there's a processing state in scsi_prep_fn() for SDEV_DEL, which has a
  printk message I see quite often when I unplug devices while they're
  operating.
  
  If you NULL out and free the request queue immediately after setting
  SDEV_DEL, without allowing the devices and filesystems to finish, are
  you sure we're not going to get a null deref on sdev-request_queue?
 
 Let's get at this another way.  The patch below probably should be 
 applied in any case.  It's essentially a reversion of this patch from 
 2003:
 
 http://git.kernel.org/?p=linux/kernel/git/tglx/history.git;a=commit;h=10921a8f1305b8ec97794941db78b825db5839bc
 
 which was written to compensate for problems in the SCSI stack.  The
 idea was to avoid releasing a kobject before all its children were
 released.  However as far as I can see, in general we should be able to
 release a kobject as soon as all its children are removed and its
 refcount drops to 0, without waiting for the children to be released.  
 To put it another way, once a child is removed from visibility it
 should not try (or need) to access its parent at all.  If it has to
 then it should take an explicit reference to the parent.
 
 Since the SCSI stack is now in much better shape, there doesn't seem to 
 be any reason to keep the old code.  Do you agree that the patch below 
 is worth merging?  I submitted it to Greg some time ago, but he didn't 
 want to accept it without some assurance that it is now safe.
 
 It would fix the problem Kay and saw with the circular references, 
 because the references would all get dropped when the scsi_device is 
 removed instead of waiting for a release that will never come.

Hm, will this really solve the problem properly?  I kept that above
patch because of the very real problem where a device in the middle of
the device tree can go away without the lower devices being gone.

The example at the time was usb-serial devices.  The physical USB device
goes away, but the tty device that userspace has open still sticks
around until userspace closes the char device.

With your patch, will things break for this usage model?

This kobject cleanup path is so fickle, I hate having to touch it :(

thanks,

greg k-h
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG in: Driver core: convert block from raw kobjects to core devices (fwd)

2007-10-30 Thread Greg KH
On Mon, Oct 29, 2007 at 08:13:17PM +0100, Kay Sievers wrote:
 On Mon, 2007-10-29 at 14:47 -0400, Alan Stern wrote:
  On Mon, 29 Oct 2007, James Bottomley wrote:
  
   I'm not sure if we can do this patch.  If you kill a device, you see
   there's a processing state in scsi_prep_fn() for SDEV_DEL, which has a
   printk message I see quite often when I unplug devices while they're
   operating.
   
   If you NULL out and free the request queue immediately after setting
   SDEV_DEL, without allowing the devices and filesystems to finish, are
   you sure we're not going to get a null deref on sdev-request_queue?
  
  Let's get at this another way.  The patch below probably should be 
  applied in any case.  It's essentially a reversion of this patch from 
  2003:
  
  http://git.kernel.org/?p=linux/kernel/git/tglx/history.git;a=commit;h=10921a8f1305b8ec97794941db78b825db5839bc
  
  which was written to compensate for problems in the SCSI stack.  The
  idea was to avoid releasing a kobject before all its children were
  released.  However as far as I can see, in general we should be able to
  release a kobject as soon as all its children are removed and its
  refcount drops to 0, without waiting for the children to be released.  
  To put it another way, once a child is removed from visibility it
  should not try (or need) to access its parent at all.  If it has to
  then it should take an explicit reference to the parent.
  
  Since the SCSI stack is now in much better shape, there doesn't seem to 
  be any reason to keep the old code.  Do you agree that the patch below 
  is worth merging?  I submitted it to Greg some time ago, but he didn't 
  want to accept it without some assurance that it is now safe.
  
  It would fix the problem Kay and saw with the circular references, 
  because the references would all get dropped when the scsi_device is 
  removed instead of waiting for a release that will never come.
 
 It indeed fixes my problem. And it sounds you are right, the fix from
 2003 is probably just a paper-over for a missing explicit reference that
 time.
 
 I'm all for giving the reversion a try, and add explicit parent get/put
 if needed somewhere. If, for some reason, that will not happen, I'll
 need to do something like this in the block patch, which will then be a
 fix for the paper-over solution for an unknown bug. :)
 
 
 --- a/fs/partitions/check.c
 +++ b/fs/partitions/check.c
 ...
 +   device_del(disk-dev);
 +
 +   /*
 +* disconnect from parent device, so the parent can clean up
 +* without waiting for us to clean up
 +*
 +* the driver core took this reference while we added ourself as
 +* a child of the parent device
 +*/
 +   parent = disk-dev.kobj.parent;

Save off the parent before calling device_del() otherwise it could be a
stale pointer, right?

 +   disk-dev.kobj.parent = NULL;
 +   disk-dev.parent = NULL;
 +   kobject_put(parent);

No, that's just wrong, if this is needed, something else really is
broken, and I don't think it's the driver core...

thanks,

greg k-h
-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG in: Driver core: convert block from raw kobjects to core devices (fwd)

2007-10-29 Thread Alan Stern
James:

With regard to the discussion below, is there any objection to the 
attached patch?  It moves the call to scsi_release_queue() from 
scsi_device_dev_release_usercontext() to __scsi_remove_device().  In 
fact, it might turn out that with this change, the extra _usercontext 
routine isn't needed at all.

As far as I can see, the only reason for not adopting this patch would 
be if a scsi_device's queue structure was needed even after the 
scsi_device was unregistered.  Does this ever happen?  If it does, it 
would be indicative of a nasty reference-loop problem (scsi_device 
needs reference to request_queue, request_queue holds reference to 
gendisk, gendisk holds reference to scsi_device).

Alan Stern


-- Forwarded message --
Date: Tue, 23 Oct 2007 13:27:49 +0200
From: Kay Sievers [EMAIL PROTECTED]
To: Alan Stern [EMAIL PROTECTED]
Cc: Greg KH [EMAIL PROTECTED],
 Kernel development list [EMAIL PROTECTED],
 Hannes Reinecke [EMAIL PROTECTED]
Subject: Re: BUG in: Driver core: convert block from raw kobjects to core
devices

On Tue, 2007-10-23 at 00:14 -0400, Alan Stern wrote:
 On Tue, 23 Oct 2007, Kay Sievers wrote:
 
  There must be something going wrong with the block patch in conjunction
  with the crazy SCSI release logic.
 
 I don't know -- there's nothing obviously wrong with the block patch 
 except the extra put_device.  But you're right that the SCSI logic is 
 crazy.  The scsi_device is the parent of the gendisk, which is the 
 parent of the request_queue.  But the scsi_device holds a reference to 
 the request_queue, which is dropped in the scsi_device's release 
 routine!

That's the thing. We see a circular reference when we use the SCSI LUN
as the parent.
The sysfs relationship is: scsi_device - genhd - queue, while the
queue holds a ref to to the blockdev (parent), the blockdev to the
scsi_device (parent) and the scsi_devices to the queue (SCSI). All
waiting for their release functions to be called, to release the other
refs.

The scsi_device needs to drop the queue reference while the device is
removed, not when it's data is released. Hannes came up with the
attached patch, which seems to work fine here.

 That doesn't make much sense to me, and it is complicated by 
 the fact that deleting a kobject doesn't drop the kobject's reference 
 to its parent -- only releasing the kobject does.

Right, that makes things very complicated.

 In fact, for my development work I normally use a patch which makes 
 kobjects behave better: They do drop the reference to their parent when 
 they are deleted.  This actually is nothing more than a reversion of a 
 patch added several years ago to try and cover up some of the 
 weaknesses of the SCSI stack!  Now that the SCSI stack is in better 
 shape, maybe my patch should be included in the mainstream kernel.  The 
 patch is below; see what you think.

Sounds good to me, to disconnect a dead object from its parent when it's
deleted. We only need to protect for use after free but not lock the
parent, I guess. We should give it a try.

Thanks,
Kay
diff --git a/drivers/scsi/scsi_sysfs.c b/drivers/scsi/scsi_sysfs.c


index daed37d..577bbf3 100644


--- a/drivers/scsi/scsi_sysfs.c


+++ b/drivers/scsi/scsi_sysfs.c


@@ -280,15 +280,6 @@ static void scsi_device_dev_release_usercontext(struct work_struct *work)


 	list_del(sdev-starved_entry);


 	spin_unlock_irqrestore(sdev-host-host_lock, flags);


 


-	if (sdev-request_queue) {


-		sdev-request_queue-queuedata = NULL;


-		/* user context needed to free queue */


-		scsi_free_queue(sdev-request_queue);


-		/* temporary expedient, try to catch use of queue lock


-		 * after free of sdev */


-		sdev-request_queue = NULL;


-	}


-


 	scsi_target_reap(scsi_target(sdev));


 


 	kfree(sdev-inquiry);


@@ -799,6 +790,16 @@ void __scsi_remove_device(struct scsi_device *sdev)


 	if (sdev-host-hostt-slave_destroy)


 		sdev-host-hostt-slave_destroy(sdev);


 	transport_destroy_device(dev);


+


+	if (sdev-request_queue) {


+		sdev-request_queue-queuedata = NULL;


+		/* user context needed to free queue */


+		scsi_free_queue(sdev-request_queue);


+		/* temporary expedient, try to catch use of queue lock


+		 * after free of sdev */


+		sdev-request_queue = NULL;


+	}


+


 	put_device(dev);


 }


 




Re: BUG in: Driver core: convert block from raw kobjects to core devices (fwd)

2007-10-29 Thread James Bottomley
On Mon, 2007-10-29 at 11:18 -0400, Alan Stern wrote:
 With regard to the discussion below, is there any objection to the 
 attached patch?  It moves the call to scsi_release_queue() from 
 scsi_device_dev_release_usercontext() to __scsi_remove_device().  In 
 fact, it might turn out that with this change, the extra _usercontext 
 routine isn't needed at all.

There's a flaw in the discussion.

 As far as I can see, the only reason for not adopting this patch would 
 be if a scsi_device's queue structure was needed even after the 
 scsi_device was unregistered.  Does this ever happen?  If it does, it 
 would be indicative of a nasty reference-loop problem (scsi_device 
 needs reference to request_queue, request_queue holds reference to 
 gendisk, gendisk holds reference to scsi_device).

I'm not sure if we can do this patch.  If you kill a device, you see
there's a processing state in scsi_prep_fn() for SDEV_DEL, which has a
printk message I see quite often when I unplug devices while they're
operating.

If you NULL out and free the request queue immediately after setting
SDEV_DEL, without allowing the devices and filesystems to finish, are
you sure we're not going to get a null deref on sdev-request_queue?

 Alan Stern
 
 
 -- Forwarded message --
 Date: Tue, 23 Oct 2007 13:27:49 +0200
 From: Kay Sievers [EMAIL PROTECTED]
 To: Alan Stern [EMAIL PROTECTED]
 Cc: Greg KH [EMAIL PROTECTED],
  Kernel development list [EMAIL PROTECTED],
  Hannes Reinecke [EMAIL PROTECTED]
 Subject: Re: BUG in: Driver core: convert block from raw kobjects to core
 devices
 
 On Tue, 2007-10-23 at 00:14 -0400, Alan Stern wrote:
  On Tue, 23 Oct 2007, Kay Sievers wrote:
  
   There must be something going wrong with the block patch in conjunction
   with the crazy SCSI release logic.
  
  I don't know -- there's nothing obviously wrong with the block patch 
  except the extra put_device.  But you're right that the SCSI logic is 
  crazy.  The scsi_device is the parent of the gendisk, which is the 
  parent of the request_queue.  But the scsi_device holds a reference to 
  the request_queue, which is dropped in the scsi_device's release 
  routine!
 
 That's the thing. We see a circular reference when we use the SCSI LUN
 as the parent.
 The sysfs relationship is: scsi_device - genhd - queue, while the

That's not true.

The relationship is scsi_device - queue

However, genhd is conditioned on the ULD attachment (we can have queues
with no gendisk), so for instance, for sd it's

scsi_disk - genhd - queue
  - scsi_device

However, when we get to the device destruction part, the first thing
that should happen is that the driver should unbind, so we release the
driver specific structure and the genhd (and it's ref to the queue and
the scsi_device).  Then we can let the mid-layer remove functions
release the queue.

 queue holds a ref to to the blockdev (parent), the blockdev to the
 scsi_device (parent) and the scsi_devices to the queue (SCSI). All
 waiting for their release functions to be called, to release the other
 refs.

This isn't how it's supposed to be ... the refs are supposed to be the
other way around (blockdev/genhd has a ref to the queue, but scsi_device
has a separate ref to the queue; the genhd should be releaseable even if
the device hasn't given up the queue).

I probably need more thread context to make a more informed statement,
though.

 The scsi_device needs to drop the queue reference while the device is
 removed, not when it's data is released. Hannes came up with the
 attached patch, which seems to work fine here.
 
  That doesn't make much sense to me, and it is complicated by 
  the fact that deleting a kobject doesn't drop the kobject's reference 
  to its parent -- only releasing the kobject does.
 
 Right, that makes things very complicated.
 
  In fact, for my development work I normally use a patch which makes 
  kobjects behave better: They do drop the reference to their parent when 
  they are deleted.  This actually is nothing more than a reversion of a 
  patch added several years ago to try and cover up some of the 
  weaknesses of the SCSI stack!  Now that the SCSI stack is in better 
  shape, maybe my patch should be included in the mainstream kernel.  The 
  patch is below; see what you think.
 
 Sounds good to me, to disconnect a dead object from its parent when it's
 deleted. We only need to protect for use after free but not lock the
 parent, I guess. We should give it a try.

James


-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG in: Driver core: convert block from raw kobjects to core devices (fwd)

2007-10-29 Thread James Bottomley
On Mon, 2007-10-29 at 12:38 -0400, Alan Stern wrote:
 No, that's not what happens.  The genhd's parent is the scsi_device
 and 
 the queue's parent is the genhd.  You can see this immediately from
 the 
 directory layout in sysfs.  Thus:

How has this happened?  The scsi_device doesn't know anything about the
genhd; how can it hold a reference to it?  If you grep the code, you'll
see that only the ULD's (that's sd, sr, st, osst et al) know what a
gendisk is.

James




-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG in: Driver core: convert block from raw kobjects to core devices (fwd)

2007-10-29 Thread Alan Stern
On Mon, 29 Oct 2007, James Bottomley wrote:

 On Mon, 2007-10-29 at 11:18 -0400, Alan Stern wrote:
  With regard to the discussion below, is there any objection to the 
  attached patch?  It moves the call to scsi_release_queue() from 
  scsi_device_dev_release_usercontext() to __scsi_remove_device().  In 
  fact, it might turn out that with this change, the extra _usercontext 
  routine isn't needed at all.
 
 There's a flaw in the discussion.
 
  As far as I can see, the only reason for not adopting this patch would 
  be if a scsi_device's queue structure was needed even after the 
  scsi_device was unregistered.  Does this ever happen?  If it does, it 
  would be indicative of a nasty reference-loop problem (scsi_device 
  needs reference to request_queue, request_queue holds reference to 
  gendisk, gendisk holds reference to scsi_device).
 
 I'm not sure if we can do this patch.  If you kill a device, you see
 there's a processing state in scsi_prep_fn() for SDEV_DEL, which has a
 printk message I see quite often when I unplug devices while they're
 operating.

Me too.

 If you NULL out and free the request queue immediately after setting
 SDEV_DEL, without allowing the devices and filesystems to finish, are
 you sure we're not going to get a null deref on sdev-request_queue?

That's basically what I was asking you!  :-)  What happens if you try 
to clean up a block queue when there are still outstanding requests?

  On Tue, 2007-10-23 at 00:14 -0400, Alan Stern wrote:
   On Tue, 23 Oct 2007, Kay Sievers wrote:
   
There must be something going wrong with the block patch in conjunction
with the crazy SCSI release logic.
   
   I don't know -- there's nothing obviously wrong with the block patch 
   except the extra put_device.  But you're right that the SCSI logic is 
   crazy.  The scsi_device is the parent of the gendisk, which is the 
   parent of the request_queue.  But the scsi_device holds a reference to 
   the request_queue, which is dropped in the scsi_device's release 
   routine!
  
  That's the thing. We see a circular reference when we use the SCSI LUN
  as the parent.
  The sysfs relationship is: scsi_device - genhd - queue, while the
 
 That's not true.
 
 The relationship is scsi_device - queue

I think Kay wrote his arrows backwards.  Or maybe he just made a simple 
mistake.

 However, genhd is conditioned on the ULD attachment (we can have queues
 with no gendisk), so for instance, for sd it's
 
 scsi_disk - genhd - queue
   - scsi_device
 
 However, when we get to the device destruction part, the first thing
 that should happen is that the driver should unbind, so we release the
 driver specific structure and the genhd (and it's ref to the queue and
 the scsi_device).  Then we can let the mid-layer remove functions
 release the queue.

That's what should happen.  And in fact the scsi_disk structure doesn't
cause any difficulties; we're only worried about the scsi_device
structure.

  queue holds a ref to to the blockdev (parent), the blockdev to the
  scsi_device (parent) and the scsi_devices to the queue (SCSI). All
  waiting for their release functions to be called, to release the other
  refs.
 
 This isn't how it's supposed to be ... the refs are supposed to be the
 other way around (blockdev/genhd has a ref to the queue, but scsi_device
 has a separate ref to the queue; the genhd should be releaseable even if
 the device hasn't given up the queue).

No, that's not what happens.  The genhd's parent is the scsi_device and 
the queue's parent is the genhd.  You can see this immediately from the 
directory layout in sysfs.  Thus:

the scsi_device holds a reference to the queue,
which holds a reference to the genhd,
which holds a reference to the scsi_device.

In fact it's not the struct device's reference to the parent which
causes the problem -- it's the embedded kobject's reference to the
kobject embedded in the parent device.  This reference doesn't get
dropped when the device (and its embedded kobject) are removed; it
persists until they are released.  But with the reference loop they
never do get released.

 I probably need more thread context to make a more informed statement,
 though.

If you want to wade through the email messages, the thread starts here:

http://marc.info/?t=11927357356r=1w=2

Here's a brief summary.  Kay wrote a patch modifying the block-device 
core, the Driver core: convert block from raw kobjects to core 
devices patch in $Subject.  The patch itself is located here:

http://www.kernel.org/pub/linux/kernel/people/gregkh/gregkh-2.6/gregkh-01-driver/block-device.patch

although this version has been updated; in its original form the patch 
included an extra call to put_device(disk_dev) right at the end of 
del_gendisk() in fs/partitions/check.c.

In my testing I found the extra put_device() caused problems and I
complained to Kay about it.  He explained that on his system the device
structures would 

Re: BUG in: Driver core: convert block from raw kobjects to core devices (fwd)

2007-10-29 Thread Alan Stern
On Mon, 29 Oct 2007, James Bottomley wrote:

 On Mon, 2007-10-29 at 12:38 -0400, Alan Stern wrote:
  No, that's not what happens.  The genhd's parent is the scsi_device
  and 
  the queue's parent is the genhd.  You can see this immediately from
  the 
  directory layout in sysfs.  Thus:
 
 How has this happened?  The scsi_device doesn't know anything about the
 genhd; how can it hold a reference to it?

No, no -- it's the other way around.  The scsi_device is the parent of 
the genhd, so the genhd holds a reference to the scsi_device, not vice 
versa.

Alan Stern

-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG in: Driver core: convert block from raw kobjects to core devices (fwd)

2007-10-29 Thread Alan Stern
On Mon, 29 Oct 2007, James Bottomley wrote:

 I'm not sure if we can do this patch.  If you kill a device, you see
 there's a processing state in scsi_prep_fn() for SDEV_DEL, which has a
 printk message I see quite often when I unplug devices while they're
 operating.
 
 If you NULL out and free the request queue immediately after setting
 SDEV_DEL, without allowing the devices and filesystems to finish, are
 you sure we're not going to get a null deref on sdev-request_queue?

Let's get at this another way.  The patch below probably should be 
applied in any case.  It's essentially a reversion of this patch from 
2003:

http://git.kernel.org/?p=linux/kernel/git/tglx/history.git;a=commit;h=10921a8f1305b8ec97794941db78b825db5839bc

which was written to compensate for problems in the SCSI stack.  The
idea was to avoid releasing a kobject before all its children were
released.  However as far as I can see, in general we should be able to
release a kobject as soon as all its children are removed and its
refcount drops to 0, without waiting for the children to be released.  
To put it another way, once a child is removed from visibility it
should not try (or need) to access its parent at all.  If it has to
then it should take an explicit reference to the parent.

Since the SCSI stack is now in much better shape, there doesn't seem to 
be any reason to keep the old code.  Do you agree that the patch below 
is worth merging?  I submitted it to Greg some time ago, but he didn't 
want to accept it without some assurance that it is now safe.

It would fix the problem Kay and saw with the circular references, 
because the references would all get dropped when the scsi_device is 
removed instead of waiting for a release that will never come.

Alan Stern



Index: usb-2.6/lib/kobject.c
===
--- usb-2.6.orig/lib/kobject.c
+++ usb-2.6/lib/kobject.c
@@ -206,12 +206,16 @@ void kobject_init(struct kobject * kobj)
 
 static void unlink(struct kobject * kobj)
 {
+   struct kobject *parent = kobj-parent;
+
if (kobj-kset) {
spin_lock(kobj-kset-list_lock);
list_del_init(kobj-entry);
spin_unlock(kobj-kset-list_lock);
}
+   kobj-parent = NULL;
kobject_put(kobj);
+   kobject_put(parent);
 }
 
 /**
@@ -255,7 +259,6 @@ int kobject_add(struct kobject * kobj)
if (error) {
/* unlink does the kobject_put() for us */
unlink(kobj);
-   kobject_put(parent);
 
/* be noisy on error issues */
if (error == -EEXIST)
@@ -498,7 +501,6 @@ void kobject_cleanup(struct kobject * ko
 {
struct kobj_type * t = get_ktype(kobj);
struct kset * s = kobj-kset;
-   struct kobject * parent = kobj-parent;
const char *name = kobj-k_name;
 
pr_debug(kobject %s: cleaning up\n,kobject_name(kobj));
@@ -515,7 +517,6 @@ void kobject_cleanup(struct kobject * ko
}
if (s)
kset_put(s);
-   kobject_put(parent);
 }
 
 static void kobject_release(struct kref *kref)

-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: BUG in: Driver core: convert block from raw kobjects to core devices (fwd)

2007-10-29 Thread Kay Sievers
On Mon, 2007-10-29 at 14:47 -0400, Alan Stern wrote:
 On Mon, 29 Oct 2007, James Bottomley wrote:
 
  I'm not sure if we can do this patch.  If you kill a device, you see
  there's a processing state in scsi_prep_fn() for SDEV_DEL, which has a
  printk message I see quite often when I unplug devices while they're
  operating.
  
  If you NULL out and free the request queue immediately after setting
  SDEV_DEL, without allowing the devices and filesystems to finish, are
  you sure we're not going to get a null deref on sdev-request_queue?
 
 Let's get at this another way.  The patch below probably should be 
 applied in any case.  It's essentially a reversion of this patch from 
 2003:
 
 http://git.kernel.org/?p=linux/kernel/git/tglx/history.git;a=commit;h=10921a8f1305b8ec97794941db78b825db5839bc
 
 which was written to compensate for problems in the SCSI stack.  The
 idea was to avoid releasing a kobject before all its children were
 released.  However as far as I can see, in general we should be able to
 release a kobject as soon as all its children are removed and its
 refcount drops to 0, without waiting for the children to be released.  
 To put it another way, once a child is removed from visibility it
 should not try (or need) to access its parent at all.  If it has to
 then it should take an explicit reference to the parent.
 
 Since the SCSI stack is now in much better shape, there doesn't seem to 
 be any reason to keep the old code.  Do you agree that the patch below 
 is worth merging?  I submitted it to Greg some time ago, but he didn't 
 want to accept it without some assurance that it is now safe.
 
 It would fix the problem Kay and saw with the circular references, 
 because the references would all get dropped when the scsi_device is 
 removed instead of waiting for a release that will never come.

It indeed fixes my problem. And it sounds you are right, the fix from
2003 is probably just a paper-over for a missing explicit reference that
time.

I'm all for giving the reversion a try, and add explicit parent get/put
if needed somewhere. If, for some reason, that will not happen, I'll
need to do something like this in the block patch, which will then be a
fix for the paper-over solution for an unknown bug. :)


--- a/fs/partitions/check.c
+++ b/fs/partitions/check.c
...
+   device_del(disk-dev);
+
+   /*
+* disconnect from parent device, so the parent can clean up
+* without waiting for us to clean up
+*
+* the driver core took this reference while we added ourself as
+* a child of the parent device
+*/
+   parent = disk-dev.kobj.parent;
+   disk-dev.kobj.parent = NULL;
+   disk-dev.parent = NULL;
+   kobject_put(parent);
 }

Thanks,
Kay

-
To unsubscribe from this list: send the line unsubscribe linux-scsi in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html