Re: svn commit: r237328 - in head/sys/cam: . scsi

2012-06-21 Thread Bjoern A. Zeeb

On 20. Jun 2012, at 17:08 , Kenneth D. Merry wrote:

> Author: ken
> Date: Wed Jun 20 17:08:00 2012
> New Revision: 237328
> URL: http://svn.freebsd.org/changeset/base/237328
> 
> Log:
>  Fix several reference counting and object lifetime issues between
>  the pass(4) and enc(4) drivers and devfs.

Just as a hint for the archives:  enc(4) is completely unrelated to this
commit as it's the "encapsulation device"  used along with IPsec and has
nothing to do with scsi_enc.

-- 
Bjoern A. Zeeb You have to have visions!
   It does not matter how good you are. It matters what good you do!

___
svn-src-head@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/svn-src-head
To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"


svn commit: r237328 - in head/sys/cam: . scsi

2012-06-20 Thread Kenneth D. Merry
Author: ken
Date: Wed Jun 20 17:08:00 2012
New Revision: 237328
URL: http://svn.freebsd.org/changeset/base/237328

Log:
  Fix several reference counting and object lifetime issues between
  the pass(4) and enc(4) drivers and devfs.
  
  The pass(4) driver uses the destroy_dev_sched() routine to
  schedule its device node for destruction in a separate thread
  context.  It does this because the passcleanup() routine can get
  called indirectly from the passclose() routine, and that would
  cause a deadlock if the close routine tried to destroy its own
  device node.
  
  In any case, once a particular passthrough driver number, e.g.
  pass3, is destroyed, CAM considers that unit number (3 in this
  case) available for reuse.
  
  The problem is that devfs may not be done cleaning up the previous
  instance of pass3, and will panic if isn't done cleaning up the
  previous instance.
  
  The solution is to get a callback from devfs when the device node
  is removed, and make sure we hold a reference to the peripheral
  until that happens.
  
  Testing exposed some other cases where we have reference counting
  issues, and those were also fixed in the pass(4) driver.
  
  cam_periph.c: In camperiphfree(), reorder some of the operations.
  
The peripheral destructor needs to be called before
the peripheral is removed from the peripheral is
removed from the list.  This is because once we
remove the peripheral from the list, and drop the
topology lock, the peripheral number may be reused.
But if the destructor hasn't been called yet, there
may still be resources hanging around (like devfs
nodes) that haven't been fully cleaned up.
  
  cam_xpt.c:Add an argument to xpt_remove_periph() to indicate
whether the topology lock is already held.
  
  scsi_enc.c:   Acquire an extra reference to the peripheral during
registration, and release it once we get a callback
from devfs indicating that the device node is gone.
  
Call destroy_dev_sched_cb() in enc_oninvalidate()
instead of calling destroy_dev() in the cleanup
routine.
  
  scsi_pass.c:  Add reference counting to handle peripheral and
devfs object lifetime issues.
  
Add a reference to the peripheral and the devfs
node in the peripheral registration.
  
Don't attempt to add a physical path alias if the
peripheral has been marked invalid.
  
Release the devfs reference once the initial
physical path alias taskqueue run has completed.
  
Schedule devfs node destruction in the
passoninvalidate(), and release our peripheral
reference in a new routine, passdevgonecb() once
the devfs node is gone.  This allows the peripheral
to fully go away, and the peripheral destructor,
passcleanup(), will get called.
  
  MFC after:3 days
  Sponsored by: Spectra Logic

Modified:
  head/sys/cam/cam_periph.c
  head/sys/cam/cam_xpt.c
  head/sys/cam/cam_xpt_periph.h
  head/sys/cam/scsi/scsi_enc.c
  head/sys/cam/scsi/scsi_pass.c

Modified: head/sys/cam/cam_periph.c
==
--- head/sys/cam/cam_periph.c   Wed Jun 20 16:51:14 2012(r237327)
+++ head/sys/cam/cam_periph.c   Wed Jun 20 17:08:00 2012(r237328)
@@ -258,7 +258,7 @@ failure:
break;
case 3:
CAM_DEBUG(periph->path, CAM_DEBUG_INFO, ("Periph destroyed\n"));
-   xpt_remove_periph(periph);
+   xpt_remove_periph(periph, /*topology_lock_held*/ 0);
/* FALLTHROUGH */
case 2:
xpt_lock_buses();
@@ -610,13 +610,38 @@ camperiphfree(struct cam_periph *periph)
return;
}
 
-   TAILQ_REMOVE(&(*p_drv)->units, periph, unit_links);
-   (*p_drv)->generation++;
+   /*
+* The peripheral destructor semantics dictate calling with only the
+* SIM mutex held.  Since it might sleep, it should not be called
+* with the topology lock held.
+*/
xpt_unlock_buses();
 
+   /*
+* We need to call the peripheral destructor prior to removing the
+* peripheral from the list.  Otherwise, we risk running into a
+* scenario where the peripheral unit number may get reused
+* (because it has been removed from the list), but some resources
+* used by the peripheral are still hanging around.  In particular,
+* the devfs nodes used by some peripherals like the pass(4) driver
+* aren't fully cleaned up until the destructor is run.  If the
+* unit number is reused before the devfs instance is fully gone,
+* devfs