Re: minor GEOM disk API change coming

2012-06-23 Thread Kenneth D. Merry
On Fri, Jun 22, 2012 at 19:50:01 +0300, Alexander Motin wrote:
 Hi.
 
 I understand problem you are going to fix and I think your patch should 
 do it. What I don't very like is addition of new GEOM method. Now GEOM 
 doesn't need it because all internal open/close operations and provider 
 destructions there protected by the topology SX lock. Unluckily that 
 lock doesn't cover g_wither_provider(), called by disk_gone() while 
 holding CAM SIM lock. If not that SIM lock, it would be enough to just 
 grab and drop GEOM topology lock to ensure that no new open() calls will 
 follow. Indirect way to do it could be to post GEOM event that would 
 drop the reference as soon as it will be handled and can obtain the 
 topology lock. Unluckily it uses malloc() for event storage and also can 
 be unreliable if called from under the SIM mutex lock. So it seems many 
 things would be much easier if it was possible to drop SIM lock inside 
 periph invalidate method, but now it is unsafe
 
 That is not an objection, just some thoughts about.

Yeah, there are things in CAM (and GEOM) that need to be cleaned up.  I
wouldn't have added a GEOM method if there were a reasonable way around it,
but as you pointed out, there isn't right now.

I committed the patch, and plan to merge it to stable/9.

Ken
-- 
Kenneth Merry
k...@freebsd.org
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: minor GEOM disk API change coming

2012-06-22 Thread Alexander Motin

Hi.

I understand problem you are going to fix and I think your patch should 
do it. What I don't very like is addition of new GEOM method. Now GEOM 
doesn't need it because all internal open/close operations and provider 
destructions there protected by the topology SX lock. Unluckily that 
lock doesn't cover g_wither_provider(), called by disk_gone() while 
holding CAM SIM lock. If not that SIM lock, it would be enough to just 
grab and drop GEOM topology lock to ensure that no new open() calls will 
follow. Indirect way to do it could be to post GEOM event that would 
drop the reference as soon as it will be handled and can obtain the 
topology lock. Unluckily it uses malloc() for event storage and also can 
be unreliable if called from under the SIM mutex lock. So it seems many 
things would be much easier if it was possible to drop SIM lock inside 
periph invalidate method, but now it is unsafe


That is not an objection, just some thoughts about.

--
Alexander Motin
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: minor GEOM disk API change coming

2012-06-21 Thread Andrey V. Elsukov
On 21.06.2012 08:29, Kenneth D. Merry wrote:
   Fix a bug which causes a panic in daopen(). The panic is caused by
   a da(4) instance going away while GEOM is still probing it.
   
   In this case, the GEOM disk class instance has been created by
   disk_create(), and the taste of the disk is queued in the GEOM
   event queue.
   
   While that event is queued, the da(4) instance goes away.  When the
   open call comes into the da(4) driver, it dereferences the freed
   (but non-NULL) peripheral pointer provided by GEOM, which results
   in a panic.

I think this situation is very specific for the GEOM_DISK class, and
this callback will be less useful for other classes.
Does g_cancel_event() cannot help you prevent tasting?

-- 
WBR, Andrey V. Elsukov



signature.asc
Description: OpenPGP digital signature


Re: minor GEOM disk API change coming

2012-06-21 Thread Kenneth D. Merry
On Thu, Jun 21, 2012 at 19:53:03 +0400, Andrey V. Elsukov wrote:
 On 21.06.2012 08:29, Kenneth D. Merry wrote:
  Fix a bug which causes a panic in daopen(). The panic is caused by
  a da(4) instance going away while GEOM is still probing it.
  
  In this case, the GEOM disk class instance has been created by
  disk_create(), and the taste of the disk is queued in the GEOM
  event queue.
  
  While that event is queued, the da(4) instance goes away.  When the
  open call comes into the da(4) driver, it dereferences the freed
  (but non-NULL) peripheral pointer provided by GEOM, which results
  in a panic.
 
 I think this situation is very specific for the GEOM_DISK class, and
 this callback will be less useful for other classes.
 Does g_cancel_event() cannot help you prevent tasting?

Calling g_cancel_event(), for instance from disk_gone(), would not
completely close the race condition.  It can't cancel an event that is
already in progress, and it is possible for the peripheral to go away while
the event is marked in progress but before the taste gets far enough into
daopen() to acquire a reference to the peripheral.

Ken
-- 
Kenneth Merry
k...@freebsd.org
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: minor GEOM disk API change coming

2012-06-21 Thread Andrey V. Elsukov
On 21.06.2012 20:48, Kenneth D. Merry wrote:
 In this case, the GEOM disk class instance has been created by
 disk_create(), and the taste of the disk is queued in the GEOM
 event queue.
 
 While that event is queued, the da(4) instance goes away.  When the
 open call comes into the da(4) driver, it dereferences the freed
 (but non-NULL) peripheral pointer provided by GEOM, which results
 in a panic.

 I think this situation is very specific for the GEOM_DISK class, and
 this callback will be less useful for other classes.
 Does g_cancel_event() cannot help you prevent tasting?
 
 Calling g_cancel_event(), for instance from disk_gone(), would not
 completely close the race condition.  It can't cancel an event that is
 already in progress, and it is possible for the peripheral to go away while
 the event is marked in progress but before the taste gets far enough into
 daopen() to acquire a reference to the peripheral.

If i understand correctly your patch, you acquires a reference to the
periph and release it when g_destroy_provider finished. What if you will
queue some custom event from the disk_gone() that will call
cddiskgonecb()? Does it close the race? This event will be executed
after the taste completes.

-- 
WBR, Andrey V. Elsukov



signature.asc
Description: OpenPGP digital signature


Re: minor GEOM disk API change coming

2012-06-21 Thread Kenneth D. Merry
On Thu, Jun 21, 2012 at 23:58:10 +0400, Andrey V. Elsukov wrote:
 On 21.06.2012 20:48, Kenneth D. Merry wrote:
In this case, the GEOM disk class instance has been created by
disk_create(), and the taste of the disk is queued in the GEOM
event queue.

While that event is queued, the da(4) instance goes away.  When the
open call comes into the da(4) driver, it dereferences the freed
(but non-NULL) peripheral pointer provided by GEOM, which results
in a panic.
 
  I think this situation is very specific for the GEOM_DISK class, and
  this callback will be less useful for other classes.
  Does g_cancel_event() cannot help you prevent tasting?
  
  Calling g_cancel_event(), for instance from disk_gone(), would not
  completely close the race condition.  It can't cancel an event that is
  already in progress, and it is possible for the peripheral to go away while
  the event is marked in progress but before the taste gets far enough into
  daopen() to acquire a reference to the peripheral.
 
 If i understand correctly your patch, you acquires a reference to the
 periph and release it when g_destroy_provider finished. What if you will
 queue some custom event from the disk_gone() that will call
 cddiskgonecb()? Does it close the race? This event will be executed
 after the taste completes.

That still would not close the race.  It would still be possible for
another context to come along and open the device at any point.

Ken
-- 
Kenneth Merry
k...@freebsd.org
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


minor GEOM disk API change coming

2012-06-20 Thread Kenneth D. Merry
Hi folks,

I have attached some patches that fix an object lifetime issue between CAM
and GEOM.

Fixing the bug required adding a callback to the GEOM disk code, and adding
a callback that a GEOM class can register to get notified when a provider
is destroyed.

The probable commit message is below.

If I don't hear any objections, I will commit it on Friday, June 22nd.


Fix a bug which causes a panic in daopen(). The panic is caused by
a da(4) instance going away while GEOM is still probing it.

In this case, the GEOM disk class instance has been created by
disk_create(), and the taste of the disk is queued in the GEOM
event queue.

While that event is queued, the da(4) instance goes away.  When the
open call comes into the da(4) driver, it dereferences the freed
(but non-NULL) peripheral pointer provided by GEOM, which results
in a panic.

The solution is to add a callback to the GEOM disk code that is
called when all of its resources are cleaned up.  This is
implemented inside GEOM by adding an optional callback that is
called when all consumers have detached from a provider, and the
provider is about to be deleted.

scsi_cd.c,
scsi_da.c:  In the register routine for the cd(4) and da(4)
routines, acquire a reference to the CAM peripheral
instance just before we call disk_create().

Use the new GEOM disk d_gone() callback to register
a callback (dadiskgonecb()/cddiskgonecb()) that
decrements the peripheral reference count once GEOM
has finished cleaning up its resources.

In the cd(4) driver, clean up open and close
behavior slightly.  GEOM makes sure we only get one
open() and one close call, so there is no need to
set an open flag and decrement the reference count
if we are not the first open.

In the cd(4) driver, use cam_periph_release_locked()
in a couple of error scenarios to avoid extra mutex
calls.

geom.h: Add a new, optional, providergone callback that
is called when a provider is about to be deleted.

geom_disk.h:Add a new d_gone() callback to the GEOM disk
interface.

Bump the DISK_VERSION to version 2.  This probably
should have been done after a couple of previous
changes, especially the addition of the d_getattr()
callback.

geom_disk.c:Add a providergone callback for the disk class,
g_disk_providergone(), that calls the user's
d_gone() callback if it exists.

Bump the DISK_VERSION to 2.

geom_subr.c:In g_destroy_provider(), call the providergone
callback if it has been provided.

In g_new_geomf(), propagate the class's
providergone callback to the new geom instance.

disk.9: Update the disk(9) man page to include information
on the new d_gone() callback, as well as the
previously added d_getattr() callback, d_descr
field, and HBA PCI ID fields.


Ken
-- 
Kenneth Merry
k...@freebsd.org
 //depot/users/kenm/FreeBSD-test/share/man/man9/disk.9#1 - 
/usr/home/kenm/perforce4/kenm/FreeBSD-test/share/man/man9/disk.9 
*** /tmp/tmp.81866.21   Wed Jun 20 22:19:20 2012
--- /usr/home/kenm/perforce4/kenm/FreeBSD-test/share/man/man9/disk.9Wed Jun 
20 21:30:45 2012
***
*** 145,150 
--- 145,160 
  .Xr dumpon 8 ,
  this function is invoked from a very restricted system state after a
  kernel panic to record a copy of the system RAM to the disk.
+ .It Vt disk_getattr_t * Va d_getattr
+ Optional: if this method is provided, it gives the disk driver the
+ opportunity to override the default GEOM response to BIO_GETATTR requests.
+ This function should return -1 if the attribute is not handled, 0 if the
+ attribute is handled, or an errno to be passed to g_io_deliver().
+ .It Vt disk_gone_t * Va d_gone
+ Optional: if this method is provided, it will be called after disk_gone()
+ is called, once GEOM has finished its cleanup process.
+ Once this callback is called, it is safe for the disk driver to free all of
+ its resources, as it will not be receiving further calls from GEOM.
  .El
  .Ss Mandatory Media Properties
  The following fields identify the size