Re: minor GEOM disk API change coming
On Fri, Jun 22, 2012 at 19:50:01 +0300, Alexander Motin wrote: Hi. I understand problem you are going to fix and I think your patch should do it. What I don't very like is addition of new GEOM method. Now GEOM doesn't need it because all internal open/close operations and provider destructions there protected by the topology SX lock. Unluckily that lock doesn't cover g_wither_provider(), called by disk_gone() while holding CAM SIM lock. If not that SIM lock, it would be enough to just grab and drop GEOM topology lock to ensure that no new open() calls will follow. Indirect way to do it could be to post GEOM event that would drop the reference as soon as it will be handled and can obtain the topology lock. Unluckily it uses malloc() for event storage and also can be unreliable if called from under the SIM mutex lock. So it seems many things would be much easier if it was possible to drop SIM lock inside periph invalidate method, but now it is unsafe That is not an objection, just some thoughts about. Yeah, there are things in CAM (and GEOM) that need to be cleaned up. I wouldn't have added a GEOM method if there were a reasonable way around it, but as you pointed out, there isn't right now. I committed the patch, and plan to merge it to stable/9. Ken -- Kenneth Merry k...@freebsd.org ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: minor GEOM disk API change coming
Hi. I understand problem you are going to fix and I think your patch should do it. What I don't very like is addition of new GEOM method. Now GEOM doesn't need it because all internal open/close operations and provider destructions there protected by the topology SX lock. Unluckily that lock doesn't cover g_wither_provider(), called by disk_gone() while holding CAM SIM lock. If not that SIM lock, it would be enough to just grab and drop GEOM topology lock to ensure that no new open() calls will follow. Indirect way to do it could be to post GEOM event that would drop the reference as soon as it will be handled and can obtain the topology lock. Unluckily it uses malloc() for event storage and also can be unreliable if called from under the SIM mutex lock. So it seems many things would be much easier if it was possible to drop SIM lock inside periph invalidate method, but now it is unsafe That is not an objection, just some thoughts about. -- Alexander Motin ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: minor GEOM disk API change coming
On 21.06.2012 08:29, Kenneth D. Merry wrote: Fix a bug which causes a panic in daopen(). The panic is caused by a da(4) instance going away while GEOM is still probing it. In this case, the GEOM disk class instance has been created by disk_create(), and the taste of the disk is queued in the GEOM event queue. While that event is queued, the da(4) instance goes away. When the open call comes into the da(4) driver, it dereferences the freed (but non-NULL) peripheral pointer provided by GEOM, which results in a panic. I think this situation is very specific for the GEOM_DISK class, and this callback will be less useful for other classes. Does g_cancel_event() cannot help you prevent tasting? -- WBR, Andrey V. Elsukov signature.asc Description: OpenPGP digital signature
Re: minor GEOM disk API change coming
On Thu, Jun 21, 2012 at 19:53:03 +0400, Andrey V. Elsukov wrote: On 21.06.2012 08:29, Kenneth D. Merry wrote: Fix a bug which causes a panic in daopen(). The panic is caused by a da(4) instance going away while GEOM is still probing it. In this case, the GEOM disk class instance has been created by disk_create(), and the taste of the disk is queued in the GEOM event queue. While that event is queued, the da(4) instance goes away. When the open call comes into the da(4) driver, it dereferences the freed (but non-NULL) peripheral pointer provided by GEOM, which results in a panic. I think this situation is very specific for the GEOM_DISK class, and this callback will be less useful for other classes. Does g_cancel_event() cannot help you prevent tasting? Calling g_cancel_event(), for instance from disk_gone(), would not completely close the race condition. It can't cancel an event that is already in progress, and it is possible for the peripheral to go away while the event is marked in progress but before the taste gets far enough into daopen() to acquire a reference to the peripheral. Ken -- Kenneth Merry k...@freebsd.org ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: minor GEOM disk API change coming
On 21.06.2012 20:48, Kenneth D. Merry wrote: In this case, the GEOM disk class instance has been created by disk_create(), and the taste of the disk is queued in the GEOM event queue. While that event is queued, the da(4) instance goes away. When the open call comes into the da(4) driver, it dereferences the freed (but non-NULL) peripheral pointer provided by GEOM, which results in a panic. I think this situation is very specific for the GEOM_DISK class, and this callback will be less useful for other classes. Does g_cancel_event() cannot help you prevent tasting? Calling g_cancel_event(), for instance from disk_gone(), would not completely close the race condition. It can't cancel an event that is already in progress, and it is possible for the peripheral to go away while the event is marked in progress but before the taste gets far enough into daopen() to acquire a reference to the peripheral. If i understand correctly your patch, you acquires a reference to the periph and release it when g_destroy_provider finished. What if you will queue some custom event from the disk_gone() that will call cddiskgonecb()? Does it close the race? This event will be executed after the taste completes. -- WBR, Andrey V. Elsukov signature.asc Description: OpenPGP digital signature
Re: minor GEOM disk API change coming
On Thu, Jun 21, 2012 at 23:58:10 +0400, Andrey V. Elsukov wrote: On 21.06.2012 20:48, Kenneth D. Merry wrote: In this case, the GEOM disk class instance has been created by disk_create(), and the taste of the disk is queued in the GEOM event queue. While that event is queued, the da(4) instance goes away. When the open call comes into the da(4) driver, it dereferences the freed (but non-NULL) peripheral pointer provided by GEOM, which results in a panic. I think this situation is very specific for the GEOM_DISK class, and this callback will be less useful for other classes. Does g_cancel_event() cannot help you prevent tasting? Calling g_cancel_event(), for instance from disk_gone(), would not completely close the race condition. It can't cancel an event that is already in progress, and it is possible for the peripheral to go away while the event is marked in progress but before the taste gets far enough into daopen() to acquire a reference to the peripheral. If i understand correctly your patch, you acquires a reference to the periph and release it when g_destroy_provider finished. What if you will queue some custom event from the disk_gone() that will call cddiskgonecb()? Does it close the race? This event will be executed after the taste completes. That still would not close the race. It would still be possible for another context to come along and open the device at any point. Ken -- Kenneth Merry k...@freebsd.org ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
minor GEOM disk API change coming
Hi folks, I have attached some patches that fix an object lifetime issue between CAM and GEOM. Fixing the bug required adding a callback to the GEOM disk code, and adding a callback that a GEOM class can register to get notified when a provider is destroyed. The probable commit message is below. If I don't hear any objections, I will commit it on Friday, June 22nd. Fix a bug which causes a panic in daopen(). The panic is caused by a da(4) instance going away while GEOM is still probing it. In this case, the GEOM disk class instance has been created by disk_create(), and the taste of the disk is queued in the GEOM event queue. While that event is queued, the da(4) instance goes away. When the open call comes into the da(4) driver, it dereferences the freed (but non-NULL) peripheral pointer provided by GEOM, which results in a panic. The solution is to add a callback to the GEOM disk code that is called when all of its resources are cleaned up. This is implemented inside GEOM by adding an optional callback that is called when all consumers have detached from a provider, and the provider is about to be deleted. scsi_cd.c, scsi_da.c: In the register routine for the cd(4) and da(4) routines, acquire a reference to the CAM peripheral instance just before we call disk_create(). Use the new GEOM disk d_gone() callback to register a callback (dadiskgonecb()/cddiskgonecb()) that decrements the peripheral reference count once GEOM has finished cleaning up its resources. In the cd(4) driver, clean up open and close behavior slightly. GEOM makes sure we only get one open() and one close call, so there is no need to set an open flag and decrement the reference count if we are not the first open. In the cd(4) driver, use cam_periph_release_locked() in a couple of error scenarios to avoid extra mutex calls. geom.h: Add a new, optional, providergone callback that is called when a provider is about to be deleted. geom_disk.h:Add a new d_gone() callback to the GEOM disk interface. Bump the DISK_VERSION to version 2. This probably should have been done after a couple of previous changes, especially the addition of the d_getattr() callback. geom_disk.c:Add a providergone callback for the disk class, g_disk_providergone(), that calls the user's d_gone() callback if it exists. Bump the DISK_VERSION to 2. geom_subr.c:In g_destroy_provider(), call the providergone callback if it has been provided. In g_new_geomf(), propagate the class's providergone callback to the new geom instance. disk.9: Update the disk(9) man page to include information on the new d_gone() callback, as well as the previously added d_getattr() callback, d_descr field, and HBA PCI ID fields. Ken -- Kenneth Merry k...@freebsd.org //depot/users/kenm/FreeBSD-test/share/man/man9/disk.9#1 - /usr/home/kenm/perforce4/kenm/FreeBSD-test/share/man/man9/disk.9 *** /tmp/tmp.81866.21 Wed Jun 20 22:19:20 2012 --- /usr/home/kenm/perforce4/kenm/FreeBSD-test/share/man/man9/disk.9Wed Jun 20 21:30:45 2012 *** *** 145,150 --- 145,160 .Xr dumpon 8 , this function is invoked from a very restricted system state after a kernel panic to record a copy of the system RAM to the disk. + .It Vt disk_getattr_t * Va d_getattr + Optional: if this method is provided, it gives the disk driver the + opportunity to override the default GEOM response to BIO_GETATTR requests. + This function should return -1 if the attribute is not handled, 0 if the + attribute is handled, or an errno to be passed to g_io_deliver(). + .It Vt disk_gone_t * Va d_gone + Optional: if this method is provided, it will be called after disk_gone() + is called, once GEOM has finished its cleanup process. + Once this callback is called, it is safe for the disk driver to free all of + its resources, as it will not be receiving further calls from GEOM. .El .Ss Mandatory Media Properties The following fields identify the size