Re: minor GEOM disk API change coming
On Fri, Jun 22, 2012 at 19:50:01 +0300, Alexander Motin wrote: > Hi. > > I understand problem you are going to fix and I think your patch should > do it. What I don't very like is addition of new GEOM method. Now GEOM > doesn't need it because all internal open/close operations and provider > destructions there protected by the topology SX lock. Unluckily that > lock doesn't cover g_wither_provider(), called by disk_gone() while > holding CAM SIM lock. If not that SIM lock, it would be enough to just > grab and drop GEOM topology lock to ensure that no new open() calls will > follow. Indirect way to do it could be to post GEOM event that would > drop the reference as soon as it will be handled and can obtain the > topology lock. Unluckily it uses malloc() for event storage and also can > be unreliable if called from under the SIM mutex lock. So it seems many > things would be much easier if it was possible to drop SIM lock inside > periph invalidate method, but now it is unsafe > > That is not an objection, just some thoughts about. Yeah, there are things in CAM (and GEOM) that need to be cleaned up. I wouldn't have added a GEOM method if there were a reasonable way around it, but as you pointed out, there isn't right now. I committed the patch, and plan to merge it to stable/9. Ken -- Kenneth Merry k...@freebsd.org ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: minor GEOM disk API change coming
Hi. I understand problem you are going to fix and I think your patch should do it. What I don't very like is addition of new GEOM method. Now GEOM doesn't need it because all internal open/close operations and provider destructions there protected by the topology SX lock. Unluckily that lock doesn't cover g_wither_provider(), called by disk_gone() while holding CAM SIM lock. If not that SIM lock, it would be enough to just grab and drop GEOM topology lock to ensure that no new open() calls will follow. Indirect way to do it could be to post GEOM event that would drop the reference as soon as it will be handled and can obtain the topology lock. Unluckily it uses malloc() for event storage and also can be unreliable if called from under the SIM mutex lock. So it seems many things would be much easier if it was possible to drop SIM lock inside periph invalidate method, but now it is unsafe That is not an objection, just some thoughts about. -- Alexander Motin ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: minor GEOM disk API change coming
On Thu, Jun 21, 2012 at 23:58:10 +0400, Andrey V. Elsukov wrote: > On 21.06.2012 20:48, Kenneth D. Merry wrote: > >>> In this case, the GEOM disk class instance has been created by > >>> disk_create(), and the taste of the disk is queued in the GEOM > >>> event queue. > >>> > >>> While that event is queued, the da(4) instance goes away. When the > >>> open call comes into the da(4) driver, it dereferences the freed > >>> (but non-NULL) peripheral pointer provided by GEOM, which results > >>> in a panic. > >> > >> I think this situation is very specific for the GEOM_DISK class, and > >> this callback will be less useful for other classes. > >> Does g_cancel_event() cannot help you prevent tasting? > > > > Calling g_cancel_event(), for instance from disk_gone(), would not > > completely close the race condition. It can't cancel an event that is > > already in progress, and it is possible for the peripheral to go away while > > the event is marked in progress but before the taste gets far enough into > > daopen() to acquire a reference to the peripheral. > > If i understand correctly your patch, you acquires a reference to the > periph and release it when g_destroy_provider finished. What if you will > queue some custom event from the disk_gone() that will call > cddiskgonecb()? Does it close the race? This event will be executed > after the taste completes. That still would not close the race. It would still be possible for another context to come along and open the device at any point. Ken -- Kenneth Merry k...@freebsd.org ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: minor GEOM disk API change coming
On 21.06.2012 20:48, Kenneth D. Merry wrote: >>> In this case, the GEOM disk class instance has been created by >>> disk_create(), and the taste of the disk is queued in the GEOM >>> event queue. >>> >>> While that event is queued, the da(4) instance goes away. When the >>> open call comes into the da(4) driver, it dereferences the freed >>> (but non-NULL) peripheral pointer provided by GEOM, which results >>> in a panic. >> >> I think this situation is very specific for the GEOM_DISK class, and >> this callback will be less useful for other classes. >> Does g_cancel_event() cannot help you prevent tasting? > > Calling g_cancel_event(), for instance from disk_gone(), would not > completely close the race condition. It can't cancel an event that is > already in progress, and it is possible for the peripheral to go away while > the event is marked in progress but before the taste gets far enough into > daopen() to acquire a reference to the peripheral. If i understand correctly your patch, you acquires a reference to the periph and release it when g_destroy_provider finished. What if you will queue some custom event from the disk_gone() that will call cddiskgonecb()? Does it close the race? This event will be executed after the taste completes. -- WBR, Andrey V. Elsukov signature.asc Description: OpenPGP digital signature
Re: minor GEOM disk API change coming
On Thu, Jun 21, 2012 at 19:53:03 +0400, Andrey V. Elsukov wrote: > On 21.06.2012 08:29, Kenneth D. Merry wrote: > > Fix a bug which causes a panic in daopen(). The panic is caused by > > a da(4) instance going away while GEOM is still probing it. > > > > In this case, the GEOM disk class instance has been created by > > disk_create(), and the taste of the disk is queued in the GEOM > > event queue. > > > > While that event is queued, the da(4) instance goes away. When the > > open call comes into the da(4) driver, it dereferences the freed > > (but non-NULL) peripheral pointer provided by GEOM, which results > > in a panic. > > I think this situation is very specific for the GEOM_DISK class, and > this callback will be less useful for other classes. > Does g_cancel_event() cannot help you prevent tasting? Calling g_cancel_event(), for instance from disk_gone(), would not completely close the race condition. It can't cancel an event that is already in progress, and it is possible for the peripheral to go away while the event is marked in progress but before the taste gets far enough into daopen() to acquire a reference to the peripheral. Ken -- Kenneth Merry k...@freebsd.org ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
Re: minor GEOM disk API change coming
On 21.06.2012 08:29, Kenneth D. Merry wrote: > Fix a bug which causes a panic in daopen(). The panic is caused by > a da(4) instance going away while GEOM is still probing it. > > In this case, the GEOM disk class instance has been created by > disk_create(), and the taste of the disk is queued in the GEOM > event queue. > > While that event is queued, the da(4) instance goes away. When the > open call comes into the da(4) driver, it dereferences the freed > (but non-NULL) peripheral pointer provided by GEOM, which results > in a panic. I think this situation is very specific for the GEOM_DISK class, and this callback will be less useful for other classes. Does g_cancel_event() cannot help you prevent tasting? -- WBR, Andrey V. Elsukov signature.asc Description: OpenPGP digital signature