On Mon, Mar 29, 2021 at 3:44 PM Jason Gunthorpe <j...@nvidia.com> wrote: > > On Mon, Mar 29, 2021 at 02:03:37PM -0700, Dan Williams wrote: > > > Ugh, exactly why I was motivated to attempt to preclude this with new > > core infrastructure that attempted to fix this centrally [1]. Remove > > the possibility of "others" getting this wrong. However after my > > initial idea bounced off Greg then I ended up shipping this bug in the > > local rewrite. I think the debugfs api gets this right in terms of > > centralizing the reference count management, and I want to see > > something similar for common driver ioctl patterns. > > There is a lot of variety here, I'm not sure how much valuable common > code there will be that could be lifted into the core.. srcu, > refcount, rwsem, percpu_ref, etc are all common implementations with > various properties. > > The easist implementation is to just block driver destruction with a > refcount & completion pattern > > The hardest is to allow the underlying HW driver to be removed from > the fops while the file remains open. > > Usually whatever scheme is used has to flow into some in-kernel API as > well, so isolating it in cdev may no be entirely helpful. > > The easisted helper API would be to add an 'unregistration lock' to > the struct device that blocks unregistration. A refcount & completion > for instance. I've seen that open coded enough times.
I do agree there is too much variety to widely unify. At the same time it is a common enough pattern for devices that allow removal before final close, especially devices that support hot-removal disconnecting is a better pattern than blocking unregisteration. Just the small matter of time to see this through...