Hi Noralf, On Thursday, 31 August 2017 22:22:03 EEST Noralf Trønnes wrote: > Den 31.08.2017 14.59, skrev Laurent Pinchart: > > On Wednesday, 30 August 2017 20:18:49 EEST Daniel Vetter wrote: > >> On Wed, Aug 30, 2017 at 6:31 PM, Noralf Trønnes wrote: > >>> Den 28.08.2017 23.56, skrev Daniel Vetter: > >>>> On Mon, Aug 28, 2017 at 07:17:48PM +0200, Noralf Trønnes wrote: > >>>>> Support device unplugging to make tinydrm suitable for USB devices. > >>>>> > >>>>> Cc: David Lechner <da...@lechnology.com> > >>>>> Signed-off-by: Noralf Trønnes <nor...@tronnes.org> > >>>>> --- > >>>>> > >>>>> drivers/gpu/drm/tinydrm/core/tinydrm-core.c | 69 ++++++++++++++++++--- > >>>>> drivers/gpu/drm/tinydrm/mi0283qt.c | 4 ++ > >>>>> drivers/gpu/drm/tinydrm/mipi-dbi.c | 5 ++- > >>>>> drivers/gpu/drm/tinydrm/repaper.c | 9 +++- > >>>>> drivers/gpu/drm/tinydrm/st7586.c | 9 +++- > >>>>> include/drm/tinydrm/tinydrm.h | 5 +++ > >>>>> 6 files changed, 87 insertions(+), 14 deletions(-) > >>>>> > >>>>> diff --git a/drivers/gpu/drm/tinydrm/core/tinydrm-core.c > >>>>> b/drivers/gpu/drm/tinydrm/core/tinydrm-core.c > >>>>> index f11f4cd..3ccbcc5 100644 > >>>>> --- a/drivers/gpu/drm/tinydrm/core/tinydrm-core.c > >>>>> +++ b/drivers/gpu/drm/tinydrm/core/tinydrm-core.c > >>>>> @@ -32,6 +32,29 @@ > >>>>> * The driver allocates &tinydrm_device, initializes it using > >>>>> * devm_tinydrm_init(), sets up the pipeline using > >>>>> tinydrm_display_pipe_init() > >>>>> * and registers the DRM device using devm_tinydrm_register(). > >>>>> + * > >>>>> + * Device unplug > >>>>> + * ------------- > >>>>> + * > >>>>> + * tinydrm supports device unplugging when there's still open DRM or > >>>>> fbdev file > >>>>> + * handles. > >>>>> + * > >>>>> + * There are 2 ways for driver-device unbinding to happen: > >>>>> + * > >>>>> + * - The driver module is unloaded causing the driver to be > >>>>> unregistered. > >>>>> + * This can't happen as long as there's open file handles because a > >>>>> reference > >>>>> + * is taken on the module. > >>>> > >>>> Aside: you can do that, but then it works like a hw unplug, through the > >>>> unbind property in sysfs. > >>> > >>> The module is still pinned isn't it? > >> > >> Yup, but only the code is pinned, not any of the datastructures like > >> drm_device. > >> > >>> I can add sysfs unbind as a third way of triggering unbind: > >>> * - The sysfs driver _unbind_ file can be used to unbind the driver > >>> form the > >>> * device. This can happen any time. > >>> * > > > > Sounds good to me. > > > >>>>> + * > >>>>> + * - The device is removed (USB, Device Tree overlay). > >>>>> + * This can happen at any time. > >>>>> + * > >>>>> + * The driver needs to protect device resources from access after the > >>>>> device is > >>>>> + * gone. This is done checking drm_dev_is_unplugged(), typically in > >>>>> + * &drm_framebuffer_funcs.dirty, > >>>>> &drm_simple_display_pipe_funcs.enable and > >>>>> + * \.disable. Resources that doesn't face userspace and is only used > > > > s/doesn't/don't/ > > s/and is/and are/ > > > >>>>> with the > >>>>> + * device can be setup using devm\_ functions, but &tinydrm_device > >>>>> must be > >>>>> + * allocated using plain kzalloc() since it's lifetime can exceed > >>>>> that of the > > > > s/it's/its/ > > > >>>>> + * device. tinydrm_release() will free the structure. > >>>> > >>>> So here's a bit a dragon: There's no prevention of is_unplugged racing > >>>> against a drm_dev_unplug(). There's been various attempts to fixing > >>>> this, but they're all somewhat ugly. > >>>> > >>>> Either way, that's a bug in the drm core :-) > >>>> > >>>>> */
[snip] > >>>>> @@ -178,8 +222,8 @@ static void devm_tinydrm_release(void *data) > >>>>> * @driver: DRM driver > >>>>> * > >>>>> * This function initializes @tdev, the underlying DRM device and > >>>>> it's > > > > While at it, s/it's/its/ > > > >>>>> - * mode_config. Resources will be automatically freed on driver > >>>>> detach (devres) > >>>>> - * using drm_mode_config_cleanup() and drm_dev_unref(). > >>>>> + * mode_config. drm_dev_unref() is called on driver detach (devres) > >>>>> and when > >>>>> + * all refs are dropped, tinydrm_release() is called. > >>>>> * > >>>>> * Returns: > >>>>> * Zero on success, negative error code on failure. > >>>>> @@ -226,14 +270,17 @@ static int tinydrm_register(struct > >>>>> tinydrm_device *tdev) > >>>>> > >>>>> static void tinydrm_unregister(struct tinydrm_device *tdev) > >>>>> { > >>>>> - struct drm_fbdev_cma *fbdev_cma = tdev->fbdev_cma; > >>>>> - > >>>>> drm_atomic_helper_shutdown(&tdev->drm); > >>>>> > >>>>> - /* don't restore fbdev in lastclose, keep pipeline disabled */ > >>>>> - tdev->fbdev_cma = NULL; > >>>>> - drm_dev_unregister(&tdev->drm); > >>>>> - if (fbdev_cma) > >>>>> - drm_fbdev_cma_fini(fbdev_cma); > >>>>> + > >>>>> + /* Get a ref that will be put in tinydrm_fini() */ > >>>>> + drm_dev_ref(&tdev->drm); > >>>> > >>>> Why do we need that private ref? Grabbing references in unregister code > >>>> looks like a recipe for leaks ... > >>> > >>> Yeah, it's better to take the second ref in devm_tinydrm_register(). > >>> > >>> The reason I need 2 refs is because tinydrm is set up with 2 devm_ > >>> functions. This way I can leave all error path cleanup in the hands of > >>> devres. > >>> > >>> static int driver_probe(...) > >>> { > >>> > >>> ret = devm_tinydrm_init(...); > >>> if (ret) > >>> > >>> return ret; > >>> > >>> ret = tinydrm_display_pipe_init(...); > >>> if (ret) > >>> > >>> return ret; > >>> > >>> drm_mode_config_reset(...); > >>> > >>> return devm_tinydrm_register(...); > >>> > >>> } > >> > >> I don't think this works. What's worse, devm has a high chance of > >> freeing stuff at the wrong time since it's tied to the > >> drm_device->dev, and not to drm_device. > >> > >> I'm not sure what a better solution is, but when discussing all this > >> with Laurent we agreed that in general, devm_ needs to be considered > >> harmful. Looks simple, very easy to create broken code that doesn't > >> clean up properly on unplug. As much as leaks are bad, freeing before > >> all users are gone is worse. devm_ "fixes" the former and heavily > >> encourages the latter. > > > > I totally agree. A crash at unbind time is worse than a memory leak given > > that the bind/unbind cycles are unfrequent. Of course we need to aim for > > fixing both problems :-) > > I'm not sure I see how the use of devm_ causes any other problems than > the fact that the parent device has been deleted. In the case of usb, > after usb_driver.disconnect it's not allowed to access the usb_device. > This means that the driver has to protect against this. The same applies > to the devm_ resources, after the device is gone, they're not available > anymore. It ofc is easier to protect one resource than several, but in > principal it's the same, isn't it? Once the .remove() (or the equivalent for your bus of choice, .disconnect() for USB) method returns the driver isn't allowed to access the device anymore. This means no USB URB submission, no MMIO access, no interrupts, ... From that point of view hardware resource management with devm_* makes sense, for instance to unmap the MMIO registers right after .remove() returns (there's one possible issue with interrupts though, any IRQ requested through devm_request_irq() will only be disabled at the IRQ controller level after .remove() returns, possibly after clocks and power domains get disabled that might result in the device generating spurious interrupts - that's a side issue). Memory allocated with devm_k*alloc() is different, in that some of the structures are reference-counted (or at least should be) and can be released long after .remove() returns. This is the case of any structure needed to implement a device node .release() file operation (a.k.a. close(), not to be confused with the device driver .release()). The drm_release() implementation starts with struct drm_file *file_priv = filp->private_data; struct drm_minor *minor = file_priv->minor; struct drm_device *dev = minor->dev; and then dereferences dev heavily. It calls some of the drm_driver's operations, which might access the driver's private data (drm_device::dev_private or the driver structure that embeds drm_device). We thus can't allocate those structures using devm_k*alloc(). > For me the beauty of devres is that it removes the need for error paths > that are almost never exercised and the probe code is simpler and easy > to read. I agree with this, and I believe we could implement a managed memory allocator similar to devm_* that wouldn't be tied to struct device, and would thus allow control of the objects lifetime. The devres_* fields of struct device could be moved to a separate structure that then gets embedded in struct device, and the devres_* API would take a pointer to that structure instead of to a struct device. There are a few details to hammer out, but I think it should be feasible. > >> In your case I think devm_tinydrm_register is correct (but a bit > >> strange). > > > > Yes, I don't think an explicit tinydrm_unregister() call in the driver's > > .remove() handler would be so difficult. > > > >> devm_tinydrm_init otoh looks like all the code it has should be moved > >> into the drm_driver->release callback. You can't destroy the dirty_lock > >> while a thread might be holding it right at that moment. > > > > Agreed too. > > > > Furthermore the devm_kzalloc() in tinydrm_display_pipe_init() seems > > suspicious to me. It should be easy to replace it with kzalloc(), the > > mode can be freed in tinydrm_connector_destroy(). > > It's safe since it's protected by connector detect(), but I agree with you. > > > (And on a side note the direct mode copy without calling > > drm_mode_duplicate() in that function is also suspicious, although it > > might not be an issue in practice) > > > >>>>> + > >>>>> + drm_fbdev_cma_dev_unplug(tdev->fbdev_cma); > >>>>> + drm_dev_unplug(&tdev->drm); > >>>>> + > >>>>> + /* Make sure framebuffer flushing is done */ > >>>>> + mutex_lock(&tdev->dirty_lock); > >>>>> + mutex_unlock(&tdev->dirty_lock); > >>>> > >>>> Is this really needed? Or, doesn't it just paper over a driver bug you > >>>> have already anyway, since native kms userspace can directly call > >>>> fb->funcs->dirty too, and you already protect against that. > >>>> > >>>> This definitely looks like the fbdev helper is leaking implementation > >>>> details to callers where it shouldn't do that. > >>> > >>> Flushing can happen while drm_dev_unplug() is called, and when we leave > >>> this function the device facing resources controlled by devres will be > >>> removed. Thus I have to make sure any such flushing is done before > >>> leaving so the next flush is stopped by the drm_dev_is_unplugged() > >>> check. I don't see any other way of ensuring that. > >>> > >>> I see now that I should move the call to drm_atomic_helper_shutdown() > >>> after drm_dev_unplug() to properly protect the pipe .enable/.disable > >>> callbacks. > >> > >> Hm, calling _shutdown when the hw is gone already won't end well. > >> Fundamentally this race exists for all use-cases, and I'm somewhat > >> leaning towards plugging it in the core. > >> > >> The general solution probably involves something that smells a lot > >> like srcu, i.e. at every possible entry point into a drm driver > >> (ioctl, fbdev, dma-buf sharing, everything really) we take that > >> super-cheap read-side look, and drop it when we leave. > > > > That's similar to what we plan to do in V4L2. The idea is to set a device > > removed flag at the beginning of the .remove() handler and wait for all > > pending operations to complete. The core will reject any new operation > > when the flag is set. To wait for completion, every entry point would > > increase a use count, and decrease it on exit. When the use count is > > decreased to 0 waiters will be woken up. This should solve the unplug/user > > race. > > Ah, such a simple solution, easy to understand and difficult to get wrong! > And it's even nestable, no danger of deadlocking. > > Maybe I can use it with tinydrm: It would be better to implement that in the DRM core to reject all ioctls early, as some of them could call drm_driver operations that can't return an error (such as .disable_vblank() for instance). > * @dev_use: Tracks use of functions acessing the parent device. > * If it is zero, the device is gone. See ... > struct tinydrm_device { > atomic_t dev_use; > }; > > /** > * tinydrm_dev_enter - Enter device accessing function > * @tdev: tinydrm device > * > * This function protects against using a device and it's resources after > * it's removed. Should be called at the beginning of the function. > * > * Returns: > * False if the device is still present, true if it is gone. > */ > static inline bool tinydrm_dev_enter(struct tinydrm_device *tdev) > { > return !atomic_inc_not_zero(&tdev->dev_use); > } > > static inline void tinydrm_dev_exit(struct tinydrm_device *tdev) > { > atomic_dec(&tdev->dev_use); > } > > static inline bool tinydrm_is_unplugged(struct tinydrm_device *tdev) > { > bool ret = !atomic_read(&tdev->dev_use); > smp_rmb(); > return ret; > } > > > static int tinydrm_init(...) > { > /* initialize */ > > /* Set device is present */ > atomic_set(&tdev->dev_use, 1); > } > > static void tinydrm_unregister(...) > { > /* Set device gone */ > atomic_dec(&tdev->dev_use); > > /* Wait for all device facing functions to finish */ > while (!tinydrm_is_unplugged(tdev)) { > cond_resched(); > } > > /* proceed with unregistering */ > } > > static int mipi_dbi_fb_dirty(...) > { > if (tinydrm_dev_enter(tdev)) > return -ENODEV; Nitpicking, isn't the right error code -ENXIO ? > /* flush framebuffer */ > > tinydrm_dev_exit(tdev); > } > > >> Then on unplug we call synchroize_srcu, which essentially does what > >> your lock grab&drop trick does. Just much less overhead on the read > >> side (we can sprinkle this over everything without worrying about > >> wasting cpu time), and a bit clearer in the intention on the _unplug > >> side. > > > > Looks like srcu already implements what we need. I'm not sure we need an > > RCU- based solution though, it might be too complex. > > > >> But yeah, another thing that'll be a pile of work to fix properly. I'd > >> leave it at a FIXME comment in drm_dev_unplug() until we get to it. > >> And for now not worry about all the possible races. One thing at a > >> time and all that. > >> > >>>>> } > >>>>> > >>>>> static void devm_tinydrm_register_release(void *data) > >>>>> diff --git a/drivers/gpu/drm/tinydrm/mi0283qt.c > >>>>> b/drivers/gpu/drm/tinydrm/mi0283qt.c > >>>>> index 2465489..84ab8d1 100644 > >>>>> --- a/drivers/gpu/drm/tinydrm/mi0283qt.c > >>>>> +++ b/drivers/gpu/drm/tinydrm/mi0283qt.c > >>>>> @@ -31,6 +31,9 @@ static void mi0283qt_enable(struct [snip] > >>>>> @@ -133,6 +136,7 @@ static struct drm_driver mi0283qt_driver = { > >>>>> DRIVER_ATOMIC, > >>>>> .fops = &mi0283qt_fops, > >>>>> TINYDRM_GEM_DRIVER_OPS, > >>>>> + .release = tinydrm_release, > > > > Should this be included in the TINYDRM_GEM_DRIVER_OPS macro ? > > I did that first, then I saw the name GEM OPS, and put it outside :-) You could rename the macro to TINYDRM_DRIVER_OPS :-) The only drawback is that drivers wouldn't be able to provide their own .release() operation, it might become a problem later. > >>>>> .lastclose = tinydrm_lastclose, > >>>>> .debugfs_init = mipi_dbi_debugfs_init, > >>>>> .name = "mi0283qt", > > > > [snip] -- Regards, Laurent Pinchart _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel