On Tue, Nov 03, 2020 at 12:25:22PM +0100, Maxime Ripard wrote:
> Hi!
> 
> On Tue, Nov 03, 2020 at 11:55:08AM +0100, Daniel Vetter wrote:
> > On Tue, Nov 03, 2020 at 11:38:32AM +0100, Maxime Ripard wrote:
> > > On Tue, Nov 03, 2020 at 11:10:27AM +0100, Thomas Zimmermann wrote:
> > > > Hi
> > > > 
> > > > Am 03.11.20 um 10:52 schrieb Maxime Ripard:
> > > > > On Tue, Nov 03, 2020 at 10:10:41AM +0800, Tian Tao wrote:
> > > > >> Add new api devm_drm_irq_install() to register interrupts,
> > > > >> no need to call drm_irq_uninstall() when the drm module is removed.
> > > > >>
> > > > >> v2:
> > > > >> fixed the wrong parameter.
> > > > >>
> > > > >> Signed-off-by: Tian Tao <tiant...@hisilicon.com>
> > > > >> ---
> > > > >>  drivers/gpu/drm/drm_drv.c | 23 +++++++++++++++++++++++
> > > > >>  include/drm/drm_drv.h     |  3 ++-
> > > > >>  2 files changed, 25 insertions(+), 1 deletion(-)
> > > > >>
> > > > >> diff --git a/drivers/gpu/drm/drm_drv.c b/drivers/gpu/drm/drm_drv.c
> > > > >> index cd162d4..0fe5243 100644
> > > > >> --- a/drivers/gpu/drm/drm_drv.c
> > > > >> +++ b/drivers/gpu/drm/drm_drv.c
> > > > >> @@ -39,6 +39,7 @@
> > > > >>  #include <drm/drm_color_mgmt.h>
> > > > >>  #include <drm/drm_drv.h>
> > > > >>  #include <drm/drm_file.h>
> > > > >> +#include <drm/drm_irq.h>
> > > > >>  #include <drm/drm_managed.h>
> > > > >>  #include <drm/drm_mode_object.h>
> > > > >>  #include <drm/drm_print.h>
> > > > >> @@ -678,6 +679,28 @@ static int devm_drm_dev_init(struct device 
> > > > >> *parent,
> > > > >>      return ret;
> > > > >>  }
> > > > >>  
> > > > >> +static void devm_drm_dev_irq_uninstall(void *data)
> > > > >> +{
> > > > >> +    drm_irq_uninstall(data);
> > > > >> +}
> > > > >> +
> > > > >> +int devm_drm_irq_install(struct device *parent,
> > > > >> +                     struct drm_device *dev, int irq)
> > > > >> +{
> > > > >> +    int ret;
> > > > >> +
> > > > >> +    ret = drm_irq_install(dev, irq);
> > > > >> +    if (ret)
> > > > >> +            return ret;
> > > > >> +
> > > > >> +    ret = devm_add_action(parent, devm_drm_dev_irq_uninstall, dev);
> > > > >> +    if (ret)
> > > > >> +            devm_drm_dev_irq_uninstall(dev);
> > > > >> +
> > > > >> +    return ret;
> > > > >> +}
> > > > >> +EXPORT_SYMBOL(devm_drm_irq_install);
> > > > >> +
> > > > > 
> > > > > Shouldn't we tie the IRQ to the drm device (so with drmm_add_action)
> > > > > instead of tying it to the underlying device?
> > > > 
> > > > If the HW device goes away, there won't be any more interrupts. So it's
> > > > similar to devm_ functions for I/O memory. Why would you use the drmm_
> > > > interface?
> > > 
> > > drm_irq_install/uninstall do more that just calling request_irq and
> > > free_irq though, they will also run (among other things) the irq-related
> > > hooks in the drm driver (irq_preinstall, irq_postinstall irq_uninstall)
> > > and wake anything waiting for a vblank to occur, so we need the DRM
> > > device and driver to still be around when we run drm_irq_uninstall.
> > > That's why (I assume) you have to pass the drm_device as an argument and
> > > not simply the device.
> > 
> > drm_device is guaranteed to outlive devm_, plus the hooks are meant to
> > shut down hw. hw isn't around anymore when we do drmm_ cleanup, at least
> > not in full generality.
> 
> drm_dev_put is either called through devm or in remove / unbind, and the
> drm_device takes a reference on its parent device, so how can the
> drm_device outlive its parent device?

Oh there's more than just that going on. struct device has 2 lifetime
things:
- devres resources: These are release on a) on driver unbind b) driver
  bind failure. Which means if you hotunplug, then devres is gone
- the kmalloced piece of memory containing the struct device, refcounted
  with kref. Totally independent.

So hw resources like irq should be managed with devres. Memory allocations
(to prevent use-after-free) should be refcounted by a kref somewhere. In
the case of struct device that's done by the driver core. In the case of
struct drm_device and all the stuff hanging off that, it's done by drmm_
(ideally at least, since in practice all drivers except i915 get it wrong
without drmm_).

Managing memory allocations with devres is almost always a bug.

So when you unbind/hotunplug a device, the following happens:
- driver unbind gets called and finishes
- devres cleans up hw resources
- as one of the last devres action the drm_dev_put gets called
- (if no userspace is around or anything else that holds a drm_device
  reference) drmm_ cleans up drm_device resources
- as the last cleanup drmm_ calls put_device()
- the actual struct device gets released

> > > This probably works in most case since you would allocate the drm_device
> > > with devm_drm_dev_alloc, and then run drm_irq_install, so in the undoing
> > > phase you would have first drm_irq_uninstall to run, and everything is
> > > fine.
> > > 
> > > However, if one doesn't use devm_drm_dev_alloc but would use
> > > devm_drm_irq_install, you would have first remove being called that
> > > would free the drm device, and then drm_irq_uninstall which will use a
> > > free'd pointer.
> > 
> > Don't do that, it's broken :-)
> 
> Well, yeah it's usually a pretty bad situation, but if we can fix it for
> free it doesn't hurt?

See my comment somewhere, if the devm_drm_irq_install also holds a
drm_dev_get reference, then no matter which wrong way you set stuff up,
the right thing should happen.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch

Reply via email to