On Fri, Dec 12, 2025 at 12:10:37PM +0100, Luca Ceresoli wrote:
> Hi Maxime,
> 
> On Thu Dec 11, 2025 at 6:47 PM CET, Luca Ceresoli wrote:
> > Hi Maxime,
> >
> > On Mon Dec 1, 2025 at 5:51 PM CET, Maxime Ripard wrote:
> >> On Mon, Nov 24, 2025 at 05:25:39PM +0100, Luca Ceresoli wrote:
> >>> Hi Maxime,
> >>>
> >>> On Mon Nov 24, 2025 at 11:39 AM CET, Maxime Ripard wrote:
> >>> > On Wed, Nov 19, 2025 at 02:05:37PM +0100, Luca Ceresoli wrote:
> >>> >> Several drivers (about 20) follow the same pattern:
> >>> >>
> >>> >>  1. get a pointer to a bridge (typically the next bridge in the chain) 
> >>> >> by
> >>> >>     calling of_drm_find_bridge()
> >>> >>  2. store the returned pointer in the private driver data, keep it 
> >>> >> until
> >>> >>     driver .remove
> >>> >>  3. dereference the pointer at attach time and possibly at other times
> >>> >>
> >>> >> of_drm_find_bridge() is now deprecated because it does not increment 
> >>> >> the
> >>> >> refcount and should be replaced with drm_of_find_bridge() +
> >>> >> drm_bridge_put().
> >>> >>
> >>> >> However some of those drivers have a complex code flow and adding a
> >>> >> drm_bridge_put() call in all the appropriate locations is error-prone,
> >>> >> leads to ugly and more complex code, and can lead to errors over time 
> >>> >> with
> >>> >> code flow changes.
> >>> >>
> >>> >> To handle all those drivers in a straightforward way, add a devm 
> >>> >> variant of
> >>> >> drm_of_find_bridge() that adds a devm action to invoke drm_bridge_put()
> >>> >> when the said driver is removed. This allows all those drivers to put 
> >>> >> the
> >>> >> reference automatically and safely with a one line change:
> >>> >>
> >>> >>   - priv->next_bridge = of_drm_find_bridge(remote_np);
> >>> >>   + priv->next_bridge = devm_drm_of_find_bridge(dev, remote_np);
> >>> >>
> >>> >> Signed-off-by: Luca Ceresoli <[email protected]>
> >>> >>
> >>> >> ---
> >>> >>  drivers/gpu/drm/drm_bridge.c | 30 ++++++++++++++++++++++++++++++
> >>> >>  include/drm/drm_bridge.h     |  5 +++++
> >>> >>  2 files changed, 35 insertions(+)
> >>> >>
> >>> >> diff --git a/drivers/gpu/drm/drm_bridge.c 
> >>> >> b/drivers/gpu/drm/drm_bridge.c
> >>> >> index 09ad825f9cb8..c7baafbe5695 100644
> >>> >> --- a/drivers/gpu/drm/drm_bridge.c
> >>> >> +++ b/drivers/gpu/drm/drm_bridge.c
> >>> >> @@ -1446,6 +1446,36 @@ struct drm_bridge *drm_of_find_bridge(struct 
> >>> >> device_node *np)
> >>> >>  }
> >>> >>  EXPORT_SYMBOL(drm_of_find_bridge);
> >>> >>
> >>> >> +/**
> >>> >> + * devm_drm_of_find_bridge - find the bridge corresponding to the 
> >>> >> device
> >>> >> + *                         node in the global bridge list and add a 
> >>> >> devm
> >>> >> + *                         action to put it
> >>> >> + *
> >>> >> + * @dev: device requesting the bridge
> >>> >> + * @np: device node
> >>> >> + *
> >>> >> + * On success the returned bridge refcount is incremented, and a devm
> >>> >> + * action is added to call drm_bridge_put() when @dev is removed. So 
> >>> >> the
> >>> >> + * caller does not have to put the returned bridge explicitly.
> >>> >> + *
> >>> >> + * RETURNS:
> >>> >> + * drm_bridge control struct on success, NULL on failure
> >>> >> + */
> >>> >> +struct drm_bridge *devm_drm_of_find_bridge(struct device *dev, struct 
> >>> >> device_node *np)
> >>> >> +{
> >>> >> +      struct drm_bridge *bridge = drm_of_find_bridge(np);
> >>> >> +
> >>> >> +      if (bridge) {
> >>> >> +              int err = devm_add_action_or_reset(dev, 
> >>> >> drm_bridge_put_void, bridge);
> >>> >> +
> >>> >> +              if (err)
> >>> >> +                      return ERR_PTR(err);
> >>> >> +      }
> >>> >> +
> >>> >> +      return bridge;
> >>> >> +}
> >>> >> +EXPORT_SYMBOL(devm_drm_of_find_bridge);
> >>> >
> >>> > That's inherently unsafe though, because even if the bridge is removed
> >>> > other parts of DRM might still have a reference to it and could call
> >>> > into it.
> >>> >
> >>> > We'd then have dropped our reference to the next bridge, which could
> >>> > have been freed, and it's a use-after-free.
> >>>
> >>> I think you refer to this scenario:
> >>>
> >>>   1. pipeline: encoder --> bridge A --> bridge B --> bridge C
> >>>   2. encoder takes a reference to bridge B
> >>>      using devm_drm_of_find_bridge() or other means
> >>>   3. bridge B takes a next_bridge reference to bridge C
> >>>      using devm_drm_of_find_bridge()
> >>>   4. encoder calls (bridge B)->foo(), which in turns references
> >>>      next_bridge, e.g.:
> >>>
> >>>        b_foo() {
> >>>            bar(b->next_bridge);
> >>>        }
> >>>
> >>> If bridges B and C are removed, bridge C can be freed but B is still
> >>> allocated because the encoder holds a ref. So when step 4 happens, 'b->c'
> >>> would be a use-after-free (or NULL deref if b.remove cleared it, which is
> >>> just as bad).
> >>
> >> Yep.
> >>
> >>> If I got you correctly, then I'm a bit surprised by your comment. This
> >>> series is part of the first chapter of the hotplug work, which does not 
> >>> aim
> >>> at fixing everything but rather at fixing one part: handle dynamic
> >>> _allocation_ lifetime of drm_bridges by adding a refcount and
> >>> drm_bridge_get/put().
> >>>
> >>> Chapter 2 of the work is adding drm_bridge_enter/exit/unplug() [1] and
> >>> other changes in order to avoid code of drivers of removed bridges to
> >>> access fields they shouldn't. So the above example at point 4 would 
> >>> become:
> >>>
> >>>        b_foo() {
> >>>            if (!drm_bridge_enter())
> >>>                return;
> >>>            bar(b->c);
> >>>            drm_bridge_exit();
> >>>        }
> >>>
> >>> And that avoids 'b->c' after bridge B is removed.
> >>>
> >>> Does that answer your remark?
> >>
> >> Not really. I wasn't really questionning your current focus, or the way
> >> you laid out the current agenda or whatever.
> >>
> >> What I am questionning though is whether or not we want to introduce
> >> something we will have to untangle soon, and even more so when we're not
> >> mentioning it anywhere.
> >>
> >>> > It's more complicated than it sounds, because we only have access to the
> >>> > drm_device when the bridge is attached, so later than probe.
> >>> >
> >>> > I wonder if we shouldn't tie the lifetime of that reference to the
> >>> > lifetime of the bridge itself, and we would give up the next_bridge
> >>> > reference only when we're destroyed ourselves.
> >>>
> >>> I'm afraid I'm not following you, sorry. Do you refer to the time between
> >>> the bridge removal (driver .remove) and the last bridge put (when
> >>> deallocation happens)?
> >>>
> >>> In that time frame the struct drm_bridge is still allocated along with any
> >>> next_bridge pointer it may contain, but the following bridge could have
> >>> been deallocated.
> >>>
> >>> What do you mean by "give up the next_bridge"?
> >>
> >> What I was trying to say was that if we want to fix the problem you
> >> illustrated about, we need to give up the reference at __drm_bridge_free
> >> time. So each bridge having a reference to a bridge would need to do so
> >> in its destroy hook.
> >>
> >> Since it's quite a common pattern, it would make sense to add a
> >> next_bridge field to drm_bridge itself, so the core can do it
> >> automatically in __drm_bridge_free if that pointer is !NULL.
> >>
> >> But...
> >>
> >>> > Storing a list of all the references we need to drop is going to be
> >>> > intrusive though, so maybe the easiest way to do it would be to create a
> >>> > next_bridge field in drm_bridge, and only drop the reference stored
> >>> > there?
> >>> >
> >>> > And possibly tie the whole thing together using a helper?
> >>> >
> >>> > Anyway, I'm not sure it should be a prerequisite to this series. I we do
> >>> > want to go the devm_drm_of_find_bridge route however, we should at least
> >>> > document that it's unsafe, and add a TODO entry to clean up the mess
> >>> > later on.
> >>
> >> ... I *really* don't consider it something you need to work on right now.
> >>
> >>> Do you mean the drm variant is unsafe while the original
> >>> (drm_of_find_bridge() in this series, might be renamed) is not? I
> >>> don't see how that can happen. If the driver for bridge B were to use
> >>> drm_of_find_bridge(), that driver would be responsible to
> >>> drm_bridge_put(b->next_bridge) in its .remove() function or earlier.
> >>> So the next_bridge pointing to bridge C would equally become subject
> >>> to use-after-free.
> >>
> >> No, I was saying that both are equally unsafe. But we're adding a new,
> >> broken, helper, and we don't mention anywhere that it is. So what I was
> >> saying is mostly do we really want to introduce some more broken code
> >> when we know it is. And if we do, we should be really clear about it.
> >>
> >>> devm does not make it worse, on the opposite it postpones the
> >>> drm_bridge_put(next_bridge) as late as possible: just after
> >>> b.remove().
> >>
> >> Which doesn't really change anything, does it? I'd expect the window
> >> between the remove and final drm_bridge_put to be much wider than the
> >> execution time of remove itself.
> >>
> >>> One final, high-level thought about the various 'next_bridge' pointers 
> >>> that
> >>> many bridge drivers have. Most of them do:
> >>>
> >>>  0. have a 'struct drm_bridge next_bridge *' in their private struct
> >>>  1. take the next_bridge reference during probe or another startup phase
> >>>  2. store it in their private driver struct
> >>>  3. use it to call drm_bridge_attach
> >>>  4. (pending) put the reference to it in their .remove or earlier
> >>>
> >>> I'm wondering whether we could let the DRM bridge core do it all, by
> >>> removing items 0, 1, 2 and 4, and change 3 as:
> >>>
> >>> -     drm_bridge_attach(encoder, me->next_bridge, &me->bridge, flags);
> >>> +  drm_of_bridge_attach(encoder, &me->bridge, dev->of_node, 1, -1, flags);
> >>>
> >>> where dev->of_node and the following integers are the same flags passed to
> >>> devm_drm_of_get_bridge() and the like, i.e. the endpoint info needed to
> >>> walk the DT graph and reach the next bridge.
> >>>
> >>> This would allow the core to take care of all locking and lifetime of the
> >>> next bridge, and most (all?) bridges would never access any pointers to 
> >>> the
> >>> next bridge. The idea is to let the core do the right thing in a single
> >>> place instead of trying to make all drivers do the right thing (and
> >>> touching dozen files when needing to touch the logic).
> >>>
> >>> That is more a long-term ideal than something I'd do right now, but having
> >>> opinions would be very interesting.
> >>
> >> That was pretty much my point, yeah.
> >>
> >> Maxime
> >
> > Let me recap this discussion, because there are various aspects and I need
> > to clarify by view on it.
> >
> > First: the problem you discuss is about drm_of_find_bridge() introduced in
> > patch 1. The devm variant is just equally affected.
> >
> > You proposed adding a next_bridge field in struct drm_bridge so there is an
> > automated, common call to drm_bridge_put() (and setting it to NULL). It
> > would remove some burden on individual drivers of course, but I don't think
> > it would solve the problem. In the same scenario we are discussing
> > (i.e. encoder --> bridge A --> bridge B --> bridge C, then B+C get removed)
> > B's next_bridge would be automatically put, but the encoder could still
> > call B->foo(), which could still do B->next_bridge.
> 
> Ah, I realied I'm wrong here. Your proposal is to put the reference at
> __drm_bridge_free time, not a release time. So yes, it would work. At least
> for the simple cases, where there's only the next_bridge pointer stored.

Yes, that's exactly what I was trying to say :)

> > Additionally, as a matter of fact there are currently drivers storing
> > bridge pointers. The next_bridge is the most common case. Code using
> > drm_bridge_connector_init() for example can store up to eight of them, but
> > individual drivers are the hardest to hunt for.
> >
> > I can see these (potential) tools to handle this (not mutually exclusive):
> >
> >  1. remove drm_bridge pointers pointing to other bridges
> >  2. check whether a bridge (say B) still exists before any dereference
> >     to B->another_bridge: that's drm_bridge_enter/exit()
> >  3. let owners of bridge pointers be notified when a bridge is unplugged,
> >     so they can actively put their reference and clear their pointer
> >
> > For item 1, I think the drm_of_bridge_attach() idea quoted above would
> > work, at least for the simple cases where bridge drivers use the
> > next_bridge only for attach. A next_bridge pointer in struct drm_bridge is
> > not even needed in that case, the pointer would be computed from OF when
> > needed and not stored. I can do an experiment and send a first series, do
> > you think it would be useful?
> 
> I had a look and, while the implementation should be simple, only a few
> drivers could benefit right now. The majority fall into one of these
> categories:
> 
>  * drivers using drm_of_find_panel_or_bridge() or *_of_get_bridge()
>    (maybe 60-80% of all drivers, those will have to wait for the panel
>    improvements)
>  * drivers using the next_bridge pointer for more than just attach
>  * drivers doing more complicated stuff
> 
> I think your "put next_bridge in __drm_bridge_free" idea would fit well the
> 2nd category and perhaps also the 1st one. For the 3rd category we'd need
> something different, e.g. a per-driver .destroy callback.

Yep, that's fine. We should optimize for the common case, with an escape
hatch. That's exactly what we are talking about here.

> So, while your idea would work, it would avoid use-after-free but not
> prevent calls into a bridge code after the bridge is removed, which is, in
> the best case, useless. I still think we should aim at avoiding the
> dereferences to even happen, so my 3 ideas above still look to me important
> to evaluate.

It's useless indeed, but something that can and will happen, so we have
to take care into account still. We should avoid any unsafe behaviour,
but it's the best we can do.    

Maxime

Attachment: signature.asc
Description: PGP signature

Reply via email to