edge in glink_subdev_stop()

Stephan Gerhold Tue, 21 Apr 2026 06:50:20 -0700

On Thu, Apr 16, 2026 at 10:52:17AM +0800, Jingyi Wang wrote:
> On 4/14/2026 4:27 PM, Stephan Gerhold wrote:
> > On Tue, Apr 14, 2026 at 11:23:50AM +0800, Jingyi Wang wrote:
> > > On 4/10/2026 10:15 PM, Stephan Gerhold wrote:
> > > > On Thu, Apr 09, 2026 at 01:46:22AM -0700, Jingyi Wang wrote:
> > > > > For rproc that doing attach, glink_subdev_start() is called only when
> > > > > attach successfully. If rproc_report_crash() is called in the attach
> > > > > function, rproc_boot_recovery()->rproc_stop()->glink_subdev_stop() 
> > > > > could
> > > > > be called and cause NULL pointer dereference:
> > > > > 
> > > > >    Unable to handle kernel NULL pointer dereference at virtual 
> > > > > address 0000000000000300
> > > > >    Mem abort info:
> > > > >    ...
> > > > >    pc : qcom_glink_smem_unregister+0x14/0x48 [qcom_glink_smem]
> > > > >    lr : glink_subdev_stop+0x1c/0x30 [qcom_common]
> > > > >    ...
> > > > >    Call trace:
> > > > >     qcom_glink_smem_unregister+0x14/0x48 [qcom_glink_smem] (P)
> > > > >     glink_subdev_stop+0x1c/0x30 [qcom_common]
> > > > >     rproc_stop+0x58/0x17c
> > > > >     rproc_trigger_recovery+0xb0/0x150
> > > > >     rproc_crash_handler_work+0xa4/0xc4
> > > > >     process_scheduled_works+0x18c/0x2d8
> > > > >     worker_thread+0x144/0x280
> > > > >     kthread+0x124/0x138
> > > > >     ret_from_fork+0x10/0x20
> > > > >    Code: a9be7bfd 910003fd a90153f3 aa0003f3 (b9430000)
> > > > >    ---[ end trace 0000000000000000 ]---
> > > > > 
> > > > > Add NULL pointer check in the glink_subdev_stop() to make sure
> > > > > qcom_glink_smem_unregister() will not be called if 
> > > > > glink_subdev_start()
> > > > > is not called.
> > > > > 
> > > > 
> > > > You mention the actual root problem here: Why is glink_subdev_stop()
> > > > called if glink_subdev_start() wasn't called?
> > > > 
> > > > The call to rproc_start_subdevices() in __rproc_attach() makes sure that
> > > > all subdevices are in consistent state when exiting the function (either
> > > > prepared+started or stopped+unprepared). Only if all subdevices were
> > > > started successfully, the rproc->state is changed to RPROC_ATTACHED.
> > > > 
> > > > In your case, attaching the rproc failed so the rproc->state should be
> > > > still RPROC_DETACHED. All subdevices should be stopped+unprepared. We
> > > > shouldn't stop/unprepare any subdevices again in this state, they all
> > > > might crash like glink does here.
> > > > 
> > > > We know that subdevices are already stopped+unprepared in RPROC_DETACHED
> > > > state, so I think you just need to skip rproc_stop_subdevices() and
> > > > rproc_unprepare_subdevices() inside rproc_stop() in this case, see diff
> > > > below.
> > > > 
> > > > @@ -1708,8 +1709,9 @@ static int rproc_stop(struct rproc *rproc, bool 
> > > > crashed)
> > > >         if (!rproc->ops->stop)
> > > >                 return -EINVAL;
> > > > -       /* Stop any subdevices for the remote processor */
> > > > -       rproc_stop_subdevices(rproc, crashed);
> > > > +       /* Stop any subdevices for the remote processor if it was 
> > > > attached */
> > > > +       if (rproc->state != RPROC_DETACHED)
> > > > +               rproc_stop_subdevices(rproc, crashed);
> > > >         /* the installed resource table is no longer accessible */
> > > >         ret = rproc_reset_rsc_table_on_stop(rproc);
> > > > @@ -1726,7 +1728,8 @@ static int rproc_stop(struct rproc *rproc, bool 
> > > > crashed)
> > > >                 return ret;
> > > >         }
> > > > -       rproc_unprepare_subdevices(rproc);
> > > > +       if (rproc->state != RPROC_DETACHED)
> > > > +               rproc_unprepare_subdevices(rproc);
> > > >         rproc->state = RPROC_OFFLINE;
> > > 
> > > In this case, rproc_crash_handler_work()->rproc_trigger_recovery()->
> > > rproc_boot_recovery()->rproc_stop()->glink_subdev_stop() is called,
> > > "rproc->state = RPROC_CRASHED" is set in the rproc_crash_handler_work
> > > before rproc_boot_recovery is called, so checking RPROC_DETACHED can
> > > not work for this case.
> > > 
> > 
> > You're right, I forgot about that. I think we need a more generic
> > solution for this though. rproc_stop_subdevices() should not be called
> > without a prior call to rproc_start_subdevices().
> > 
> > I think there are a couple of options for this:
> > 
> >   - Add a bool "subdevs_started" to struct rproc and manage that
> >     separately from the rproc->state.
> > 
> >   - Track the rproc state before the crash separately (something like
> >     rproc->state_before_crash) and check that in the stop path.
> > 
> >   - Add a new state RPROC_CRASHED_DETACHED to make sure the states are
> >     unique.
> > 
> >   - ...
> > 
> 
> Sure, I think a bool like subdevs_started will be better for maintain?
> 
> > Does the same issue also exist in qcom_pas_stop() of "[PATCH v5 4/5]
> > remoteproc: qcom: pas: Add late attach support for subsystems" [1]?
> > There you check for pas->rproc->state != RPROC_ATTACHED, wouldn't this
> > also fail for the RPROC_CRASHED case?
> > 
> 
> I tested calling rproc_report_crash directly during qcom_pas_attach but
> did not see issue, handover_issued is set only if attach is success
> so "handover = qcom_q6v5_unprepare(&pas->q6v5);" will return false and
> "qcom_pas_handover(&pas->q6v5);" will not be called.
>


Hm, as you mention, if you call rproc_report_crash() during
qcom_pas_attach() then handover_issued does not get set (so it's still
set to false). But qcom_q6v5_unprepare() returns !q6v5->handover_issued
(handover_issued negated), so !false -> true. So I think exactly the
opposite will happen and qcom_pas_handover(&pas->q6v5); will get called?
It should not be called in that case, because this will break the
reference counting for the regulator/clock resources.

In addition, even the disable_irq(q6v5->handover_irq); inside
qcom_q6v5_unprepare() is problematic. enable_irq()/disable_irq() are
also reference-counted, so disable_irq() should not be called without a
prior enable_irq() or you end up having the IRQ permanently disabled.
See e.g. commit 110be46f5afe2 ("remoteproc: qcom: q6v5: Avoid disabling
handover IRQ twice").

Thanks,
Stephan

Re: [PATCH 2/2] remoteproc: qcom: Check glink->edge in glink_subdev_stop()

Reply via email to