On Wed, Sep 25, 2024 at 04:03:51PM +0530, Mukesh Ojha wrote:
> Multiple call to glink_subdev_stop() for the same remoteproc can happen
> if rproc_stop() fails from Process-A that leaves the rproc state to
> RPROC_CRASHED state later a call to recovery_store from user space in
> Process B triggers rproc_trigger_recovery() of the same remoteproc to
> recover it results in NULL pointer dereference issue in
> qcom_glink_smem_unregister().
> 
> Fix it by having a NULL check in glink_subdev_stop().
> 
>       Process-A                                       Process-B
> 
>   fatal error interrupt happens
> 
>   rproc_crash_handler_work()
>     mutex_lock_interruptible(&rproc->lock);
>     ...
> 
>        rproc->state = RPROC_CRASHED;
>     ...
>     mutex_unlock(&rproc->lock);
> 
>     rproc_trigger_recovery()
>      mutex_lock_interruptible(&rproc->lock);
> 
>       adsp_stop()
>       qcom_q6v5_pas 20c00000.remoteproc: failed to shutdown: -22
>       remoteproc remoteproc3: can't stop rproc: -22

I presume that at this point this remoteproc is in some undefined state
and the only way to recover is for the user to reboot the machine?


The check for glink->edge avoids one pitfall following this, but I'd
prefer to see a solution that avoids issues in this scenario in the
remoteproc core - rather than working around side effects of this in
different places.

Regards,
Bjorn

>      mutex_unlock(&rproc->lock);
> 
>                                               echo enabled > 
> /sys/class/remoteproc/remoteprocX/recovery
>                                               recovery_store()
>                                                rproc_trigger_recovery()
>                                                 
> mutex_lock_interruptible(&rproc->lock);
>                                                  rproc_stop()
>                                                   glink_subdev_stop()
>                                                     
> qcom_glink_smem_unregister() ==|
>                                                                               
>        |
>                                                                               
>        V
>                                                     Unable to handle kernel 
> NULL pointer dereference
>                                                                 at virtual 
> address 0000000000000358
> 
> Signed-off-by: Mukesh Ojha <quic_mo...@quicinc.com>
> ---
> - We can do this NULL check in qcom_glink_smem_unregister() as it is
>   exported function however, there is only one user of this. So, doing
>   it with current approach should also be fine.
> 
>  drivers/remoteproc/qcom_common.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/remoteproc/qcom_common.c 
> b/drivers/remoteproc/qcom_common.c
> index 8c8688f99f0a..52d6c9b99fdb 100644
> --- a/drivers/remoteproc/qcom_common.c
> +++ b/drivers/remoteproc/qcom_common.c
> @@ -209,6 +209,9 @@ static void glink_subdev_stop(struct rproc_subdev 
> *subdev, bool crashed)
>  {
>       struct qcom_rproc_glink *glink = to_glink_subdev(subdev);
>  
> +     if (!glink->edge)
> +             return;
> +
>       qcom_glink_smem_unregister(glink->edge);
>       glink->edge = NULL;
>  }
> -- 
> 2.34.1
> 
> 

Reply via email to