On Tue, Jun 09, 2026 at 01:43:17PM +0200, Stephan Gerhold wrote:
> On Tue, Jun 09, 2026 at 03:52:52PM +0530, Mukesh Ojha wrote:
> > If a subdevice fails to stop, it indicates broken communication with the
> > DSP. Continuing to stop further subdevices against an unresponsive
> > remote processor could close rpmsg devices that could remove the memory
> > mapping from HLOS and in case if remote processor touches those memory
> > can result in SMMU fault.
> > 
> > Change rproc_stop_subdevices() to return int and abort on the first
> > failing subdev. Propagate the error through rproc_stop() and
> > __rproc_detach() so callers are aware the teardown did not complete
> > cleanly.
> > 
> > Signed-off-by: Mukesh Ojha <[email protected]>
> 
> But what would callers do about this? If you abort the teardown sequence
> half-way through you now have an inconsistent half-stopped state that
> neither a new call to stop() nor a new call to start() could recover
> from. That doesn't sound much better than the SMMU fault. Or am I
> missing something here?

SMMU fault result in device crash while other is non-functional remote
processor. From Linux side, we do not know the state of remote processor
when the timeout happens..cleaning the subdevices can result in the
debug data being lost for hung remote processor.

> 
> I would expect that we should either be able to tolerate the SMMU faults
> with the resets involved in the remoteproc stop/start sequence, or that
> DMA gets cancelled by the remoteproc stop sequence, before the buffers
> are unmapped. Perhaps the order of our stop sequence is just wrong? Can
> we unmap the buffers in the subdev unprepare() callback?


IMO, Sequence of subdevice is fine 

 glink-> sysmon-> ssr     start

 ssr -> sysmon-> glink    stop

glink subdevice gets cleared due to which this issue happens.., it will
not help as we are ignoring the timeout.


> Thanks,
> Stephan

-- 
-Mukesh Ojha

Reply via email to