On Thu, Jun 11, 2026 at 11:54:46AM +0200, Stephan Gerhold wrote: > On Thu, Jun 11, 2026 at 03:18:51PM +0530, Mukesh Ojha wrote: > > On Tue, Jun 09, 2026 at 01:43:17PM +0200, Stephan Gerhold wrote: > > > On Tue, Jun 09, 2026 at 03:52:52PM +0530, Mukesh Ojha wrote: > > > > If a subdevice fails to stop, it indicates broken communication with the > > > > DSP. Continuing to stop further subdevices against an unresponsive > > > > remote processor could close rpmsg devices that could remove the memory > > > > mapping from HLOS and in case if remote processor touches those memory > > > > can result in SMMU fault. > > > > > > > > Change rproc_stop_subdevices() to return int and abort on the first > > > > failing subdev. Propagate the error through rproc_stop() and > > > > __rproc_detach() so callers are aware the teardown did not complete > > > > cleanly. > > > > > > > > Signed-off-by: Mukesh Ojha <[email protected]> > > > > > > But what would callers do about this? If you abort the teardown sequence > > > half-way through you now have an inconsistent half-stopped state that > > > neither a new call to stop() nor a new call to start() could recover > > > from. That doesn't sound much better than the SMMU fault. Or am I > > > missing something here? > > > > SMMU fault result in device crash while other is non-functional remote > > processor. From Linux side, we do not know the state of remote processor > > when the timeout happens..cleaning the subdevices can result in the > > debug data being lost for hung remote processor. > > > > Ok, but how do we go from here? Do we expect that the system would have > some userspace monitoring daemon that would collect the debug data and > then reboot the device to make the remoteproc work again?
I would expect the manually collected crash dump in this state to find out the exact reason for remoteproc being stuck instead of ignoring it and claiming a graceful shutdown. Whatever we do here, the remote may be dysfunctional without a reboot, but cleaning the rpmsg device will clean all the required debug data, or at least if possible, tell the rpmsg driver with the rproc state that shutdown was tried but was not graceful. > > With these changes, I don't see how you would start the remoteproc again > without fully rebooting the board. Calling start()/stop() on the > subdevices again would lead to crashes because some of them are in > started state and some of them are in stopped state and we don't even > know which one is in which state. > > Thanks, > Stephan -- -Mukesh Ojha

