On 5/22/2026 8:07 PM, Stephan Gerhold wrote: > On Tue, May 19, 2026 at 12:24:23AM -0700, Jingyi Wang wrote: >> Subsystems can be brought out of reset by entities such as bootloaders. >> As the irq enablement could be later than subsystem bring up, the state >> of subsystem should be checked by reading SMP2P bits. >> >> A new qcom_pas_attach() function is introduced. if a crash state is >> detected for the subsystem, rproc_report_crash() is called. If the ready >> state is detected, it will be marked as "attached", otherwise it could >> be the early boot feature is not supported by other entities. In this >> case, the state will be marked as RPROC_OFFLINE so that the PAS driver >> can load the firmware and start the remoteproc. >> >> Co-developed-by: Gokul Krishna Krishnakumar >> <[email protected]> >> Signed-off-by: Gokul Krishna Krishnakumar >> <[email protected]> >> Signed-off-by: Jingyi Wang <[email protected]> > > Unfortunately, removing the ping-pong functionality that was present in > previous patch versions makes the whole mechanism a lot more fragile. > I'm not entirely sure if this has changed in SMP2P v2 or more recent > firmware versions, but in my experience the SMP2P "ready" bit does not > tell you if the remoteproc is actually running. The problem is that the > "ready" bit is asserted by the remoteproc when the firmware is ready, > but it is not cleared when you shutdown or forcibly stop the remoteproc. > > If this is still the case, you can easily reproduce that with the > following test: > > 1. Start the system as usual and let it attach the remoteproc > 2. Manually stop the remoteproc in sysfs (echo stop > state) > 3. modprobe -r qcom_q6v5_pas > 4. modprobe qcom_q6v5_pas > 5. If the "ready" bit is still set, the driver will try attaching the > remoteproc, but it's actually not running. No recovery will happen. > > In this situation, it is very difficult to detect the correct remoteproc > state without relying on an additional query mechanism like the > ping-pong feature.
This a valid use case and concern. We had a discussion with Bjorn, and want to take this scenario into consideration of the separate robustness improvement series[1]. Stephan could you agree to have the basic function in this series can be go in firstly. [1] https://lore.kernel.org/all/[email protected]/ > > You can make it a bit more reliable if you also check the status of the > "stop-ack" bit. This would tell you if the remoteproc was cleanly > stopped with the SMP2P "stop" mechanism. However, that will typically > still not fix the case above since nowadays remoteprocs are typically > stopped via the QMI qcom_sysmon and the "stop-ack" is not set in that > case. I believe this might set the separate "shutdown-ack" bit though > that is described for some SoCs, I never finished testing that. > > And even if you check both "stop-ack" and "shutdown-ack", that doesn't > tell you if the remoteproc was forcibly killed using > qcom_scm_pas_shutdown() without gracefully stopping it first. The ideal > solution would be querying the PAS API to tell us if the remoteproc is > actively running, but the last time I checked I was unfortunately not > able to find a documented call that would tell us that. It is a state currently kernel don't know whether the remoteproc is offline or crashed when ready==1 && error==0 && ping-pong==0 scenario. If it is re-modprob, the software don't have any data and only the firmware can tell us whether if it is active or not per my understanding. Maybe let's have this scenario and solution discussion in the other series I mentioned before. > > Thanks, > Stephan -- Thx and BRs, Aiqun(Maria) Yu

