On Thu, Mar 19, 2026 at 01:44:48PM +0800, Jingyi Wang wrote:
> 
> 
> On 3/19/2026 1:23 PM, Dmitry Baryshkov wrote:
> > On Thu, Mar 19, 2026 at 12:36:15PM +0800, Jingyi Wang wrote:
> > > 
> > > 
> > > On 3/13/2026 10:37 AM, Dmitry Baryshkov wrote:
> > > > On Wed, Mar 11, 2026 at 01:39:50AM -0700, Bartosz Golaszewski wrote:
> > > > > On Wed, 11 Mar 2026 03:11:42 +0100, Dmitry Baryshkov
> > > > > <[email protected]> said:
> > > > > > On Tue, Mar 10, 2026 at 06:50:30AM -0700, Bartosz Golaszewski wrote:
> > > > > > > 
> > > > > > > Ideally things like this would be passed to the rproc core in 
> > > > > > > some kind of a
> > > > > > > config structure and only set when registration succeeds. This 
> > > > > > > looks to me
> > > > > > > like papering over the real issue and I think it's still racy as 
> > > > > > > there's no
> > > > > > > true synchronization.
> > > > > > > 
> > > > > > > Wouldn't it be better to take rproc->lock for the entire duration 
> > > > > > > of
> > > > > > > rproc_add()? It's already initialized in rproc_alloc().
> > > > > > 
> > > > > > It would still be racy as rproc_trigger_recovery() is called 
> > > > > > outside of
> > > > > > the lock. Instead the error cleanup path (and BTW, rproc_del() path 
> > > > > > too)
> > > > > > must explicitly call cancel_work_sync() on the crash_handler work 
> > > > > > (and
> > > > > > any other work items that can be scheduled).
> > > > > > 
> > > > > 
> > > > > This looks weird TBH. For example: rproc_crash_handler_work() takes 
> > > > > the lock,
> > > > > but releases it right before calling inspecting 
> > > > > rproc->recovery_disabled and
> > > > > calling rproc_trigger_recovery(). It looks wrong, I think it should 
> > > > > keep the
> > > > > lock and rptoc_trigger_recovery() should enforce it being taken 
> > > > > before the
> > > > > call.
> > > > 
> > > > Yes. Nevertheless the driver should cancel the work too.
> > > > 
> > > 
> > > Hi Dmitry & Bartosz,
> > > 
> > > rproc_crash_handler_work() may call rproc_trigger_recovery() and
> > > rproc_add() may call rproc_boot(), both the function have already
> > > hold the lock. And the lock cannot protect resources like glink_subdev
> > > in the patch.
> > > 
> > > And there is a possible case for cancel_work, rproc_add tear down call
> > > cancel work and wait for the work finished, the reboot run successfully,
> > > and the tear down continued and the resources all released, including 
> > > sysfs
> > > and glink_subdev.
> > > 
> > > Indeed recovery_disabled is kind of hacky.
> > > The root cause for this issue is that for remoteproc with RPROC_OFFLINE
> > > state, the rproc_start will be called asynchronously, but for the 
> > > remoteproc
> > > with RPROC_DETACHED, the attach function is called directly, the failure
> > > in this path will cause the rproc_add() fail and the resource release.
> > > I think the current patch can be dropped, we are thinking about make 
> > > rproc_attach
> > > called asynchronously to avoid this race.
> > 
> > Isn't this patch necessary for SoCCP bringup? If not, why did you
> > include it into the series?
> > 
> yes, will squash to soccp patch in next versoin.

I'm sorry, but that doesn't make sense to me.

The SoCCP patch adds support for attaching SoCCP. This change tries to
address a generic problem shared across all remoteproc drivers (that
does attach?).

I think you should interpret Dmitry's comment as "why is this part of
this series, please fix this problem in a separate series".

Regards,
Bjorn

Reply via email to