On Thu, 2023-09-14 at 11:34 -0400, Vivi, Rodrigo wrote:
> On Sat, Sep 09, 2023 at 08:58:45PM -0700, Alan Previn wrote:
alan:snip
>
> > + /* Change context state to destroyed and get gt-pm */
> > + __intel_gt_pm_get(gt);
> > + set_context_destroyed(ce);
> > + clr_context_registered(ce);
> > +
> > + ret = deregister_context(ce, ce->guc_id.id);
> > + if (ret) {
> > + /* Undo the state change and put gt-pm if that failed */
> > + set_context_registered(ce);
> > + clr_context_destroyed(ce);
> > + intel_gt_pm_put(gt);
>
> This is a might_sleep inside a spin_lock! Are you 100% confident no WARN
> was seeing during the tests indicated in the commit msg?
>
> > + }
> > + spin_unlock_irqrestore(&ce->guc_state.lock, flags);
> > +
> > + return 0;
>
> If you are always returning 0, there's no pointing in s/void/int...
alan: Hi Rodrigo - i actually forget that i need the error value returned
so that _guc_context_destroy can put the context back into the
guc->submission_state.destroyed_contexts linked list. So i clearly missed
returning
the error in the case deregister_context fails.
>
> > }
> >
> > static void __guc_context_destroy(struct intel_context *ce)
> > @@ -3268,7 +3296,22 @@ static void deregister_destroyed_contexts(struct
> > intel_guc *guc)
> > if (!ce)
> > break;
> >
> > - guc_lrc_desc_unpin(ce);
> > + if (guc_lrc_desc_unpin(ce)) {
> > + /*
> > + * This means GuC's CT link severed mid-way which could
> > happen
> > + * in suspend-resume corner cases. In this case, put the
> > + * context back into the destroyed_contexts list which
> > will
> > + * get picked up on the next context deregistration
> > event or
> > + * purged in a GuC sanitization event
> > (reset/unload/wedged/...).
> > + */
> > + spin_lock_irqsave(&guc->submission_state.lock, flags);
> > + list_add_tail(&ce->destroyed_link,
> > +
> > &guc->submission_state.destroyed_contexts);
> > + spin_unlock_irqrestore(&guc->submission_state.lock,
> > flags);
alan:snip