On Wed, 2023-08-02 at 16:35 -0700, Teres Alexis, Alan Previn wrote: > If we are at the end of suspend or very early in resume > its possible an async fence signal could lead us to the > execution of the context destruction worker (after the > prior worker flush). > alan:snip > > static void __guc_context_destroy(struct intel_context *ce) > @@ -3270,7 +3287,20 @@ static void deregister_destroyed_contexts(struct > intel_guc *guc) > if (!ce) > break; > > - guc_lrc_desc_unpin(ce); > + if (guc_lrc_desc_unpin(ce)) { > + /* > + * This means GuC's CT link severed mid-way which only > happens > + * in suspend-resume corner cases. In this case, put the > + * context back into the destroyed_contexts list which > will > + * get picked up on the next context deregistration > event or > + * purged in a GuC sanitization event > (reset/unload/wedged/...). > + */ > + spin_lock_irqsave(&guc->submission_state.lock, flags); > + list_add_tail(&ce->destroyed_link, > + > &guc->submission_state.destroyed_contexts); alan: i completely missed the fact this new code is sitting within a while (!list_empty(&guc->submission_state.submission_state.destroyed_contexts) block so putting it back will cause it to while loop forever.
will fix and rerev. > + spin_unlock_irqrestore(&guc->submission_state.lock, > flags); > + } > + > } > } >