On Wed, 2023-08-02 at 16:35 -0700, Teres Alexis, Alan Previn wrote:
> If we are at the end of suspend or very early in resume
> its possible an async fence signal could lead us to the
> execution of the context destruction worker (after the
> prior worker flush).
> 
alan:snip
>  
>  static void __guc_context_destroy(struct intel_context *ce)
> @@ -3270,7 +3287,20 @@ static void deregister_destroyed_contexts(struct 
> intel_guc *guc)
>               if (!ce)
>                       break;
>  
> -             guc_lrc_desc_unpin(ce);
> +             if (guc_lrc_desc_unpin(ce)) {
> +                     /*
> +                      * This means GuC's CT link severed mid-way which only 
> happens
> +                      * in suspend-resume corner cases. In this case, put the
> +                      * context back into the destroyed_contexts list which 
> will
> +                      * get picked up on the next context deregistration 
> event or
> +                      * purged in a GuC sanitization event 
> (reset/unload/wedged/...).
> +                      */
> +                     spin_lock_irqsave(&guc->submission_state.lock, flags);
> +                     list_add_tail(&ce->destroyed_link,
> +                                   
> &guc->submission_state.destroyed_contexts);
alan: i completely missed the fact this new code is sitting within a 
while (!list_empty(&guc->submission_state.submission_state.destroyed_contexts) 
block
so putting it back will cause it to while loop forever.

will fix and rerev.

> +                     spin_unlock_irqrestore(&guc->submission_state.lock, 
> flags);
> +             }
> +
>       }
>  }
>  

Reply via email to