Quoting Chris Wilson (2019-07-17 14:40:26)
> Quoting Tvrtko Ursulin (2019-07-17 14:31:00)
> > 
> > On 16/07/2019 13:49, Chris Wilson wrote:
> > > By stopping the rings, we may trigger an arbitration point resulting in
> > > a premature context-switch (i.e. a completion event before the request
> > > is actually complete). This clears the active context before the reset,
> > > but we must remember to rewind the incomplete context for replay upon
> > > resume.
> > > 
> > > Signed-off-by: Chris Wilson <ch...@chris-wilson.co.uk>
> > > ---
> > >   drivers/gpu/drm/i915/gt/intel_lrc.c | 6 ++++--
> > >   1 file changed, 4 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c 
> > > b/drivers/gpu/drm/i915/gt/intel_lrc.c
> > > index 9b87a2fc186c..7570a9256001 100644
> > > --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
> > > +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
> > > @@ -1419,7 +1419,8 @@ static void process_csb(struct intel_engine_cs 
> > > *engine)
> > >                        * coherent (visible from the CPU) before the
> > >                        * user interrupt and CSB is processed.
> > >                        */
> > > -                     
> > > GEM_BUG_ON(!i915_request_completed(*execlists->active));
> > > +                     
> > > GEM_BUG_ON(!i915_request_completed(*execlists->active) &&
> > > +                                !reset_in_progress(execlists));
> > >                       execlists_schedule_out(*execlists->active++);
> > >   
> > >                       GEM_BUG_ON(execlists->active - execlists->inflight >
> > > @@ -2254,7 +2255,7 @@ static void __execlists_reset(struct 
> > > intel_engine_cs *engine, bool stalled)
> > >        */
> > >       rq = execlists_active(execlists);
> > >       if (!rq)
> > > -             return;
> > > +             goto unwind;
> > >   
> > >       ce = rq->hw_context;
> > >       GEM_BUG_ON(i915_active_is_idle(&ce->active));
> > > @@ -2331,6 +2332,7 @@ static void __execlists_reset(struct 
> > > intel_engine_cs *engine, bool stalled)
> > >       intel_ring_update_space(ce->ring);
> > >       __execlists_update_reg_state(ce, engine);
> > >   
> > > +unwind:
> > >       /* Push back any incomplete requests for replay after the reset. */
> > >       __unwind_incomplete_requests(engine);
> > >   }
> > > 
> > 
> > Sounds plausible.
> > 
> > Reviewed-by: Tvrtko Ursulin <tvrtko.ursu...@intel.com>
> > 
> > Shouldn't there be a Fixes: tag to go with it?
> 
> Yeah, it's rare even by our standards, I think there's a live_hangcheck
> failure about once a month that could be the result of this. However,
> the result would be an unrecoverable GPU hang as each attempt at
> resetting would not see the missing request and so it would remain
> perpetually in the engine->active.list until a set-wedged (i.e. suspend
> in the user case).

Heh, the commit responsible was one that was itself trying to workaround
the effect of stop_engines() setting RING_HEAD=0 :)

commit 1863e3020ab50bd5f68d85719ba26356cc282643
Author: Chris Wilson <ch...@chris-wilson.co.uk>
Date:   Thu Apr 11 14:05:15 2019 +0100

    drm/i915/execlists: Always reset the context's RING registers

    During reset, we try and stop the active ring. This has the consequence
    that we often clobber the RING registers within the context image. When
    we find an active request, we update the context image to rerun that
    request (if it was guilty, we replace the hanging user payload with
    NOPs). However, we were ignoring an active context if the request had
    completed, with the consequence that the next submission on that request
    would start with RING_HEAD==0 and not the tail of the previous request,
    causing all requests still in the ring to be rerun. Rare, but
    occasionally seen within CI where we would spot that the context seqno
    would reverse and complain that we were retiring an incomplete request.

-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Reply via email to