Chris Wilson <ch...@chris-wilson.co.uk> writes:

> Before a reset, we set the STOP_RING bit of RING_MI_MODE to freeze the
> engine. However, sometimes we observe that upon restart, the engine
> freezes again with the STOP_RING bit still asserted. By inspection, we
> know that the register itself is cleared by the GPU reset, so that bit
> must be preserved inside the context image and reloaded from there. A
> simple fix (as the RING_MI_MODE is at a fixed offset in a valid context)
> is to clobber the STOP_RING bit inside the image before replaying the
> request.
>
> Signed-off-by: Chris Wilson <ch...@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursu...@intel.com>
> Cc: Michel Thierry <michel.thie...@intel.com>
> Cc: Michał Winiarski <michal.winiar...@intel.com>
> ---
>  drivers/gpu/drm/i915/intel_lrc.c     | 17 +++++++++++++++--
>  drivers/gpu/drm/i915/intel_lrc_reg.h |  2 ++
>  2 files changed, 17 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c 
> b/drivers/gpu/drm/i915/intel_lrc.c
> index 3f90c74038ef..37fe842de639 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -1918,6 +1918,20 @@ static void execlists_reset(struct intel_engine_cs 
> *engine,
>  
>       spin_unlock_irqrestore(&engine->timeline.lock, flags);
>  
> +     if (!request)
> +             return;
> +
> +     regs = request->hw_context->lrc_reg_state;
> +
> +     /*
> +      * After a reset, the context may have preserved the STOP bit
> +      * of RING_MI_MODE we used to freeze the active engine before
> +      * the reset. If that bit is restored the ring stops instead
> +      * of being executed.
> +      */
> +     regs[CTX_MI_MODE + 1] |= STOP_RING << 16;
> +     regs[CTX_MI_MODE + 1] &= ~STOP_RING;

Ok, I did go and pondered if this is truely the simplest
way. Forcing the start outside a context would not
be simpler and would be slower.

Nice find and it will has potential to explain our some troubles
of re-enabling engines.

Did you find out this by looking at the error states or what?

Reviewed-by: Mika Kuoppala <mika.kuopp...@linux.intel.com>

> +
>       /*
>        * If the request was innocent, we leave the request in the ELSP
>        * and will try to replay it on restarting. The context image may
> @@ -1929,7 +1943,7 @@ static void execlists_reset(struct intel_engine_cs 
> *engine,
>        * and have to at least restore the RING register in the context
>        * image back to the expected values to skip over the guilty request.
>        */
> -     if (!request || request->fence.error != -EIO)
> +     if (request->fence.error != -EIO)
>               return;
>  
>       /*
> @@ -1940,7 +1954,6 @@ static void execlists_reset(struct intel_engine_cs 
> *engine,
>        * future request will be after userspace has had the opportunity
>        * to recreate its own state.
>        */
> -     regs = request->hw_context->lrc_reg_state;
>       if (engine->pinned_default_state) {
>               memcpy(regs, /* skip restoring the vanilla PPHWSP */
>                      engine->pinned_default_state + LRC_STATE_PN * PAGE_SIZE,
> diff --git a/drivers/gpu/drm/i915/intel_lrc_reg.h 
> b/drivers/gpu/drm/i915/intel_lrc_reg.h
> index 5ef932d810a7..3b155ecbfa92 100644
> --- a/drivers/gpu/drm/i915/intel_lrc_reg.h
> +++ b/drivers/gpu/drm/i915/intel_lrc_reg.h
> @@ -39,6 +39,8 @@
>  #define CTX_R_PWR_CLK_STATE          0x42
>  #define CTX_END                              0x44
>  
> +#define CTX_MI_MODE                  0x54
> +
>  #define CTX_REG(reg_state, pos, reg, val) do { \
>       u32 *reg_state__ = (reg_state); \
>       const u32 pos__ = (pos); \
> -- 
> 2.18.0
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Reply via email to