On Thu, May 18, 2017 at 05:28:41PM +0300, Mika Kuoppala wrote:
> ELK seems to very picky about the preconditions to reset.
> Evidence on Eaglelake (8086:2e12 (rev 03)) shows that it does
> not like if reset occurs when there is active ring.
> 
> Ville found out that there is workaround with name
> 'WaMediaResetMainRingCleanup' which suggests that we need to
> cleanup rings before resetting. It is unclear what cleanup
> exactly means but evidence shows that stopping the ring
> does have an effect on reset reliability. This patch makes
> reset succesful on hangs induced by chained batches (the igt ones).
> Note that if the hang is inside a shader, it is possible
> that our attempts to stop the ring achieves anything.
> 
> v2: zero ctl,head,tail also. bug ref. use driver debugs (Chris)
> 
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100942
> Testcase: igt/gem_busy/*-hang
> Testcase: igt/gem_ringfill/hang-*

Maybe add # elk to these to indicate the problem isn't quite that
widespread!

> Suggested-by: Ville Syrjälä <ville.syrj...@linux.intel.com>
> Cc: Ville Syrjälä <ville.syrj...@linux.intel.com>
> Cc: Chris Wilson <ch...@chris-wilson.co.uk>
> Cc: Tomi Sarvela <tomi.p.sarv...@intel.com>
> Signed-off-by: Mika Kuoppala <mika.kuopp...@intel.com>
> ---
>  drivers/gpu/drm/i915/intel_uncore.c | 35 +++++++++++++++++++++++++++++++++++
>  1 file changed, 35 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/intel_uncore.c 
> b/drivers/gpu/drm/i915/intel_uncore.c
> index 7eaaf2225e1a..43da84be0321 100644
> --- a/drivers/gpu/drm/i915/intel_uncore.c
> +++ b/drivers/gpu/drm/i915/intel_uncore.c
> @@ -1427,6 +1427,35 @@ int i915_reg_read_ioctl(struct drm_device *dev,
>       return ret;
>  }
>  
> +static void gen3_stop_rings(struct drm_i915_private *dev_priv)
> +{
> +     struct intel_engine_cs *engine;
> +     enum intel_engine_id id;
> +
> +     for_each_engine(engine, dev_priv, id) {
> +             const u32 base = engine->mmio_base;
> +             const i915_reg_t mode = RING_MI_MODE(base);
> +
> +             I915_WRITE_FW(mode, _MASKED_BIT_ENABLE(STOP_RING));
> +             if (intel_wait_for_register_fw(dev_priv,
> +                                            mode,
> +                                            MODE_IDLE,
> +                                            MODE_IDLE,
> +                                            500))
> +                     DRM_DEBUG_DRIVER("%s: timed out on STOP_RING\n",
> +                                      engine->name);
> +
> +             I915_WRITE_FW(RING_CTL(base), 0);
> +             I915_WRITE_FW(RING_HEAD(base), 0);
> +             I915_WRITE_FW(RING_TAIL(base), 0);
> +
> +             /* Check acts as a post */
> +             if (I915_READ_FW(RING_HEAD(base)) != 0)
> +                     DRM_DEBUG_DRIVER("%s: ring head not parked\n",
> +                                      engine->name);
> +     }
> +}
> +
>  static bool i915_reset_complete(struct pci_dev *pdev)
>  {
>       u8 gdrst;
> @@ -1472,6 +1501,12 @@ static int g4x_do_reset(struct drm_i915_private 
> *dev_priv, unsigned engine_mask)
>       I915_WRITE(VDECCLK_GATE_D, I915_READ(VDECCLK_GATE_D) | 
> VCP_UNIT_CLOCK_GATE_DISABLE);
>       POSTING_READ(VDECCLK_GATE_D);
>  
> +     /* We stop engines, otherwise we might get failed reset and a
> +      * dead gpu (on elk).
> +      */
> +     /* WaMediaResetMainRingCleanup:ctg,elk (supposedly) */

Join this into a single comment block, s/supposedly/presumably/

Just a small concern we have some duplication of stop_ring() here, but I
don't have a better suggestion (along the lines of export intel_stop_ring,
gen3_engine_stop_ring, so far looks more confusing than helpful). As you
have tested with DRM_ERROR to be sure that fear about this simply
timing out for our spinning batches, it looks good to me.

Reviewed-by: Chris Wilson <ch...@chris-wilson.co.uk>
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Reply via email to