Quoting Michał Winiarski (2018-03-10 11:07:03)
> [   59.708020] [drm:error_state_write [i915]] Resetting error state
> [   59.708508] [IGT] gem_exec_capture: starting subtest capture-vebox
> [   59.718849] [drm] GPU HANG: ecode 9:0:0xfff7fffe, reason: Manually set
> wedged engine mask = ffffffffffffffff, action: reset
> [   59.719421] i915 0000:00:02.0: Resetting vecs0 after gpu hang
> [   59.720276] [drm:i915_gem_reset_engine [i915]] resetting vecs0 to restart
> from tail of request 0x1
> [   59.721008] [drm:i915_reset_device [i915]] resetting chip
> [   59.721226] i915 0000:00:02.0: Resetting chip after gpu hang
> [   59.721575] i915 0000:00:02.0: GPU recovery failed

Full device reset doesn't handle being called from a failed per-engine
reset. Whoops. It doesn't look there's any reason for it to have failed
per-engine reset either,

diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 828f3104488c..44eef355e12c 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -2985,6 +2985,7 @@ void i915_handle_error(struct drm_i915_private *dev_priv,
         */
        intel_runtime_pm_get(dev_priv);
 
+       engine_mask &= INTEL_INFO(dev_priv)->ring_mask;
        i915_capture_error_state(dev_priv, engine_mask, error_msg);
        i915_clear_error_registers(dev_priv);
 
should fix the immediate problem; but there's no reason afaict for this
to vary between test runs. As to how to properly ignore left-over state
from per-engine reset when doing the full-reset fallback... ugh.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Reply via email to