Re: [Intel-gfx] [PATCH 2/2] drm/i915/execlists: Always reset the context's RING registers

2019-04-10 Thread Chris Wilson
Quoting Mika Kuoppala (2019-04-10 15:40:13)
> Chris Wilson  writes:
> >   /* Rerun the request; its payload has been neutered (if guilty). */
> > - rq->ring->head = intel_ring_wrap(rq->ring, rq->head);
> > - intel_ring_update_space(rq->ring);
> > +out_replay:
> > + ce->ring->head =
> > + rq ? intel_ring_wrap(ce->ring, rq->head) : ce->ring->tail;
> 
> The ce and rq ring should be same with the rq set. I guess
> you had a reasons to keep it as ce, perhaps because it is
> the culprit.

Yes, by this point we know that rq->hw_context == ce, and so rq->ring ==
ce->ring. I decided that execlists_reset() was now all about the active
context, and the request just a hint as how far along that context we had
completed -- hence trying to using ce as the primary throughout.
-Chris
___
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Re: [Intel-gfx] [PATCH 2/2] drm/i915/execlists: Always reset the context's RING registers

2019-04-10 Thread Mika Kuoppala
Chris Wilson  writes:

> During reset, we try and stop the active ring. This has the consequence
> that we often clobber the RING registers within the context image. When
> we find an active request, we update the context image to rerun that
> request (if it was guilty, we replace the hanging user payload with
> NOPs). However, we were ignoring an active context if the request had
> completed, with the consequence that the next submission on that request
> would start with RING_HEAD==0 and not the tail of the previous request,
> causing all requests still in the ring to be rerun. Rare, but
> occasionally seen within CI where we would spot that the context seqno
> would reverse and complain that we were retiring an incomplete request.
>
> <0> [412.390350]   -0   3d.s2 408373352us : 
> __i915_request_submit: rcs0 fence 1e95b:3640 -> current 3638
> <0> [412.390350]   -0   3d.s2 408373353us : 
> __i915_request_submit: rcs0 fence 1e95b:3642 -> current 3638
> <0> [412.390350]   -0   3d.s2 408373354us : 
> __i915_request_submit: rcs0 fence 1e95b:3644 -> current 3638
> <0> [412.390350]   -0   3d.s2 408373354us : 
> __i915_request_submit: rcs0 fence 1e95b:3646 -> current 3638
> <0> [412.390350]   -0   3d.s2 408373356us : 
> __execlists_submission_tasklet: rcs0 in[0]:  ctx=2.1, fence 1e95b:3646 
> (current 3638), prio=4
> <0> [412.390350] i915_sel-46130 408373374us : 
> __i915_request_commit: rcs0 fence 1e95b:3648
> <0> [412.390350] i915_sel-46130d..1 408373377us : process_csb: rcs0 
> cs-irq head=2, tail=3
> <0> [412.390350] i915_sel-46130d..1 408373377us : process_csb: rcs0 
> csb[3]: status=0x0001:0x, active=0x1
> <0> [412.390350] i915_sel-46130d..1 408373378us : 
> __i915_request_submit: rcs0 fence 1e95b:3648 -> current 3638
> <0> [412.390350]   -0   3..s1 408373378us : 
> execlists_submission_tasklet: rcs0 awake?=1, active=5
> <0> [412.390350] i915_sel-46130d..1 408373379us : 
> __execlists_submission_tasklet: rcs0 in[0]:  ctx=2.2, fence 1e95b:3648 
> (current 3638), prio=4
> <0> [412.390350] i915_sel-46130 408373381us : i915_reset_engine: 
> rcs0 flags=4
> <0> [412.390350] i915_sel-46130 408373382us : 
> execlists_reset_prepare: rcs0: depth<-0
> <0> [412.390350]   -0   3d.s2 408373390us : process_csb: rcs0 
> cs-irq head=3, tail=4
> <0> [412.390350]   -0   3d.s2 408373390us : process_csb: rcs0 
> csb[4]: status=0x8002:0x0002, active=0x1
> <0> [412.390350]   -0   3d.s2 408373390us : process_csb: rcs0 
> out[0]: ctx=2.2, fence 1e95b:3648 (current 3640), prio=4
> <0> [412.390350] i915_sel-46130 408373401us : 
> intel_engine_stop_cs: rcs0
> <0> [412.390350] i915_sel-46130d..1 408373402us : process_csb: rcs0 
> cs-irq head=4, tail=4
> <0> [412.390350] i915_sel-46130 408373403us : intel_gpu_reset: 
> engine_mask=1
> <0> [412.390350] i915_sel-46130d..1 408373408us : 
> execlists_cancel_port_requests: rcs0:port0 fence 1e95b:3648, (current 3648)
> <0> [412.390350] i915_sel-46130 408373442us : 
> intel_engine_cancel_stop_cs: rcs0
> <0> [412.390350] i915_sel-46130 408373442us : 
> execlists_reset_finish: rcs0: depth->0
> <0> [412.390350] ksoftirq-26  3..s. 408373442us : 
> execlists_submission_tasklet: rcs0 awake?=1, active=0
> <0> [412.390350] ksoftirq-26  3d.s1 408373443us : process_csb: rcs0 
> cs-irq head=5, tail=5
> <0> [412.390350] i915_sel-46130 408373475us : 
> i915_request_retire: rcs0 fence 1e95b:3640, current 3648
> <0> [412.390350] i915_sel-46130 408373476us : 
> i915_request_retire: __retire_engine_request(rcs0) fence 1e95b:3640, current 
> 3648
> <0> [412.390350] i915_sel-46130 408373494us : 
> __i915_request_commit: rcs0 fence 1e95b:3650
> <0> [412.390350] i915_sel-46130d..1 408373496us : process_csb: rcs0 
> cs-irq head=5, tail=5
> <0> [412.390350] i915_sel-46130d..1 408373496us : 
> __i915_request_submit: rcs0 fence 1e95b:3650 -> current 3648
> <0> [412.390350] i915_sel-46130d..1 408373498us : 
> __execlists_submission_tasklet: rcs0 in[0]:  ctx=2.1, fence 1e95b:3650 
> (current 3648), prio=6
> <0> [412.390350] i915_sel-46130 408373500us : 
> i915_request_retire_upto: rcs0 fence 1e95b:3648, current 3648
> <0> [412.390350] i915_sel-46130 408373500us : 
> i915_request_retire: rcs0 fence 1e95b:3642, current 3648
> <0> [412.390350] i915_sel-46130 408373501us : 
> i915_request_retire: __retire_engine_request(rcs0) fence 1e95b:3642, current 
> 3648
> <0> [412.390350] i915_sel-46130 408373514us : 
> i915_request_retire: rcs0 fence 1e95b:3644, current 3648
> <0> [412.390350] i915_sel-46130 408373515us : 
> i915_request_retire: __retire_engine_request(rcs0) fence 1e95b:3644, current 
> 3648
> <0> [412.390350] i915_sel-46130 408373527us : 
> 

[Intel-gfx] [PATCH 2/2] drm/i915/execlists: Always reset the context's RING registers

2019-04-08 Thread Chris Wilson
During reset, we try and stop the active ring. This has the consequence
that we often clobber the RING registers within the context image. When
we find an active request, we update the context image to rerun that
request (if it was guilty, we replace the hanging user payload with
NOPs). However, we were ignoring an active context if the request had
completed, with the consequence that the next submission on that request
would start with RING_HEAD==0 and not the tail of the previous request,
causing all requests still in the ring to be rerun. Rare, but
occasionally seen within CI where we would spot that the context seqno
would reverse and complain that we were retiring an incomplete request.

<0> [412.390350]   -0   3d.s2 408373352us : 
__i915_request_submit: rcs0 fence 1e95b:3640 -> current 3638
<0> [412.390350]   -0   3d.s2 408373353us : 
__i915_request_submit: rcs0 fence 1e95b:3642 -> current 3638
<0> [412.390350]   -0   3d.s2 408373354us : 
__i915_request_submit: rcs0 fence 1e95b:3644 -> current 3638
<0> [412.390350]   -0   3d.s2 408373354us : 
__i915_request_submit: rcs0 fence 1e95b:3646 -> current 3638
<0> [412.390350]   -0   3d.s2 408373356us : 
__execlists_submission_tasklet: rcs0 in[0]:  ctx=2.1, fence 1e95b:3646 (current 
3638), prio=4
<0> [412.390350] i915_sel-46130 408373374us : 
__i915_request_commit: rcs0 fence 1e95b:3648
<0> [412.390350] i915_sel-46130d..1 408373377us : process_csb: rcs0 
cs-irq head=2, tail=3
<0> [412.390350] i915_sel-46130d..1 408373377us : process_csb: rcs0 
csb[3]: status=0x0001:0x, active=0x1
<0> [412.390350] i915_sel-46130d..1 408373378us : 
__i915_request_submit: rcs0 fence 1e95b:3648 -> current 3638
<0> [412.390350]   -0   3..s1 408373378us : 
execlists_submission_tasklet: rcs0 awake?=1, active=5
<0> [412.390350] i915_sel-46130d..1 408373379us : 
__execlists_submission_tasklet: rcs0 in[0]:  ctx=2.2, fence 1e95b:3648 (current 
3638), prio=4
<0> [412.390350] i915_sel-46130 408373381us : i915_reset_engine: 
rcs0 flags=4
<0> [412.390350] i915_sel-46130 408373382us : 
execlists_reset_prepare: rcs0: depth<-0
<0> [412.390350]   -0   3d.s2 408373390us : process_csb: rcs0 
cs-irq head=3, tail=4
<0> [412.390350]   -0   3d.s2 408373390us : process_csb: rcs0 
csb[4]: status=0x8002:0x0002, active=0x1
<0> [412.390350]   -0   3d.s2 408373390us : process_csb: rcs0 
out[0]: ctx=2.2, fence 1e95b:3648 (current 3640), prio=4
<0> [412.390350] i915_sel-46130 408373401us : intel_engine_stop_cs: 
rcs0
<0> [412.390350] i915_sel-46130d..1 408373402us : process_csb: rcs0 
cs-irq head=4, tail=4
<0> [412.390350] i915_sel-46130 408373403us : intel_gpu_reset: 
engine_mask=1
<0> [412.390350] i915_sel-46130d..1 408373408us : 
execlists_cancel_port_requests: rcs0:port0 fence 1e95b:3648, (current 3648)
<0> [412.390350] i915_sel-46130 408373442us : 
intel_engine_cancel_stop_cs: rcs0
<0> [412.390350] i915_sel-46130 408373442us : 
execlists_reset_finish: rcs0: depth->0
<0> [412.390350] ksoftirq-26  3..s. 408373442us : 
execlists_submission_tasklet: rcs0 awake?=1, active=0
<0> [412.390350] ksoftirq-26  3d.s1 408373443us : process_csb: rcs0 
cs-irq head=5, tail=5
<0> [412.390350] i915_sel-46130 408373475us : i915_request_retire: 
rcs0 fence 1e95b:3640, current 3648
<0> [412.390350] i915_sel-46130 408373476us : i915_request_retire: 
__retire_engine_request(rcs0) fence 1e95b:3640, current 3648
<0> [412.390350] i915_sel-46130 408373494us : 
__i915_request_commit: rcs0 fence 1e95b:3650
<0> [412.390350] i915_sel-46130d..1 408373496us : process_csb: rcs0 
cs-irq head=5, tail=5
<0> [412.390350] i915_sel-46130d..1 408373496us : 
__i915_request_submit: rcs0 fence 1e95b:3650 -> current 3648
<0> [412.390350] i915_sel-46130d..1 408373498us : 
__execlists_submission_tasklet: rcs0 in[0]:  ctx=2.1, fence 1e95b:3650 (current 
3648), prio=6
<0> [412.390350] i915_sel-46130 408373500us : 
i915_request_retire_upto: rcs0 fence 1e95b:3648, current 3648
<0> [412.390350] i915_sel-46130 408373500us : i915_request_retire: 
rcs0 fence 1e95b:3642, current 3648
<0> [412.390350] i915_sel-46130 408373501us : i915_request_retire: 
__retire_engine_request(rcs0) fence 1e95b:3642, current 3648
<0> [412.390350] i915_sel-46130 408373514us : i915_request_retire: 
rcs0 fence 1e95b:3644, current 3648
<0> [412.390350] i915_sel-46130 408373515us : i915_request_retire: 
__retire_engine_request(rcs0) fence 1e95b:3644, current 3648
<0> [412.390350] i915_sel-46130 408373527us : i915_request_retire: 
rcs0 fence 1e95b:3646, current 3640
<0> [412.390350]   -0   3..s1 408373569us : 
execlists_submission_tasklet: rcs0 awake?=1, active=1
<0> [412.390350]   -0   3d.s2