On 19/02/2018 10:26, Chris Wilson wrote:
Quoting Tvrtko Ursulin (2018-02-19 09:57:20)

On 19/02/2018 09:27, Chris Wilson wrote:
Quoting Tvrtko Ursulin (2018-02-19 09:19:47)

Do you have a link to BSW hang? Is that obviously related to PMU?

It's only occurring in this test, just looks like an issue with the
spinner:

[bsw] 
https://intel-gfx-ci.01.org/tree/drm-tip/kasan_2/fi-bsw-n3050/igt@perf_...@busy-accuracy-2-bcs0.html

...
<0>[  681.022677] perf_pmu-1516    1..s1 282520414us : 
execlists_submission_tasklet: bcs0 in[0]:  ctx=3.1, seqno=a
<0>[  681.022838] perf_pmu-1516    1..s1 282520580us : 
execlists_submission_tasklet: bcs0 cs-irq head=5 [5?], tail=0 [0?]
<0>[  681.023001] perf_pmu-1516    1..s1 282520594us : 
execlists_submission_tasklet: bcs0 csb[0]: status=0x00000001:0x00000000, active=0x1
<0>[  681.023168] kworker/-338     1.... 298087910us : reset_common_ring: bcs0 
seqno=a
<0>[  681.023321] ksoftirq-17      1..s. 298088483us : 
execlists_submission_tasklet: bcs0 in[0]:  ctx=3.1, seqno=a
<0>[  681.023482] ksoftirq-17      1..s. 298088575us : 
execlists_submission_tasklet: bcs0 cs-irq head=0 [0], tail=1 [1]
<0>[  681.023644] ksoftirq-17      1..s. 298088579us : 
execlists_submission_tasklet: bcs0 csb[1]: status=0x00000018:0x00000003, active=0x1
<0>[  681.023811] ksoftirq-17      1..s. 298088581us : 
execlists_submission_tasklet: bcs0 out[0]: ctx=3.1, seqno=a

Everything stops.

[kbl] 
https://intel-gfx-ci.01.org/tree/drm-tip/kasan_2/fi-kbl-7560u/igt@perf_...@busy-accuracy-2-bcs0.html

...
<0>[  506.745332] perf_pmu-1544    3..s1 107905835us : 
execlists_submission_tasklet: bcs0 in[0]:  ctx=3.1, seqno=a
<0>[  506.745397]   <idle>-0       2..s1 107905980us : 
execlists_submission_tasklet: bcs0 cs-irq head=2 [1?], tail=3 [3?]
<0>[  506.745440]   <idle>-0       2..s1 107905983us : 
execlists_submission_tasklet: bcs0 csb[3]: status=0x00000001:0x00000000, active=0x1
<0>[  506.745498] kworker/-30      3.... 120840583us : reset_common_ring: bcs0 
seqno=a
<0>[  506.745547] ksoftirq-29      3..s. 120840688us : 
execlists_submission_tasklet: bcs0 in[0]:  ctx=3.1, seqno=a
<0>[  506.745598] in:imklo-499     2..s1 120840710us : 
execlists_submission_tasklet: bcs0 cs-irq head=0 [0], tail=1 [1]
<0>[  506.745637] in:imklo-499     2..s1 120840712us : 
execlists_submission_tasklet: bcs0 csb[1]: status=0x00000018:0x00000003, active=0x1
<0>[  506.745676] in:imklo-499     2..s1 120840713us : 
execlists_submission_tasklet: bcs0 out[0]: ctx=3.1, seqno=a

Everything stops here.

I have not idea what's happening here. In both cases I would expect the test
to have exited after the GPU hang (or at least attempt to exit!), since it
would detect it overran the timeout.

Could it be stuck in gem_sync after the reset? Or somewhere else?

I think it's that we will be throwing the calibration off if it hangs.
If busy_ns = 10s, won't that generate a target idle time of 500s?

Indeed, well spotted. I'll need to add a hang detector of some sort.

In the meantime trying to figure out how to wire up GuC to engine stats. The fix to get correct state on stats enable by looking at ports is a problem given different tracking in GuC mode I had.

Regards,

Tvrtko


_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

Reply via email to