Re: [Intel-gfx] [igt-dev] [PATCH i-g-t v6] tests/perf_pmu: Verify engine busyness accuracy
Quoting Tvrtko Ursulin (2018-02-19 10:58:25) > > On 19/02/2018 10:26, Chris Wilson wrote: > > Quoting Tvrtko Ursulin (2018-02-19 09:57:20) > >> > >> On 19/02/2018 09:27, Chris Wilson wrote: > >>> Quoting Tvrtko Ursulin (2018-02-19 09:19:47) > > Do you have a link to BSW hang? Is that obviously related to PMU? > >>> > >>> It's only occurring in this test, just looks like an issue with the > >>> spinner: > >>> > >>> [bsw] > >>> https://intel-gfx-ci.01.org/tree/drm-tip/kasan_2/fi-bsw-n3050/igt@perf_...@busy-accuracy-2-bcs0.html > >> > >> ... > >> <0>[ 681.022677] perf_pmu-15161..s1 282520414us : > >> execlists_submission_tasklet: bcs0 in[0]: ctx=3.1, seqno=a > >> <0>[ 681.022838] perf_pmu-15161..s1 282520580us : > >> execlists_submission_tasklet: bcs0 cs-irq head=5 [5?], tail=0 [0?] > >> <0>[ 681.023001] perf_pmu-15161..s1 282520594us : > >> execlists_submission_tasklet: bcs0 csb[0]: status=0x0001:0x, > >> active=0x1 > >> <0>[ 681.023168] kworker/-338 1 298087910us : reset_common_ring: > >> bcs0 seqno=a > >> <0>[ 681.023321] ksoftirq-17 1..s. 298088483us : > >> execlists_submission_tasklet: bcs0 in[0]: ctx=3.1, seqno=a > >> <0>[ 681.023482] ksoftirq-17 1..s. 298088575us : > >> execlists_submission_tasklet: bcs0 cs-irq head=0 [0], tail=1 [1] > >> <0>[ 681.023644] ksoftirq-17 1..s. 298088579us : > >> execlists_submission_tasklet: bcs0 csb[1]: status=0x0018:0x0003, > >> active=0x1 > >> <0>[ 681.023811] ksoftirq-17 1..s. 298088581us : > >> execlists_submission_tasklet: bcs0 out[0]: ctx=3.1, seqno=a > >> > >> Everything stops. > >> > >>> [kbl] > >>> https://intel-gfx-ci.01.org/tree/drm-tip/kasan_2/fi-kbl-7560u/igt@perf_...@busy-accuracy-2-bcs0.html > >> > >> ... > >> <0>[ 506.745332] perf_pmu-15443..s1 107905835us : > >> execlists_submission_tasklet: bcs0 in[0]: ctx=3.1, seqno=a > >> <0>[ 506.745397] -0 2..s1 107905980us : > >> execlists_submission_tasklet: bcs0 cs-irq head=2 [1?], tail=3 [3?] > >> <0>[ 506.745440] -0 2..s1 107905983us : > >> execlists_submission_tasklet: bcs0 csb[3]: status=0x0001:0x, > >> active=0x1 > >> <0>[ 506.745498] kworker/-30 3 120840583us : reset_common_ring: > >> bcs0 seqno=a > >> <0>[ 506.745547] ksoftirq-29 3..s. 120840688us : > >> execlists_submission_tasklet: bcs0 in[0]: ctx=3.1, seqno=a > >> <0>[ 506.745598] in:imklo-499 2..s1 120840710us : > >> execlists_submission_tasklet: bcs0 cs-irq head=0 [0], tail=1 [1] > >> <0>[ 506.745637] in:imklo-499 2..s1 120840712us : > >> execlists_submission_tasklet: bcs0 csb[1]: status=0x0018:0x0003, > >> active=0x1 > >> <0>[ 506.745676] in:imklo-499 2..s1 120840713us : > >> execlists_submission_tasklet: bcs0 out[0]: ctx=3.1, seqno=a > >> > >> Everything stops here. > >> > >> I have not idea what's happening here. In both cases I would expect the > >> test > >> to have exited after the GPU hang (or at least attempt to exit!), since it > >> would detect it overran the timeout. > >> > >> Could it be stuck in gem_sync after the reset? Or somewhere else? > > > > I think it's that we will be throwing the calibration off if it hangs. > > If busy_ns = 10s, won't that generate a target idle time of 500s? > > Indeed, well spotted. I'll need to add a hang detector of some sort. Oh, I think I know why it's hanging. As the buffer will be idle, the kernel is allowed to move it, and __submit_spin_batch() doesn't tell the kernel to preserve the original address (so the kernel assumes that the relocations are relative to the passed in address and so move the buffer to match). I should have noticed that before given the discussion around EXEC_OBJECT_PINNED for the spinner. I think there's an easy enough patch... -Chris ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [igt-dev] [PATCH i-g-t v6] tests/perf_pmu: Verify engine busyness accuracy
On 19/02/2018 10:26, Chris Wilson wrote: Quoting Tvrtko Ursulin (2018-02-19 09:57:20) On 19/02/2018 09:27, Chris Wilson wrote: Quoting Tvrtko Ursulin (2018-02-19 09:19:47) Do you have a link to BSW hang? Is that obviously related to PMU? It's only occurring in this test, just looks like an issue with the spinner: [bsw] https://intel-gfx-ci.01.org/tree/drm-tip/kasan_2/fi-bsw-n3050/igt@perf_...@busy-accuracy-2-bcs0.html ... <0>[ 681.022677] perf_pmu-15161..s1 282520414us : execlists_submission_tasklet: bcs0 in[0]: ctx=3.1, seqno=a <0>[ 681.022838] perf_pmu-15161..s1 282520580us : execlists_submission_tasklet: bcs0 cs-irq head=5 [5?], tail=0 [0?] <0>[ 681.023001] perf_pmu-15161..s1 282520594us : execlists_submission_tasklet: bcs0 csb[0]: status=0x0001:0x, active=0x1 <0>[ 681.023168] kworker/-338 1 298087910us : reset_common_ring: bcs0 seqno=a <0>[ 681.023321] ksoftirq-17 1..s. 298088483us : execlists_submission_tasklet: bcs0 in[0]: ctx=3.1, seqno=a <0>[ 681.023482] ksoftirq-17 1..s. 298088575us : execlists_submission_tasklet: bcs0 cs-irq head=0 [0], tail=1 [1] <0>[ 681.023644] ksoftirq-17 1..s. 298088579us : execlists_submission_tasklet: bcs0 csb[1]: status=0x0018:0x0003, active=0x1 <0>[ 681.023811] ksoftirq-17 1..s. 298088581us : execlists_submission_tasklet: bcs0 out[0]: ctx=3.1, seqno=a Everything stops. [kbl] https://intel-gfx-ci.01.org/tree/drm-tip/kasan_2/fi-kbl-7560u/igt@perf_...@busy-accuracy-2-bcs0.html ... <0>[ 506.745332] perf_pmu-15443..s1 107905835us : execlists_submission_tasklet: bcs0 in[0]: ctx=3.1, seqno=a <0>[ 506.745397] -0 2..s1 107905980us : execlists_submission_tasklet: bcs0 cs-irq head=2 [1?], tail=3 [3?] <0>[ 506.745440] -0 2..s1 107905983us : execlists_submission_tasklet: bcs0 csb[3]: status=0x0001:0x, active=0x1 <0>[ 506.745498] kworker/-30 3 120840583us : reset_common_ring: bcs0 seqno=a <0>[ 506.745547] ksoftirq-29 3..s. 120840688us : execlists_submission_tasklet: bcs0 in[0]: ctx=3.1, seqno=a <0>[ 506.745598] in:imklo-499 2..s1 120840710us : execlists_submission_tasklet: bcs0 cs-irq head=0 [0], tail=1 [1] <0>[ 506.745637] in:imklo-499 2..s1 120840712us : execlists_submission_tasklet: bcs0 csb[1]: status=0x0018:0x0003, active=0x1 <0>[ 506.745676] in:imklo-499 2..s1 120840713us : execlists_submission_tasklet: bcs0 out[0]: ctx=3.1, seqno=a Everything stops here. I have not idea what's happening here. In both cases I would expect the test to have exited after the GPU hang (or at least attempt to exit!), since it would detect it overran the timeout. Could it be stuck in gem_sync after the reset? Or somewhere else? I think it's that we will be throwing the calibration off if it hangs. If busy_ns = 10s, won't that generate a target idle time of 500s? Indeed, well spotted. I'll need to add a hang detector of some sort. In the meantime trying to figure out how to wire up GuC to engine stats. The fix to get correct state on stats enable by looking at ports is a problem given different tracking in GuC mode I had. Regards, Tvrtko ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [igt-dev] [PATCH i-g-t v6] tests/perf_pmu: Verify engine busyness accuracy
Quoting Tvrtko Ursulin (2018-02-19 09:57:20) > > On 19/02/2018 09:27, Chris Wilson wrote: > > Quoting Tvrtko Ursulin (2018-02-19 09:19:47) > >> > >> Do you have a link to BSW hang? Is that obviously related to PMU? > > > > It's only occurring in this test, just looks like an issue with the > > spinner: > > > > [bsw] > > https://intel-gfx-ci.01.org/tree/drm-tip/kasan_2/fi-bsw-n3050/igt@perf_...@busy-accuracy-2-bcs0.html > > ... > <0>[ 681.022677] perf_pmu-15161..s1 282520414us : > execlists_submission_tasklet: bcs0 in[0]: ctx=3.1, seqno=a > <0>[ 681.022838] perf_pmu-15161..s1 282520580us : > execlists_submission_tasklet: bcs0 cs-irq head=5 [5?], tail=0 [0?] > <0>[ 681.023001] perf_pmu-15161..s1 282520594us : > execlists_submission_tasklet: bcs0 csb[0]: status=0x0001:0x, > active=0x1 > <0>[ 681.023168] kworker/-338 1 298087910us : reset_common_ring: > bcs0 seqno=a > <0>[ 681.023321] ksoftirq-17 1..s. 298088483us : > execlists_submission_tasklet: bcs0 in[0]: ctx=3.1, seqno=a > <0>[ 681.023482] ksoftirq-17 1..s. 298088575us : > execlists_submission_tasklet: bcs0 cs-irq head=0 [0], tail=1 [1] > <0>[ 681.023644] ksoftirq-17 1..s. 298088579us : > execlists_submission_tasklet: bcs0 csb[1]: status=0x0018:0x0003, > active=0x1 > <0>[ 681.023811] ksoftirq-17 1..s. 298088581us : > execlists_submission_tasklet: bcs0 out[0]: ctx=3.1, seqno=a > > Everything stops. > > > [kbl] > > https://intel-gfx-ci.01.org/tree/drm-tip/kasan_2/fi-kbl-7560u/igt@perf_...@busy-accuracy-2-bcs0.html > > ... > <0>[ 506.745332] perf_pmu-15443..s1 107905835us : > execlists_submission_tasklet: bcs0 in[0]: ctx=3.1, seqno=a > <0>[ 506.745397] -0 2..s1 107905980us : > execlists_submission_tasklet: bcs0 cs-irq head=2 [1?], tail=3 [3?] > <0>[ 506.745440] -0 2..s1 107905983us : > execlists_submission_tasklet: bcs0 csb[3]: status=0x0001:0x, > active=0x1 > <0>[ 506.745498] kworker/-30 3 120840583us : reset_common_ring: > bcs0 seqno=a > <0>[ 506.745547] ksoftirq-29 3..s. 120840688us : > execlists_submission_tasklet: bcs0 in[0]: ctx=3.1, seqno=a > <0>[ 506.745598] in:imklo-499 2..s1 120840710us : > execlists_submission_tasklet: bcs0 cs-irq head=0 [0], tail=1 [1] > <0>[ 506.745637] in:imklo-499 2..s1 120840712us : > execlists_submission_tasklet: bcs0 csb[1]: status=0x0018:0x0003, > active=0x1 > <0>[ 506.745676] in:imklo-499 2..s1 120840713us : > execlists_submission_tasklet: bcs0 out[0]: ctx=3.1, seqno=a > > Everything stops here. > > I have not idea what's happening here. In both cases I would expect the test > to have exited after the GPU hang (or at least attempt to exit!), since it > would detect it overran the timeout. > > Could it be stuck in gem_sync after the reset? Or somewhere else? I think it's that we will be throwing the calibration off if it hangs. If busy_ns = 10s, won't that generate a target idle time of 500s? -Chris ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [igt-dev] [PATCH i-g-t v6] tests/perf_pmu: Verify engine busyness accuracy
On 19/02/2018 09:27, Chris Wilson wrote: > Quoting Tvrtko Ursulin (2018-02-19 09:19:47) >> >> Do you have a link to BSW hang? Is that obviously related to PMU? > > It's only occurring in this test, just looks like an issue with the > spinner: > > [bsw] > https://intel-gfx-ci.01.org/tree/drm-tip/kasan_2/fi-bsw-n3050/igt@perf_...@busy-accuracy-2-bcs0.html ... <0>[ 681.022677] perf_pmu-15161..s1 282520414us : execlists_submission_tasklet: bcs0 in[0]: ctx=3.1, seqno=a <0>[ 681.022838] perf_pmu-15161..s1 282520580us : execlists_submission_tasklet: bcs0 cs-irq head=5 [5?], tail=0 [0?] <0>[ 681.023001] perf_pmu-15161..s1 282520594us : execlists_submission_tasklet: bcs0 csb[0]: status=0x0001:0x, active=0x1 <0>[ 681.023168] kworker/-338 1 298087910us : reset_common_ring: bcs0 seqno=a <0>[ 681.023321] ksoftirq-17 1..s. 298088483us : execlists_submission_tasklet: bcs0 in[0]: ctx=3.1, seqno=a <0>[ 681.023482] ksoftirq-17 1..s. 298088575us : execlists_submission_tasklet: bcs0 cs-irq head=0 [0], tail=1 [1] <0>[ 681.023644] ksoftirq-17 1..s. 298088579us : execlists_submission_tasklet: bcs0 csb[1]: status=0x0018:0x0003, active=0x1 <0>[ 681.023811] ksoftirq-17 1..s. 298088581us : execlists_submission_tasklet: bcs0 out[0]: ctx=3.1, seqno=a Everything stops. > [kbl] > https://intel-gfx-ci.01.org/tree/drm-tip/kasan_2/fi-kbl-7560u/igt@perf_...@busy-accuracy-2-bcs0.html ... <0>[ 506.745332] perf_pmu-15443..s1 107905835us : execlists_submission_tasklet: bcs0 in[0]: ctx=3.1, seqno=a <0>[ 506.745397] -0 2..s1 107905980us : execlists_submission_tasklet: bcs0 cs-irq head=2 [1?], tail=3 [3?] <0>[ 506.745440] -0 2..s1 107905983us : execlists_submission_tasklet: bcs0 csb[3]: status=0x0001:0x, active=0x1 <0>[ 506.745498] kworker/-30 3 120840583us : reset_common_ring: bcs0 seqno=a <0>[ 506.745547] ksoftirq-29 3..s. 120840688us : execlists_submission_tasklet: bcs0 in[0]: ctx=3.1, seqno=a <0>[ 506.745598] in:imklo-499 2..s1 120840710us : execlists_submission_tasklet: bcs0 cs-irq head=0 [0], tail=1 [1] <0>[ 506.745637] in:imklo-499 2..s1 120840712us : execlists_submission_tasklet: bcs0 csb[1]: status=0x0018:0x0003, active=0x1 <0>[ 506.745676] in:imklo-499 2..s1 120840713us : execlists_submission_tasklet: bcs0 out[0]: ctx=3.1, seqno=a Everything stops here. I have not idea what's happening here. In both cases I would expect the test to have exited after the GPU hang (or at least attempt to exit!), since it would detect it overran the timeout. Could it be stuck in gem_sync after the reset? Or somewhere else? Could we add "echo t > /proc/sysrq-trigger" equivalent when owatch triggers? Or it would overflow some buffer? Should work in cases like this one, when it is not a machine hang. Regards, Tvrtko ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [igt-dev] [PATCH i-g-t v6] tests/perf_pmu: Verify engine busyness accuracy
Quoting Tvrtko Ursulin (2018-02-19 09:19:47) > > Do you have a link to BSW hang? Is that obviously related to PMU? It's only occurring in this test, just looks like an issue with the spinner: [bsw] https://intel-gfx-ci.01.org/tree/drm-tip/kasan_2/fi-bsw-n3050/igt@perf_...@busy-accuracy-2-bcs0.html [kbl] https://intel-gfx-ci.01.org/tree/drm-tip/kasan_2/fi-kbl-7560u/igt@perf_...@busy-accuracy-2-bcs0.html -Chris ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [igt-dev] [PATCH i-g-t v6] tests/perf_pmu: Verify engine busyness accuracy
On 17/02/2018 11:36, Chris Wilson wrote: Quoting Tvrtko Ursulin (2018-02-15 15:34:53) From: Tvrtko Ursulin A subtest to verify that the engine busyness is reported with expected accuracy on platforms where the feature is available. We test three patterns: 2%, 50% and 98% load per engine. v2: * Use spin batch instead of nop calibration. * Various tweaks. v3: * Change loops to be time based. * Use __igt_spin_batch_new inside timing sensitive loops. * Fixed PWM sleep handling. v4: * Use restarting spin batch. * Calibrate more carefully by looking at the real PWM loop. v5: * Made standalone. * Better info messages. * Tweak sleep compensation. v6: * Some final tweaks. (Chris Wilson) Signed-off-by: Tvrtko Ursulin Reviewed-by: Chris Wilson --- + + /* Sampling platforms cannot reach the high accuracy criteria. */ + igt_require(gem_has_execlists(gem_fd)); But we don't handle guc, right? Correct. igt_skip_on(gem_has_guc_submission(gem_fd)) ? I'll dig up and rebase my old patch which implements busy stats in GuC mode. https://intel-gfx-ci.01.org/tree/drm-tip/kasan_2/fi-skl-guc/igt@perf_...@busy-accuracy-2-vecs0.html Or at least it doesn't work to sufficient accuracy. And bsw hung. There are some occasional excursions over 15% tolerance even with execlists on small core. Bummer. Don't want to be playing up the tolerance game. I'll analyse in more detail and think what to do. Do you have a link to BSW hang? Is that obviously related to PMU? Regards, Tvrtko ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Re: [Intel-gfx] [igt-dev] [PATCH i-g-t v6] tests/perf_pmu: Verify engine busyness accuracy
Quoting Tvrtko Ursulin (2018-02-15 15:34:53) > From: Tvrtko Ursulin > > A subtest to verify that the engine busyness is reported with expected > accuracy on platforms where the feature is available. > > We test three patterns: 2%, 50% and 98% load per engine. > > v2: > * Use spin batch instead of nop calibration. > * Various tweaks. > > v3: > * Change loops to be time based. > * Use __igt_spin_batch_new inside timing sensitive loops. > * Fixed PWM sleep handling. > > v4: > * Use restarting spin batch. > * Calibrate more carefully by looking at the real PWM loop. > > v5: > * Made standalone. > * Better info messages. > * Tweak sleep compensation. > > v6: > * Some final tweaks. (Chris Wilson) > > Signed-off-by: Tvrtko Ursulin > Reviewed-by: Chris Wilson > --- > + > + /* Sampling platforms cannot reach the high accuracy criteria. */ > + igt_require(gem_has_execlists(gem_fd)); But we don't handle guc, right? igt_skip_on(gem_has_guc_submission(gem_fd)) ? https://intel-gfx-ci.01.org/tree/drm-tip/kasan_2/fi-skl-guc/igt@perf_...@busy-accuracy-2-vecs0.html Or at least it doesn't work to sufficient accuracy. And bsw hung. -Chris ___ Intel-gfx mailing list Intel-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/intel-gfx