On 04/11/2022 14:58, Umesh Nerlige Ramappa wrote:
On Fri, Nov 04, 2022 at 08:29:38AM +0000, Tvrtko Ursulin wrote:
On 03/11/2022 18:08, Umesh Nerlige Ramappa wrote:
On Thu, Nov 03, 2022 at 12:28:46PM +0000, Tvrtko Ursulin wrote:
On 03/11/2022 00:11, Umesh Nerlige Ramappa wrote:
Engine busyness samples around a 10ms period is failing with busyness
ranging approx. from 87% to 115%. The expected range is +/- 5% of the
sample period.
When determining busyness of active engine, the GuC based engine
busyness implementation relies on a 64 bit timestamp register read.
The
latency incurred by this register read causes the failure.
On DG1, when the test fails, the observed latencies range from 900us -
1.5ms.
Do I read this right - that the latency of a 64 bit timestamp
register read is 0.9 - 1.5ms? That would be the read in
guc_update_pm_timestamp?
Correct. That is total time taken by intel_uncore_read64_2x32()
measured with local_clock().
One other thing I missed out in the comments is that enable_dc=0 also
resolves the issue, but display team confirmed there is no relation
to display in this case other than that it somehow introduces a
latency in the reg read.
Could it be the DMC wreaking havoc something similar to b68763741aa2
("drm/i915: Restore GT performance in headless mode with DMC loaded")?
__gt_unpark is already doing a
gt->awake = intel_display_power_get(i915, POWER_DOMAIN_GT_IRQ);
I would assume that __gt_unpark was called prior to running the
selftest, need to confirm that though.
Right, I meant maybe something similar but not necessarily the same.
Similar in the sense that it may be DMC doing many MMIO invisible to
i915 and so introducing latency.
One solution tried was to reduce the latency between reg read and
CPU timestamp capture, but such optimization does not add value to
user
since the CPU timestamp obtained here is only used for (1) selftest
and
(2) i915 rps implementation specific to execlist scheduler. Also, this
solution only reduces the frequency of failure and does not eliminate
it.
Note that this solution is here -
https://patchwork.freedesktop.org/patch/509991/?series=110497&rev=1
but I am not intending to use it since it just reduces the frequency
of failues, but the inherent issue still exists.
Right, I'd just go with that as well if it makes a significant
improvement. Or even just refactor intel_uncore_read64_2x32 to be
under one spinlock/fw. I don't see that it can have an excuse to be
less efficient since there's a loop in there.
The patch did reduce the failure to once in 200 runs vs once in 10 runs.
I will refactor the helper in that case.
Yeah it makes sense to make it efficient. But feel free to go with the
msleep increase as well to workaround the issue fully.
Regards,
Tvrtko