On Thu, Dec 11, 2025 at 11:07:05AM +0530, Anirban, Sk wrote: > Hi, > > On 10-12-2025 02:21 pm, Krzysztof Karas wrote: > > Hi, > > > > On 2025-12-09 at 13:36:15 +0530, Anirban, Sk wrote: > > > Hi, > > > > > > On 09-12-2025 12:46 pm, Krzysztof Karas wrote: > > > > Hi Sk Anirban, > > > > > > > > On 2025-12-09 at 11:26:17 +0530, Sk Anirban wrote: > > > > > Report GPU throttle reasons when RPS tests fail to reach expected > > > > > frequencies or power levels. > > > > > > > > > > Signed-off-by: Sk Anirban <[email protected]> > > > > > --- > > > > > drivers/gpu/drm/i915/gt/selftest_rps.c | 17 +++++++++++++++++ > > > > > 1 file changed, 17 insertions(+) > > > > > > > > > > diff --git a/drivers/gpu/drm/i915/gt/selftest_rps.c > > > > > b/drivers/gpu/drm/i915/gt/selftest_rps.c > > > > > index 73bc91c6ea07..01c671e00e61 100644 > > > > > --- a/drivers/gpu/drm/i915/gt/selftest_rps.c > > > > > +++ b/drivers/gpu/drm/i915/gt/selftest_rps.c > > > > > @@ -1138,6 +1138,7 @@ int live_rps_power(void *arg) > > > > > struct intel_engine_cs *engine; > > > > > enum intel_engine_id id; > > > > > struct igt_spinner spin; > > > > > + u32 throttle; > > > > > int err = 0; > > > > > /* > > > > > @@ -1216,6 +1217,13 @@ int live_rps_power(void *arg) > > > > > if (11 * min.power > 10 * max.power) { > > > > > pr_err("%s: did not conserve power when setting > > > > > lower frequency!\n", > > > > > engine->name); > > > > > + > > > > > + throttle = intel_uncore_read(gt->uncore, > > > > > + > > > > > intel_gt_perf_limit_reasons_reg(gt)); > > > > > + > > > > > + pr_warn("%s: GPU throttled with reasons > > > > > 0x%08x\n", > > > > > + engine->name, throttle & > > > > > GT0_PERF_LIMIT_REASONS_MASK); > > > > > + > > > > This feels like spamming the system messages. We are on error > > > > path already and exiting with -EINVAL, so printing an > > > > unconditional warning here is excessive, I think. > > > > > > > > [...] > > > Got it. Based on past experience, most failures occur due to throttling. > > > However, I can switch it to pr_info since we already print pr_err > > > beforehand. > > No, that would bunch up two reasons for potential failure. > > If you experienced problems in condition check: > > > > if (11 * min.power > 10 * max.power) > > > > due to throttling, then throttling detection could use its own > > error path, something like this could work: > > > > if (11 * min.power > 10 * max.power) { > > - pr_err("%s: did not conserve power when setting > > lower frequency!\n", > > - engine->name); > > + if (read_cagf(rps) != rps->max_freq) { > > + throttle = intel_uncore_read(gt->uncore, > > + > > intel_gt_perf_limit_reasons_reg(gt)); > > + pr_err("%s: GPU throttled with reasons > > 0x%08x\n", > > + engine->name, throttle & > > GT0_PERF_LIMIT_REASONS_MASK); > > + } else { > > + pr_err("%s: did not conserve power when setting > > lower frequency!\n", > > + engine->name); > > + } > > + > > err = -EINVAL; > > break; > > } > > > > The main goal would be to differentiate and print only one > > reason for failure, instead of emitting two of them and leaving > > people guessing which one of the two was the real reason the > > function returned with -EINVAL. > > > > I did not test the code above, so it may require some changes. > > Throttle is not considered a failure in this context; it serves only as a > debug indicator or a potential reason for failure. > > The primary objective of the test is to measure power, and any failure will > be related to power metrics. > > Throttling is viewed as a defensive mechanism rather than a failure > condition.
Yes, it's actually the hardware saving itself from permanent damage. But here we're reading throttle reasons after the spinner ends, so it is possible that throttling is no longer happening at this point. The useful reading would be before we end the spinner, but we can always log the result afterwards. Also, update with same logic where applicable (like live_rps_control()). Raag
