Hi,
On 10-12-2025 02:21 pm, Krzysztof Karas wrote:
Hi,
On 2025-12-09 at 13:36:15 +0530, Anirban, Sk wrote:
Hi,
On 09-12-2025 12:46 pm, Krzysztof Karas wrote:
Hi Sk Anirban,
On 2025-12-09 at 11:26:17 +0530, Sk Anirban wrote:
Report GPU throttle reasons when RPS tests fail to reach expected
frequencies or power levels.
Signed-off-by: Sk Anirban <[email protected]>
---
drivers/gpu/drm/i915/gt/selftest_rps.c | 17 +++++++++++++++++
1 file changed, 17 insertions(+)
diff --git a/drivers/gpu/drm/i915/gt/selftest_rps.c
b/drivers/gpu/drm/i915/gt/selftest_rps.c
index 73bc91c6ea07..01c671e00e61 100644
--- a/drivers/gpu/drm/i915/gt/selftest_rps.c
+++ b/drivers/gpu/drm/i915/gt/selftest_rps.c
@@ -1138,6 +1138,7 @@ int live_rps_power(void *arg)
struct intel_engine_cs *engine;
enum intel_engine_id id;
struct igt_spinner spin;
+ u32 throttle;
int err = 0;
/*
@@ -1216,6 +1217,13 @@ int live_rps_power(void *arg)
if (11 * min.power > 10 * max.power) {
pr_err("%s: did not conserve power when setting lower
frequency!\n",
engine->name);
+
+ throttle = intel_uncore_read(gt->uncore,
+
intel_gt_perf_limit_reasons_reg(gt));
+
+ pr_warn("%s: GPU throttled with reasons 0x%08x\n",
+ engine->name, throttle &
GT0_PERF_LIMIT_REASONS_MASK);
+
This feels like spamming the system messages. We are on error
path already and exiting with -EINVAL, so printing an
unconditional warning here is excessive, I think.
[...]
Got it. Based on past experience, most failures occur due to throttling.
However, I can switch it to pr_info since we already print pr_err
beforehand.
No, that would bunch up two reasons for potential failure.
If you experienced problems in condition check:
if (11 * min.power > 10 * max.power)
due to throttling, then throttling detection could use its own
error path, something like this could work:
if (11 * min.power > 10 * max.power) {
- pr_err("%s: did not conserve power when setting lower
frequency!\n",
- engine->name);
+ if (read_cagf(rps) != rps->max_freq) {
+ throttle = intel_uncore_read(gt->uncore,
+
intel_gt_perf_limit_reasons_reg(gt));
+ pr_err("%s: GPU throttled with reasons
0x%08x\n",
+ engine->name, throttle &
GT0_PERF_LIMIT_REASONS_MASK);
+ } else {
+ pr_err("%s: did not conserve power when setting
lower frequency!\n",
+ engine->name);
+ }
+
err = -EINVAL;
break;
}
The main goal would be to differentiate and print only one
reason for failure, instead of emitting two of them and leaving
people guessing which one of the two was the real reason the
function returned with -EINVAL.
I did not test the code above, so it may require some changes.
Throttle is not considered a failure in this context; it serves only as
a debug indicator or a potential reason for failure.
The primary objective of the test is to measure power, and any failure
will be related to power metrics.
Throttling is viewed as a defensive mechanism rather than a failure
condition.
Thanks,
Anirban