Hi,

[...]

> > > > > @@ -1216,6 +1217,13 @@ int live_rps_power(void *arg)
> > > > >               if (11 * min.power > 10 * max.power) {
> > > > >                       pr_err("%s: did not conserve power when setting 
> > > > > lower frequency!\n",
> > > > >                              engine->name);
> > > > > +
> > > > > +                     throttle = intel_uncore_read(gt->uncore,
> > > > > +                                                  
> > > > > intel_gt_perf_limit_reasons_reg(gt));
> > > > > +
> > > > > +                     pr_warn("%s: GPU throttled with reasons 
> > > > > 0x%08x\n",
> > > > > +                             engine->name, throttle & 
> > > > > GT0_PERF_LIMIT_REASONS_MASK);
> > > > > +
> > > > This feels like spamming the system messages. We are on error
> > > > path already and exiting with -EINVAL, so printing an
> > > > unconditional warning here is excessive, I think.
> > > > 
> > > > [...]
> > > Got it. Based on past experience, most failures occur due to throttling.
> > > However, I can switch it to pr_info since we already print pr_err
> > > beforehand.
> > No, that would bunch up two reasons for potential failure.
> > If you experienced problems in condition check:
> > 
> > if (11 * min.power > 10 * max.power)
> > 
> > due to throttling, then throttling detection could use its own
> > error path, something like this could work:
> > 
> >             if (11 * min.power > 10 * max.power) {
> > -                       pr_err("%s: did not conserve power when setting 
> > lower frequency!\n",
> > -                              engine->name);
> > +                   if (read_cagf(rps) != rps->max_freq) {
> > +                           throttle = intel_uncore_read(gt->uncore,
> > +                                                        
> > intel_gt_perf_limit_reasons_reg(gt));
> > +                           pr_err("%s: GPU throttled with reasons 
> > 0x%08x\n",
> > +                                   engine->name, throttle & 
> > GT0_PERF_LIMIT_REASONS_MASK);
> > +                   } else {
> > +                           pr_err("%s: did not conserve power when setting 
> > lower frequency!\n",
> > +                                  engine->name);
> > +                   }
> > +
> >                     err = -EINVAL;
> >                     break;
> >             }
> > 
> > The main goal would be to differentiate and print only one
> > reason for failure, instead of emitting two of them and leaving
> > people guessing which one of the two was the real reason the
> > function returned with -EINVAL.
> > 
> > I did not test the code above, so it may require some changes.
> 
> Throttle is not considered a failure in this context; it serves only as a
> debug indicator or a potential reason for failure.
This is a good reason not to include it straight away in the logs.

> 
> The primary objective of the test is to measure power, and any failure will
> be related to power metrics.
> 
> Throttling is viewed as a defensive mechanism rather than a failure
> condition.
Then print the log when throttling happened
(i.e., when if (read_cagf(rps) != rps->max_freq) is true):

             if (11 * min.power > 10 * max.power) {
                     pr_err("%s: did not conserve power when setting lower 
frequency!\n",
                            engine->name);
 +                   if (read_cagf(rps) != rps->max_freq) {
 +                           throttle = intel_uncore_read(gt->uncore,
 +                                                        
intel_gt_perf_limit_reasons_reg(gt));
 +                           pr_warn("%s: GPU throttled with reasons 0x%08x\n",
 +                                   engine->name, throttle & 
GT0_PERF_LIMIT_REASONS_MASK);
 +                   }
 +
                     err = -EINVAL;
                     break;
             }
Unless, you are certain that a throttle will always happen in
this error path, that additional "if" is necessary.

If this is a debug log anyway, then choosing pr_info or pr_warn
is not that important, so you may leave it as you see fit.

-- 
Best Regards,
Krzysztof

Reply via email to