On Tue, 07 Oct 2025 16:35:44 -0700, Umesh Nerlige Ramappa wrote: Hi Umesh,
> > When tick values are large, the multiplication by NSEC_PER_SEC is larger > than 64 bits and results in bad conversions. > > The issue is seen in PMU busyness counters that look like they have > wrapped around due to bad conversion. i915 PMU implementation returns > monotonically increasing counters. If a count is lesser than previous > one, it will only return the larger value until the smaller value > catches up. The user will see this as zero delta between two > measurements even though the engines are busy. > > Fix it by using a scaling factor to do the conversion. Add the same fix > for reverse conversion as well. > > Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/14955 > Signed-off-by: Umesh Nerlige Ramappa <[email protected]> > --- > v2: > - Fix divide by zero for Gen11 (Andi) > - Update commit message > --- > .../gpu/drm/i915/gt/intel_gt_clock_utils.c | 19 ++++++++++++++----- > drivers/gpu/drm/i915/gt/intel_gt_types.h | 2 ++ > 2 files changed, 16 insertions(+), 5 deletions(-) > > diff --git a/drivers/gpu/drm/i915/gt/intel_gt_clock_utils.c > b/drivers/gpu/drm/i915/gt/intel_gt_clock_utils.c > index 88b147fa5cb1..41a0e8622b33 100644 > --- a/drivers/gpu/drm/i915/gt/intel_gt_clock_utils.c > +++ b/drivers/gpu/drm/i915/gt/intel_gt_clock_utils.c > @@ -3,6 +3,8 @@ > * Copyright © 2020 Intel Corporation > */ > > +#include <linux/gcd.h> > + > #include "i915_drv.h" > #include "i915_reg.h" > #include "intel_gt.h" > @@ -171,7 +173,12 @@ static u32 read_clock_frequency(struct intel_uncore > *uncore) > > void intel_gt_init_clock_frequency(struct intel_gt *gt) > { > + unsigned long clock_period_scale; > + > gt->clock_frequency = read_clock_frequency(gt->uncore); > + clock_period_scale = gcd(NSEC_PER_SEC, gt->clock_frequency); > + gt->clock_nsec_scaled = NSEC_PER_SEC / clock_period_scale; > + gt->clock_freq_scaled = gt->clock_frequency / clock_period_scale; > > /* Icelake appears to use another fixed frequency for CTX_TIMESTAMP */ > if (GRAPHICS_VER(gt->i915) == 11) > @@ -180,11 +187,11 @@ void intel_gt_init_clock_frequency(struct intel_gt *gt) > gt->clock_period_ns = intel_gt_clock_interval_to_ns(gt, 1); > > GT_TRACE(gt, > - "Using clock frequency: %dkHz, period: %dns, wrap: %lldms\n", > + "Using clock frequency: %dkHz, period: %dns, wrap: %lldms, > scale %lu\n", > gt->clock_frequency / 1000, > gt->clock_period_ns, > - div_u64(mul_u32_u32(gt->clock_period_ns, S32_MAX), > - USEC_PER_SEC)); > + div_u64(mul_u32_u32(gt->clock_period_ns, S32_MAX), > USEC_PER_SEC), > + clock_period_scale); > } > > #if IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM) > @@ -205,7 +212,8 @@ static u64 div_u64_roundup(u64 nom, u32 den) > > u64 intel_gt_clock_interval_to_ns(const struct intel_gt *gt, u64 count) > { > - return div_u64_roundup(count * NSEC_PER_SEC, gt->clock_frequency); > + return div_u64_roundup(count * gt->clock_nsec_scaled, > + gt->clock_freq_scaled); > } > > u64 intel_gt_pm_interval_to_ns(const struct intel_gt *gt, u64 count) > @@ -215,7 +223,8 @@ u64 intel_gt_pm_interval_to_ns(const struct intel_gt *gt, > u64 count) > > u64 intel_gt_ns_to_clock_interval(const struct intel_gt *gt, u64 ns) > { > - return div_u64_roundup(gt->clock_frequency * ns, NSEC_PER_SEC); > + return div_u64_roundup(gt->clock_freq_scaled * ns, > + gt->clock_nsec_scaled); Instead of this approach, how about just using the already available mul_u64_u32_div() (or even mul_u64_u64_div_u64())? That would be preferable I think (though not sure if the rounding is needed?). There is also a roundup_u64() available in math64.h as a replacement for div_u64_roundup(). Thanks. -- Ashutosh > } > > u64 intel_gt_ns_to_pm_interval(const struct intel_gt *gt, u64 ns) > diff --git a/drivers/gpu/drm/i915/gt/intel_gt_types.h > b/drivers/gpu/drm/i915/gt/intel_gt_types.h > index bcee084b1f27..a19c568fcdc0 100644 > --- a/drivers/gpu/drm/i915/gt/intel_gt_types.h > +++ b/drivers/gpu/drm/i915/gt/intel_gt_types.h > @@ -166,6 +166,8 @@ struct intel_gt { > > u32 clock_frequency; > u32 clock_period_ns; > + u32 clock_freq_scaled; > + u32 clock_nsec_scaled; > > struct intel_llc llc; > struct intel_rc6 rc6; > -- > 2.43.0 >
