> > > Motivation: perf support enables measuring cache occupancy and memory > > > bandwidth metrics on hrtimer (high resolution timer) interrupts via eBPF. > > > Compared with polling from userspace, hrtimer-based reads remove > > > scheduling jitter and context switch overhead. Further, PMU reads can be > > > parallel, since the PMU read path need not lock resctrl's rdtgroup_mutex. > > > Parallelization and reduced jitter enable more accurate snapshots of > > > cache occupancy and memory bandwidth. [1] has more details on the > > > motivation and design. > > > > This parallel read without rdtgroup_mutex looks worrying. > > > > The h/w counters have limited width (24-bits on older Intel CPUs, > > 32-bits on AMD and Intel >= Icelake). So resctrl takes the raw > > value and in get_corrected_val() figures the increment since the > > previous read of the MSR to figure out how much to add to the > > running per-RMID count of "chunks". > > > > That's all inherently full of races. If perf does this at the > > same time that resctrl does, then things will be corrupted > > sooner or later. > > > > You might fix it with a per-RMID spinlock in "struct arch_mbm_state"? > > That might be too fine a locking granularity. You'd probably be fine > with little contention with a lock in "struct rdt_mon_domain".
Good catch. Thank you Tony! We might be able to solve the issue similarly to what adding a per-RMID spinlock in "struct arch_mbm_state" would do, but with only a memory barrier (no spinlock). I'll look further into it. -Jonathan
