On Thu, 11 Aug, at 06:41:50PM, Borislav Petkov wrote: > Drop stable from CC. > > On Thu, Aug 11, 2016 at 04:21:42PM +0100, Matt Fleming wrote: > > While the Intel PMU monitors the LLC when perf enables the > > HW_CACHE_REFERENCES and HW_CACHE_MISSES events, these events monitor > > L1 instruction cache fetches (0x0080) and instruction cache misses > > (0x0081) on the AMD PMU. > > > > This is extremely confusing when monitoring the same workload across > > Intel and AMD machines, since parameters like, > > > > $ perf stat -e cache-references,cache-misses > > > > measure completely different things. > > > > Instead, make the AMD PMU measure instruction/data cache fill requests > > to the L2 and instruction/data cache misses in the L2 when > > HW_CACHE_REFERENCES and HW_CACHE_MISSES are enabled, respectively. > > That way the events measure unified caches on both platforms. > > I guess that's closer. > > Even though LLC is not always L2 on AMD (some have L3). Btw, > what are the exact events for PERF_COUNT_HW_CACHE_REFERENCES and > PERF_COUNT_HW_CACHE_MISSES called on Intel? They're referred to as "LLC Reference" and "LLC Misses" in the Intel SDM Table 18-1 and "Longest latency cache references/misses" in Table 19-1.
> I could try to find better/more fitting event selectors on AMD... If you've got any other suggestions, I'm all ears. Note that one thing I wasn't sure about was whether we want to include TLB events hitting the L2. I left them out of this patch, but it might make sense to add them so that HW_CACHE_{REFERENCES,MISSES} is actually distinguishable from LLC-{loads,misses}. > > Signed-off-by: Matt Fleming <m...@codeblueprint.co.uk> > > Cc: Peter Zijlstra <pet...@infradead.org> > > Cc: Ingo Molnar <mi...@kernel.org> > > Cc: Borislav Petkov <b...@alien8.de> > > Cc: <sta...@vger.kernel.org> > > --- > > arch/x86/events/amd/core.c | 4 ++-- > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/arch/x86/events/amd/core.c b/arch/x86/events/amd/core.c > > index e07a22bb9308..8fd8bf79f32b 100644 > > --- a/arch/x86/events/amd/core.c > > +++ b/arch/x86/events/amd/core.c > > @@ -119,8 +119,8 @@ static const u64 > > amd_perfmon_event_map[PERF_COUNT_HW_MAX] = > > { > > [PERF_COUNT_HW_CPU_CYCLES] = 0x0076, > > [PERF_COUNT_HW_INSTRUCTIONS] = 0x00c0, > > - [PERF_COUNT_HW_CACHE_REFERENCES] = 0x0080, > > - [PERF_COUNT_HW_CACHE_MISSES] = 0x0081, > > + [PERF_COUNT_HW_CACHE_REFERENCES] = 0x037d, > > + [PERF_COUNT_HW_CACHE_MISSES] = 0x037e, > > [PERF_COUNT_HW_BRANCH_INSTRUCTIONS] = 0x00c2, > > [PERF_COUNT_HW_BRANCH_MISSES] = 0x00c3, > > [PERF_COUNT_HW_STALLED_CYCLES_FRONTEND] = 0x00d0, /* "Decoder empty" > > event */ > > Btw, there's also amd_event_mapping in arch/x86/kvm/pmu_amd.c which has > duplicated amd_perfmon_event_map. Would need adjusting too. Urgh, right. I totally missed that. I'll update.