On Thu, Sep 10, 2020 at 12:41 PM Ian Rogers <irog...@google.com> wrote: > > On Wed, Sep 9, 2020 at 10:52 AM Ian Rogers <irog...@google.com> wrote: > > > > On Wed, Sep 9, 2020 at 2:53 AM Namhyung Kim <namhy...@kernel.org> wrote: > > > > > > Hi Ian, > > > > > > On Wed, Sep 9, 2020 at 5:02 PM Ian Rogers <irog...@google.com> wrote: > > > > > > > > A metric like DRAM_BW_Use has on SkylakeX events > > > > uncore_imc/cas_count_read/ > > > > and uncore_imc/case_count_write/. These events open 6 events per socket > > > > with pmu names of uncore_imc_[0-5]. The current metric setup code in > > > > find_evsel_group assumes one ID will map to 1 event to be recorded in > > > > metric_events. For events with multiple matches, the first event is > > > > recorded in metric_events (avoiding matching >1 event with the same > > > > name) and the evlist_used updated so that duplicate events aren't > > > > removed when the evlist has unused events removed. > > > > > > > > Before this change: > > > > $ /tmp/perf/perf stat -M DRAM_BW_Use -a -- sleep 1 > > > > > > > > Performance counter stats for 'system wide': > > > > > > > > 41.14 MiB uncore_imc/cas_count_read/ > > > > 1,002,614,251 ns duration_time > > > > > > > > 1.002614251 seconds time elapsed > > > > > > > > After this change: > > > > $ /tmp/perf/perf stat -M DRAM_BW_Use -a -- sleep 1 > > > > > > > > Performance counter stats for 'system wide': > > > > > > > > 157.47 MiB uncore_imc/cas_count_read/ # 0.00 > > > > DRAM_BW_Use > > > > > > Hmm.. I guess the 0.00 result is incorrect, no? > > > > Agreed. There are a number of pre-existing bugs in this code. I'll try > > to look into this one. > > There was a bug where the metric_leader wasn't being set correctly and > so the accumulation of values wasn't working as expected. There's a > small fix in: > https://lore.kernel.org/lkml/20200910032632.511566-3-irog...@google.com/T/#u > that also does the change I mentioned below. > > The metric still remains at 0.0 following this change as I believe > there is a bug in the metric. The metric expression is: > "( 64 * ( uncore_imc@cas_count_read@ + uncore_imc@cas_count_write@ ) / > 1000000000 ) / duration_time" > ie the counts of uncore_imc/cas_count_read/ and > uncore_imc/cas_count_write/ are being first being scaled up by 64, > that is to turn a count of cache lines into bytes, the count is then > divided by 1000000000 or a GB to supposedly give GB/s. However, the > counts are read and scaled: > > ... > void perf_stat__update_shadow_stats(struct evsel *counter, u64 count, > ... > count *= counter->scale; > ... > > The scale value from > /sys/devices/uncore_imc_0/events/cas_count_read.scale is > 6.103515625e-5 or 64/(1024*1024). This means the result of the > expression is 64 times too large but in PB/s and so too small to > display. I don't rule out there being other issues but this appears to > be a significant one. Printing using intervals also looks suspicious > as the count appears to just increase rather than vary up and down.
You're right! Maybe we can change it to simply like below and change the unit to MiB/s? ( uncore_imc@cas_count_read@ + uncore_imc@cas_count_write@ ) / duration_time Thanks Namhyung