On 10/2/20 8:16 AM, Liang, Kan wrote: > On 10/2/2020 7:02 AM, Peter Zijlstra wrote: >> On Wed, Sep 30, 2020 at 07:29:35AM -0700, kan.li...@linux.intel.com wrote: >>> From: Kan Liang <kan.li...@linux.intel.com> >>> >>> When a group that has TopDown members is failed to be scheduled, any >>> later TopDown groups will not return valid values. >>> >>> Here is an example. >>> >>> A background perf that occupies all the GP counters and the fixed >>> counter 1. >>> $perf stat -e "{cycles,cycles,cycles,cycles,cycles,cycles,cycles, >>> cycles,cycles}:D" -a >>> >>> A user monitors a TopDown group. It works well, because the fixed >>> counter 3 and the PERF_METRICS are available. >>> $perf stat -x, --topdown -- ./workload >>> retiring,bad speculation,frontend bound,backend bound, >>> 18.0,16.1,40.4,25.5, >>> >>> Then the user tries to monitor a group that has TopDown members. >>> Because of the cycles event, the group is failed to be scheduled. >>> $perf stat -x, -e '{slots,topdown-retiring,topdown-be-bound, >>> topdown-fe-bound,topdown-bad-spec,cycles}' >>> -- ./workload >>> <not counted>,,slots,0,0.00,, >>> <not counted>,,topdown-retiring,0,0.00,, >>> <not counted>,,topdown-be-bound,0,0.00,, >>> <not counted>,,topdown-fe-bound,0,0.00,, >>> <not counted>,,topdown-bad-spec,0,0.00,, >>> <not counted>,,cycles,0,0.00,, >>> >>> The user tries to monitor a TopDown group again. It doesn't work anymore. >>> $perf stat -x, --topdown -- ./workload >>> >>> ,,,,, >>> >>> In a txn, cancel_txn() is to truncate the event_list for a canceled >>> group and update the number of events added in this transaction. >>> However, the number of TopDown events added in this transaction is not >>> updated. The kernel will probably fail to add new Topdown events. >>> >>> Check if the canceled group has Topdown events. If so, subtract the >>> TopDown events from n_metric accordingly. >>> >>> Fixes: 7b2c05a15d29 ("perf/x86/intel: Generic support for hardware TopDown >>> metrics") >>> Reported-by: Andi Kleen <a...@linux.intel.com> >>> Reviewed-by: Andi Kleen <a...@linux.intel.com> >>> Signed-off-by: Kan Liang <kan.li...@linux.intel.com> >>> ---
>> >> Urgh, I'd much rather we add n_txn_metric. But also, while looking at >> this, don't we have the same problem with n_pair ? >> >> Something like this perhaps... >> > > Sure. For the perf metric, the below patch fixes the problem. > > Tested-by: Kan Liang <kan.li...@linux.intel.com> Tested-by: Kim Phillips <kim.phill...@amd.com> Excerpt from test script: sudo perf stat -e "{cycles,cycles,cycles,cycles}:D" -a sleep 10 & # should succeed: sudo perf stat -e "{fp_ret_sse_avx_ops.all}:D" -a workload # should fail: sudo perf stat -e "{fp_ret_sse_avx_ops.all,fp_ret_sse_avx_ops.all,cycles}:D" -a workload # previously failed, now succeeds with this patch: sudo perf stat -e "{fp_ret_sse_avx_ops.all}:D" -a workload Thanks both, Kim