intel: Fix n_metric for the canceled group

Kim Phillips Fri, 02 Oct 2020 14:11:47 -0700

On 10/2/20 8:16 AM, Liang, Kan wrote:
> On 10/2/2020 7:02 AM, Peter Zijlstra wrote:
>> On Wed, Sep 30, 2020 at 07:29:35AM -0700, kan.li...@linux.intel.com wrote:
>>> From: Kan Liang <kan.li...@linux.intel.com>
>>>
>>> When a group that has TopDown members is failed to be scheduled, any
>>> later TopDown groups will not return valid values.
>>>
>>> Here is an example.
>>>
>>> A background perf that occupies all the GP counters and the fixed
>>> counter 1.
>>>   $perf stat -e "{cycles,cycles,cycles,cycles,cycles,cycles,cycles,
>>>                   cycles,cycles}:D" -a
>>>
>>> A user monitors a TopDown group. It works well, because the fixed
>>> counter 3 and the PERF_METRICS are available.
>>>   $perf stat -x, --topdown -- ./workload
>>>     retiring,bad speculation,frontend bound,backend bound,
>>>     18.0,16.1,40.4,25.5,
>>>
>>> Then the user tries to monitor a group that has TopDown members.
>>> Because of the cycles event, the group is failed to be scheduled.
>>>   $perf stat -x, -e '{slots,topdown-retiring,topdown-be-bound,
>>>                       topdown-fe-bound,topdown-bad-spec,cycles}'
>>>                       -- ./workload
>>>      <not counted>,,slots,0,0.00,,
>>>      <not counted>,,topdown-retiring,0,0.00,,
>>>      <not counted>,,topdown-be-bound,0,0.00,,
>>>      <not counted>,,topdown-fe-bound,0,0.00,,
>>>      <not counted>,,topdown-bad-spec,0,0.00,,
>>>      <not counted>,,cycles,0,0.00,,
>>>
>>> The user tries to monitor a TopDown group again. It doesn't work anymore.
>>>   $perf stat -x, --topdown -- ./workload
>>>
>>>      ,,,,,
>>>
>>> In a txn, cancel_txn() is to truncate the event_list for a canceled
>>> group and update the number of events added in this transaction.
>>> However, the number of TopDown events added in this transaction is not
>>> updated. The kernel will probably fail to add new Topdown events.
>>>
>>> Check if the canceled group has Topdown events. If so, subtract the
>>> TopDown events from n_metric accordingly.
>>>
>>> Fixes: 7b2c05a15d29 ("perf/x86/intel: Generic support for hardware TopDown 
>>> metrics")
>>> Reported-by: Andi Kleen <a...@linux.intel.com>
>>> Reviewed-by: Andi Kleen <a...@linux.intel.com>
>>> Signed-off-by: Kan Liang <kan.li...@linux.intel.com>
>>> ---


>>
>> Urgh, I'd much rather we add n_txn_metric. But also, while looking at
>> this, don't we have the same problem with n_pair ?
>>
>> Something like this perhaps...
>>
> 
> Sure. For the perf metric, the below patch fixes the problem.
> 
> Tested-by: Kan Liang <kan.li...@linux.intel.com>

Tested-by: Kim Phillips <kim.phill...@amd.com>

Excerpt from test script:

sudo perf stat -e "{cycles,cycles,cycles,cycles}:D" -a sleep 10 &

# should succeed:
sudo perf stat -e "{fp_ret_sse_avx_ops.all}:D" -a workload

# should fail:
sudo perf stat -e "{fp_ret_sse_avx_ops.all,fp_ret_sse_avx_ops.all,cycles}:D" -a 
workload

# previously failed, now succeeds with this patch:
sudo perf stat -e "{fp_ret_sse_avx_ops.all}:D" -a workload

Thanks both,

Kim

Re: [PATCH] perf/x86/intel: Fix n_metric for the canceled group

Reply via email to