Re: [RFC PATCH 2/3] perf tools: Add support for "report" for some spe events

2019-10-17 Thread Tan Xiaojun
On 2019/10/17 9:51, Tan Xiaojun wrote:
> On 2019/10/16 18:12, James Clark wrote:
>> Hi Xiaojun,
>>

 What do you mean when the user specifies "event:pp", if the SPE is 
 available, configure and record the spe data directly via the perf event 
 open syscall?
 (perf.data itself is the same as using -e arm_spe_0//xxx?)
>>>
>>> I mean, for the perf record, if the user does not add ":pp" to these 
>>> events, the original process is taken, and if ":pp" is added, the spe 
>>> process is taken.
>>>
>>
>> Yes we think this is the best way to do it considering that SPE has been 
>> implemented as a separate PMU and it will be very difficult to do it in the 
>> Kernel when the precise_ip attribute is set.
>>
>> I think doing everything in userspace is easiest. This will at least mean 
>> that users of Perf don't have to be aware of the details of SPE to get 
>> precise sample data.
>>
>> So if the user specifies "event:p" when SPE is available, the SPE PMU is 
>> automatically configured data is recorded. If the user also specifies -e 
>> arm_spe_0//xxx and wants to do some manual configuration, then that could 
>> override the automatic configuration.
>>
>>
>> James
>>
>>
>>
> 
> OK. I got it.
> 
> I found a bug in the test. If I specify cpu_list(use -a or -C) when logging 
> spe data, some events with "pid:0 tid:0" is logged. This is obviously wrong.
> 
> I want to solve this problem, but I haven't found out what went wrong.
> 
> --
> [root@server121 perf]# perf record -e 
> arm_spe_0/branch_filter=1,ts_enable=1,pa_enable=1,load_filter=1,jitter=0,store_filter=1,min_latency=0/
>  -a

Sorry, it should add "--all-user" here, and finally there will still be some 
"pid:0" events in spe_dump.out. 
(And if kernel event is included, then "pid:0" is not a problem)

This causes the pc address of some spe sampled data to be untranslated because 
the wrong pid/tid is obtained from here.

Thanks.
Xiaojun.

> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 7.925 MB perf.data ]
> [root@server121 perf]# perf report -D > spe_dump.out
> [root@server121 perf]# vim spe_dump.out
> 
> --
> ...
> 0xd0330 [0x30]: event: 12
> .
> . ... raw event: size 48 bytes
> .  :  0c 00 00 00 00 00 30 00 00 00 00 00 00 00 00 00  ..0.
> .  0010:  00 00 00 00 00 00 00 00 f8 d9 fe bd f7 08 02 00  
> .  0020:  00 00 00 00 00 00 00 00 4c bc 14 00 00 00 00 00  L...
> 
> 0 572810090961400 0xd0330 [0x30]: PERF_RECORD_ITRACE_START pid: 0 tid: 0
> 
> 0xd0438 [0x30]: event: 12
> .
> . ... raw event: size 48 bytes
> .  :  0c 00 00 00 00 00 30 00 00 00 00 00 00 00 00 00  ..0.
> .  0010:  00 00 00 00 00 00 00 00 d8 ef fe bd f7 08 02 00  
> .  0020:  01 00 00 00 00 00 00 00 4d bc 14 00 00 00 00 00  M...
> 
> 1 572810090967000 0xd0438 [0x30]: PERF_RECORD_ITRACE_START pid: 0 tid: 0
> ...
> --
> 
> Thanks.
> Xiaojun.
> 
> 
> .
> 




Re: [RFC PATCH 2/3] perf tools: Add support for "report" for some spe events

2019-10-16 Thread Tan Xiaojun
On 2019/10/16 18:12, James Clark wrote:
> Hi Xiaojun,
> 
>>>
>>> What do you mean when the user specifies "event:pp", if the SPE is 
>>> available, configure and record the spe data directly via the perf event 
>>> open syscall?
>>> (perf.data itself is the same as using -e arm_spe_0//xxx?)
>>
>> I mean, for the perf record, if the user does not add ":pp" to these events, 
>> the original process is taken, and if ":pp" is added, the spe process is 
>> taken.
>>
> 
> Yes we think this is the best way to do it considering that SPE has been 
> implemented as a separate PMU and it will be very difficult to do it in the 
> Kernel when the precise_ip attribute is set.
> 
> I think doing everything in userspace is easiest. This will at least mean 
> that users of Perf don't have to be aware of the details of SPE to get 
> precise sample data.
> 
> So if the user specifies "event:p" when SPE is available, the SPE PMU is 
> automatically configured data is recorded. If the user also specifies -e 
> arm_spe_0//xxx and wants to do some manual configuration, then that could 
> override the automatic configuration.
> 
> 
> James
> 
> 
> 

OK. I got it.

I found a bug in the test. If I specify cpu_list(use -a or -C) when logging spe 
data, some events with "pid:0 tid:0" is logged. This is obviously wrong.

I want to solve this problem, but I haven't found out what went wrong.

--
[root@server121 perf]# perf record -e 
arm_spe_0/branch_filter=1,ts_enable=1,pa_enable=1,load_filter=1,jitter=0,store_filter=1,min_latency=0/
 -a
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 7.925 MB perf.data ]
[root@server121 perf]# perf report -D > spe_dump.out
[root@server121 perf]# vim spe_dump.out

--
...
0xd0330 [0x30]: event: 12
.
. ... raw event: size 48 bytes
.  :  0c 00 00 00 00 00 30 00 00 00 00 00 00 00 00 00  ..0.
.  0010:  00 00 00 00 00 00 00 00 f8 d9 fe bd f7 08 02 00  
.  0020:  00 00 00 00 00 00 00 00 4c bc 14 00 00 00 00 00  L...

0 572810090961400 0xd0330 [0x30]: PERF_RECORD_ITRACE_START pid: 0 tid: 0

0xd0438 [0x30]: event: 12
.
. ... raw event: size 48 bytes
.  :  0c 00 00 00 00 00 30 00 00 00 00 00 00 00 00 00  ..0.
.  0010:  00 00 00 00 00 00 00 00 d8 ef fe bd f7 08 02 00  
.  0020:  01 00 00 00 00 00 00 00 4d bc 14 00 00 00 00 00  M...

1 572810090967000 0xd0438 [0x30]: PERF_RECORD_ITRACE_START pid: 0 tid: 0
...
--

Thanks.
Xiaojun.



Re: [RFC PATCH 2/3] perf tools: Add support for "report" for some spe events

2019-10-16 Thread James Clark
Hi Xiaojun,

>>
>> What do you mean when the user specifies "event:pp", if the SPE is 
>> available, configure and record the spe data directly via the perf event 
>> open syscall?
>> (perf.data itself is the same as using -e arm_spe_0//xxx?)
> 
> I mean, for the perf record, if the user does not add ":pp" to these events, 
> the original process is taken, and if ":pp" is added, the spe process is 
> taken.
> 

Yes we think this is the best way to do it considering that SPE has been 
implemented as a separate PMU and it will be very difficult to do it in the 
Kernel when the precise_ip attribute is set.

I think doing everything in userspace is easiest. This will at least mean that 
users of Perf don't have to be aware of the details of SPE to get precise 
sample data.

So if the user specifies "event:p" when SPE is available, the SPE PMU is 
automatically configured data is recorded. If the user also specifies -e 
arm_spe_0//xxx and wants to do some manual configuration, then that could 
override the automatic configuration.


James



Re: [RFC PATCH 2/3] perf tools: Add support for "report" for some spe events

2019-10-09 Thread Tan Xiaojun
On 2019/10/9 19:09, Tan Xiaojun wrote:
> On 2019/10/9 17:48, James Clark wrote:
>> Hi Xiaojun,
>>
>>> By the way, you mentioned before that you want the spe event to be in the 
>>> form of "event:pp" like pebs. Is that the whole framework should be made 
>>> similar to pebs? Or is it just a modification to the command format? 
>>
>> We're currently still investigating if it makes sense to modify the Perf 
>> event open syscall to use SPE when the "precise_ip" attribute is set. And 
>> then synthesize samples using the SPE data when available. This would keep 
>> the syscall interface more consistent between architectures.
>>
>> And if tools other than Perf want more precise data, they don't have to be 
>> aware of SPE or any of the implementation defined details of it. For example 
>> the 'data source' encoding can be different from one micro architecture to 
>> the next. The kernel is probably the best place to handle this.
>>
>> At the moment, every tool that wants to use the Perf syscall to get precise 
>> data on ARM would have to be aware of SPE and implement their own decoding.
>>
> 
> Hi James,
> 
> What do you mean when the user specifies "event:pp", if the SPE is available, 
> configure and record the spe data directly via the perf event open syscall?
> (perf.data itself is the same as using -e arm_spe_0//xxx?)

I mean, for the perf record, if the user does not add ":pp" to these events, 
the original process is taken, and if ":pp" is added, the spe process is taken.

Xiaojun.

> 
> OK. If I have not misunderstood, I think I know how to do it.
> Thank you.
> 
>>> For the former, this may be a bit difficult. For the latter, there is 
>>> currently no modification to the record part, so "-c -F, etc." is only for 
>>> instructions rather than events, so it may be misunderstood by users.
>>>
>>> So I haven't figured out how to do. What do you think of this?
>>
>> I think the patch at the moment is a good start to make SPE more accessible. 
>> And the changes I mentioned above wouldn't change the fact that the raw SPE 
>> data would still be available via the SPE PMU. So I think continuing with 
>> the patch as-is for now is the best idea.
>>
> 
> Yes. I agree.
> 
> Xiaojun.
> 
>>
>> James
>>
>>
> 




Re: [RFC PATCH 2/3] perf tools: Add support for "report" for some spe events

2019-10-09 Thread Tan Xiaojun
On 2019/10/9 17:48, James Clark wrote:
> Hi Xiaojun,
> 
>> By the way, you mentioned before that you want the spe event to be in the 
>> form of "event:pp" like pebs. Is that the whole framework should be made 
>> similar to pebs? Or is it just a modification to the command format? 
> 
> We're currently still investigating if it makes sense to modify the Perf 
> event open syscall to use SPE when the "precise_ip" attribute is set. And 
> then synthesize samples using the SPE data when available. This would keep 
> the syscall interface more consistent between architectures.
> 
> And if tools other than Perf want more precise data, they don't have to be 
> aware of SPE or any of the implementation defined details of it. For example 
> the 'data source' encoding can be different from one micro architecture to 
> the next. The kernel is probably the best place to handle this.
> 
> At the moment, every tool that wants to use the Perf syscall to get precise 
> data on ARM would have to be aware of SPE and implement their own decoding.
> 

Hi James,

What do you mean when the user specifies "event:pp", if the SPE is available, 
configure and record the spe data directly via the perf event open syscall?
(perf.data itself is the same as using -e arm_spe_0//xxx?)

OK. If I have not misunderstood, I think I know how to do it.
Thank you.

>> For the former, this may be a bit difficult. For the latter, there is 
>> currently no modification to the record part, so "-c -F, etc." is only for 
>> instructions rather than events, so it may be misunderstood by users.
>>
>> So I haven't figured out how to do. What do you think of this?
> 
> I think the patch at the moment is a good start to make SPE more accessible. 
> And the changes I mentioned above wouldn't change the fact that the raw SPE 
> data would still be available via the SPE PMU. So I think continuing with the 
> patch as-is for now is the best idea.
> 

Yes. I agree.

Xiaojun.

> 
> James
> 
> 




Re: [RFC PATCH 2/3] perf tools: Add support for "report" for some spe events

2019-10-09 Thread James Clark
Hi Xiaojun,

> By the way, you mentioned before that you want the spe event to be in the 
> form of "event:pp" like pebs. Is that the whole framework should be made 
> similar to pebs? Or is it just a modification to the command format? 

We're currently still investigating if it makes sense to modify the Perf event 
open syscall to use SPE when the "precise_ip" attribute is set. And then 
synthesize samples using the SPE data when available. This would keep the 
syscall interface more consistent between architectures.

And if tools other than Perf want more precise data, they don't have to be 
aware of SPE or any of the implementation defined details of it. For example 
the 'data source' encoding can be different from one micro architecture to the 
next. The kernel is probably the best place to handle this.

At the moment, every tool that wants to use the Perf syscall to get precise 
data on ARM would have to be aware of SPE and implement their own decoding.

> For the former, this may be a bit difficult. For the latter, there is 
> currently no modification to the record part, so "-c -F, etc." is only for 
> instructions rather than events, so it may be misunderstood by users.
> 
> So I haven't figured out how to do. What do you think of this?

I think the patch at the moment is a good start to make SPE more accessible. 
And the changes I mentioned above wouldn't change the fact that the raw SPE 
data would still be available via the SPE PMU. So I think continuing with the 
patch as-is for now is the best idea.


James


Re: [RFC PATCH 2/3] perf tools: Add support for "report" for some spe events

2019-10-08 Thread Tan Xiaojun
On 2019/10/4 21:46, James Clark wrote:
> Hi Xiaojun,
> 
> I wanted to ask if you are still working on this?
> 
> I've noticed that it doesn't apply cleanly to perf/core anymore and I was 
> working on re-basing it.
> Would you be interested in me posting my progress?
> 
> I was also interested in decoding the "data source" of events and displaying 
> that information. Does this
> clash with any of your current work?
> 
> 
> Thanks
> James
> 

(Sorry, you may have received a lot of this email because I am suddenly not on 
the mail-list, I need to confirm it.)

Hi, James,

Sorry, I did not respond in time because of the National Day holiday in China.

I am still doing this, but I have been scheduled for other tasks some time ago, 
so that there is no obvious progress on spe.

By the way, you mentioned before that you want the spe event to be in the form 
of "event:pp" like pebs. Is that the whole framework should be made similar to 
pebs? Or is it just a modification to the command format? For the former, this 
may be a bit difficult. For the latter, there is currently no modification to 
the record part, so "-c -F, etc." is only for instructions rather than events, 
so it may be misunderstood by users.

So I haven't figured out how to do. What do you think of this?

Thanks.
Xiaojun.

> On 09/08/2019 07:12, Tan Xiaojun wrote:
>> On 2019/8/9 5:00, Jeremy Linton wrote:
>>> Hi,
>>>
>>> First thanks for posting this!
>>>
>>> I ran this on our DAWN platform and it does what it says. Its a pretty 
>>> reasonable start, but I get -1's in the command row rather than "dd" (or 
>>> similar) and this also results in [unknown] for the shared object and most 
>>> userspace addresses. This is quite possibly something I'm not doing right, 
>>> but I didn't spend a lot of time testing/debugging it.
>>>
>>> I did a quick glance at the code to, and had a couple comments, although 
>>> I'm not a perf tool expert.
>>>
>>
>> Hi,
>>
>> Thank you for your reply.
>>
>> I have only recently started working on this aspect of the perf tool, so 
>> your reply is very important to me.
>>
>> I need to be sorry, my example here is not complete, until you said that I 
>> found that I only posted a part of the example. The complete example is as 
>> follows:
>>
>> Example usage:
>>
>> # perf record -e arm_spe/ts_enable=1,pa_enable=1/ dd if=/dev/zero 
>> of=/dev/null count=1
>> # perf report
>>
>> 
>> ...
>> # Samples: 37  of event 'llc-miss'
>> # Event count (approx.): 37
>> #
>> # Children  Self  Command  Shared Object  Symbol
>> #     ...  .  
>> 
>> #
>> 37.84%37.84%  dd   [kernel.kallsyms]  [k] 
>> perf_iterate_ctx.constprop.64
>> 16.22%16.22%  dd   [kernel.kallsyms]  [k] copy_page
>>  5.41% 5.41%  dd   [kernel.kallsyms]  [k] find_vma
>>  5.41% 5.41%  dd   [kernel.kallsyms]  [k] perf_event_mmap
>>  5.41% 5.41%  dd   [kernel.kallsyms]  [k] zap_pte_range
>>  5.41% 5.41%  dd   ld-2.28.so [.] _dl_lookup_symbol_x
>>  5.41% 5.41%  dd   libc-2.28.so   [.] _nl_intern_locale_data
>>  2.70% 2.70%  dd   [kernel.kallsyms]  [k] 
>> __remove_shared_vm_struct.isra.1
>>  2.70% 2.70%  dd   [kernel.kallsyms]  [k] kmem_cache_free
>>  2.70% 2.70%  dd   [kernel.kallsyms]  [k] ttwu_do_wakeup.isra.19
>>  2.70% 2.70%  dd   dd [.] 0xd9d8
>>  2.70% 2.70%  dd   ld-2.28.so [.] _dl_relocate_object
>>  2.70% 2.70%  dd   libc-2.28.so   [.] __unregister_atfork
>>  2.70% 2.70%  dd   libc-2.28.so   [.] _dl_addr
>>
>>
>> # Samples: 8  of event 'tlb-miss'
>> # Event count (approx.): 8
>> #
>> # Children  Self  Command  Shared Object  Symbol
>> #     ...  .  
>> .
>> #
>> 12.50%12.50%  dd   [kernel.kallsyms]  [k] __audit_syscall_entry
>> 12.50%12.50%  dd   [kernel.kallsyms]  [k] kmem_cache_free
>> 12.50%12.50%  dd   [kernel.kallsyms]  [k] 
>> perf_iterate_ctx.constprop.64
>> 12.50%12.50%  dd   [kernel.kallsyms]  [k] ttwu_do_wakeup.isra.19
>> 12.50%12.50%  dd   dd [.] 0xd9d8
>> 12.50%12.50%  dd   libc-2.28.so   [.] __unregister_atfork
>> 12.50%12.50%  dd   libc-2.28.so   [.] _nl_intern_locale_data
>> 12.50%12.50%  dd   libc-2.28.so   [.] vfprintf
>>
>>
>> # Samples: 12  of event 'branch-miss'
>> # Event count (approx.): 12
>> #
>> # Children  Self  Command  Shared Object  Symbol
>> #     ...  .  ..
>> #
>> 16.67%16.67%  dd   libc-2.28.so   [.] read_alias_file
>>  8.33% 8.33%  dd   [kernel.kallsyms]  

Re: [RFC PATCH 2/3] perf tools: Add support for "report" for some spe events

2019-10-08 Thread Tan Xiaojun
On 2019/10/4 21:46, James Clark wrote:
> Hi Xiaojun,
> 
> I wanted to ask if you are still working on this?
> 
> I've noticed that it doesn't apply cleanly to perf/core anymore and I was 
> working on re-basing it.
> Would you be interested in me posting my progress?
> 
> I was also interested in decoding the "data source" of events and displaying 
> that information. Does this
> clash with any of your current work?
> 
> 
> Thanks
> James
> 

Hi, James,

Sorry, I did not respond in time because of the National Day holiday in China.

I am still doing this, but I have been scheduled for other tasks some time ago, 
so that there is no obvious progress on spe.

By the way, you mentioned before that you want the spe event to be in the form 
of "event:pp" like pebs. Is that the whole framework should be made similar to 
pebs? Or is it just a modification to the command format? For the former, this 
may be a bit difficult. For the latter, there is currently no modification to 
the record part, so "-c -F, etc." is only for instructions rather than events, 
so it may be misunderstood by users.

So I haven't figured out how to do. What do you think of this?

Thanks.
Xiaojun.

> On 09/08/2019 07:12, Tan Xiaojun wrote:
>> On 2019/8/9 5:00, Jeremy Linton wrote:
>>> Hi,
>>>
>>> First thanks for posting this!
>>>
>>> I ran this on our DAWN platform and it does what it says. Its a pretty 
>>> reasonable start, but I get -1's in the command row rather than "dd" (or 
>>> similar) and this also results in [unknown] for the shared object and most 
>>> userspace addresses. This is quite possibly something I'm not doing right, 
>>> but I didn't spend a lot of time testing/debugging it.
>>>
>>> I did a quick glance at the code to, and had a couple comments, although 
>>> I'm not a perf tool expert.
>>>
>>
>> Hi,
>>
>> Thank you for your reply.
>>
>> I have only recently started working on this aspect of the perf tool, so 
>> your reply is very important to me.
>>
>> I need to be sorry, my example here is not complete, until you said that I 
>> found that I only posted a part of the example. The complete example is as 
>> follows:
>>
>> Example usage:
>>
>> # perf record -e arm_spe/ts_enable=1,pa_enable=1/ dd if=/dev/zero 
>> of=/dev/null count=1
>> # perf report
>>
>> 
>> ...
>> # Samples: 37  of event 'llc-miss'
>> # Event count (approx.): 37
>> #
>> # Children  Self  Command  Shared Object  Symbol
>> #     ...  .  
>> 
>> #
>> 37.84%37.84%  dd   [kernel.kallsyms]  [k] 
>> perf_iterate_ctx.constprop.64
>> 16.22%16.22%  dd   [kernel.kallsyms]  [k] copy_page
>>  5.41% 5.41%  dd   [kernel.kallsyms]  [k] find_vma
>>  5.41% 5.41%  dd   [kernel.kallsyms]  [k] perf_event_mmap
>>  5.41% 5.41%  dd   [kernel.kallsyms]  [k] zap_pte_range
>>  5.41% 5.41%  dd   ld-2.28.so [.] _dl_lookup_symbol_x
>>  5.41% 5.41%  dd   libc-2.28.so   [.] _nl_intern_locale_data
>>  2.70% 2.70%  dd   [kernel.kallsyms]  [k] 
>> __remove_shared_vm_struct.isra.1
>>  2.70% 2.70%  dd   [kernel.kallsyms]  [k] kmem_cache_free
>>  2.70% 2.70%  dd   [kernel.kallsyms]  [k] ttwu_do_wakeup.isra.19
>>  2.70% 2.70%  dd   dd [.] 0xd9d8
>>  2.70% 2.70%  dd   ld-2.28.so [.] _dl_relocate_object
>>  2.70% 2.70%  dd   libc-2.28.so   [.] __unregister_atfork
>>  2.70% 2.70%  dd   libc-2.28.so   [.] _dl_addr
>>
>>
>> # Samples: 8  of event 'tlb-miss'
>> # Event count (approx.): 8
>> #
>> # Children  Self  Command  Shared Object  Symbol
>> #     ...  .  
>> .
>> #
>> 12.50%12.50%  dd   [kernel.kallsyms]  [k] __audit_syscall_entry
>> 12.50%12.50%  dd   [kernel.kallsyms]  [k] kmem_cache_free
>> 12.50%12.50%  dd   [kernel.kallsyms]  [k] 
>> perf_iterate_ctx.constprop.64
>> 12.50%12.50%  dd   [kernel.kallsyms]  [k] ttwu_do_wakeup.isra.19
>> 12.50%12.50%  dd   dd [.] 0xd9d8
>> 12.50%12.50%  dd   libc-2.28.so   [.] __unregister_atfork
>> 12.50%12.50%  dd   libc-2.28.so   [.] _nl_intern_locale_data
>> 12.50%12.50%  dd   libc-2.28.so   [.] vfprintf
>>
>>
>> # Samples: 12  of event 'branch-miss'
>> # Event count (approx.): 12
>> #
>> # Children  Self  Command  Shared Object  Symbol
>> #     ...  .  ..
>> #
>> 16.67%16.67%  dd   libc-2.28.so   [.] read_alias_file
>>  8.33% 8.33%  dd   [kernel.kallsyms]  [k] __arch_copy_from_user
>>  8.33% 8.33%  dd   [kernel.kallsyms]  [k] __arch_copy_to_user
>>  8.33% 

Re: [RFC PATCH 2/3] perf tools: Add support for "report" for some spe events

2019-10-07 Thread Tan Xiaojun
On 2019/10/4 21:46, James Clark wrote:
> Hi Xiaojun,
> 
> I wanted to ask if you are still working on this?
> 
> I've noticed that it doesn't apply cleanly to perf/core anymore and I was 
> working on re-basing it.
> Would you be interested in me posting my progress?
> 
> I was also interested in decoding the "data source" of events and displaying 
> that information. Does this
> clash with any of your current work?
> 
> 
> Thanks
> James
> 

Hi, James,

Sorry, I did not respond in time because of the National Day holiday in China.

I am still doing this, but I have been scheduled for other tasks some time ago, 
so that there is no obvious progress on spe.

By the way, you mentioned before that you want the spe event to be in the form 
of "event:pp" like pebs. Is that the whole framework should be made similar to 
pebs? Or is it just a modification to the command format? For the former, this 
may be a bit difficult. For the latter, there is currently no modification to 
the record part, so "-c -F, etc." is only for instructions rather than events, 
so it may be misunderstood by users.

So I haven't figured out how to do. What do you think of this?

Thanks.
Xiaojun.

> On 09/08/2019 07:12, Tan Xiaojun wrote:
>> On 2019/8/9 5:00, Jeremy Linton wrote:
>>> Hi,
>>>
>>> First thanks for posting this!
>>>
>>> I ran this on our DAWN platform and it does what it says. Its a pretty 
>>> reasonable start, but I get -1's in the command row rather than "dd" (or 
>>> similar) and this also results in [unknown] for the shared object and most 
>>> userspace addresses. This is quite possibly something I'm not doing right, 
>>> but I didn't spend a lot of time testing/debugging it.
>>>
>>> I did a quick glance at the code to, and had a couple comments, although 
>>> I'm not a perf tool expert.
>>>
>>
>> Hi,
>>
>> Thank you for your reply.
>>
>> I have only recently started working on this aspect of the perf tool, so 
>> your reply is very important to me.
>>
>> I need to be sorry, my example here is not complete, until you said that I 
>> found that I only posted a part of the example. The complete example is as 
>> follows:
>>
>> Example usage:
>>
>> # perf record -e arm_spe/ts_enable=1,pa_enable=1/ dd if=/dev/zero 
>> of=/dev/null count=1
>> # perf report
>>
>> 
>> ...
>> # Samples: 37  of event 'llc-miss'
>> # Event count (approx.): 37
>> #
>> # Children  Self  Command  Shared Object  Symbol
>> #     ...  .  
>> 
>> #
>> 37.84%37.84%  dd   [kernel.kallsyms]  [k] 
>> perf_iterate_ctx.constprop.64
>> 16.22%16.22%  dd   [kernel.kallsyms]  [k] copy_page
>>  5.41% 5.41%  dd   [kernel.kallsyms]  [k] find_vma
>>  5.41% 5.41%  dd   [kernel.kallsyms]  [k] perf_event_mmap
>>  5.41% 5.41%  dd   [kernel.kallsyms]  [k] zap_pte_range
>>  5.41% 5.41%  dd   ld-2.28.so [.] _dl_lookup_symbol_x
>>  5.41% 5.41%  dd   libc-2.28.so   [.] _nl_intern_locale_data
>>  2.70% 2.70%  dd   [kernel.kallsyms]  [k] 
>> __remove_shared_vm_struct.isra.1
>>  2.70% 2.70%  dd   [kernel.kallsyms]  [k] kmem_cache_free
>>  2.70% 2.70%  dd   [kernel.kallsyms]  [k] ttwu_do_wakeup.isra.19
>>  2.70% 2.70%  dd   dd [.] 0xd9d8
>>  2.70% 2.70%  dd   ld-2.28.so [.] _dl_relocate_object
>>  2.70% 2.70%  dd   libc-2.28.so   [.] __unregister_atfork
>>  2.70% 2.70%  dd   libc-2.28.so   [.] _dl_addr
>>
>>
>> # Samples: 8  of event 'tlb-miss'
>> # Event count (approx.): 8
>> #
>> # Children  Self  Command  Shared Object  Symbol
>> #     ...  .  
>> .
>> #
>> 12.50%12.50%  dd   [kernel.kallsyms]  [k] __audit_syscall_entry
>> 12.50%12.50%  dd   [kernel.kallsyms]  [k] kmem_cache_free
>> 12.50%12.50%  dd   [kernel.kallsyms]  [k] 
>> perf_iterate_ctx.constprop.64
>> 12.50%12.50%  dd   [kernel.kallsyms]  [k] ttwu_do_wakeup.isra.19
>> 12.50%12.50%  dd   dd [.] 0xd9d8
>> 12.50%12.50%  dd   libc-2.28.so   [.] __unregister_atfork
>> 12.50%12.50%  dd   libc-2.28.so   [.] _nl_intern_locale_data
>> 12.50%12.50%  dd   libc-2.28.so   [.] vfprintf
>>
>>
>> # Samples: 12  of event 'branch-miss'
>> # Event count (approx.): 12
>> #
>> # Children  Self  Command  Shared Object  Symbol
>> #     ...  .  ..
>> #
>> 16.67%16.67%  dd   libc-2.28.so   [.] read_alias_file
>>  8.33% 8.33%  dd   [kernel.kallsyms]  [k] __arch_copy_from_user
>>  8.33% 8.33%  dd   [kernel.kallsyms]  [k] __arch_copy_to_user
>>  8.33% 

Re: [RFC PATCH 2/3] perf tools: Add support for "report" for some spe events

2019-10-04 Thread James Clark
Hi Xiaojun,

I wanted to ask if you are still working on this?

I've noticed that it doesn't apply cleanly to perf/core anymore and I was 
working on re-basing it.
Would you be interested in me posting my progress?

I was also interested in decoding the "data source" of events and displaying 
that information. Does this
clash with any of your current work?


Thanks
James

On 09/08/2019 07:12, Tan Xiaojun wrote:
> On 2019/8/9 5:00, Jeremy Linton wrote:
>> Hi,
>>
>> First thanks for posting this!
>>
>> I ran this on our DAWN platform and it does what it says. Its a pretty 
>> reasonable start, but I get -1's in the command row rather than "dd" (or 
>> similar) and this also results in [unknown] for the shared object and most 
>> userspace addresses. This is quite possibly something I'm not doing right, 
>> but I didn't spend a lot of time testing/debugging it.
>>
>> I did a quick glance at the code to, and had a couple comments, although I'm 
>> not a perf tool expert.
>>
>
> Hi,
>
> Thank you for your reply.
>
> I have only recently started working on this aspect of the perf tool, so your 
> reply is very important to me.
>
> I need to be sorry, my example here is not complete, until you said that I 
> found that I only posted a part of the example. The complete example is as 
> follows:
>
> Example usage:
>
> # perf record -e arm_spe/ts_enable=1,pa_enable=1/ dd if=/dev/zero 
> of=/dev/null count=1
> # perf report
>
> 
> ...
> # Samples: 37  of event 'llc-miss'
> # Event count (approx.): 37
> #
> # Children  Self  Command  Shared Object  Symbol
> #     ...  .  
> 
> #
> 37.84%37.84%  dd   [kernel.kallsyms]  [k] 
> perf_iterate_ctx.constprop.64
> 16.22%16.22%  dd   [kernel.kallsyms]  [k] copy_page
>  5.41% 5.41%  dd   [kernel.kallsyms]  [k] find_vma
>  5.41% 5.41%  dd   [kernel.kallsyms]  [k] perf_event_mmap
>  5.41% 5.41%  dd   [kernel.kallsyms]  [k] zap_pte_range
>  5.41% 5.41%  dd   ld-2.28.so [.] _dl_lookup_symbol_x
>  5.41% 5.41%  dd   libc-2.28.so   [.] _nl_intern_locale_data
>  2.70% 2.70%  dd   [kernel.kallsyms]  [k] 
> __remove_shared_vm_struct.isra.1
>  2.70% 2.70%  dd   [kernel.kallsyms]  [k] kmem_cache_free
>  2.70% 2.70%  dd   [kernel.kallsyms]  [k] ttwu_do_wakeup.isra.19
>  2.70% 2.70%  dd   dd [.] 0xd9d8
>  2.70% 2.70%  dd   ld-2.28.so [.] _dl_relocate_object
>  2.70% 2.70%  dd   libc-2.28.so   [.] __unregister_atfork
>  2.70% 2.70%  dd   libc-2.28.so   [.] _dl_addr
>
>
> # Samples: 8  of event 'tlb-miss'
> # Event count (approx.): 8
> #
> # Children  Self  Command  Shared Object  Symbol
> #     ...  .  
> .
> #
> 12.50%12.50%  dd   [kernel.kallsyms]  [k] __audit_syscall_entry
> 12.50%12.50%  dd   [kernel.kallsyms]  [k] kmem_cache_free
> 12.50%12.50%  dd   [kernel.kallsyms]  [k] 
> perf_iterate_ctx.constprop.64
> 12.50%12.50%  dd   [kernel.kallsyms]  [k] ttwu_do_wakeup.isra.19
> 12.50%12.50%  dd   dd [.] 0xd9d8
> 12.50%12.50%  dd   libc-2.28.so   [.] __unregister_atfork
> 12.50%12.50%  dd   libc-2.28.so   [.] _nl_intern_locale_data
> 12.50%12.50%  dd   libc-2.28.so   [.] vfprintf
>
>
> # Samples: 12  of event 'branch-miss'
> # Event count (approx.): 12
> #
> # Children  Self  Command  Shared Object  Symbol
> #     ...  .  ..
> #
> 16.67%16.67%  dd   libc-2.28.so   [.] read_alias_file
>  8.33% 8.33%  dd   [kernel.kallsyms]  [k] __arch_copy_from_user
>  8.33% 8.33%  dd   [kernel.kallsyms]  [k] __arch_copy_to_user
>  8.33% 8.33%  dd   [kernel.kallsyms]  [k] lookup_fast
>  8.33% 8.33%  dd   [kernel.kallsyms]  [k] strncpy_from_user
>  8.33% 8.33%  dd   ld-2.28.so [.] _dl_lookup_symbol_x
>  8.33% 8.33%  dd   ld-2.28.so [.] check_match
>  8.33% 8.33%  dd   libc-2.28.so   [.] __GI___printf_fp_l
>  8.33% 8.33%  dd   libc-2.28.so   [.] _dl_addr
>  8.33% 8.33%  dd   libc-2.28.so   [.] _int_malloc
>  8.33% 8.33%  dd   libc-2.28.so   [.] _nl_intern_locale_data
>
>
>
>>
>> On 8/2/19 4:40 AM, Tan Xiaojun wrote:
>>> After the commit ffd3d18c20b8 ("perf tools: Add ARM Statistical
>>> Profiling Extensions (SPE) support") is merged, "perf record" and
>>> "perf report --dump-raw-trace" have been supported. However, the
>>> raw data that is dumped cannot be used without parsing.
>>>
>>> This patch is to 

Re: [RFC PATCH 2/3] perf tools: Add support for "report" for some spe events

2019-08-09 Thread Tan Xiaojun
On 2019/8/9 5:00, Jeremy Linton wrote:
> Hi,
> 
> First thanks for posting this!
> 
> I ran this on our DAWN platform and it does what it says. Its a pretty 
> reasonable start, but I get -1's in the command row rather than "dd" (or 
> similar) and this also results in [unknown] for the shared object and most 
> userspace addresses. This is quite possibly something I'm not doing right, 
> but I didn't spend a lot of time testing/debugging it.
> 
> I did a quick glance at the code to, and had a couple comments, although I'm 
> not a perf tool expert.
> 

Hi,

Thank you for your reply.

I have only recently started working on this aspect of the perf tool, so your 
reply is very important to me.

I need to be sorry, my example here is not complete, until you said that I 
found that I only posted a part of the example. The complete example is as 
follows:

Example usage:

# perf record -e arm_spe/ts_enable=1,pa_enable=1/ dd if=/dev/zero of=/dev/null 
count=1
# perf report


...
# Samples: 37  of event 'llc-miss'
# Event count (approx.): 37
#
# Children  Self  Command  Shared Object  Symbol
#     ...  .  

#
37.84%37.84%  dd   [kernel.kallsyms]  [k] 
perf_iterate_ctx.constprop.64
16.22%16.22%  dd   [kernel.kallsyms]  [k] copy_page
 5.41% 5.41%  dd   [kernel.kallsyms]  [k] find_vma
 5.41% 5.41%  dd   [kernel.kallsyms]  [k] perf_event_mmap
 5.41% 5.41%  dd   [kernel.kallsyms]  [k] zap_pte_range
 5.41% 5.41%  dd   ld-2.28.so [.] _dl_lookup_symbol_x
 5.41% 5.41%  dd   libc-2.28.so   [.] _nl_intern_locale_data
 2.70% 2.70%  dd   [kernel.kallsyms]  [k] 
__remove_shared_vm_struct.isra.1
 2.70% 2.70%  dd   [kernel.kallsyms]  [k] kmem_cache_free
 2.70% 2.70%  dd   [kernel.kallsyms]  [k] ttwu_do_wakeup.isra.19
 2.70% 2.70%  dd   dd [.] 0xd9d8
 2.70% 2.70%  dd   ld-2.28.so [.] _dl_relocate_object
 2.70% 2.70%  dd   libc-2.28.so   [.] __unregister_atfork
 2.70% 2.70%  dd   libc-2.28.so   [.] _dl_addr


# Samples: 8  of event 'tlb-miss'
# Event count (approx.): 8
#
# Children  Self  Command  Shared Object  Symbol
#     ...  .  
.
#
12.50%12.50%  dd   [kernel.kallsyms]  [k] __audit_syscall_entry
12.50%12.50%  dd   [kernel.kallsyms]  [k] kmem_cache_free
12.50%12.50%  dd   [kernel.kallsyms]  [k] 
perf_iterate_ctx.constprop.64
12.50%12.50%  dd   [kernel.kallsyms]  [k] ttwu_do_wakeup.isra.19
12.50%12.50%  dd   dd [.] 0xd9d8
12.50%12.50%  dd   libc-2.28.so   [.] __unregister_atfork
12.50%12.50%  dd   libc-2.28.so   [.] _nl_intern_locale_data
12.50%12.50%  dd   libc-2.28.so   [.] vfprintf


# Samples: 12  of event 'branch-miss'
# Event count (approx.): 12
#
# Children  Self  Command  Shared Object  Symbol
#     ...  .  ..
#
16.67%16.67%  dd   libc-2.28.so   [.] read_alias_file
 8.33% 8.33%  dd   [kernel.kallsyms]  [k] __arch_copy_from_user
 8.33% 8.33%  dd   [kernel.kallsyms]  [k] __arch_copy_to_user
 8.33% 8.33%  dd   [kernel.kallsyms]  [k] lookup_fast
 8.33% 8.33%  dd   [kernel.kallsyms]  [k] strncpy_from_user
 8.33% 8.33%  dd   ld-2.28.so [.] _dl_lookup_symbol_x
 8.33% 8.33%  dd   ld-2.28.so [.] check_match
 8.33% 8.33%  dd   libc-2.28.so   [.] __GI___printf_fp_l
 8.33% 8.33%  dd   libc-2.28.so   [.] _dl_addr
 8.33% 8.33%  dd   libc-2.28.so   [.] _int_malloc
 8.33% 8.33%  dd   libc-2.28.so   [.] _nl_intern_locale_data



> 
> On 8/2/19 4:40 AM, Tan Xiaojun wrote:
>> After the commit ffd3d18c20b8 ("perf tools: Add ARM Statistical
>> Profiling Extensions (SPE) support") is merged, "perf record" and
>> "perf report --dump-raw-trace" have been supported. However, the
>> raw data that is dumped cannot be used without parsing.
>>
>> This patch is to improve the "perf report" support for spe, and
>> further process the data. Currently, support for the three events
>> of llc-miss, tlb-miss, and branch-miss is added.
>>
>> Example usage:
>>
>> 
>> ...
>>  37.84%    37.84%  dd   [kernel.kallsyms]  [k] 
>> perf_iterate_ctx.constprop.64
>>  16.22%    16.22%  dd   [kernel.kallsyms]  [k] copy_page
>>   5.41% 5.41%  dd   [kernel.kallsyms]  [k] find_vma
>>   5.41% 5.41%  dd   [kernel.kallsyms]  [k] perf_event_mmap
>>   5.41% 

Re: [RFC PATCH 2/3] perf tools: Add support for "report" for some spe events

2019-08-08 Thread Jeremy Linton

Hi,

First thanks for posting this!

I ran this on our DAWN platform and it does what it says. Its a pretty 
reasonable start, but I get -1's in the command row rather than "dd" (or 
similar) and this also results in [unknown] for the shared object and 
most userspace addresses. This is quite possibly something I'm not doing 
right, but I didn't spend a lot of time testing/debugging it.


I did a quick glance at the code to, and had a couple comments, although 
I'm not a perf tool expert.



On 8/2/19 4:40 AM, Tan Xiaojun wrote:

After the commit ffd3d18c20b8 ("perf tools: Add ARM Statistical
Profiling Extensions (SPE) support") is merged, "perf record" and
"perf report --dump-raw-trace" have been supported. However, the
raw data that is dumped cannot be used without parsing.

This patch is to improve the "perf report" support for spe, and
further process the data. Currently, support for the three events
of llc-miss, tlb-miss, and branch-miss is added.

Example usage:


...
 37.84%37.84%  dd   [kernel.kallsyms]  [k] 
perf_iterate_ctx.constprop.64
 16.22%16.22%  dd   [kernel.kallsyms]  [k] copy_page
  5.41% 5.41%  dd   [kernel.kallsyms]  [k] find_vma
  5.41% 5.41%  dd   [kernel.kallsyms]  [k] perf_event_mmap
  5.41% 5.41%  dd   [kernel.kallsyms]  [k] zap_pte_range
  5.41% 5.41%  dd   ld-2.28.so [.] _dl_lookup_symbol_x
  5.41% 5.41%  dd   libc-2.28.so   [.] _nl_intern_locale_data
  2.70% 2.70%  dd   [kernel.kallsyms]  [k] 
__remove_shared_vm_struct.isra.1
  2.70% 2.70%  dd   [kernel.kallsyms]  [k] kmem_cache_free
  2.70% 2.70%  dd   [kernel.kallsyms]  [k] ttwu_do_wakeup.isra.19
  2.70% 2.70%  dd   dd [.] 0xd9d8
  2.70% 2.70%  dd   ld-2.28.so [.] _dl_relocate_object
  2.70% 2.70%  dd   libc-2.28.so   [.] __unregister_atfork
  2.70% 2.70%  dd   libc-2.28.so   [.] _dl_addr

 12.50%12.50%  dd   [kernel.kallsyms]  [k] __audit_syscall_entry
 12.50%12.50%  dd   [kernel.kallsyms]  [k] kmem_cache_free
 12.50%12.50%  dd   [kernel.kallsyms]  [k] 
perf_iterate_ctx.constprop.64
 12.50%12.50%  dd   [kernel.kallsyms]  [k] ttwu_do_wakeup.isra.19
 12.50%12.50%  dd   dd [.] 0xd9d8
 12.50%12.50%  dd   libc-2.28.so   [.] __unregister_atfork
 12.50%12.50%  dd   libc-2.28.so   [.] _nl_intern_locale_data
 12.50%12.50%  dd   libc-2.28.so   [.] vfprintf

 16.67%16.67%  dd   libc-2.28.so   [.] read_alias_file
  8.33% 8.33%  dd   [kernel.kallsyms]  [k] __arch_copy_from_user
  8.33% 8.33%  dd   [kernel.kallsyms]  [k] __arch_copy_to_user
  8.33% 8.33%  dd   [kernel.kallsyms]  [k] lookup_fast
  8.33% 8.33%  dd   [kernel.kallsyms]  [k] strncpy_from_user
  8.33% 8.33%  dd   ld-2.28.so [.] _dl_lookup_symbol_x
  8.33% 8.33%  dd   ld-2.28.so [.] check_match
  8.33% 8.33%  dd   libc-2.28.so   [.] __GI___printf_fp_l
  8.33% 8.33%  dd   libc-2.28.so   [.] _dl_addr
  8.33% 8.33%  dd   libc-2.28.so   [.] _int_malloc
  8.33% 8.33%  dd   libc-2.28.so   [.] _nl_intern_locale_data



After that, more analysis and processing of the raw data of spe
will be done.

Signed-off-by: Tan Xiaojun 
---
  tools/perf/builtin-report.c|   5 +
  tools/perf/util/arm-spe-decoder/Build  |   2 +-
  tools/perf/util/arm-spe-decoder/arm-spe-decoder.c  | 214 ++
  tools/perf/util/arm-spe-decoder/arm-spe-decoder.h  |  51 ++
  .../util/arm-spe-decoder/arm-spe-pkt-decoder.h |   2 +
  tools/perf/util/arm-spe.c  | 715 -
  tools/perf/util/auxtrace.c |  45 ++
  tools/perf/util/auxtrace.h |  27 +
  tools/perf/util/session.h  |   2 +
  9 files changed, 1028 insertions(+), 35 deletions(-)
  create mode 100644 tools/perf/util/arm-spe-decoder/arm-spe-decoder.c
  create mode 100644 tools/perf/util/arm-spe-decoder/arm-spe-decoder.h

diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index abf0b9b..fadc8eb 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -1007,6 +1007,7 @@ int cmd_report(int argc, const char **argv)
  {
struct perf_session *session;
struct itrace_synth_opts itrace_synth_opts = { .set = 0, };
+   struct arm_spe_synth_opts arm_spe_synth_opts;
struct stat st;
bool has_br_stack = false;
int branch_mode = -1;
@@ -1165,6 +1166,9 @@ int cmd_report(int argc, const char **argv)
   

[RFC PATCH 2/3] perf tools: Add support for "report" for some spe events

2019-08-02 Thread Tan Xiaojun
After the commit ffd3d18c20b8 ("perf tools: Add ARM Statistical
Profiling Extensions (SPE) support") is merged, "perf record" and
"perf report --dump-raw-trace" have been supported. However, the
raw data that is dumped cannot be used without parsing.

This patch is to improve the "perf report" support for spe, and
further process the data. Currently, support for the three events
of llc-miss, tlb-miss, and branch-miss is added.

Example usage:


...
37.84%37.84%  dd   [kernel.kallsyms]  [k] 
perf_iterate_ctx.constprop.64
16.22%16.22%  dd   [kernel.kallsyms]  [k] copy_page
 5.41% 5.41%  dd   [kernel.kallsyms]  [k] find_vma
 5.41% 5.41%  dd   [kernel.kallsyms]  [k] perf_event_mmap
 5.41% 5.41%  dd   [kernel.kallsyms]  [k] zap_pte_range
 5.41% 5.41%  dd   ld-2.28.so [.] _dl_lookup_symbol_x
 5.41% 5.41%  dd   libc-2.28.so   [.] _nl_intern_locale_data
 2.70% 2.70%  dd   [kernel.kallsyms]  [k] 
__remove_shared_vm_struct.isra.1
 2.70% 2.70%  dd   [kernel.kallsyms]  [k] kmem_cache_free
 2.70% 2.70%  dd   [kernel.kallsyms]  [k] ttwu_do_wakeup.isra.19
 2.70% 2.70%  dd   dd [.] 0xd9d8
 2.70% 2.70%  dd   ld-2.28.so [.] _dl_relocate_object
 2.70% 2.70%  dd   libc-2.28.so   [.] __unregister_atfork
 2.70% 2.70%  dd   libc-2.28.so   [.] _dl_addr

12.50%12.50%  dd   [kernel.kallsyms]  [k] __audit_syscall_entry
12.50%12.50%  dd   [kernel.kallsyms]  [k] kmem_cache_free
12.50%12.50%  dd   [kernel.kallsyms]  [k] 
perf_iterate_ctx.constprop.64
12.50%12.50%  dd   [kernel.kallsyms]  [k] ttwu_do_wakeup.isra.19
12.50%12.50%  dd   dd [.] 0xd9d8
12.50%12.50%  dd   libc-2.28.so   [.] __unregister_atfork
12.50%12.50%  dd   libc-2.28.so   [.] _nl_intern_locale_data
12.50%12.50%  dd   libc-2.28.so   [.] vfprintf

16.67%16.67%  dd   libc-2.28.so   [.] read_alias_file
 8.33% 8.33%  dd   [kernel.kallsyms]  [k] __arch_copy_from_user
 8.33% 8.33%  dd   [kernel.kallsyms]  [k] __arch_copy_to_user
 8.33% 8.33%  dd   [kernel.kallsyms]  [k] lookup_fast
 8.33% 8.33%  dd   [kernel.kallsyms]  [k] strncpy_from_user
 8.33% 8.33%  dd   ld-2.28.so [.] _dl_lookup_symbol_x
 8.33% 8.33%  dd   ld-2.28.so [.] check_match
 8.33% 8.33%  dd   libc-2.28.so   [.] __GI___printf_fp_l
 8.33% 8.33%  dd   libc-2.28.so   [.] _dl_addr
 8.33% 8.33%  dd   libc-2.28.so   [.] _int_malloc
 8.33% 8.33%  dd   libc-2.28.so   [.] _nl_intern_locale_data



After that, more analysis and processing of the raw data of spe
will be done.

Signed-off-by: Tan Xiaojun 
---
 tools/perf/builtin-report.c|   5 +
 tools/perf/util/arm-spe-decoder/Build  |   2 +-
 tools/perf/util/arm-spe-decoder/arm-spe-decoder.c  | 214 ++
 tools/perf/util/arm-spe-decoder/arm-spe-decoder.h  |  51 ++
 .../util/arm-spe-decoder/arm-spe-pkt-decoder.h |   2 +
 tools/perf/util/arm-spe.c  | 715 -
 tools/perf/util/auxtrace.c |  45 ++
 tools/perf/util/auxtrace.h |  27 +
 tools/perf/util/session.h  |   2 +
 9 files changed, 1028 insertions(+), 35 deletions(-)
 create mode 100644 tools/perf/util/arm-spe-decoder/arm-spe-decoder.c
 create mode 100644 tools/perf/util/arm-spe-decoder/arm-spe-decoder.h

diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index abf0b9b..fadc8eb 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -1007,6 +1007,7 @@ int cmd_report(int argc, const char **argv)
 {
struct perf_session *session;
struct itrace_synth_opts itrace_synth_opts = { .set = 0, };
+   struct arm_spe_synth_opts arm_spe_synth_opts;
struct stat st;
bool has_br_stack = false;
int branch_mode = -1;
@@ -1165,6 +1166,9 @@ int cmd_report(int argc, const char **argv)
OPT_CALLBACK_OPTARG(0, "itrace", _synth_opts, NULL, "opts",
"Instruction Tracing options\n" ITRACE_HELP,
itrace_parse_synth_opts),
+   OPT_CALLBACK_OPTARG(0, "spe", _spe_synth_opts, NULL, "spe opts",
+   "ARM SPE Tracing options",
+   arm_spe_parse_synth_opts),
OPT_BOOLEAN(0, "full-source-path", _full_filename,
"Show full source file name path for source lines"),
OPT_BOOLEAN(0, "show-ref-call-graph", _conf.show_ref_callgraph,