Re: [RFC PATCH 2/3] perf tools: Add support for "report" for some spe events
On 2019/10/17 9:51, Tan Xiaojun wrote: > On 2019/10/16 18:12, James Clark wrote: >> Hi Xiaojun, >> What do you mean when the user specifies "event:pp", if the SPE is available, configure and record the spe data directly via the perf event open syscall? (perf.data itself is the same as using -e arm_spe_0//xxx?) >>> >>> I mean, for the perf record, if the user does not add ":pp" to these >>> events, the original process is taken, and if ":pp" is added, the spe >>> process is taken. >>> >> >> Yes we think this is the best way to do it considering that SPE has been >> implemented as a separate PMU and it will be very difficult to do it in the >> Kernel when the precise_ip attribute is set. >> >> I think doing everything in userspace is easiest. This will at least mean >> that users of Perf don't have to be aware of the details of SPE to get >> precise sample data. >> >> So if the user specifies "event:p" when SPE is available, the SPE PMU is >> automatically configured data is recorded. If the user also specifies -e >> arm_spe_0//xxx and wants to do some manual configuration, then that could >> override the automatic configuration. >> >> >> James >> >> >> > > OK. I got it. > > I found a bug in the test. If I specify cpu_list(use -a or -C) when logging > spe data, some events with "pid:0 tid:0" is logged. This is obviously wrong. > > I want to solve this problem, but I haven't found out what went wrong. > > -- > [root@server121 perf]# perf record -e > arm_spe_0/branch_filter=1,ts_enable=1,pa_enable=1,load_filter=1,jitter=0,store_filter=1,min_latency=0/ > -a Sorry, it should add "--all-user" here, and finally there will still be some "pid:0" events in spe_dump.out. (And if kernel event is included, then "pid:0" is not a problem) This causes the pc address of some spe sampled data to be untranslated because the wrong pid/tid is obtained from here. Thanks. Xiaojun. > [ perf record: Woken up 1 times to write data ] > [ perf record: Captured and wrote 7.925 MB perf.data ] > [root@server121 perf]# perf report -D > spe_dump.out > [root@server121 perf]# vim spe_dump.out > > -- > ... > 0xd0330 [0x30]: event: 12 > . > . ... raw event: size 48 bytes > . : 0c 00 00 00 00 00 30 00 00 00 00 00 00 00 00 00 ..0. > . 0010: 00 00 00 00 00 00 00 00 f8 d9 fe bd f7 08 02 00 > . 0020: 00 00 00 00 00 00 00 00 4c bc 14 00 00 00 00 00 L... > > 0 572810090961400 0xd0330 [0x30]: PERF_RECORD_ITRACE_START pid: 0 tid: 0 > > 0xd0438 [0x30]: event: 12 > . > . ... raw event: size 48 bytes > . : 0c 00 00 00 00 00 30 00 00 00 00 00 00 00 00 00 ..0. > . 0010: 00 00 00 00 00 00 00 00 d8 ef fe bd f7 08 02 00 > . 0020: 01 00 00 00 00 00 00 00 4d bc 14 00 00 00 00 00 M... > > 1 572810090967000 0xd0438 [0x30]: PERF_RECORD_ITRACE_START pid: 0 tid: 0 > ... > -- > > Thanks. > Xiaojun. > > > . >
Re: [RFC PATCH 2/3] perf tools: Add support for "report" for some spe events
On 2019/10/16 18:12, James Clark wrote: > Hi Xiaojun, > >>> >>> What do you mean when the user specifies "event:pp", if the SPE is >>> available, configure and record the spe data directly via the perf event >>> open syscall? >>> (perf.data itself is the same as using -e arm_spe_0//xxx?) >> >> I mean, for the perf record, if the user does not add ":pp" to these events, >> the original process is taken, and if ":pp" is added, the spe process is >> taken. >> > > Yes we think this is the best way to do it considering that SPE has been > implemented as a separate PMU and it will be very difficult to do it in the > Kernel when the precise_ip attribute is set. > > I think doing everything in userspace is easiest. This will at least mean > that users of Perf don't have to be aware of the details of SPE to get > precise sample data. > > So if the user specifies "event:p" when SPE is available, the SPE PMU is > automatically configured data is recorded. If the user also specifies -e > arm_spe_0//xxx and wants to do some manual configuration, then that could > override the automatic configuration. > > > James > > > OK. I got it. I found a bug in the test. If I specify cpu_list(use -a or -C) when logging spe data, some events with "pid:0 tid:0" is logged. This is obviously wrong. I want to solve this problem, but I haven't found out what went wrong. -- [root@server121 perf]# perf record -e arm_spe_0/branch_filter=1,ts_enable=1,pa_enable=1,load_filter=1,jitter=0,store_filter=1,min_latency=0/ -a [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 7.925 MB perf.data ] [root@server121 perf]# perf report -D > spe_dump.out [root@server121 perf]# vim spe_dump.out -- ... 0xd0330 [0x30]: event: 12 . . ... raw event: size 48 bytes . : 0c 00 00 00 00 00 30 00 00 00 00 00 00 00 00 00 ..0. . 0010: 00 00 00 00 00 00 00 00 f8 d9 fe bd f7 08 02 00 . 0020: 00 00 00 00 00 00 00 00 4c bc 14 00 00 00 00 00 L... 0 572810090961400 0xd0330 [0x30]: PERF_RECORD_ITRACE_START pid: 0 tid: 0 0xd0438 [0x30]: event: 12 . . ... raw event: size 48 bytes . : 0c 00 00 00 00 00 30 00 00 00 00 00 00 00 00 00 ..0. . 0010: 00 00 00 00 00 00 00 00 d8 ef fe bd f7 08 02 00 . 0020: 01 00 00 00 00 00 00 00 4d bc 14 00 00 00 00 00 M... 1 572810090967000 0xd0438 [0x30]: PERF_RECORD_ITRACE_START pid: 0 tid: 0 ... -- Thanks. Xiaojun.
Re: [RFC PATCH 2/3] perf tools: Add support for "report" for some spe events
Hi Xiaojun, >> >> What do you mean when the user specifies "event:pp", if the SPE is >> available, configure and record the spe data directly via the perf event >> open syscall? >> (perf.data itself is the same as using -e arm_spe_0//xxx?) > > I mean, for the perf record, if the user does not add ":pp" to these events, > the original process is taken, and if ":pp" is added, the spe process is > taken. > Yes we think this is the best way to do it considering that SPE has been implemented as a separate PMU and it will be very difficult to do it in the Kernel when the precise_ip attribute is set. I think doing everything in userspace is easiest. This will at least mean that users of Perf don't have to be aware of the details of SPE to get precise sample data. So if the user specifies "event:p" when SPE is available, the SPE PMU is automatically configured data is recorded. If the user also specifies -e arm_spe_0//xxx and wants to do some manual configuration, then that could override the automatic configuration. James
Re: [RFC PATCH 2/3] perf tools: Add support for "report" for some spe events
On 2019/10/9 19:09, Tan Xiaojun wrote: > On 2019/10/9 17:48, James Clark wrote: >> Hi Xiaojun, >> >>> By the way, you mentioned before that you want the spe event to be in the >>> form of "event:pp" like pebs. Is that the whole framework should be made >>> similar to pebs? Or is it just a modification to the command format? >> >> We're currently still investigating if it makes sense to modify the Perf >> event open syscall to use SPE when the "precise_ip" attribute is set. And >> then synthesize samples using the SPE data when available. This would keep >> the syscall interface more consistent between architectures. >> >> And if tools other than Perf want more precise data, they don't have to be >> aware of SPE or any of the implementation defined details of it. For example >> the 'data source' encoding can be different from one micro architecture to >> the next. The kernel is probably the best place to handle this. >> >> At the moment, every tool that wants to use the Perf syscall to get precise >> data on ARM would have to be aware of SPE and implement their own decoding. >> > > Hi James, > > What do you mean when the user specifies "event:pp", if the SPE is available, > configure and record the spe data directly via the perf event open syscall? > (perf.data itself is the same as using -e arm_spe_0//xxx?) I mean, for the perf record, if the user does not add ":pp" to these events, the original process is taken, and if ":pp" is added, the spe process is taken. Xiaojun. > > OK. If I have not misunderstood, I think I know how to do it. > Thank you. > >>> For the former, this may be a bit difficult. For the latter, there is >>> currently no modification to the record part, so "-c -F, etc." is only for >>> instructions rather than events, so it may be misunderstood by users. >>> >>> So I haven't figured out how to do. What do you think of this? >> >> I think the patch at the moment is a good start to make SPE more accessible. >> And the changes I mentioned above wouldn't change the fact that the raw SPE >> data would still be available via the SPE PMU. So I think continuing with >> the patch as-is for now is the best idea. >> > > Yes. I agree. > > Xiaojun. > >> >> James >> >> >
Re: [RFC PATCH 2/3] perf tools: Add support for "report" for some spe events
On 2019/10/9 17:48, James Clark wrote: > Hi Xiaojun, > >> By the way, you mentioned before that you want the spe event to be in the >> form of "event:pp" like pebs. Is that the whole framework should be made >> similar to pebs? Or is it just a modification to the command format? > > We're currently still investigating if it makes sense to modify the Perf > event open syscall to use SPE when the "precise_ip" attribute is set. And > then synthesize samples using the SPE data when available. This would keep > the syscall interface more consistent between architectures. > > And if tools other than Perf want more precise data, they don't have to be > aware of SPE or any of the implementation defined details of it. For example > the 'data source' encoding can be different from one micro architecture to > the next. The kernel is probably the best place to handle this. > > At the moment, every tool that wants to use the Perf syscall to get precise > data on ARM would have to be aware of SPE and implement their own decoding. > Hi James, What do you mean when the user specifies "event:pp", if the SPE is available, configure and record the spe data directly via the perf event open syscall? (perf.data itself is the same as using -e arm_spe_0//xxx?) OK. If I have not misunderstood, I think I know how to do it. Thank you. >> For the former, this may be a bit difficult. For the latter, there is >> currently no modification to the record part, so "-c -F, etc." is only for >> instructions rather than events, so it may be misunderstood by users. >> >> So I haven't figured out how to do. What do you think of this? > > I think the patch at the moment is a good start to make SPE more accessible. > And the changes I mentioned above wouldn't change the fact that the raw SPE > data would still be available via the SPE PMU. So I think continuing with the > patch as-is for now is the best idea. > Yes. I agree. Xiaojun. > > James > >
Re: [RFC PATCH 2/3] perf tools: Add support for "report" for some spe events
Hi Xiaojun, > By the way, you mentioned before that you want the spe event to be in the > form of "event:pp" like pebs. Is that the whole framework should be made > similar to pebs? Or is it just a modification to the command format? We're currently still investigating if it makes sense to modify the Perf event open syscall to use SPE when the "precise_ip" attribute is set. And then synthesize samples using the SPE data when available. This would keep the syscall interface more consistent between architectures. And if tools other than Perf want more precise data, they don't have to be aware of SPE or any of the implementation defined details of it. For example the 'data source' encoding can be different from one micro architecture to the next. The kernel is probably the best place to handle this. At the moment, every tool that wants to use the Perf syscall to get precise data on ARM would have to be aware of SPE and implement their own decoding. > For the former, this may be a bit difficult. For the latter, there is > currently no modification to the record part, so "-c -F, etc." is only for > instructions rather than events, so it may be misunderstood by users. > > So I haven't figured out how to do. What do you think of this? I think the patch at the moment is a good start to make SPE more accessible. And the changes I mentioned above wouldn't change the fact that the raw SPE data would still be available via the SPE PMU. So I think continuing with the patch as-is for now is the best idea. James
Re: [RFC PATCH 2/3] perf tools: Add support for "report" for some spe events
On 2019/10/4 21:46, James Clark wrote: > Hi Xiaojun, > > I wanted to ask if you are still working on this? > > I've noticed that it doesn't apply cleanly to perf/core anymore and I was > working on re-basing it. > Would you be interested in me posting my progress? > > I was also interested in decoding the "data source" of events and displaying > that information. Does this > clash with any of your current work? > > > Thanks > James > (Sorry, you may have received a lot of this email because I am suddenly not on the mail-list, I need to confirm it.) Hi, James, Sorry, I did not respond in time because of the National Day holiday in China. I am still doing this, but I have been scheduled for other tasks some time ago, so that there is no obvious progress on spe. By the way, you mentioned before that you want the spe event to be in the form of "event:pp" like pebs. Is that the whole framework should be made similar to pebs? Or is it just a modification to the command format? For the former, this may be a bit difficult. For the latter, there is currently no modification to the record part, so "-c -F, etc." is only for instructions rather than events, so it may be misunderstood by users. So I haven't figured out how to do. What do you think of this? Thanks. Xiaojun. > On 09/08/2019 07:12, Tan Xiaojun wrote: >> On 2019/8/9 5:00, Jeremy Linton wrote: >>> Hi, >>> >>> First thanks for posting this! >>> >>> I ran this on our DAWN platform and it does what it says. Its a pretty >>> reasonable start, but I get -1's in the command row rather than "dd" (or >>> similar) and this also results in [unknown] for the shared object and most >>> userspace addresses. This is quite possibly something I'm not doing right, >>> but I didn't spend a lot of time testing/debugging it. >>> >>> I did a quick glance at the code to, and had a couple comments, although >>> I'm not a perf tool expert. >>> >> >> Hi, >> >> Thank you for your reply. >> >> I have only recently started working on this aspect of the perf tool, so >> your reply is very important to me. >> >> I need to be sorry, my example here is not complete, until you said that I >> found that I only posted a part of the example. The complete example is as >> follows: >> >> Example usage: >> >> # perf record -e arm_spe/ts_enable=1,pa_enable=1/ dd if=/dev/zero >> of=/dev/null count=1 >> # perf report >> >> >> ... >> # Samples: 37 of event 'llc-miss' >> # Event count (approx.): 37 >> # >> # Children Self Command Shared Object Symbol >> # ... . >> >> # >> 37.84%37.84% dd [kernel.kallsyms] [k] >> perf_iterate_ctx.constprop.64 >> 16.22%16.22% dd [kernel.kallsyms] [k] copy_page >> 5.41% 5.41% dd [kernel.kallsyms] [k] find_vma >> 5.41% 5.41% dd [kernel.kallsyms] [k] perf_event_mmap >> 5.41% 5.41% dd [kernel.kallsyms] [k] zap_pte_range >> 5.41% 5.41% dd ld-2.28.so [.] _dl_lookup_symbol_x >> 5.41% 5.41% dd libc-2.28.so [.] _nl_intern_locale_data >> 2.70% 2.70% dd [kernel.kallsyms] [k] >> __remove_shared_vm_struct.isra.1 >> 2.70% 2.70% dd [kernel.kallsyms] [k] kmem_cache_free >> 2.70% 2.70% dd [kernel.kallsyms] [k] ttwu_do_wakeup.isra.19 >> 2.70% 2.70% dd dd [.] 0xd9d8 >> 2.70% 2.70% dd ld-2.28.so [.] _dl_relocate_object >> 2.70% 2.70% dd libc-2.28.so [.] __unregister_atfork >> 2.70% 2.70% dd libc-2.28.so [.] _dl_addr >> >> >> # Samples: 8 of event 'tlb-miss' >> # Event count (approx.): 8 >> # >> # Children Self Command Shared Object Symbol >> # ... . >> . >> # >> 12.50%12.50% dd [kernel.kallsyms] [k] __audit_syscall_entry >> 12.50%12.50% dd [kernel.kallsyms] [k] kmem_cache_free >> 12.50%12.50% dd [kernel.kallsyms] [k] >> perf_iterate_ctx.constprop.64 >> 12.50%12.50% dd [kernel.kallsyms] [k] ttwu_do_wakeup.isra.19 >> 12.50%12.50% dd dd [.] 0xd9d8 >> 12.50%12.50% dd libc-2.28.so [.] __unregister_atfork >> 12.50%12.50% dd libc-2.28.so [.] _nl_intern_locale_data >> 12.50%12.50% dd libc-2.28.so [.] vfprintf >> >> >> # Samples: 12 of event 'branch-miss' >> # Event count (approx.): 12 >> # >> # Children Self Command Shared Object Symbol >> # ... . .. >> # >> 16.67%16.67% dd libc-2.28.so [.] read_alias_file >> 8.33% 8.33% dd [kernel.kallsyms] [k]
Re: [RFC PATCH 2/3] perf tools: Add support for "report" for some spe events
On 2019/10/4 21:46, James Clark wrote: > Hi Xiaojun, > > I wanted to ask if you are still working on this? > > I've noticed that it doesn't apply cleanly to perf/core anymore and I was > working on re-basing it. > Would you be interested in me posting my progress? > > I was also interested in decoding the "data source" of events and displaying > that information. Does this > clash with any of your current work? > > > Thanks > James > Hi, James, Sorry, I did not respond in time because of the National Day holiday in China. I am still doing this, but I have been scheduled for other tasks some time ago, so that there is no obvious progress on spe. By the way, you mentioned before that you want the spe event to be in the form of "event:pp" like pebs. Is that the whole framework should be made similar to pebs? Or is it just a modification to the command format? For the former, this may be a bit difficult. For the latter, there is currently no modification to the record part, so "-c -F, etc." is only for instructions rather than events, so it may be misunderstood by users. So I haven't figured out how to do. What do you think of this? Thanks. Xiaojun. > On 09/08/2019 07:12, Tan Xiaojun wrote: >> On 2019/8/9 5:00, Jeremy Linton wrote: >>> Hi, >>> >>> First thanks for posting this! >>> >>> I ran this on our DAWN platform and it does what it says. Its a pretty >>> reasonable start, but I get -1's in the command row rather than "dd" (or >>> similar) and this also results in [unknown] for the shared object and most >>> userspace addresses. This is quite possibly something I'm not doing right, >>> but I didn't spend a lot of time testing/debugging it. >>> >>> I did a quick glance at the code to, and had a couple comments, although >>> I'm not a perf tool expert. >>> >> >> Hi, >> >> Thank you for your reply. >> >> I have only recently started working on this aspect of the perf tool, so >> your reply is very important to me. >> >> I need to be sorry, my example here is not complete, until you said that I >> found that I only posted a part of the example. The complete example is as >> follows: >> >> Example usage: >> >> # perf record -e arm_spe/ts_enable=1,pa_enable=1/ dd if=/dev/zero >> of=/dev/null count=1 >> # perf report >> >> >> ... >> # Samples: 37 of event 'llc-miss' >> # Event count (approx.): 37 >> # >> # Children Self Command Shared Object Symbol >> # ... . >> >> # >> 37.84%37.84% dd [kernel.kallsyms] [k] >> perf_iterate_ctx.constprop.64 >> 16.22%16.22% dd [kernel.kallsyms] [k] copy_page >> 5.41% 5.41% dd [kernel.kallsyms] [k] find_vma >> 5.41% 5.41% dd [kernel.kallsyms] [k] perf_event_mmap >> 5.41% 5.41% dd [kernel.kallsyms] [k] zap_pte_range >> 5.41% 5.41% dd ld-2.28.so [.] _dl_lookup_symbol_x >> 5.41% 5.41% dd libc-2.28.so [.] _nl_intern_locale_data >> 2.70% 2.70% dd [kernel.kallsyms] [k] >> __remove_shared_vm_struct.isra.1 >> 2.70% 2.70% dd [kernel.kallsyms] [k] kmem_cache_free >> 2.70% 2.70% dd [kernel.kallsyms] [k] ttwu_do_wakeup.isra.19 >> 2.70% 2.70% dd dd [.] 0xd9d8 >> 2.70% 2.70% dd ld-2.28.so [.] _dl_relocate_object >> 2.70% 2.70% dd libc-2.28.so [.] __unregister_atfork >> 2.70% 2.70% dd libc-2.28.so [.] _dl_addr >> >> >> # Samples: 8 of event 'tlb-miss' >> # Event count (approx.): 8 >> # >> # Children Self Command Shared Object Symbol >> # ... . >> . >> # >> 12.50%12.50% dd [kernel.kallsyms] [k] __audit_syscall_entry >> 12.50%12.50% dd [kernel.kallsyms] [k] kmem_cache_free >> 12.50%12.50% dd [kernel.kallsyms] [k] >> perf_iterate_ctx.constprop.64 >> 12.50%12.50% dd [kernel.kallsyms] [k] ttwu_do_wakeup.isra.19 >> 12.50%12.50% dd dd [.] 0xd9d8 >> 12.50%12.50% dd libc-2.28.so [.] __unregister_atfork >> 12.50%12.50% dd libc-2.28.so [.] _nl_intern_locale_data >> 12.50%12.50% dd libc-2.28.so [.] vfprintf >> >> >> # Samples: 12 of event 'branch-miss' >> # Event count (approx.): 12 >> # >> # Children Self Command Shared Object Symbol >> # ... . .. >> # >> 16.67%16.67% dd libc-2.28.so [.] read_alias_file >> 8.33% 8.33% dd [kernel.kallsyms] [k] __arch_copy_from_user >> 8.33% 8.33% dd [kernel.kallsyms] [k] __arch_copy_to_user >> 8.33% 8
Re: [RFC PATCH 2/3] perf tools: Add support for "report" for some spe events
On 2019/10/4 21:46, James Clark wrote: > Hi Xiaojun, > > I wanted to ask if you are still working on this? > > I've noticed that it doesn't apply cleanly to perf/core anymore and I was > working on re-basing it. > Would you be interested in me posting my progress? > > I was also interested in decoding the "data source" of events and displaying > that information. Does this > clash with any of your current work? > > > Thanks > James > Hi, James, Sorry, I did not respond in time because of the National Day holiday in China. I am still doing this, but I have been scheduled for other tasks some time ago, so that there is no obvious progress on spe. By the way, you mentioned before that you want the spe event to be in the form of "event:pp" like pebs. Is that the whole framework should be made similar to pebs? Or is it just a modification to the command format? For the former, this may be a bit difficult. For the latter, there is currently no modification to the record part, so "-c -F, etc." is only for instructions rather than events, so it may be misunderstood by users. So I haven't figured out how to do. What do you think of this? Thanks. Xiaojun. > On 09/08/2019 07:12, Tan Xiaojun wrote: >> On 2019/8/9 5:00, Jeremy Linton wrote: >>> Hi, >>> >>> First thanks for posting this! >>> >>> I ran this on our DAWN platform and it does what it says. Its a pretty >>> reasonable start, but I get -1's in the command row rather than "dd" (or >>> similar) and this also results in [unknown] for the shared object and most >>> userspace addresses. This is quite possibly something I'm not doing right, >>> but I didn't spend a lot of time testing/debugging it. >>> >>> I did a quick glance at the code to, and had a couple comments, although >>> I'm not a perf tool expert. >>> >> >> Hi, >> >> Thank you for your reply. >> >> I have only recently started working on this aspect of the perf tool, so >> your reply is very important to me. >> >> I need to be sorry, my example here is not complete, until you said that I >> found that I only posted a part of the example. The complete example is as >> follows: >> >> Example usage: >> >> # perf record -e arm_spe/ts_enable=1,pa_enable=1/ dd if=/dev/zero >> of=/dev/null count=1 >> # perf report >> >> >> ... >> # Samples: 37 of event 'llc-miss' >> # Event count (approx.): 37 >> # >> # Children Self Command Shared Object Symbol >> # ... . >> >> # >> 37.84%37.84% dd [kernel.kallsyms] [k] >> perf_iterate_ctx.constprop.64 >> 16.22%16.22% dd [kernel.kallsyms] [k] copy_page >> 5.41% 5.41% dd [kernel.kallsyms] [k] find_vma >> 5.41% 5.41% dd [kernel.kallsyms] [k] perf_event_mmap >> 5.41% 5.41% dd [kernel.kallsyms] [k] zap_pte_range >> 5.41% 5.41% dd ld-2.28.so [.] _dl_lookup_symbol_x >> 5.41% 5.41% dd libc-2.28.so [.] _nl_intern_locale_data >> 2.70% 2.70% dd [kernel.kallsyms] [k] >> __remove_shared_vm_struct.isra.1 >> 2.70% 2.70% dd [kernel.kallsyms] [k] kmem_cache_free >> 2.70% 2.70% dd [kernel.kallsyms] [k] ttwu_do_wakeup.isra.19 >> 2.70% 2.70% dd dd [.] 0xd9d8 >> 2.70% 2.70% dd ld-2.28.so [.] _dl_relocate_object >> 2.70% 2.70% dd libc-2.28.so [.] __unregister_atfork >> 2.70% 2.70% dd libc-2.28.so [.] _dl_addr >> >> >> # Samples: 8 of event 'tlb-miss' >> # Event count (approx.): 8 >> # >> # Children Self Command Shared Object Symbol >> # ... . >> . >> # >> 12.50%12.50% dd [kernel.kallsyms] [k] __audit_syscall_entry >> 12.50%12.50% dd [kernel.kallsyms] [k] kmem_cache_free >> 12.50%12.50% dd [kernel.kallsyms] [k] >> perf_iterate_ctx.constprop.64 >> 12.50%12.50% dd [kernel.kallsyms] [k] ttwu_do_wakeup.isra.19 >> 12.50%12.50% dd dd [.] 0xd9d8 >> 12.50%12.50% dd libc-2.28.so [.] __unregister_atfork >> 12.50%12.50% dd libc-2.28.so [.] _nl_intern_locale_data >> 12.50%12.50% dd libc-2.28.so [.] vfprintf >> >> >> # Samples: 12 of event 'branch-miss' >> # Event count (approx.): 12 >> # >> # Children Self Command Shared Object Symbol >> # ... . .. >> # >> 16.67%16.67% dd libc-2.28.so [.] read_alias_file >> 8.33% 8.33% dd [kernel.kallsyms] [k] __arch_copy_from_user >> 8.33% 8.33% dd [kernel.kallsyms] [k] __arch_copy_to_user >> 8.33% 8
Re: [RFC PATCH 2/3] perf tools: Add support for "report" for some spe events
Hi Xiaojun, I wanted to ask if you are still working on this? I've noticed that it doesn't apply cleanly to perf/core anymore and I was working on re-basing it. Would you be interested in me posting my progress? I was also interested in decoding the "data source" of events and displaying that information. Does this clash with any of your current work? Thanks James On 09/08/2019 07:12, Tan Xiaojun wrote: > On 2019/8/9 5:00, Jeremy Linton wrote: >> Hi, >> >> First thanks for posting this! >> >> I ran this on our DAWN platform and it does what it says. Its a pretty >> reasonable start, but I get -1's in the command row rather than "dd" (or >> similar) and this also results in [unknown] for the shared object and most >> userspace addresses. This is quite possibly something I'm not doing right, >> but I didn't spend a lot of time testing/debugging it. >> >> I did a quick glance at the code to, and had a couple comments, although I'm >> not a perf tool expert. >> > > Hi, > > Thank you for your reply. > > I have only recently started working on this aspect of the perf tool, so your > reply is very important to me. > > I need to be sorry, my example here is not complete, until you said that I > found that I only posted a part of the example. The complete example is as > follows: > > Example usage: > > # perf record -e arm_spe/ts_enable=1,pa_enable=1/ dd if=/dev/zero > of=/dev/null count=1 > # perf report > > > ... > # Samples: 37 of event 'llc-miss' > # Event count (approx.): 37 > # > # Children Self Command Shared Object Symbol > # ... . > > # > 37.84%37.84% dd [kernel.kallsyms] [k] > perf_iterate_ctx.constprop.64 > 16.22%16.22% dd [kernel.kallsyms] [k] copy_page > 5.41% 5.41% dd [kernel.kallsyms] [k] find_vma > 5.41% 5.41% dd [kernel.kallsyms] [k] perf_event_mmap > 5.41% 5.41% dd [kernel.kallsyms] [k] zap_pte_range > 5.41% 5.41% dd ld-2.28.so [.] _dl_lookup_symbol_x > 5.41% 5.41% dd libc-2.28.so [.] _nl_intern_locale_data > 2.70% 2.70% dd [kernel.kallsyms] [k] > __remove_shared_vm_struct.isra.1 > 2.70% 2.70% dd [kernel.kallsyms] [k] kmem_cache_free > 2.70% 2.70% dd [kernel.kallsyms] [k] ttwu_do_wakeup.isra.19 > 2.70% 2.70% dd dd [.] 0xd9d8 > 2.70% 2.70% dd ld-2.28.so [.] _dl_relocate_object > 2.70% 2.70% dd libc-2.28.so [.] __unregister_atfork > 2.70% 2.70% dd libc-2.28.so [.] _dl_addr > > > # Samples: 8 of event 'tlb-miss' > # Event count (approx.): 8 > # > # Children Self Command Shared Object Symbol > # ... . > . > # > 12.50%12.50% dd [kernel.kallsyms] [k] __audit_syscall_entry > 12.50%12.50% dd [kernel.kallsyms] [k] kmem_cache_free > 12.50%12.50% dd [kernel.kallsyms] [k] > perf_iterate_ctx.constprop.64 > 12.50%12.50% dd [kernel.kallsyms] [k] ttwu_do_wakeup.isra.19 > 12.50%12.50% dd dd [.] 0xd9d8 > 12.50%12.50% dd libc-2.28.so [.] __unregister_atfork > 12.50%12.50% dd libc-2.28.so [.] _nl_intern_locale_data > 12.50%12.50% dd libc-2.28.so [.] vfprintf > > > # Samples: 12 of event 'branch-miss' > # Event count (approx.): 12 > # > # Children Self Command Shared Object Symbol > # ... . .. > # > 16.67%16.67% dd libc-2.28.so [.] read_alias_file > 8.33% 8.33% dd [kernel.kallsyms] [k] __arch_copy_from_user > 8.33% 8.33% dd [kernel.kallsyms] [k] __arch_copy_to_user > 8.33% 8.33% dd [kernel.kallsyms] [k] lookup_fast > 8.33% 8.33% dd [kernel.kallsyms] [k] strncpy_from_user > 8.33% 8.33% dd ld-2.28.so [.] _dl_lookup_symbol_x > 8.33% 8.33% dd ld-2.28.so [.] check_match > 8.33% 8.33% dd libc-2.28.so [.] __GI___printf_fp_l > 8.33% 8.33% dd libc-2.28.so [.] _dl_addr > 8.33% 8.33% dd libc-2.28.so [.] _int_malloc > 8.33% 8.33% dd libc-2.28.so [.] _nl_intern_locale_data > > > >> >> On 8/2/19 4:40 AM, Tan Xiaojun wrote: >>> After the commit ffd3d18c20b8 ("perf tools: Add ARM Statistical >>> Profiling Extensions (SPE) support") is merged, "perf record" and >>> "perf report --dump-raw-trace" have been supported. However, the >>> raw data that is dumped cannot be used without parsing. >>> >>> This patch is to im
Re: [RFC PATCH 2/3] perf tools: Add support for "report" for some spe events
On 2019/8/9 5:00, Jeremy Linton wrote: > Hi, > > First thanks for posting this! > > I ran this on our DAWN platform and it does what it says. Its a pretty > reasonable start, but I get -1's in the command row rather than "dd" (or > similar) and this also results in [unknown] for the shared object and most > userspace addresses. This is quite possibly something I'm not doing right, > but I didn't spend a lot of time testing/debugging it. > > I did a quick glance at the code to, and had a couple comments, although I'm > not a perf tool expert. > Hi, Thank you for your reply. I have only recently started working on this aspect of the perf tool, so your reply is very important to me. I need to be sorry, my example here is not complete, until you said that I found that I only posted a part of the example. The complete example is as follows: Example usage: # perf record -e arm_spe/ts_enable=1,pa_enable=1/ dd if=/dev/zero of=/dev/null count=1 # perf report ... # Samples: 37 of event 'llc-miss' # Event count (approx.): 37 # # Children Self Command Shared Object Symbol # ... . # 37.84%37.84% dd [kernel.kallsyms] [k] perf_iterate_ctx.constprop.64 16.22%16.22% dd [kernel.kallsyms] [k] copy_page 5.41% 5.41% dd [kernel.kallsyms] [k] find_vma 5.41% 5.41% dd [kernel.kallsyms] [k] perf_event_mmap 5.41% 5.41% dd [kernel.kallsyms] [k] zap_pte_range 5.41% 5.41% dd ld-2.28.so [.] _dl_lookup_symbol_x 5.41% 5.41% dd libc-2.28.so [.] _nl_intern_locale_data 2.70% 2.70% dd [kernel.kallsyms] [k] __remove_shared_vm_struct.isra.1 2.70% 2.70% dd [kernel.kallsyms] [k] kmem_cache_free 2.70% 2.70% dd [kernel.kallsyms] [k] ttwu_do_wakeup.isra.19 2.70% 2.70% dd dd [.] 0xd9d8 2.70% 2.70% dd ld-2.28.so [.] _dl_relocate_object 2.70% 2.70% dd libc-2.28.so [.] __unregister_atfork 2.70% 2.70% dd libc-2.28.so [.] _dl_addr # Samples: 8 of event 'tlb-miss' # Event count (approx.): 8 # # Children Self Command Shared Object Symbol # ... . . # 12.50%12.50% dd [kernel.kallsyms] [k] __audit_syscall_entry 12.50%12.50% dd [kernel.kallsyms] [k] kmem_cache_free 12.50%12.50% dd [kernel.kallsyms] [k] perf_iterate_ctx.constprop.64 12.50%12.50% dd [kernel.kallsyms] [k] ttwu_do_wakeup.isra.19 12.50%12.50% dd dd [.] 0xd9d8 12.50%12.50% dd libc-2.28.so [.] __unregister_atfork 12.50%12.50% dd libc-2.28.so [.] _nl_intern_locale_data 12.50%12.50% dd libc-2.28.so [.] vfprintf # Samples: 12 of event 'branch-miss' # Event count (approx.): 12 # # Children Self Command Shared Object Symbol # ... . .. # 16.67%16.67% dd libc-2.28.so [.] read_alias_file 8.33% 8.33% dd [kernel.kallsyms] [k] __arch_copy_from_user 8.33% 8.33% dd [kernel.kallsyms] [k] __arch_copy_to_user 8.33% 8.33% dd [kernel.kallsyms] [k] lookup_fast 8.33% 8.33% dd [kernel.kallsyms] [k] strncpy_from_user 8.33% 8.33% dd ld-2.28.so [.] _dl_lookup_symbol_x 8.33% 8.33% dd ld-2.28.so [.] check_match 8.33% 8.33% dd libc-2.28.so [.] __GI___printf_fp_l 8.33% 8.33% dd libc-2.28.so [.] _dl_addr 8.33% 8.33% dd libc-2.28.so [.] _int_malloc 8.33% 8.33% dd libc-2.28.so [.] _nl_intern_locale_data > > On 8/2/19 4:40 AM, Tan Xiaojun wrote: >> After the commit ffd3d18c20b8 ("perf tools: Add ARM Statistical >> Profiling Extensions (SPE) support") is merged, "perf record" and >> "perf report --dump-raw-trace" have been supported. However, the >> raw data that is dumped cannot be used without parsing. >> >> This patch is to improve the "perf report" support for spe, and >> further process the data. Currently, support for the three events >> of llc-miss, tlb-miss, and branch-miss is added. >> >> Example usage: >> >> >> ... >> 37.84% 37.84% dd [kernel.kallsyms] [k] >> perf_iterate_ctx.constprop.64 >> 16.22% 16.22% dd [kernel.kallsyms] [k] copy_page >> 5.41% 5.41% dd [kernel.kallsyms] [k] find_vma >> 5.41% 5.41% dd [kernel.kallsyms] [k] perf_event_mmap >> 5.41%
Re: [RFC PATCH 2/3] perf tools: Add support for "report" for some spe events
Hi, First thanks for posting this! I ran this on our DAWN platform and it does what it says. Its a pretty reasonable start, but I get -1's in the command row rather than "dd" (or similar) and this also results in [unknown] for the shared object and most userspace addresses. This is quite possibly something I'm not doing right, but I didn't spend a lot of time testing/debugging it. I did a quick glance at the code to, and had a couple comments, although I'm not a perf tool expert. On 8/2/19 4:40 AM, Tan Xiaojun wrote: After the commit ffd3d18c20b8 ("perf tools: Add ARM Statistical Profiling Extensions (SPE) support") is merged, "perf record" and "perf report --dump-raw-trace" have been supported. However, the raw data that is dumped cannot be used without parsing. This patch is to improve the "perf report" support for spe, and further process the data. Currently, support for the three events of llc-miss, tlb-miss, and branch-miss is added. Example usage: ... 37.84%37.84% dd [kernel.kallsyms] [k] perf_iterate_ctx.constprop.64 16.22%16.22% dd [kernel.kallsyms] [k] copy_page 5.41% 5.41% dd [kernel.kallsyms] [k] find_vma 5.41% 5.41% dd [kernel.kallsyms] [k] perf_event_mmap 5.41% 5.41% dd [kernel.kallsyms] [k] zap_pte_range 5.41% 5.41% dd ld-2.28.so [.] _dl_lookup_symbol_x 5.41% 5.41% dd libc-2.28.so [.] _nl_intern_locale_data 2.70% 2.70% dd [kernel.kallsyms] [k] __remove_shared_vm_struct.isra.1 2.70% 2.70% dd [kernel.kallsyms] [k] kmem_cache_free 2.70% 2.70% dd [kernel.kallsyms] [k] ttwu_do_wakeup.isra.19 2.70% 2.70% dd dd [.] 0xd9d8 2.70% 2.70% dd ld-2.28.so [.] _dl_relocate_object 2.70% 2.70% dd libc-2.28.so [.] __unregister_atfork 2.70% 2.70% dd libc-2.28.so [.] _dl_addr 12.50%12.50% dd [kernel.kallsyms] [k] __audit_syscall_entry 12.50%12.50% dd [kernel.kallsyms] [k] kmem_cache_free 12.50%12.50% dd [kernel.kallsyms] [k] perf_iterate_ctx.constprop.64 12.50%12.50% dd [kernel.kallsyms] [k] ttwu_do_wakeup.isra.19 12.50%12.50% dd dd [.] 0xd9d8 12.50%12.50% dd libc-2.28.so [.] __unregister_atfork 12.50%12.50% dd libc-2.28.so [.] _nl_intern_locale_data 12.50%12.50% dd libc-2.28.so [.] vfprintf 16.67%16.67% dd libc-2.28.so [.] read_alias_file 8.33% 8.33% dd [kernel.kallsyms] [k] __arch_copy_from_user 8.33% 8.33% dd [kernel.kallsyms] [k] __arch_copy_to_user 8.33% 8.33% dd [kernel.kallsyms] [k] lookup_fast 8.33% 8.33% dd [kernel.kallsyms] [k] strncpy_from_user 8.33% 8.33% dd ld-2.28.so [.] _dl_lookup_symbol_x 8.33% 8.33% dd ld-2.28.so [.] check_match 8.33% 8.33% dd libc-2.28.so [.] __GI___printf_fp_l 8.33% 8.33% dd libc-2.28.so [.] _dl_addr 8.33% 8.33% dd libc-2.28.so [.] _int_malloc 8.33% 8.33% dd libc-2.28.so [.] _nl_intern_locale_data After that, more analysis and processing of the raw data of spe will be done. Signed-off-by: Tan Xiaojun --- tools/perf/builtin-report.c| 5 + tools/perf/util/arm-spe-decoder/Build | 2 +- tools/perf/util/arm-spe-decoder/arm-spe-decoder.c | 214 ++ tools/perf/util/arm-spe-decoder/arm-spe-decoder.h | 51 ++ .../util/arm-spe-decoder/arm-spe-pkt-decoder.h | 2 + tools/perf/util/arm-spe.c | 715 - tools/perf/util/auxtrace.c | 45 ++ tools/perf/util/auxtrace.h | 27 + tools/perf/util/session.h | 2 + 9 files changed, 1028 insertions(+), 35 deletions(-) create mode 100644 tools/perf/util/arm-spe-decoder/arm-spe-decoder.c create mode 100644 tools/perf/util/arm-spe-decoder/arm-spe-decoder.h diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c index abf0b9b..fadc8eb 100644 --- a/tools/perf/builtin-report.c +++ b/tools/perf/builtin-report.c @@ -1007,6 +1007,7 @@ int cmd_report(int argc, const char **argv) { struct perf_session *session; struct itrace_synth_opts itrace_synth_opts = { .set = 0, }; + struct arm_spe_synth_opts arm_spe_synth_opts; struct stat st; bool has_br_stack = false; int branch_mode = -1; @@ -1165,6 +1166,9 @@ int cmd_report(int argc, const char **argv)
[RFC PATCH 2/3] perf tools: Add support for "report" for some spe events
After the commit ffd3d18c20b8 ("perf tools: Add ARM Statistical Profiling Extensions (SPE) support") is merged, "perf record" and "perf report --dump-raw-trace" have been supported. However, the raw data that is dumped cannot be used without parsing. This patch is to improve the "perf report" support for spe, and further process the data. Currently, support for the three events of llc-miss, tlb-miss, and branch-miss is added. Example usage: ... 37.84%37.84% dd [kernel.kallsyms] [k] perf_iterate_ctx.constprop.64 16.22%16.22% dd [kernel.kallsyms] [k] copy_page 5.41% 5.41% dd [kernel.kallsyms] [k] find_vma 5.41% 5.41% dd [kernel.kallsyms] [k] perf_event_mmap 5.41% 5.41% dd [kernel.kallsyms] [k] zap_pte_range 5.41% 5.41% dd ld-2.28.so [.] _dl_lookup_symbol_x 5.41% 5.41% dd libc-2.28.so [.] _nl_intern_locale_data 2.70% 2.70% dd [kernel.kallsyms] [k] __remove_shared_vm_struct.isra.1 2.70% 2.70% dd [kernel.kallsyms] [k] kmem_cache_free 2.70% 2.70% dd [kernel.kallsyms] [k] ttwu_do_wakeup.isra.19 2.70% 2.70% dd dd [.] 0xd9d8 2.70% 2.70% dd ld-2.28.so [.] _dl_relocate_object 2.70% 2.70% dd libc-2.28.so [.] __unregister_atfork 2.70% 2.70% dd libc-2.28.so [.] _dl_addr 12.50%12.50% dd [kernel.kallsyms] [k] __audit_syscall_entry 12.50%12.50% dd [kernel.kallsyms] [k] kmem_cache_free 12.50%12.50% dd [kernel.kallsyms] [k] perf_iterate_ctx.constprop.64 12.50%12.50% dd [kernel.kallsyms] [k] ttwu_do_wakeup.isra.19 12.50%12.50% dd dd [.] 0xd9d8 12.50%12.50% dd libc-2.28.so [.] __unregister_atfork 12.50%12.50% dd libc-2.28.so [.] _nl_intern_locale_data 12.50%12.50% dd libc-2.28.so [.] vfprintf 16.67%16.67% dd libc-2.28.so [.] read_alias_file 8.33% 8.33% dd [kernel.kallsyms] [k] __arch_copy_from_user 8.33% 8.33% dd [kernel.kallsyms] [k] __arch_copy_to_user 8.33% 8.33% dd [kernel.kallsyms] [k] lookup_fast 8.33% 8.33% dd [kernel.kallsyms] [k] strncpy_from_user 8.33% 8.33% dd ld-2.28.so [.] _dl_lookup_symbol_x 8.33% 8.33% dd ld-2.28.so [.] check_match 8.33% 8.33% dd libc-2.28.so [.] __GI___printf_fp_l 8.33% 8.33% dd libc-2.28.so [.] _dl_addr 8.33% 8.33% dd libc-2.28.so [.] _int_malloc 8.33% 8.33% dd libc-2.28.so [.] _nl_intern_locale_data After that, more analysis and processing of the raw data of spe will be done. Signed-off-by: Tan Xiaojun --- tools/perf/builtin-report.c| 5 + tools/perf/util/arm-spe-decoder/Build | 2 +- tools/perf/util/arm-spe-decoder/arm-spe-decoder.c | 214 ++ tools/perf/util/arm-spe-decoder/arm-spe-decoder.h | 51 ++ .../util/arm-spe-decoder/arm-spe-pkt-decoder.h | 2 + tools/perf/util/arm-spe.c | 715 - tools/perf/util/auxtrace.c | 45 ++ tools/perf/util/auxtrace.h | 27 + tools/perf/util/session.h | 2 + 9 files changed, 1028 insertions(+), 35 deletions(-) create mode 100644 tools/perf/util/arm-spe-decoder/arm-spe-decoder.c create mode 100644 tools/perf/util/arm-spe-decoder/arm-spe-decoder.h diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c index abf0b9b..fadc8eb 100644 --- a/tools/perf/builtin-report.c +++ b/tools/perf/builtin-report.c @@ -1007,6 +1007,7 @@ int cmd_report(int argc, const char **argv) { struct perf_session *session; struct itrace_synth_opts itrace_synth_opts = { .set = 0, }; + struct arm_spe_synth_opts arm_spe_synth_opts; struct stat st; bool has_br_stack = false; int branch_mode = -1; @@ -1165,6 +1166,9 @@ int cmd_report(int argc, const char **argv) OPT_CALLBACK_OPTARG(0, "itrace", &itrace_synth_opts, NULL, "opts", "Instruction Tracing options\n" ITRACE_HELP, itrace_parse_synth_opts), + OPT_CALLBACK_OPTARG(0, "spe", &arm_spe_synth_opts, NULL, "spe opts", + "ARM SPE Tracing options", + arm_spe_parse_synth_opts), OPT_BOOLEAN(0, "full-source-path", &srcline_full_filename, "Show full source file name path for source lines"), OPT_BOOLEAN(0, "show-ref-call-graph", &symbol