On Tue, Jul 23, 2024 at 09:02:23AM -0700, Ian Rogers wrote: > On Mon, Jul 22, 2024 at 10:27 PM Kajol Jain <kj...@linux.ibm.com> wrote: > > > > Update JSON/events for power10 platform with additional events. > > Also move PM_VECTOR_LD_CMPL event from others.json to > > frontend.json file. > > > > Signed-off-by: Kajol Jain <kj...@linux.ibm.com> > > Reviewed-by: Ian Rogers <irog...@google.com>
Thanks, applied to tmp.perf-tools-next, - Arnaldo > > --- > > .../arch/powerpc/power10/frontend.json | 5 + > > .../arch/powerpc/power10/others.json | 100 +++++++++++++++++- > > 2 files changed, 100 insertions(+), 5 deletions(-) > > > > diff --git a/tools/perf/pmu-events/arch/powerpc/power10/frontend.json > > b/tools/perf/pmu-events/arch/powerpc/power10/frontend.json > > index 5977f5e64212..53660c279286 100644 > > --- a/tools/perf/pmu-events/arch/powerpc/power10/frontend.json > > +++ b/tools/perf/pmu-events/arch/powerpc/power10/frontend.json > > @@ -74,6 +74,11 @@ > > "EventName": "PM_ISSUE_KILL", > > "BriefDescription": "Cycles in which an instruction or group of > > instructions were cancelled after being issued. This event increments once > > per occurrence, regardless of how many instructions are included in the > > issue group." > > }, > > + { > > + "EventCode": "0x44054", > > + "EventName": "PM_VECTOR_LD_CMPL", > > + "BriefDescription": "Vector load instruction completed." > > + }, > > { > > "EventCode": "0x44056", > > "EventName": "PM_VECTOR_ST_CMPL", > > diff --git a/tools/perf/pmu-events/arch/powerpc/power10/others.json > > b/tools/perf/pmu-events/arch/powerpc/power10/others.json > > index fcf8a8ebe7bd..53ca610152fa 100644 > > --- a/tools/perf/pmu-events/arch/powerpc/power10/others.json > > +++ b/tools/perf/pmu-events/arch/powerpc/power10/others.json > > The "topic" of an event is taken from the filename, here the topic > will be "others". > > > @@ -94,11 +94,6 @@ > > "EventName": "PM_L1_ICACHE_RELOADED_ALL", > > "BriefDescription": "Counts all instruction cache reloads includes > > demand, prefetch, prefetch turned into demand and demand turned into > > prefetch." > > }, > > - { > > - "EventCode": "0x44054", > > - "EventName": "PM_VECTOR_LD_CMPL", > > - "BriefDescription": "Vector load instruction completed." > > - }, > > { > > "EventCode": "0x4D05E", > > "EventName": "PM_BR_CMPL", > > @@ -108,5 +103,100 @@ > > "EventCode": "0x400F0", > > "EventName": "PM_LD_DEMAND_MISS_L1_FIN", > > "BriefDescription": "Load missed L1, counted at finish time." > > + }, > > + { > > + "EventCode": "0x00000038BC", > > + "EventName": "PM_ISYNC_CMPL", > > + "BriefDescription": "Isync completion count per thread." > > + }, > > + { > > + "EventCode": "0x000000C088", > > + "EventName": "PM_LD0_32B_FIN", > > + "BriefDescription": "256-bit load finished in the LD0 load execution > > unit." > > + }, > > + { > > + "EventCode": "0x000000C888", > > + "EventName": "PM_LD1_32B_FIN", > > + "BriefDescription": "256-bit load finished in the LD1 load execution > > unit." > > + }, > > + { > > + "EventCode": "0x000000C090", > > + "EventName": "PM_LD0_UNALIGNED_FIN", > > + "BriefDescription": "Load instructions in LD0 port that are either > > unaligned, or treated as unaligned and require an additional recycle > > through the pipeline using the load gather buffer. This typically adds > > about 10 cycles to the latency of the instruction. This includes loads that > > cross the 128 byte boundary, octword loads that are not aligned, and a > > special forward progress case of a load that does not hit in the L1 and > > crosses the 32 byte boundary and is launched NTC. Counted at finish time." > > + }, > > + { > > + "EventCode": "0x000000C890", > > + "EventName": "PM_LD1_UNALIGNED_FIN", > > + "BriefDescription": "Load instructions in LD1 port that are either > > unaligned, or treated as unaligned and require an additional recycle > > through the pipeline using the load gather buffer. This typically adds > > about 10 cycles to the latency of the instruction. This includes loads that > > cross the 128 byte boundary, octword loads that are not aligned, and a > > special forward progress case of a load that does not hit in the L1 and > > crosses the 32 byte boundary and is launched NTC. Counted at finish time." > > + }, > > + { > > + "EventCode": "0x000000C0A4", > > + "EventName": "PM_ST0_UNALIGNED_FIN", > > + "BriefDescription": "Store instructions in ST0 port that are either > > unaligned, or treated as unaligned and require an additional recycle > > through the pipeline. This typically adds about 10 cycles to the latency of > > the instruction. This only includes stores that cross the 128 byte > > boundary. Counted at finish time." > > + }, > > + { > > + "EventCode": "0x000000C8A4", > > + "EventName": "PM_ST1_UNALIGNED_FIN", > > + "BriefDescription": "Store instructions in ST1 port that are either > > unaligned, or treated as unaligned and require an additional recycle > > through the pipeline. This typically adds about 10 cycles to the latency of > > the instruction. This only includes stores that cross the 128 byte > > boundary. Counted at finish time." > > + }, > > + { > > + "EventCode": "0x000000C8B8", > > + "EventName": "PM_STCX_SUCCESS_CMPL", > > + "BriefDescription": "STCX instructions that completed successfully. > > Specifically, counts only when a pass status is returned from the nest." > > + }, > > + { > > + "EventCode": "0x000000D0B4", > > + "EventName": "PM_DC_PREF_STRIDED_CONF", > > + "BriefDescription": "A demand load referenced a line in an active > > strided prefetch stream. The stream could have been allocated through the > > hardware prefetch mechanism or through software." > > + }, > > + { > > + "EventCode": "0x000000F880", > > + "EventName": "PM_SNOOP_TLBIE_CYC", > > + "BriefDescription": "Cycles in which TLBIE snoops are executed in the > > LSU." > > + }, > > Perhaps the topics here should be memory or translation? > > > + { > > + "EventCode": "0x000000F084", > > + "EventName": "PM_SNOOP_TLBIE_CACHE_WALK_CYC", > > + "BriefDescription": "TLBIE snoop cycles in which the data cache is > > being walked." > > + }, > > + { > > + "EventCode": "0x000000F884", > > + "EventName": "PM_SNOOP_TLBIE_WAIT_ST_CYC", > > + "BriefDescription": "TLBIE snoop cycles in which older stores are > > still draining." > > + }, > > + { > > + "EventCode": "0x000000F088", > > + "EventName": "PM_SNOOP_TLBIE_WAIT_LD_CYC", > > + "BriefDescription": "TLBIE snoop cycles in which older loads are still > > draining." > > + }, > > + { > > + "EventCode": "0x000000F08C", > > + "EventName": "PM_SNOOP_TLBIE_WAIT_MMU_CYC", > > + "BriefDescription": "TLBIE snoop cycles in which the Load-Store unit > > is waiting for the MMU to finish invalidation." > > + }, > > + { > > + "EventCode": "0x0000004884", > > + "EventName": "PM_NO_FETCH_IBUF_FULL_CYC", > > + "BriefDescription": "Cycles in which no instructions are fetched > > because there is no room in the instruction buffers." > > + }, > > + { > > + "EventCode": "0x00000048B4", > > + "EventName": "PM_BR_TKN_UNCOND_FIN", > > + "BriefDescription": "An unconditional branch finished. All > > unconditional branches are taken." > > I see PM_BR_TAKEN_CMPL in > tools/perf/pmu-events/arch/powerpc/power10/frontend.json, so maybe it > makes sense to put this event in that topic? > > Thanks, > Ian > > > + }, > > + { > > + "EventCode": "0x0B0000016080", > > + "EventName": "PM_L2_TLBIE_SLBIE_START", > > + "BriefDescription": "NCU Master received a TLBIE/SLBIEG/SLBIAG > > operation from the core. Event count should be multiplied by 2 since the > > data is coming from a 2:1 clock domain and the data is time sliced across > > all 4 threads." > > + }, > > + { > > + "EventCode": "0x0B0000016880", > > + "EventName": "PM_L2_TLBIE_SLBIE_DELAY", > > + "BriefDescription": "Cycles when a TLBIE/SLBIEG/SLBIAG command was > > held in a hottemp condition by the NCU Master. Multiply this count by 1000 > > to obtain the total number of cycles. This can be divided by > > PM_L2_TLBIE_SLBIE_SENT to obtain the average time a TLBIE/SLBIEG/SLBIAG > > command was held. Event count should be multiplied by 2 since the data is > > coming from a 2:1 clock domain and the data is time sliced across all 4 > > threads." > > + }, > > + { > > + "EventCode": "0x0B0000026880", > > + "EventName": "PM_L2_SNP_TLBIE_SLBIE_DELAY", > > + "BriefDescription": "Cycles when a TLBIE/SLBIEG/SLBIAG that targets > > this thread's LPAR was in flight while in a hottemp condition. Multiply > > this count by 1000 to obtain the total number of cycles. This can be > > divided by PM_L2_SNP_TLBIE_SLBIE_START to obtain the overall efficiency. > > Note: ’inflight’ means SnpTLB has been sent to core(ie doesn’t include when > > SnpTLB is in NCU waiting to be launched serially behind different SnpTLB). > > The NCU Snooper gets in a ’hottemp’ delay window when it detects it is > > above its TLBIE/SLBIE threshold for process SnpTLBIE/SLBIE with this core. > > Event count should be multiplied by 2 since the data is coming from a 2:1 > > clock domain and the data is time sliced across all 4 threads." > > } > > ] > > -- > > 2.43.0 > >