Re: [1/2] powerpc/perf: Add constraints for power9 l2/l3 bus events

2018-12-22 Thread Michael Ellerman
On Sun, 2018-06-10 at 14:27:01 UTC, Madhavan Srinivasan wrote:
> In previous generation processors, both bus events and direct
> events of performance monitoring unit can be individually
> programmabled and monitored in PMCs.
> 
> But in Power9, L2/L3 bus events are always available as a
> "bank" of 4 events. To obtain the counts for any of the
> l2/l3 bus events in a given bank, the user will have to
> program PMC4 with corresponding l2/l3 bus event for that
> bank.
> 
> Patch enforce two contraints incase of L2/L3 bus events.
> 
> 1)Any L2/L3 event when programmed is also expected to program corresponding
> PMC4 event from that group.
> 2)PMC4 event should always been programmed first due to group constraint
> logic limitation
> 
> For ex. consider these L3 bus events
> 
> PM_L3_PF_ON_CHIP_MEM (0x460A0),
> PM_L3_PF_MISS_L3 (0x160A0),
> PM_L3_CO_MEM (0x260A0),
> PM_L3_PF_ON_CHIP_CACHE (0x360A0),
> 
> 1) This is an INVALID group for L3 Bus event monitoring,
> since it is missing PMC4 event.
>   perf stat -e "{r160A0,r260A0,r360A0}" < >
> 
> And this is a VALID group for L3 Bus events:
>   perf stat -e "{r460A0,r160A0,r260A0,r360A0}" < >
> 
> 2) This is an INVALID group for L3 Bus event monitoring,
> since it is missing PMC4 event.
>   perf stat -e "{r260A0,r360A0}" < >
> 
> And this is a VALID group for L3 Bus events:
>   perf stat -e "{r460A0,r260A0,r360A0}" < >
> 
> 3) This is an INVALID group for L3 Bus event monitoring,
> since it is missing PMC4 event.
>   perf stat -e "{r360A0}" < >
> 
> And this is a VALID group for L3 Bus events:
>   perf stat -e "{r460A0,r360A0}" < >
> 
> Patch here implements group constraint logic suggested by Michael Ellerman.
> 
> Signed-off-by: Madhavan Srinivasan 

Series applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/59029136d75022cb8b7c7bebd1738a

cheers


[PATCH 1/2] powerpc/perf: Add constraints for power9 l2/l3 bus events

2018-06-10 Thread Madhavan Srinivasan
In previous generation processors, both bus events and direct
events of performance monitoring unit can be individually
programmabled and monitored in PMCs.

But in Power9, L2/L3 bus events are always available as a
"bank" of 4 events. To obtain the counts for any of the
l2/l3 bus events in a given bank, the user will have to
program PMC4 with corresponding l2/l3 bus event for that
bank.

Patch enforce two contraints incase of L2/L3 bus events.

1)Any L2/L3 event when programmed is also expected to program corresponding
PMC4 event from that group.
2)PMC4 event should always been programmed first due to group constraint
logic limitation

For ex. consider these L3 bus events

PM_L3_PF_ON_CHIP_MEM (0x460A0),
PM_L3_PF_MISS_L3 (0x160A0),
PM_L3_CO_MEM (0x260A0),
PM_L3_PF_ON_CHIP_CACHE (0x360A0),

1) This is an INVALID group for L3 Bus event monitoring,
since it is missing PMC4 event.
perf stat -e "{r160A0,r260A0,r360A0}" < >

And this is a VALID group for L3 Bus events:
perf stat -e "{r460A0,r160A0,r260A0,r360A0}" < >

2) This is an INVALID group for L3 Bus event monitoring,
since it is missing PMC4 event.
perf stat -e "{r260A0,r360A0}" < >

And this is a VALID group for L3 Bus events:
perf stat -e "{r460A0,r260A0,r360A0}" < >

3) This is an INVALID group for L3 Bus event monitoring,
since it is missing PMC4 event.
perf stat -e "{r360A0}" < >

And this is a VALID group for L3 Bus events:
perf stat -e "{r460A0,r360A0}" < >

Patch here implements group constraint logic suggested by Michael Ellerman.

Signed-off-by: Madhavan Srinivasan 
---
 arch/powerpc/include/asm/perf_event_server.h |  2 ++
 arch/powerpc/perf/core-book3s.c  | 20 +++-
 arch/powerpc/perf/isa207-common.c| 28 ++--
 arch/powerpc/perf/isa207-common.h|  5 +
 arch/powerpc/perf/power9-pmu.c   |  2 ++
 5 files changed, 42 insertions(+), 15 deletions(-)

diff --git a/arch/powerpc/include/asm/perf_event_server.h 
b/arch/powerpc/include/asm/perf_event_server.h
index 67a8a9585d50..e60aeb46d6a0 100644
--- a/arch/powerpc/include/asm/perf_event_server.h
+++ b/arch/powerpc/include/asm/perf_event_server.h
@@ -41,6 +41,8 @@ struct power_pmu {
void(*get_mem_data_src)(union perf_mem_data_src *dsrc,
u32 flags, struct pt_regs *regs);
void(*get_mem_weight)(u64 *weight);
+   unsigned long   group_constraint_mask;
+   unsigned long   group_constraint_val;
u64 (*bhrb_filter_map)(u64 branch_sample_type);
void(*config_bhrb)(u64 pmu_bhrb_filter);
void(*disable_pmc)(unsigned int pmc, unsigned long mmcr[]);
diff --git a/arch/powerpc/perf/core-book3s.c b/arch/powerpc/perf/core-book3s.c
index 3f66fcf8ad99..e429a395c6dd 100644
--- a/arch/powerpc/perf/core-book3s.c
+++ b/arch/powerpc/perf/core-book3s.c
@@ -876,6 +876,8 @@ static int power_check_constraints(struct cpu_hw_events 
*cpuhw,
int i, j;
unsigned long addf = ppmu->add_fields;
unsigned long tadd = ppmu->test_adder;
+   unsigned long grp_mask = ppmu->group_constraint_mask;
+   unsigned long grp_val = ppmu->group_constraint_val;
 
if (n_ev > ppmu->n_counter)
return -1;
@@ -896,15 +898,23 @@ static int power_check_constraints(struct cpu_hw_events 
*cpuhw,
for (i = 0; i < n_ev; ++i) {
nv = (value | cpuhw->avalues[i][0]) +
(value & cpuhw->avalues[i][0] & addf);
-   if nv + tadd) ^ value) & mask) != 0 ||
-   (((nv + tadd) ^ cpuhw->avalues[i][0]) &
-cpuhw->amasks[i][0]) != 0)
+
+   if (nv + tadd) ^ value) & mask) & (~grp_mask)) != 0)
+   break;
+
+   if (nv + tadd) ^ cpuhw->avalues[i][0]) & 
cpuhw->amasks[i][0])
+   & (~grp_mask)) != 0)
break;
+
value = nv;
mask |= cpuhw->amasks[i][0];
}
-   if (i == n_ev)
-   return 0;   /* all OK */
+   if (i == n_ev) {
+   if ((value & mask & grp_mask) != (mask & grp_val))
+   return -1;
+   else
+   return 0;   /* all OK */
+   }
 
/* doesn't work, gather alternatives... */
if (!ppmu->get_alternatives)
diff --git a/arch/powerpc/perf/isa207-common.c 
b/arch/powerpc/perf/isa207-common.c
index 2efee3f196f5..bff206831667 100644
--- a/arch/powerpc/perf/isa207-common.c
+++ b/arch/powerpc/perf/isa207-common.c
@@ -276,17 +276,25 @@ int isa207_get_constraint(u64 event, unsigned long 
*maskp, unsigned long *valp)
}
 
if (unit >= 6 && unit <= 9) {
-   /*
-* L2/L3 events contain a cache selector field, which is
-* supposed to be programmed into MMCRC. However MMCRC is only
-