Re: [PATCH v8 05/10] powerpc/perf: IMC pmu cpumask and cpuhotplug support

2017-05-10 Thread Stephen Rothwell
Hi,

On Wed, 10 May 2017 14:09:53 +0200 (CEST) Thomas Gleixner  
wrote:
>
> > +static void nest_change_cpu_context(int old_cpu, int new_cpu)
> > +{
> > +   int i;
> > +
> > +   for (i = 0;
> > +(per_nest_pmu_arr[i] != NULL) && (i < IMC_MAX_PMUS); i++)
> > +   perf_pmu_migrate_context(&per_nest_pmu_arr[i]->pmu,
> > +   old_cpu, new_cpu);  
> 
> Bah, this is horrible to read.
> 
>   struct imc_pmu **pn = per_nest_pmu_arr;
>   int i;
> 
>   for (i = 0; *pn && i < IMC_MAX_PMUS; i++, pn++)
>   perf_pmu_migrate_context(&(*pn)->pmu, old_cpu, new_cpu);

(Just a bit of bike shedding ...)

Or even (since "i" is not used any more):

struct imc_pmu **pn;

for (pn = per_nest_pmu_arr;
 pn < &per_nest_pmu_arr[IMC_MAX_PMUS] && *pn;
 pn++)
perf_pmu_migrate_context(&(*pn)->pmu, old_cpu, new_cpu);

-- 
Cheers,
Stephen Rothwell


Re: [PATCH v8 05/10] powerpc/perf: IMC pmu cpumask and cpuhotplug support

2017-05-11 Thread Thomas Gleixner
On Thu, 11 May 2017, Stephen Rothwell wrote:

> Hi,
> 
> On Wed, 10 May 2017 14:09:53 +0200 (CEST) Thomas Gleixner 
>  wrote:
> >
> > > +static void nest_change_cpu_context(int old_cpu, int new_cpu)
> > > +{
> > > + int i;
> > > +
> > > + for (i = 0;
> > > +  (per_nest_pmu_arr[i] != NULL) && (i < IMC_MAX_PMUS); i++)
> > > + perf_pmu_migrate_context(&per_nest_pmu_arr[i]->pmu,
> > > + old_cpu, new_cpu);  
> > 
> > Bah, this is horrible to read.
> > 
> > struct imc_pmu **pn = per_nest_pmu_arr;
> > int i;
> > 
> > for (i = 0; *pn && i < IMC_MAX_PMUS; i++, pn++)
> > perf_pmu_migrate_context(&(*pn)->pmu, old_cpu, new_cpu);
> 
> (Just a bit of bike shedding ...)
> 
> Or even (since "i" is not used any more):
> 
>   struct imc_pmu **pn;
> 
>   for (pn = per_nest_pmu_arr;
>pn < &per_nest_pmu_arr[IMC_MAX_PMUS] && *pn;
>pn++)
>   perf_pmu_migrate_context(&(*pn)->pmu, old_cpu, new_cpu);

Which is equally unreadable as the original code I complained about. Is that
a corporate preference?

Thanks,

tglx







Re: [PATCH v8 05/10] powerpc/perf: IMC pmu cpumask and cpuhotplug support

2017-05-11 Thread Stewart Smith
Madhavan Srinivasan  writes:
>>  * in patch 9 should opal_imc_counters_init return something other
>>than OPAL_SUCCESS in the case on invalid arguments? Maybe
>>OPAL_PARAMETER? (I think you fix this in a later patch anyway?)
>
> So, init call will return OPAL_PARAMETER for the unsupported
> domains (core and nest are supported). And if the init operation
> fails for any reason, it would return OPAL_HARDWARE. And this is
> documented.

(I'll comment on the skiboot one too), but I think that if the class
exists but init is a no-op, then OPAL_IMC_COUNTERS_INIT should return
OPAL_SUCCESS and just do nothing. This future proofs everything, and the
API is that one *must* call _INIT before start.


-- 
Stewart Smith
OPAL Architect, IBM.



Re: [PATCH v8 05/10] powerpc/perf: IMC pmu cpumask and cpuhotplug support

2017-05-11 Thread Michael Ellerman
Stewart Smith  writes:

> Madhavan Srinivasan  writes:
>>>  * in patch 9 should opal_imc_counters_init return something other
>>>than OPAL_SUCCESS in the case on invalid arguments? Maybe
>>>OPAL_PARAMETER? (I think you fix this in a later patch anyway?)
>>
>> So, init call will return OPAL_PARAMETER for the unsupported
>> domains (core and nest are supported). And if the init operation
>> fails for any reason, it would return OPAL_HARDWARE. And this is
>> documented.
>
> (I'll comment on the skiboot one too), but I think that if the class
> exists but init is a no-op, then OPAL_IMC_COUNTERS_INIT should return
> OPAL_SUCCESS and just do nothing. This future proofs everything, and the
> API is that one *must* call _INIT before start.

Yes, 100%.

That's what I described in my replies to a previous version, if it
doesn't do that we need to fix it.

cheers


Re: [PATCH v8 05/10] powerpc/perf: IMC pmu cpumask and cpuhotplug support

2017-05-11 Thread Madhavan Srinivasan



On Friday 12 May 2017 07:48 AM, Stewart Smith wrote:

Madhavan Srinivasan  writes:

  * in patch 9 should opal_imc_counters_init return something other
than OPAL_SUCCESS in the case on invalid arguments? Maybe
OPAL_PARAMETER? (I think you fix this in a later patch anyway?)

So, init call will return OPAL_PARAMETER for the unsupported
domains (core and nest are supported). And if the init operation
fails for any reason, it would return OPAL_HARDWARE. And this is
documented.

(I'll comment on the skiboot one too), but I think that if the class
exists but init is a no-op, then OPAL_IMC_COUNTERS_INIT should return
OPAL_SUCCESS and just do nothing. This future proofs everything, and the
API is that one *must* call _INIT before start.

Hi stewart,

Yes. mpe did mention this in his review. And i have made the same in the v11
of the opal patchset. Currently we return OPAL_SUCCESS from _INIT
incase of type "Nest". Additionally i have also added a message to be 
printed

 but i guess we can get away with that.

Maddy







Re: [PATCH v8 05/10] powerpc/perf: IMC pmu cpumask and cpuhotplug support

2017-05-11 Thread Madhavan Srinivasan



On Friday 12 May 2017 09:03 AM, Michael Ellerman wrote:

Stewart Smith  writes:


Madhavan Srinivasan  writes:

  * in patch 9 should opal_imc_counters_init return something other
than OPAL_SUCCESS in the case on invalid arguments? Maybe
OPAL_PARAMETER? (I think you fix this in a later patch anyway?)

So, init call will return OPAL_PARAMETER for the unsupported
domains (core and nest are supported). And if the init operation
fails for any reason, it would return OPAL_HARDWARE. And this is
documented.

(I'll comment on the skiboot one too), but I think that if the class
exists but init is a no-op, then OPAL_IMC_COUNTERS_INIT should return
OPAL_SUCCESS and just do nothing. This future proofs everything, and the
API is that one *must* call _INIT before start.

Yes, 100%.

That's what I described in my replies to a previous version, if it
doesn't do that we need to fix it.

Hi mpe,

Yes, as you suggested in the opal v11 patchset, we return OPAL_SUCCESS
from _INIT for type "Nest". Have also added a prerror message logging
for debug, but can get away with it or make it as a prlog.

Maddy


cheers





Re: [PATCH v8 05/10] powerpc/perf: IMC pmu cpumask and cpuhotplug support

2017-05-15 Thread Madhavan Srinivasan

Sorry for delayed response.


On Wednesday 10 May 2017 05:39 PM, Thomas Gleixner wrote:

On Thu, 4 May 2017, Anju T Sudhakar wrote:

+/*
+ * nest_init : Initializes the nest imc engine for the current chip.
+ * by default the nest engine is disabled.
+ */
+static void nest_init(int *cpu_opal_rc)
+{
+   int rc;
+
+   /*
+* OPAL figures out which CPU to start based on the CPU that is
+* currently running when we call into OPAL

I have no idea what that comment tries to tell me and how it is related to
the init function or the invoked opal_imc_counters_stop() function.


Yep will fix the comment.




+*/
+   rc = opal_imc_counters_stop(OPAL_IMC_COUNTERS_NEST);
+   if (rc)
+   cpu_opal_rc[smp_processor_id()] = 1;
+}
+
+static void nest_change_cpu_context(int old_cpu, int new_cpu)
+{
+   int i;
+
+   for (i = 0;
+(per_nest_pmu_arr[i] != NULL) && (i < IMC_MAX_PMUS); i++)
+   perf_pmu_migrate_context(&per_nest_pmu_arr[i]->pmu,
+   old_cpu, new_cpu);

Bah, this is horrible to read.

struct imc_pmu **pn = per_nest_pmu_arr;
int i;

for (i = 0; *pn && i < IMC_MAX_PMUS; i++, pn++)
perf_pmu_migrate_context(&(*pn)->pmu, old_cpu, new_cpu);

Hmm?


Yes this is better. Will update the code.




+}
+
+static int ppc_nest_imc_cpu_online(unsigned int cpu)
+{
+   int nid;
+   const struct cpumask *l_cpumask;
+   struct cpumask tmp_mask;

You should not allocate cpumask on stack unconditionally. Either make that
cpumask_var_t and use zalloc/free_cpumask_var() or simply make it

static struct cpumask tmp_mask;

That's fine, because this is serialized by the hotplug code already.


ok will fix it as suggested.




+
+   /* Find the cpumask of this node */
+   nid = cpu_to_node(cpu);
+   l_cpumask = cpumask_of_node(nid);
+
+   /*
+* If any of the cpu from this node is already present in the mask,
+* just return, if not, then set this cpu in the mask.
+*/
+   if (!cpumask_and(&tmp_mask, l_cpumask, &nest_imc_cpumask)) {
+   cpumask_set_cpu(cpu, &nest_imc_cpumask);
+   nest_change_cpu_context(-1, cpu);
+   return 0;
+   }
+
+   return 0;
+}
+
+static int ppc_nest_imc_cpu_offline(unsigned int cpu)
+{
+   int nid, target = -1;
+   const struct cpumask *l_cpumask;
+
+   /*
+* Check in the designated list for this cpu. Dont bother
+* if not one of them.
+*/
+   if (!cpumask_test_and_clear_cpu(cpu, &nest_imc_cpumask))
+   return 0;
+
+   /*
+* Now that this cpu is one of the designated,
+* find a next cpu a) which is online and b) in same chip.
+*/
+   nid = cpu_to_node(cpu);
+   l_cpumask = cpumask_of_node(nid);
+   target = cpumask_next(cpu, l_cpumask);
+
+   /*
+* Update the cpumask with the target cpu and
+* migrate the context if needed
+*/
+   if (target >= 0 && target <= nr_cpu_ids) {
+   cpumask_set_cpu(target, &nest_imc_cpumask);
+   nest_change_cpu_context(cpu, target);
+   }

What disables the perf context if this was the last CPU on the node?


My bad. i did not understand this. Is this regarding the updates
of the "flags" in the perf_event and hw_perf_event structs?





+   return 0;
+}
+
+static int nest_pmu_cpumask_init(void)
+{
+   const struct cpumask *l_cpumask;
+   int cpu, nid;
+   int *cpus_opal_rc;
+
+   if (!cpumask_empty(&nest_imc_cpumask))
+   return 0;

What's that for? Paranoia engineering?


No.  The idea here is to generate the cpu_mask attribute
field only for the first "nest" pmu and use the same
for other "nest" units.


+
+   /*
+* Memory for OPAL call return value.
+*/
+   cpus_opal_rc = kzalloc((sizeof(int) * nr_cpu_ids), GFP_KERNEL);
+   if (!cpus_opal_rc)
+   goto fail;
+
+   /*
+* Nest PMUs are per-chip counters. So designate a cpu
+* from each chip for counter collection.
+*/
+   for_each_online_node(nid) {
+   l_cpumask = cpumask_of_node(nid);
+
+   /* designate first online cpu in this node */
+   cpu = cpumask_first(l_cpumask);
+   cpumask_set_cpu(cpu, &nest_imc_cpumask);
+   }

This is all unprotected against CPU hotplug.


+
+   /* Initialize Nest PMUs in each node using designated cpus */
+   on_each_cpu_mask(&nest_imc_cpumask, (smp_call_func_t)nest_init,
+   (void *)cpus_opal_rc, 1);

What does this check on nodes which are not yet online and become online
later?


Idea here is, not to have the nest engines running always. Enable/start
them only when needed. So at init stage of the pmu setup, disable them.
That said, opal api call to disable is needed for a ne

Re: [PATCH v8 05/10] powerpc/perf: IMC pmu cpumask and cpuhotplug support

2017-05-15 Thread Thomas Gleixner
On Mon, 15 May 2017, Madhavan Srinivasan wrote:
> On Wednesday 10 May 2017 05:39 PM, Thomas Gleixner wrote:
> > On Thu, 4 May 2017, Anju T Sudhakar wrote:
> > > + /*
> > > +  * Update the cpumask with the target cpu and
> > > +  * migrate the context if needed
> > > +  */
> > > + if (target >= 0 && target <= nr_cpu_ids) {
> > > + cpumask_set_cpu(target, &nest_imc_cpumask);
> > > + nest_change_cpu_context(cpu, target);
> > > + }
> > What disables the perf context if this was the last CPU on the node?
> 
> My bad. i did not understand this. Is this regarding the updates
> of the "flags" in the perf_event and hw_perf_event structs?

Sorry, there is nothing to understand. It was me misreading it.

> > > +static int nest_pmu_cpumask_init(void)
> > > +{
> > > + const struct cpumask *l_cpumask;
> > > + int cpu, nid;
> > > + int *cpus_opal_rc;
> > > +
> > > + if (!cpumask_empty(&nest_imc_cpumask))
> > > + return 0;
> > What's that for? Paranoia engineering?
> 
> No.  The idea here is to generate the cpu_mask attribute
> field only for the first "nest" pmu and use the same
> for other "nest" units.

Why is nest_pmu_cpumask_init() called more than once? If it is then you
should have a proper flag marking it initialiazed rather than making a
completely obscure check for a cpu mask. That check could be empty for the
second invocation when the first invocation fails.

> > But this whole function is completely overengineered. If you make that
> > nest_init() part of the online function then this works also for nodes
> > which come online later and it simplifies to:
> > 
> > static int ppc_nest_imc_cpu_online(unsigned int cpu)
> > {
> > const struct cpumask *l_cpumask;
> > static struct cpumask tmp_mask;
> > int res;
> > 
> > /* Get the cpumask of this node */
> > l_cpumask = cpumask_of_node(cpu_to_node(cpu));
> > 
> > /*
> >  * If this is not the first online CPU on this node, then IMC is
> >  * initialized already.
> >  */
> > if (cpumask_and(&tmp_mask, l_cpumask, &nest_imc_cpumask))
> > return 0;
> > 
> > /*
> >  * If this fails, IMC is not usable.
> >  *
> >  * FIXME: Add a understandable comment what this actually does
> >  * and why it can fail.
> >  */
> > res = opal_imc_counters_stop(OPAL_IMC_COUNTERS_NEST);
> > if (res)
> > return res;
> > 
> > /* Make this CPU the designated target for counter collection */
> > cpumask_set_cpu(cpu, &nest_imc_cpumask);
> > nest_change_cpu_context(-1, cpu);
> > return 0;
> > }
> > 
> > static int nest_pmu_cpumask_init(void)
> > {
> > return cpuhp_setup_state(CPUHP_AP_PERF_POWERPC_NEST_ONLINE,
> >  "perf/powerpc/imc:online",
> >  ppc_nest_imc_cpu_online,
> >  ppc_nest_imc_cpu_offline);
> > }
> > 
> > Hmm?
> 
> Yes this make sense. But we need to first designate a cpu in each
> chip at init setup and use opal api to disable the engine in the same.
> So probably, after cpuhp_setup_state, can we do that?

Errm. That's what ppc_nest_imc_cpu_online() does.

It checks whether this is the first online cpu on a node and if yes, it
calls opal_imc_counters_stop() and sets that cpu in nest_imc_cpumask.

cpuhp_setup_state() invokes the callback on each online CPU. Seo evrything
is set up proper after that.

Thanks,

tglx


Re: [PATCH v8 05/10] powerpc/perf: IMC pmu cpumask and cpuhotplug support

2017-05-08 Thread Daniel Axtens
Hi all,

I've had a look at the API as it was a big thing I didn't like in the
earlier version.

I am much happier with this one.

Some comments:

 - I'm no longer subscribed to skiboot but I've had a look at the
   patches on that side:

* in patch 9 should opal_imc_counters_init return something other
  than OPAL_SUCCESS in the case on invalid arguments? Maybe
  OPAL_PARAMETER? (I think you fix this in a later patch anyway?)

* in start/stop, should there be some sort of write barrier to make
  sure the cb->imc_chip_command actually gets written out to memory
  at the time we expect?

The rest of my comments are in line.

> Adds cpumask attribute to be used by each IMC pmu. Only one cpu (any
> online CPU) from each chip for nest PMUs is designated to read counters.
>
> On CPU hotplug, dying CPU is checked to see whether it is one of the
> designated cpus, if yes, next online cpu from the same chip (for nest
> units) is designated as new cpu to read counters. For this purpose, we
> introduce a new state : CPUHP_AP_PERF_POWERPC_NEST_ONLINE.
>
> Signed-off-by: Anju T Sudhakar 
> Signed-off-by: Hemant Kumar 
> Signed-off-by: Madhavan Srinivasan 
> ---
>  arch/powerpc/include/asm/imc-pmu.h |   4 +
>  arch/powerpc/include/asm/opal-api.h|  12 +-
>  arch/powerpc/include/asm/opal.h|   4 +
>  arch/powerpc/perf/imc-pmu.c| 248 
> -
>  arch/powerpc/platforms/powernv/opal-wrappers.S |   3 +
>  include/linux/cpuhotplug.h |   1 +

Who owns this? get_maintainer.pl doesn't give me anything helpful
here... Do we need an Ack from anyone?

>  6 files changed, 266 insertions(+), 6 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/imc-pmu.h 
> b/arch/powerpc/include/asm/imc-pmu.h
> index 6bbe184..1478d0f 100644
> --- a/arch/powerpc/include/asm/imc-pmu.h
> +++ b/arch/powerpc/include/asm/imc-pmu.h
> @@ -92,6 +92,10 @@ struct imc_pmu {
>  #define IMC_DOMAIN_NEST  1
>  #define IMC_DOMAIN_UNKNOWN   -1
>  
> +#define IMC_COUNTER_ENABLE   1
> +#define IMC_COUNTER_DISABLE  0

I'm not sure these constants are particularly useful any more, but I'll
have more to say on that later.

> +
> +
>  extern struct perchip_nest_info nest_perchip_info[IMC_MAX_CHIPS];
>  extern struct imc_pmu *per_nest_pmu_arr[IMC_MAX_PMUS];
>  extern int __init init_imc_pmu(struct imc_events *events,int idx, struct 
> imc_pmu *pmu_ptr);
> diff --git a/arch/powerpc/include/asm/opal-api.h 
> b/arch/powerpc/include/asm/opal-api.h
> index a0aa285..ce863d9 100644
> --- a/arch/powerpc/include/asm/opal-api.h
> +++ b/arch/powerpc/include/asm/opal-api.h
> @@ -168,7 +168,10 @@
>  #define OPAL_INT_SET_MFRR125
>  #define OPAL_PCI_TCE_KILL126
>  #define OPAL_NMMU_SET_PTCR   127
> -#define OPAL_LAST127
> +#define OPAL_IMC_COUNTERS_INIT   149
> +#define OPAL_IMC_COUNTERS_START  150
> +#define OPAL_IMC_COUNTERS_STOP   151

Yay, this is heaps better!

> +#define OPAL_LAST151
>  
>  /* Device tree flags */
>  
> @@ -928,6 +931,13 @@ enum {
>   OPAL_PCI_TCE_KILL_ALL,
>  };
>  
> +/* Argument to OPAL_IMC_COUNTERS_*  */
> +enum {
> + OPAL_IMC_COUNTERS_NEST = 1,
> + OPAL_IMC_COUNTERS_CORE = 2,
> + OPAL_IMC_COUNTERS_THREAD = 3,
> +};
> +
>  #endif /* __ASSEMBLY__ */
>  
>  #endif /* __OPAL_API_H */
> diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
> index 1ff03a6..9c16ec6 100644
> --- a/arch/powerpc/include/asm/opal.h
> +++ b/arch/powerpc/include/asm/opal.h
> @@ -227,6 +227,10 @@ int64_t opal_pci_tce_kill(uint64_t phb_id, uint32_t 
> kill_type,
> uint64_t dma_addr, uint32_t npages);
>  int64_t opal_nmmu_set_ptcr(uint64_t chip_id, uint64_t ptcr);
>  
> +int64_t opal_imc_counters_init(uint32_t type, uint64_t address);
This isn't called anywhere in this patch... including (worryingly) in
the init function...

> +int64_t opal_imc_counters_start(uint32_t type);
> +int64_t opal_imc_counters_stop(uint32_t type);
> +
>  /* Internal functions */
>  extern int early_init_dt_scan_opal(unsigned long node, const char *uname,
>  int depth, void *data);
> diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c
> index f09a37a..40792424 100644
> --- a/arch/powerpc/perf/imc-pmu.c
> +++ b/arch/powerpc/perf/imc-pmu.c
> @@ -18,6 +18,11 @@
>  
>  struct perchip_nest_info nest_perchip_info[IMC_MAX_CHIPS];
>  struct imc_pmu *per_nest_pmu_arr[IMC_MAX_PMUS];
> +static cpumask_t nest_imc_cpumask;
> +
> +static atomic_t nest_events;
> +/* Used to avoid races in calling enable/disable nest-pmu units*/
You need a space here between s and * ^

> +static DEFINE_MUTEX(imc_nest_reserve);
>  
>  /* Needed for sanity check */
>  extern u64 nest_max_offset;
> @@ -33,6

Re: [PATCH v8 05/10] powerpc/perf: IMC pmu cpumask and cpuhotplug support

2017-05-08 Thread Madhavan Srinivasan



On Monday 08 May 2017 07:42 PM, Daniel Axtens wrote:

Hi all,

I've had a look at the API as it was a big thing I didn't like in the
earlier version.

I am much happier with this one.


Thanks to mpe for suggesting this. :)



Some comments:

  - I'm no longer subscribed to skiboot but I've had a look at the
patches on that side:


Thanks alot for the review comments.



 * in patch 9 should opal_imc_counters_init return something other
   than OPAL_SUCCESS in the case on invalid arguments? Maybe
   OPAL_PARAMETER? (I think you fix this in a later patch anyway?)


So, init call will return OPAL_PARAMETER for the unsupported
domains (core and nest are supported). And if the init operation
fails for any reason, it would return OPAL_HARDWARE. And this is
documented.



 * in start/stop, should there be some sort of write barrier to make
   sure the cb->imc_chip_command actually gets written out to memory
   at the time we expect?


In the current implementation we make the opal call in the
*_event_stop and *_event_start function. But we wanted to
move opal call to the corresponding *_event_init(), so this
avoid a opal call on each _event_start and _event_stop to
this pmu. With this change, we may not need the barrier.

Maddy



The rest of my comments are in line.


Adds cpumask attribute to be used by each IMC pmu. Only one cpu (any
online CPU) from each chip for nest PMUs is designated to read counters.

On CPU hotplug, dying CPU is checked to see whether it is one of the
designated cpus, if yes, next online cpu from the same chip (for nest
units) is designated as new cpu to read counters. For this purpose, we
introduce a new state : CPUHP_AP_PERF_POWERPC_NEST_ONLINE.

Signed-off-by: Anju T Sudhakar 
Signed-off-by: Hemant Kumar 
Signed-off-by: Madhavan Srinivasan 
---
  arch/powerpc/include/asm/imc-pmu.h |   4 +
  arch/powerpc/include/asm/opal-api.h|  12 +-
  arch/powerpc/include/asm/opal.h|   4 +
  arch/powerpc/perf/imc-pmu.c| 248 -
  arch/powerpc/platforms/powernv/opal-wrappers.S |   3 +
  include/linux/cpuhotplug.h |   1 +

Who owns this? get_maintainer.pl doesn't give me anything helpful
here... Do we need an Ack from anyone?


  6 files changed, 266 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/include/asm/imc-pmu.h 
b/arch/powerpc/include/asm/imc-pmu.h
index 6bbe184..1478d0f 100644
--- a/arch/powerpc/include/asm/imc-pmu.h
+++ b/arch/powerpc/include/asm/imc-pmu.h
@@ -92,6 +92,10 @@ struct imc_pmu {
  #define IMC_DOMAIN_NEST   1
  #define IMC_DOMAIN_UNKNOWN-1
  
+#define IMC_COUNTER_ENABLE	1

+#define IMC_COUNTER_DISABLE0

I'm not sure these constants are particularly useful any more, but I'll
have more to say on that later.


+
+
  extern struct perchip_nest_info nest_perchip_info[IMC_MAX_CHIPS];
  extern struct imc_pmu *per_nest_pmu_arr[IMC_MAX_PMUS];
  extern int __init init_imc_pmu(struct imc_events *events,int idx, struct 
imc_pmu *pmu_ptr);
diff --git a/arch/powerpc/include/asm/opal-api.h 
b/arch/powerpc/include/asm/opal-api.h
index a0aa285..ce863d9 100644
--- a/arch/powerpc/include/asm/opal-api.h
+++ b/arch/powerpc/include/asm/opal-api.h
@@ -168,7 +168,10 @@
  #define OPAL_INT_SET_MFRR 125
  #define OPAL_PCI_TCE_KILL 126
  #define OPAL_NMMU_SET_PTCR127
-#define OPAL_LAST  127
+#define OPAL_IMC_COUNTERS_INIT 149
+#define OPAL_IMC_COUNTERS_START150
+#define OPAL_IMC_COUNTERS_STOP 151

Yay, this is heaps better!


+#define OPAL_LAST  151
  
  /* Device tree flags */
  
@@ -928,6 +931,13 @@ enum {

OPAL_PCI_TCE_KILL_ALL,
  };
  
+/* Argument to OPAL_IMC_COUNTERS_*  */

+enum {
+   OPAL_IMC_COUNTERS_NEST = 1,
+   OPAL_IMC_COUNTERS_CORE = 2,
+   OPAL_IMC_COUNTERS_THREAD = 3,
+};
+
  #endif /* __ASSEMBLY__ */
  
  #endif /* __OPAL_API_H */

diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index 1ff03a6..9c16ec6 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -227,6 +227,10 @@ int64_t opal_pci_tce_kill(uint64_t phb_id, uint32_t 
kill_type,
  uint64_t dma_addr, uint32_t npages);
  int64_t opal_nmmu_set_ptcr(uint64_t chip_id, uint64_t ptcr);
  
+int64_t opal_imc_counters_init(uint32_t type, uint64_t address);

This isn't called anywhere in this patch... including (worryingly) in
the init function...


+int64_t opal_imc_counters_start(uint32_t type);
+int64_t opal_imc_counters_stop(uint32_t type);
+
  /* Internal functions */
  extern int early_init_dt_scan_opal(unsigned long node, const char *uname,
   int depth, void *data);
diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c
index f09a37a..40792424 100644
--- a/a

Re: [PATCH v8 05/10] powerpc/perf: IMC pmu cpumask and cpuhotplug support

2017-05-09 Thread Anju T Sudhakar

Hi Daniel,


On Monday 08 May 2017 07:42 PM, Daniel Axtens wrote:

Hi all,

I've had a look at the API as it was a big thing I didn't like in the
earlier version.

I am much happier with this one.

Some comments:

  - I'm no longer subscribed to skiboot but I've had a look at the
patches on that side:

 * in patch 9 should opal_imc_counters_init return something other
   than OPAL_SUCCESS in the case on invalid arguments? Maybe
   OPAL_PARAMETER? (I think you fix this in a later patch anyway?)

 * in start/stop, should there be some sort of write barrier to make
   sure the cb->imc_chip_command actually gets written out to memory
   at the time we expect?

The rest of my comments are in line.


Adds cpumask attribute to be used by each IMC pmu. Only one cpu (any
online CPU) from each chip for nest PMUs is designated to read counters.

On CPU hotplug, dying CPU is checked to see whether it is one of the
designated cpus, if yes, next online cpu from the same chip (for nest
units) is designated as new cpu to read counters. For this purpose, we
introduce a new state : CPUHP_AP_PERF_POWERPC_NEST_ONLINE.

Signed-off-by: Anju T Sudhakar 
Signed-off-by: Hemant Kumar 
Signed-off-by: Madhavan Srinivasan 
---
  arch/powerpc/include/asm/imc-pmu.h |   4 +
  arch/powerpc/include/asm/opal-api.h|  12 +-
  arch/powerpc/include/asm/opal.h|   4 +
  arch/powerpc/perf/imc-pmu.c| 248 -
  arch/powerpc/platforms/powernv/opal-wrappers.S |   3 +
  include/linux/cpuhotplug.h |   1 +

Who owns this? get_maintainer.pl doesn't give me anything helpful
here... Do we need an Ack from anyone?


  6 files changed, 266 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/include/asm/imc-pmu.h 
b/arch/powerpc/include/asm/imc-pmu.h
index 6bbe184..1478d0f 100644
--- a/arch/powerpc/include/asm/imc-pmu.h
+++ b/arch/powerpc/include/asm/imc-pmu.h
@@ -92,6 +92,10 @@ struct imc_pmu {
  #define IMC_DOMAIN_NEST   1
  #define IMC_DOMAIN_UNKNOWN-1
  
+#define IMC_COUNTER_ENABLE	1

+#define IMC_COUNTER_DISABLE0

I'm not sure these constants are particularly useful any more, but I'll
have more to say on that later.


+
+
  extern struct perchip_nest_info nest_perchip_info[IMC_MAX_CHIPS];
  extern struct imc_pmu *per_nest_pmu_arr[IMC_MAX_PMUS];
  extern int __init init_imc_pmu(struct imc_events *events,int idx, struct 
imc_pmu *pmu_ptr);
diff --git a/arch/powerpc/include/asm/opal-api.h 
b/arch/powerpc/include/asm/opal-api.h
index a0aa285..ce863d9 100644
--- a/arch/powerpc/include/asm/opal-api.h
+++ b/arch/powerpc/include/asm/opal-api.h
@@ -168,7 +168,10 @@
  #define OPAL_INT_SET_MFRR 125
  #define OPAL_PCI_TCE_KILL 126
  #define OPAL_NMMU_SET_PTCR127
-#define OPAL_LAST  127
+#define OPAL_IMC_COUNTERS_INIT 149
+#define OPAL_IMC_COUNTERS_START150
+#define OPAL_IMC_COUNTERS_STOP 151

Yay, this is heaps better!


+#define OPAL_LAST  151
  
  /* Device tree flags */
  
@@ -928,6 +931,13 @@ enum {

OPAL_PCI_TCE_KILL_ALL,
  };
  
+/* Argument to OPAL_IMC_COUNTERS_*  */

+enum {
+   OPAL_IMC_COUNTERS_NEST = 1,
+   OPAL_IMC_COUNTERS_CORE = 2,
+   OPAL_IMC_COUNTERS_THREAD = 3,
+};
+
  #endif /* __ASSEMBLY__ */
  
  #endif /* __OPAL_API_H */

diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index 1ff03a6..9c16ec6 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -227,6 +227,10 @@ int64_t opal_pci_tce_kill(uint64_t phb_id, uint32_t 
kill_type,
  uint64_t dma_addr, uint32_t npages);
  int64_t opal_nmmu_set_ptcr(uint64_t chip_id, uint64_t ptcr);
  
+int64_t opal_imc_counters_init(uint32_t type, uint64_t address);

This isn't called anywhere in this patch... including (worryingly) in
the init function...


+int64_t opal_imc_counters_start(uint32_t type);
+int64_t opal_imc_counters_stop(uint32_t type);
+
  /* Internal functions */
  extern int early_init_dt_scan_opal(unsigned long node, const char *uname,
   int depth, void *data);
diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c
index f09a37a..40792424 100644
--- a/arch/powerpc/perf/imc-pmu.c
+++ b/arch/powerpc/perf/imc-pmu.c
@@ -18,6 +18,11 @@
  
  struct perchip_nest_info nest_perchip_info[IMC_MAX_CHIPS];

  struct imc_pmu *per_nest_pmu_arr[IMC_MAX_PMUS];
+static cpumask_t nest_imc_cpumask;
+
+static atomic_t nest_events;
+/* Used to avoid races in calling enable/disable nest-pmu units*/

You need a space here between s and * ^


Sure. I will correct this.  :)



+static DEFINE_MUTEX(imc_nest_reserve);
  
  /* Needed for sanity check */

  extern u64 nest_max_offset;
@@ -33,6 +38,160 @@ static struct attribu

Re: [PATCH v8 05/10] powerpc/perf: IMC pmu cpumask and cpuhotplug support

2017-05-10 Thread Thomas Gleixner
On Thu, 4 May 2017, Anju T Sudhakar wrote:
> +/*
> + * nest_init : Initializes the nest imc engine for the current chip.
> + * by default the nest engine is disabled.
> + */
> +static void nest_init(int *cpu_opal_rc)
> +{
> + int rc;
> +
> + /*
> +  * OPAL figures out which CPU to start based on the CPU that is
> +  * currently running when we call into OPAL

I have no idea what that comment tries to tell me and how it is related to
the init function or the invoked opal_imc_counters_stop() function.

> +  */
> + rc = opal_imc_counters_stop(OPAL_IMC_COUNTERS_NEST);
> + if (rc)
> + cpu_opal_rc[smp_processor_id()] = 1;
> +}
> +
> +static void nest_change_cpu_context(int old_cpu, int new_cpu)
> +{
> + int i;
> +
> + for (i = 0;
> +  (per_nest_pmu_arr[i] != NULL) && (i < IMC_MAX_PMUS); i++)
> + perf_pmu_migrate_context(&per_nest_pmu_arr[i]->pmu,
> + old_cpu, new_cpu);

Bah, this is horrible to read.

struct imc_pmu **pn = per_nest_pmu_arr;
int i;

for (i = 0; *pn && i < IMC_MAX_PMUS; i++, pn++)
perf_pmu_migrate_context(&(*pn)->pmu, old_cpu, new_cpu);

Hmm?

> +}

> +
> +static int ppc_nest_imc_cpu_online(unsigned int cpu)
> +{
> + int nid;
> + const struct cpumask *l_cpumask;
> + struct cpumask tmp_mask;

You should not allocate cpumask on stack unconditionally. Either make that
cpumask_var_t and use zalloc/free_cpumask_var() or simply make it

static struct cpumask tmp_mask;

That's fine, because this is serialized by the hotplug code already.

> +
> + /* Find the cpumask of this node */
> + nid = cpu_to_node(cpu);
> + l_cpumask = cpumask_of_node(nid);
> +
> + /*
> +  * If any of the cpu from this node is already present in the mask,
> +  * just return, if not, then set this cpu in the mask.
> +  */
> + if (!cpumask_and(&tmp_mask, l_cpumask, &nest_imc_cpumask)) {
> + cpumask_set_cpu(cpu, &nest_imc_cpumask);
> + nest_change_cpu_context(-1, cpu);
> + return 0;
> + }
> +
> + return 0;
> +}
> +
> +static int ppc_nest_imc_cpu_offline(unsigned int cpu)
> +{
> + int nid, target = -1;
> + const struct cpumask *l_cpumask;
> +
> + /*
> +  * Check in the designated list for this cpu. Dont bother
> +  * if not one of them.
> +  */
> + if (!cpumask_test_and_clear_cpu(cpu, &nest_imc_cpumask))
> + return 0;
> +
> + /*
> +  * Now that this cpu is one of the designated,
> +  * find a next cpu a) which is online and b) in same chip.
> +  */
> + nid = cpu_to_node(cpu);
> + l_cpumask = cpumask_of_node(nid);
> + target = cpumask_next(cpu, l_cpumask);
> +
> + /*
> +  * Update the cpumask with the target cpu and
> +  * migrate the context if needed
> +  */
> + if (target >= 0 && target <= nr_cpu_ids) {
> + cpumask_set_cpu(target, &nest_imc_cpumask);
> + nest_change_cpu_context(cpu, target);
> + }

What disables the perf context if this was the last CPU on the node?

> + return 0;
> +}
> +
> +static int nest_pmu_cpumask_init(void)
> +{
> + const struct cpumask *l_cpumask;
> + int cpu, nid;
> + int *cpus_opal_rc;
> +
> + if (!cpumask_empty(&nest_imc_cpumask))
> + return 0;

What's that for? Paranoia engineering?

> +
> + /*
> +  * Memory for OPAL call return value.
> +  */
> + cpus_opal_rc = kzalloc((sizeof(int) * nr_cpu_ids), GFP_KERNEL);
> + if (!cpus_opal_rc)
> + goto fail;
> +
> + /*
> +  * Nest PMUs are per-chip counters. So designate a cpu
> +  * from each chip for counter collection.
> +  */
> + for_each_online_node(nid) {
> + l_cpumask = cpumask_of_node(nid);
> +
> + /* designate first online cpu in this node */
> + cpu = cpumask_first(l_cpumask);
> + cpumask_set_cpu(cpu, &nest_imc_cpumask);
> + }

This is all unprotected against CPU hotplug.

> +
> + /* Initialize Nest PMUs in each node using designated cpus */
> + on_each_cpu_mask(&nest_imc_cpumask, (smp_call_func_t)nest_init,
> + (void *)cpus_opal_rc, 1);

What does this check on nodes which are not yet online and become online
later?

> + /* Check return value array for any OPAL call failure */
> + for_each_cpu(cpu, &nest_imc_cpumask) {

Races against CPU hotplug as well.

> + if (cpus_opal_rc[cpu])
> + goto fail;
> + }

Leaks cpus_opal_rc.

> +
> + cpuhp_setup_state(CPUHP_AP_PERF_POWERPC_NEST_ONLINE,
> +   "POWER_NEST_IMC_ONLINE",

Please use a proper prefixed text here. 

> +   ppc_nest_imc_cpu_online,
> +   ppc_nest_imc_cpu_offline);
> +
> + return 0;
> +
> +fail:
> + if (cpus_opal_rc)
> + kfree