Re: [PATCH v3 7/8] perf: Define PMU_TXN_READ interface

2015-07-16 Thread Peter Zijlstra
On Tue, Jul 14, 2015 at 08:01:54PM -0700, Sukadev Bhattiprolu wrote:
> +/*
> + * Use the transaction interface to read the group of events in @leader.
> + * PMUs like the 24x7 counters in Power, can use this to queue the events
> + * in the ->read() operation and perform the actual read in ->commit_txn.
> + *
> + * Other PMUs can ignore the ->start_txn and ->commit_txn and read each
> + * PMU directly in the ->read() operation.
> + */
> +static int perf_event_read_group(struct perf_event *leader)
> +{
> + int ret;
> + struct perf_event *sub;
> + struct pmu *pmu;
> +
> + pmu = leader->pmu;
> +
> + pmu->start_txn(pmu, PERF_PMU_TXN_READ);
> +
> + perf_event_read(leader);

There should be a lockdep assert with that list iteration.

> + list_for_each_entry(sub, &leader->sibling_list, group_entry)
> + perf_event_read(sub);
> +
> + ret = pmu->commit_txn(pmu);
> +
> + return ret;
> +}
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3 7/8] perf: Define PMU_TXN_READ interface

2015-07-21 Thread Sukadev Bhattiprolu
Peter Zijlstra [pet...@infradead.org] wrote:
| On Tue, Jul 14, 2015 at 08:01:54PM -0700, Sukadev Bhattiprolu wrote:
| > +/*
| > + * Use the transaction interface to read the group of events in @leader.
| > + * PMUs like the 24x7 counters in Power, can use this to queue the events
| > + * in the ->read() operation and perform the actual read in ->commit_txn.
| > + *
| > + * Other PMUs can ignore the ->start_txn and ->commit_txn and read each
| > + * PMU directly in the ->read() operation.
| > + */
| > +static int perf_event_read_group(struct perf_event *leader)
| > +{
| > +   int ret;
| > +   struct perf_event *sub;
| > +   struct pmu *pmu;
| > +
| > +   pmu = leader->pmu;
| > +
| > +   pmu->start_txn(pmu, PERF_PMU_TXN_READ);
| > +
| > +   perf_event_read(leader);
| 
| There should be a lockdep assert with that list iteration.
| 
| > +   list_for_each_entry(sub, &leader->sibling_list, group_entry)
| > +   perf_event_read(sub);
| > +
| > +   ret = pmu->commit_txn(pmu);

Peter,

I have a situation :-)

We are trying to use the following interface:

start_txn(pmu, PERF_PMU_TXN_READ);

perf_event_read(leader);
list_for_each(sibling, &leader->sibling_list, group_entry)
perf_event_read(sibling)

pmu->commit_txn(pmu);

with the idea that the PMU driver would save the type of transaction in
->start_txn() and use in ->read() and ->commit_txn().

But since ->start_txn() and the ->read() operations could happen on different
CPUs (perf_event_read() uses the event->oncpu to schedule a call), the PMU
driver cannot use a per-cpu variable to save the state in ->start_txn().

I tried using a pmu-wide global, but that would also need us to hold a mutex
to serialize access to that global. The problem is ->start_txn() can be
called from an interrupt context for the TXN_ADD transactions (I got the
following backtrace during testing)

mutex_lock_nested+0x504/0x520 (unreliable)
h_24x7_event_start_txn+0x3c/0xd0
group_sched_in+0x70/0x230
ctx_sched_in.isra.63+0x150/0x230
__perf_install_in_context+0x1c8/0x1e0
remote_function+0x7c/0xa0
flush_smp_call_function_queue+0xb0/0x1d0
smp_ipi_demux+0x88/0xf0
icp_hv_ipi_action+0x54/0xc0
handle_irq_event_percpu+0x98/0x2b0
handle_percpu_irq+0x7c/0xc0
generic_handle_irq+0x4c/0x80
__do_irq+0x7c/0x190
call_do_irq+0x14/0x24
do_IRQ+0x8c/0x100
hardware_interrupt_common+0x168/0x180
--- interrupt: 501 at .plpar_hcall_norets+0x14/0x20

Basically stuck trying to save the txn type in ->start_txn() and retrieve in
->read().

Couple of options I can think of are:

- having ->start_txn() return a handle that should then be passed in
  with ->read() (yuck) and ->commit_txn().

- serialize the READ transaction for the PMU in perf_event_read_group()
  with a new pmu->txn_mutex:

mutex_lock(&pmu->txn_mutex);

pmu->start_txn()
list_for_each_entry(sub, &leader->sibling_list, group_entry)
perf_event_read(sub);

ret = pmu->commit_txn(pmu);

mutex_unlock(&pmu->txn_mutex);

  such serialization would be ok with 24x7 counters (they are system
  wide counters anyway) We could maybe skip the mutex for PMUs that
  don't implement TXN_READ interface.

or is there better way?

Sukadev

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3 7/8] perf: Define PMU_TXN_READ interface

2015-07-21 Thread Peter Zijlstra
On Tue, Jul 21, 2015 at 06:50:45PM -0700, Sukadev Bhattiprolu wrote:
> We are trying to use the following interface:
> 
>   start_txn(pmu, PERF_PMU_TXN_READ);
> 
>   perf_event_read(leader);
>   list_for_each(sibling, &leader->sibling_list, group_entry)
>   perf_event_read(sibling)
> 
>   pmu->commit_txn(pmu);
> 
> with the idea that the PMU driver would save the type of transaction in
> ->start_txn() and use in ->read() and ->commit_txn().
> 
> But since ->start_txn() and the ->read() operations could happen on different
> CPUs (perf_event_read() uses the event->oncpu to schedule a call), the PMU
> driver cannot use a per-cpu variable to save the state in ->start_txn().

> or is there better way?


I've not woken up yet, and not actually fully read the email, but can
you stuff the entire above chunk inside the IPI?

I think you could then actually optimize __perf_event_read() as well,
because all these events should be on the same context, so no point in
calling update_*time*() for every event or so.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3 7/8] perf: Define PMU_TXN_READ interface

2015-07-22 Thread Sukadev Bhattiprolu
Peter Zijlstra [pet...@infradead.org] wrote:
| I've not woken up yet, and not actually fully read the email, but can
| you stuff the entire above chunk inside the IPI?
| 
| I think you could then actually optimize __perf_event_read() as well,
| because all these events should be on the same context, so no point in
| calling update_*time*() for every event or so.
| 

Do you mean something like this (will move the rename to a separate
patch before posting):
--

From e8eddb5d3877ebdb3b71213a00aaa980f4010dd0 Mon Sep 17 00:00:00 2001
From: Sukadev Bhattiprolu 
Date: Tue, 7 Jul 2015 21:45:23 -0400
Subject: [PATCH 1/1] perf: Define PMU_TXN_READ interface

Define a new PERF_PMU_TXN_READ interface to read a group of counters
at once. Note that we use this interface with all PMUs.

PMUs that implement this interface use the ->read() operation to _queue_
the counters to be read and use ->commit_txn() to actually read all the
queued counters at once.

PMUs that don't implement PERF_PMU_TXN_READ ignore ->start_txn() and
->commit_txn() and continue to read counters one at a time.

Thanks to input from Peter Zijlstra.

Signed-off-by: Sukadev Bhattiprolu 
---

Changelog[v5]
[Peter Zijlstra] Ensure the entire transaction happens on the same CPU.

Changelog[v4]
[Peter Zijlstra] Add lockdep_assert_held() in perf_event_read_group()
---
 include/linux/perf_event.h |1 +
 kernel/events/core.c   |   72 +---
 2 files changed, 62 insertions(+), 11 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 44bf05f..da307ad 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -169,6 +169,7 @@ struct perf_event;
 #define PERF_EVENT_TXN 0x1
 
 #define PERF_PMU_TXN_ADD  0x1  /* txn to add/schedule event on PMU */
+#define PERF_PMU_TXN_READ 0x2  /* txn to read event group from PMU */
 
 /**
  * pmu::capabilities flags
diff --git a/kernel/events/core.c b/kernel/events/core.c
index a6bd09d..7177dd8 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -3174,12 +3174,8 @@ void perf_event_exec(void)
rcu_read_unlock();
 }
 
-/*
- * Cross CPU call to read the hardware event
- */
-static void __perf_event_read(void *info)
+static void __perf_event_read(struct perf_event *event, int update_ctx)
 {
-   struct perf_event *event = info;
struct perf_event_context *ctx = event->ctx;
struct perf_cpu_context *cpuctx = __get_cpu_context(ctx);
 
@@ -3194,7 +3190,7 @@ static void __perf_event_read(void *info)
return;
 
raw_spin_lock(&ctx->lock);
-   if (ctx->is_active) {
+   if (ctx->is_active && update_ctx) {
update_context_time(ctx);
update_cgrp_time_from_event(event);
}
@@ -3204,6 +3200,16 @@ static void __perf_event_read(void *info)
raw_spin_unlock(&ctx->lock);
 }
 
+/*
+ * Cross CPU call to read the hardware event
+ */
+static void __perf_event_read_ipi(void *info)
+{
+   struct perf_event *event = info;
+
+   __perf_event_read(event, 1);
+}
+
 static inline u64 perf_event_count(struct perf_event *event)
 {
if (event->pmu->count)
@@ -3220,7 +3226,7 @@ static void perf_event_read(struct perf_event *event)
 */
if (event->state == PERF_EVENT_STATE_ACTIVE) {
smp_call_function_single(event->oncpu,
-__perf_event_read, event, 1);
+__perf_event_read_ipi, event, 1);
} else if (event->state == PERF_EVENT_STATE_INACTIVE) {
struct perf_event_context *ctx = event->ctx;
unsigned long flags;
@@ -3765,6 +3771,36 @@ static void orphans_remove_work(struct work_struct *work)
put_ctx(ctx);
 }
 
+/*
+ * Use the transaction interface to read the group of events in @leader.
+ * PMUs like the 24x7 counters in Power, can use this to queue the events
+ * in the ->read() operation and perform the actual read in ->commit_txn.
+ *
+ * Other PMUs can ignore the ->start_txn and ->commit_txn and read each
+ * PMU directly in the ->read() operation.
+ */
+static int perf_event_read_group(struct perf_event *leader)
+{
+   int ret;
+   struct perf_event *sub;
+   struct pmu *pmu;
+   struct perf_event_context *ctx = leader->ctx;
+
+   lockdep_assert_held(&ctx->mutex);
+
+   pmu = leader->pmu;
+
+   pmu->start_txn(pmu, PERF_PMU_TXN_READ);
+
+   __perf_event_read(leader, 1);
+   list_for_each_entry(sub, &leader->sibling_list, group_entry)
+   __perf_event_read(sub, 0);
+
+   ret = pmu->commit_txn(pmu);
+
+   return ret;
+}
+
 u64 perf_event_read_value(struct perf_event *event, u64 *enabled, u64 *running)
 {
u64 total = 0;
@@ -3794,7 +3830,17 @@ static int perf_read_group(struct perf_event *event,
 
lockdep_assert_held(&ctx->mutex);
 
-   count = perf_event_read_value(leader, &enabled, &running

Re: [PATCH v3 7/8] perf: Define PMU_TXN_READ interface

2015-07-23 Thread Peter Zijlstra
On Wed, Jul 22, 2015 at 04:19:16PM -0700, Sukadev Bhattiprolu wrote:
> Peter Zijlstra [pet...@infradead.org] wrote:
> | I've not woken up yet, and not actually fully read the email, but can
> | you stuff the entire above chunk inside the IPI?
> | 
> | I think you could then actually optimize __perf_event_read() as well,
> | because all these events should be on the same context, so no point in
> | calling update_*time*() for every event or so.
> | 
> 
> Do you mean something like this (will move the rename to a separate
> patch before posting):

More like so.. please double check, I've not even had tea yet.

--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -3174,14 +3174,22 @@ void perf_event_exec(void)
rcu_read_unlock();
 }
 
+struct perf_read_data {
+   struct perf_event *event;
+   bool group;
+   int ret;
+};
+
 /*
  * Cross CPU call to read the hardware event
  */
 static void __perf_event_read(void *info)
 {
-   struct perf_event *event = info;
+   struct perf_read_data *data = info;
+   struct perf_event *sub, *event = data->event;
struct perf_event_context *ctx = event->ctx;
struct perf_cpu_context *cpuctx = __get_cpu_context(ctx);
+   struct pmu *pmu = event->pmu;
 
/*
 * If this is a task context, we need to check whether it is
@@ -3199,8 +3207,23 @@ static void __perf_event_read(void *info
update_cgrp_time_from_event(event);
}
update_event_times(event);
-   if (event->state == PERF_EVENT_STATE_ACTIVE)
-   event->pmu->read(event);
+   if (event->state != PERF_EVENT_STATE_ACTIVE)
+   goto unlock;
+
+   if (!data->group) {
+   pmu->read(event);
+   goto unlock;
+   }
+
+   pmu->start_txn(pmu, PERF_PMU_TXN_READ);
+   pmu->read(event);
+   list_for_each_entry(sub, &event->sibling_list, group_entry) {
+   if (sub->state == PERF_EVENT_STATE_ACTIVE)
+   pmu->read(sub);
+   }
+   data->ret = pmu->commit_txn(pmu);
+
+unlock:
raw_spin_unlock(&ctx->lock);
 }
 
@@ -3212,15 +3235,23 @@ static inline u64 perf_event_count(struc
return __perf_event_count(event);
 }
 
-static void perf_event_read(struct perf_event *event)
+static int perf_event_read(struct perf_event *event, bool group)
 {
+   int ret = 0
+
/*
 * If event is enabled and currently active on a CPU, update the
 * value in the event structure:
 */
if (event->state == PERF_EVENT_STATE_ACTIVE) {
+   struct perf_read_data data = {
+   .event = event,
+   .group = group,
+   .ret = 0,
+   };
smp_call_function_single(event->oncpu,
-__perf_event_read, event, 1);
+__perf_event_read, &data, 1);
+   ret = data.ret;
} else if (event->state == PERF_EVENT_STATE_INACTIVE) {
struct perf_event_context *ctx = event->ctx;
unsigned long flags;
@@ -3235,9 +3266,14 @@ static void perf_event_read(struct perf_
update_context_time(ctx);
update_cgrp_time_from_event(event);
}
-   update_event_times(event);
+   if (group)
+   update_group_times(event);
+   else
+   update_event_times(event);
raw_spin_unlock_irqrestore(&ctx->lock, flags);
}
+
+   return ret;
 }
 
 /*
@@ -3718,7 +3754,6 @@ static u64 perf_event_compute(struct per
atomic64_read(&event->child_total_time_running);
 
list_for_each_entry(child, &event->child_list, child_list) {
-   perf_event_read(child);
total += perf_event_count(child);
*enabled += child->total_time_enabled;
*running += child->total_time_running;
@@ -3772,7 +3807,7 @@ u64 perf_event_read_value(struct perf_ev
 
mutex_lock(&event->child_mutex);
 
-   perf_event_read(event);
+   perf_event_read(event, false);
total = perf_event_compute(event, enabled, running);
 
mutex_unlock(&event->child_mutex);
@@ -3792,7 +3827,11 @@ static int perf_read_group(struct perf_e
 
lockdep_assert_held(&ctx->mutex);
 
-   count = perf_event_read_value(leader, &enabled, &running);
+   ret = perf_event_read(leader, true);
+   if (ret)
+   return ret;
+
+   count = perf_event_compute(leader, &enabled, &running);
 
values[n++] = 1 + leader->nr_siblings;
if (read_format & PERF_FORMAT_TOTAL_TIME_ENABLED)
@@ -3813,7 +3852,7 @@ static int perf_read_group(struct perf_e
list_for_each_entry(sub, &leader->sibling_list, group_entry) {
n = 0;
 
-   values[n++] = perf_event_read_value(sub, &enabled, &runnin

Re: [PATCH v3 7/8] perf: Define PMU_TXN_READ interface

2015-07-23 Thread Sukadev Bhattiprolu
Peter Zijlstra [pet...@infradead.org] wrote:
| On Wed, Jul 22, 2015 at 04:19:16PM -0700, Sukadev Bhattiprolu wrote:
| > Peter Zijlstra [pet...@infradead.org] wrote:
| > | I've not woken up yet, and not actually fully read the email, but can
| > | you stuff the entire above chunk inside the IPI?
| > | 
| > | I think you could then actually optimize __perf_event_read() as well,
| > | because all these events should be on the same context, so no point in
| > | calling update_*time*() for every event or so.
| > | 
| > 
| > Do you mean something like this (will move the rename to a separate
| > patch before posting):
| 
| More like so.. please double check, I've not even had tea yet.

Yeah, I realized I had ignored the 'event->cpu' spec.
Will try this out. Thanks,

Sukadev

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev