Re: [PATCH V2 1/6] perf,core: allow invalid context events to be part of sw/hw groups

2015-04-20 Thread Andi Kleen
> If you don't care about sampling and only care about totals, then you
> can just open the events concurrently *without* grouping them, as I
> stated previously.

perf record doesn't really support that. We need some group reader
that runs regularly. The best choice for the leader is a CPU sample event,
which also has the advantage that it won't disturb idle states.

> > > From my PoV that violates group semantics, because now the events aren't
> > > always counting at the same time (which would be the reason I grouped
> > > them in the first place).
> > 
> > You never use the absolute value, just differences. The differences 
> > effectively count only when the group runs.
> 
> Except that the uncore PMU is counting during the periods the CPU PMU is
> disabled (e.g. when it is being programmed, read, or written). There's a
> race there that you cannot solve given the two are indepedent agents.

We get useful information, like

"memory bandwidth during the group run time".

You're right formulas between different PMUs are problematic though.
But in most cases time relation is good enough.

-Andi

-- 
a...@linux.intel.com -- Speaking for myself only.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V2 1/6] perf,core: allow invalid context events to be part of sw/hw groups

2015-04-20 Thread Mark Rutland
On Sat, Apr 18, 2015 at 01:47:06AM +0100, Andi Kleen wrote:
> > ... which would give you arbitrary skew, because one counter is
> > free-running and the other is not (when scheduling a context in or out we 
> > stop
> > the PMU)
> 
> Everyone just reads the counter and subtracts it from
> the last value they've seen.
> 
> That's the same how any other shared free running counter work,
> such as the standard timer.
> 
> The only thing that perf needs to enforce is that the counters
> are running with the same event. 
> 
> It also wouldn't work for sampling, but the uncore doesn't do
> sampling anyways.

If you don't care about sampling and only care about totals, then you
can just open the events concurrently *without* grouping them, as I
stated previously.

> > From my PoV that violates group semantics, because now the events aren't
> > always counting at the same time (which would be the reason I grouped
> > them in the first place).
> 
> You never use the absolute value, just differences. The differences 
> effectively count only when the group runs.

Except that the uncore PMU is counting during the periods the CPU PMU is
disabled (e.g. when it is being programmed, read, or written). There's a
race there that you cannot solve given the two are indepedent agents.

> > However, it is the case that you cannot offer group semantics.
> 
> I don't think that's true.

I believe that it is. I also believe it doesn't matter since you don't
care about those semantics anyway. Given that you can just open
independent events, which is already possible.

Mark.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V2 1/6] perf,core: allow invalid context events to be part of sw/hw groups

2015-04-20 Thread Mark Rutland
On Sat, Apr 18, 2015 at 01:47:06AM +0100, Andi Kleen wrote:
  ... which would give you arbitrary skew, because one counter is
  free-running and the other is not (when scheduling a context in or out we 
  stop
  the PMU)
 
 Everyone just reads the counter and subtracts it from
 the last value they've seen.
 
 That's the same how any other shared free running counter work,
 such as the standard timer.
 
 The only thing that perf needs to enforce is that the counters
 are running with the same event. 
 
 It also wouldn't work for sampling, but the uncore doesn't do
 sampling anyways.

If you don't care about sampling and only care about totals, then you
can just open the events concurrently *without* grouping them, as I
stated previously.

  From my PoV that violates group semantics, because now the events aren't
  always counting at the same time (which would be the reason I grouped
  them in the first place).
 
 You never use the absolute value, just differences. The differences 
 effectively count only when the group runs.

Except that the uncore PMU is counting during the periods the CPU PMU is
disabled (e.g. when it is being programmed, read, or written). There's a
race there that you cannot solve given the two are indepedent agents.

  However, it is the case that you cannot offer group semantics.
 
 I don't think that's true.

I believe that it is. I also believe it doesn't matter since you don't
care about those semantics anyway. Given that you can just open
independent events, which is already possible.

Mark.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V2 1/6] perf,core: allow invalid context events to be part of sw/hw groups

2015-04-20 Thread Andi Kleen
 If you don't care about sampling and only care about totals, then you
 can just open the events concurrently *without* grouping them, as I
 stated previously.

perf record doesn't really support that. We need some group reader
that runs regularly. The best choice for the leader is a CPU sample event,
which also has the advantage that it won't disturb idle states.

   From my PoV that violates group semantics, because now the events aren't
   always counting at the same time (which would be the reason I grouped
   them in the first place).
  
  You never use the absolute value, just differences. The differences 
  effectively count only when the group runs.
 
 Except that the uncore PMU is counting during the periods the CPU PMU is
 disabled (e.g. when it is being programmed, read, or written). There's a
 race there that you cannot solve given the two are indepedent agents.

We get useful information, like

memory bandwidth during the group run time.

You're right formulas between different PMUs are problematic though.
But in most cases time relation is good enough.

-Andi

-- 
a...@linux.intel.com -- Speaking for myself only.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V2 1/6] perf,core: allow invalid context events to be part of sw/hw groups

2015-04-17 Thread Andi Kleen
> ... which would give you arbitrary skew, because one counter is
> free-running and the other is not (when scheduling a context in or out we stop
> the PMU)

Everyone just reads the counter and subtracts it from
the last value they've seen.

That's the same how any other shared free running counter work,
such as the standard timer.

The only thing that perf needs to enforce is that the counters
are running with the same event. 

It also wouldn't work for sampling, but the uncore doesn't do
sampling anyways.

> From my PoV that violates group semantics, because now the events aren't
> always counting at the same time (which would be the reason I grouped
> them in the first place).

You never use the absolute value, just differences. The differences 
effectively count only when the group runs.

> However, it is the case that you cannot offer group semantics.

I don't think that's true.

-Andi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V2 1/6] perf,core: allow invalid context events to be part of sw/hw groups

2015-04-17 Thread Mark Rutland
On Thu, Apr 16, 2015 at 10:23:42PM +0100, Andi Kleen wrote:
> > From my PoV that makes sense. One is CPU-affine, the other is not, and
> > the two cannot be scheduled in the same PMU transaction by the nature of
> > the hardware. Fundamentally, you cannot provide group semantics due to
> > this.
> 
> Actually you can. Just use it like a free running counter, and the
> different groups sample it. This will work from the different CPUs,
> as long as the event is the same everywhere.

... which would give you arbitrary skew, because one counter is
free-running and the other is not (when scheduling a context in or out we stop
the PMU)

>From my PoV that violates group semantics, because now the events aren't
always counting at the same time (which would be the reason I grouped
them in the first place).

> The implemention may not be quite right yet, but the basic concept 
> should work, and is useful.

I can see that associating counts from different PMUs at points in time
may be useful, even if they aren't sampled at the precise same time, and
you have weaker guarantees than the current group semantics.

However, it is the case that you cannot offer group semantics.

Mark.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V2 1/6] perf,core: allow invalid context events to be part of sw/hw groups

2015-04-17 Thread Mark Rutland
On Thu, Apr 16, 2015 at 10:23:42PM +0100, Andi Kleen wrote:
  From my PoV that makes sense. One is CPU-affine, the other is not, and
  the two cannot be scheduled in the same PMU transaction by the nature of
  the hardware. Fundamentally, you cannot provide group semantics due to
  this.
 
 Actually you can. Just use it like a free running counter, and the
 different groups sample it. This will work from the different CPUs,
 as long as the event is the same everywhere.

... which would give you arbitrary skew, because one counter is
free-running and the other is not (when scheduling a context in or out we stop
the PMU)

From my PoV that violates group semantics, because now the events aren't
always counting at the same time (which would be the reason I grouped
them in the first place).

 The implemention may not be quite right yet, but the basic concept 
 should work, and is useful.

I can see that associating counts from different PMUs at points in time
may be useful, even if they aren't sampled at the precise same time, and
you have weaker guarantees than the current group semantics.

However, it is the case that you cannot offer group semantics.

Mark.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V2 1/6] perf,core: allow invalid context events to be part of sw/hw groups

2015-04-17 Thread Andi Kleen
 ... which would give you arbitrary skew, because one counter is
 free-running and the other is not (when scheduling a context in or out we stop
 the PMU)

Everyone just reads the counter and subtracts it from
the last value they've seen.

That's the same how any other shared free running counter work,
such as the standard timer.

The only thing that perf needs to enforce is that the counters
are running with the same event. 

It also wouldn't work for sampling, but the uncore doesn't do
sampling anyways.

 From my PoV that violates group semantics, because now the events aren't
 always counting at the same time (which would be the reason I grouped
 them in the first place).

You never use the absolute value, just differences. The differences 
effectively count only when the group runs.

 However, it is the case that you cannot offer group semantics.

I don't think that's true.

-Andi
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V2 1/6] perf,core: allow invalid context events to be part of sw/hw groups

2015-04-16 Thread Andi Kleen
> From my PoV that makes sense. One is CPU-affine, the other is not, and
> the two cannot be scheduled in the same PMU transaction by the nature of
> the hardware. Fundamentally, you cannot provide group semantics due to
> this.

Actually you can. Just use it like a free running counter, and the
different groups sample it. This will work from the different CPUs,
as long as the event is the same everywhere.

The implemention may not be quite right yet, but the basic concept 
should work, and is useful.

-Andi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V2 1/6] perf,core: allow invalid context events to be part of sw/hw groups

2015-04-16 Thread Mark Rutland
On Thu, Apr 16, 2015 at 05:31:40PM +0100, Mark Rutland wrote:
> Hi,
> 
> If you're going to fundamentally change the behaviour of
> perf_invalid_context, please Cc authors of other system PMU drivers.
> Intel aren't the only ones with such PMUs.
> 
> For instance, this affects the ARM CCI and CCN PMU drivers.

As an aside, is there any chance of a p...@vger.kernel.org list?

My work email becomes a little useless when subscribed to the full LKML
firehose, so while I'd like to keep up with perf core changes I don't
subscribe. In this case I stumbled across these patches by chance when
browsing an archive, then had to have the messages bounced to my
account.

Having a dedicated list would make things easier for myself, at least.

Mark.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V2 1/6] perf,core: allow invalid context events to be part of sw/hw groups

2015-04-16 Thread Mark Rutland
> Even if you ignore the fundamental semantics of groups, there are other
> problems with allowing shared contexts:
> 
> * The *_txn functions only get called on the group leader's PMU. If your
>   system PMU has these functions, they are not called.
> 
> * Event rotation is per ctx, but now you could have some events in a CPU
>   PMU's context, and some in the uncore PMU's context. So those can race
>   with each other.
> 
> * Throttling is also per-context. So those can race with each other too.

There's also a break down of behaviour: events in the uncore context
will get migrated to another CPU in the event of a hot unplug, while
events that are grouped with CPU events (and hence live in the CPU
context) will be destroyed.

Mark.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V2 1/6] perf,core: allow invalid context events to be part of sw/hw groups

2015-04-16 Thread Mark Rutland
Hi,

If you're going to fundamentally change the behaviour of
perf_invalid_context, please Cc authors of other system PMU drivers.
Intel aren't the only ones with such PMUs.

For instance, this affects the ARM CCI and CCN PMU drivers.

On Wed, Apr 15, 2015 at 08:56:11AM +0100, Kan Liang wrote:
> From: Kan Liang 
> 
> The pmu marked as perf_invalid_context don't have any state to switch on
> context switch. Everything is global. So it is OK to be part of sw/hw
> groups.
> In sched_out/sched_in, del/add must be called, so the
> perf_invalid_context event can be disabled/enabled accordingly during
> context switch. The event count only be read when the event is already
> sched_in.
> 
> However group read doesn't work with mix events.
> 
> For example,
> perf record -e '{cycles,uncore_imc_0/cas_count_read/}:S' -a sleep 1
> It always gets EINVAL.

>From my PoV that makes sense. One is CPU-affine, the other is not, and
the two cannot be scheduled in the same PMU transaction by the nature of
the hardware. Fundamentally, you cannot provide group semantics due to
this.

Even if you ignore the fundamental semantics of groups, there are other
problems with allowing shared contexts:

* The *_txn functions only get called on the group leader's PMU. If your
  system PMU has these functions, they are not called.

* Event rotation is per ctx, but now you could have some events in a CPU
  PMU's context, and some in the uncore PMU's context. So those can race
  with each other.

* Throttling is also per-context. So those can race with each other too.

> This patch set intends to fix this issue.
> perf record -e '{cycles,uncore_imc_0/cas_count_read/}:S' -a sleep 1
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.202 MB perf.data (12 samples) ]

You can already count the events concurrently without grouping them, and
the above implies that this patch just ends up misleading the user
w.r.t. group semantics.

If you want to be able to sample the events with a single read, then you
can attach the FDs.

I don't see that this solves a real problem. I see that it introduces a
new set of problems in addition to complicating existing code.

Mark.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH V2 1/6] perf,core: allow invalid context events to be part of sw/hw groups

2015-04-16 Thread Liang, Kan


> 
> On Wed, Apr 15, 2015 at 03:56:11AM -0400, Kan Liang wrote:
> > The event count only be read when the event is already sched_in.
> 
> Yeah, so no. This breaks what groups are. Group events _must_ be co-
> scheduled. You cannot guarantee you can schedule events from another
> PMU.

Why? I think it's similar as sw and hw mixed event group.
group_sched_in will co-schedule all events in sibling_list. 
Invalid context events is already added in sibling_list in
perf_install_in_context.
So all group events will be scheduled together. 
event->pmu guarantee proper add is called.
For invalid context events, everything is global. It never fails to schedule.

> 
> Also, I cannot see how this can possibly work, you cannot put these things
> on the same event_context.

Why not? For a sw and hw mixed event group, they use same
hw event_context. 

Actually, the invalid context events don't have any state to switch 
on context switch. They can attach to any event_context.

> 
> Also, how does this work wrt cpumasks, regular events are per cpu, uncore
> events are per node.

Currently, uncore events is only available on the first cpu of the node.
So if it's a hw and uncore mixed group, only cpu 0 do the group.
For other CPUs, there will be only hw event. 
Perf tool will guarantee it. Here are some examples.

[perf]$ sudo perf record -e '{cycles,uncore_imc_0/cas_count_read/}:S'
-C 0-4 -vvv -- sleep 1

perf_event_attr:
  size 112
  { sample_period, sample_freq }   4000
  sample_type  IP|TID|TIME|READ|ID|CPU|PERIOD
  read_format  ID|GROUP
  disabled 1
  mmap 1
  comm 1
  freq 1
  task 1
  sample_id_all1
  exclude_guest1
  mmap21
  comm_exec1

sys_perf_event_open: pid -1  cpu 0  group_fd -1  flags 0x8
sys_perf_event_open: pid -1  cpu 1  group_fd -1  flags 0x8
sys_perf_event_open: pid -1  cpu 2  group_fd -1  flags 0x8
sys_perf_event_open: pid -1  cpu 3  group_fd -1  flags 0x8
sys_perf_event_open: pid -1  cpu 4  group_fd -1  flags 0x8

perf_event_attr:
  type 15
  size 112
  config   304
  sample_type  IP|TID|TIME|READ|ID|CPU|PERIOD
  read_format  ID|GROUP
  freq 1
  sample_id_all1
  exclude_guest1

sys_perf_event_open: pid -1  cpu 0  group_fd 3  flags 0x8
mmap size 528384B
perf event ring buffer mmapped per cpu



[perf]$ sudo perf stat -e '{cycles,uncore_imc_0/cas_count_read/}' -C 0-4 
--per-core -- sleep 1

 Performance counter stats for 'CPU(s) 0-4':

S0-C0   11367712  cycles
(100.00%)
S0-C0   1   0.17 MiB  uncore_imc_0/cas_count_read/
S0-C1   1 982923  cycles
S0-C1   0   MiB  uncore_imc_0/cas_count_read/
S0-C2   1 958585  cycles
S0-C2   0   MiB  uncore_imc_0/cas_count_read/
S0-C3   1 978161  cycles
S0-C3   0   MiB  uncore_imc_0/cas_count_read/
S0-C4   1 976012  cycles
S0-C4   0   MiB  uncore_imc_0/cas_count_read/

   1.001151765 seconds time elapsed


> 
> There is so much broken stuff here without explanation its not funny.
> 
> Please explain how this does not completely wreck everything?

OK. I will refine the description when questions as above are addressed.

Thanks,
Kan
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH V2 1/6] perf,core: allow invalid context events to be part of sw/hw groups

2015-04-16 Thread Liang, Kan


 
 On Wed, Apr 15, 2015 at 03:56:11AM -0400, Kan Liang wrote:
  The event count only be read when the event is already sched_in.
 
 Yeah, so no. This breaks what groups are. Group events _must_ be co-
 scheduled. You cannot guarantee you can schedule events from another
 PMU.

Why? I think it's similar as sw and hw mixed event group.
group_sched_in will co-schedule all events in sibling_list. 
Invalid context events is already added in sibling_list in
perf_install_in_context.
So all group events will be scheduled together. 
event-pmu guarantee proper add is called.
For invalid context events, everything is global. It never fails to schedule.

 
 Also, I cannot see how this can possibly work, you cannot put these things
 on the same event_context.

Why not? For a sw and hw mixed event group, they use same
hw event_context. 

Actually, the invalid context events don't have any state to switch 
on context switch. They can attach to any event_context.

 
 Also, how does this work wrt cpumasks, regular events are per cpu, uncore
 events are per node.

Currently, uncore events is only available on the first cpu of the node.
So if it's a hw and uncore mixed group, only cpu 0 do the group.
For other CPUs, there will be only hw event. 
Perf tool will guarantee it. Here are some examples.

[perf]$ sudo perf record -e '{cycles,uncore_imc_0/cas_count_read/}:S'
-C 0-4 -vvv -- sleep 1

perf_event_attr:
  size 112
  { sample_period, sample_freq }   4000
  sample_type  IP|TID|TIME|READ|ID|CPU|PERIOD
  read_format  ID|GROUP
  disabled 1
  mmap 1
  comm 1
  freq 1
  task 1
  sample_id_all1
  exclude_guest1
  mmap21
  comm_exec1

sys_perf_event_open: pid -1  cpu 0  group_fd -1  flags 0x8
sys_perf_event_open: pid -1  cpu 1  group_fd -1  flags 0x8
sys_perf_event_open: pid -1  cpu 2  group_fd -1  flags 0x8
sys_perf_event_open: pid -1  cpu 3  group_fd -1  flags 0x8
sys_perf_event_open: pid -1  cpu 4  group_fd -1  flags 0x8

perf_event_attr:
  type 15
  size 112
  config   304
  sample_type  IP|TID|TIME|READ|ID|CPU|PERIOD
  read_format  ID|GROUP
  freq 1
  sample_id_all1
  exclude_guest1

sys_perf_event_open: pid -1  cpu 0  group_fd 3  flags 0x8
mmap size 528384B
perf event ring buffer mmapped per cpu



[perf]$ sudo perf stat -e '{cycles,uncore_imc_0/cas_count_read/}' -C 0-4 
--per-core -- sleep 1

 Performance counter stats for 'CPU(s) 0-4':

S0-C0   11367712  cycles
(100.00%)
S0-C0   1   0.17 MiB  uncore_imc_0/cas_count_read/
S0-C1   1 982923  cycles
S0-C1   0  not counted MiB  uncore_imc_0/cas_count_read/
S0-C2   1 958585  cycles
S0-C2   0  not counted MiB  uncore_imc_0/cas_count_read/
S0-C3   1 978161  cycles
S0-C3   0  not counted MiB  uncore_imc_0/cas_count_read/
S0-C4   1 976012  cycles
S0-C4   0  not counted MiB  uncore_imc_0/cas_count_read/

   1.001151765 seconds time elapsed


 
 There is so much broken stuff here without explanation its not funny.
 
 Please explain how this does not completely wreck everything?

OK. I will refine the description when questions as above are addressed.

Thanks,
Kan
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V2 1/6] perf,core: allow invalid context events to be part of sw/hw groups

2015-04-16 Thread Mark Rutland
Hi,

If you're going to fundamentally change the behaviour of
perf_invalid_context, please Cc authors of other system PMU drivers.
Intel aren't the only ones with such PMUs.

For instance, this affects the ARM CCI and CCN PMU drivers.

On Wed, Apr 15, 2015 at 08:56:11AM +0100, Kan Liang wrote:
 From: Kan Liang kan.li...@intel.com
 
 The pmu marked as perf_invalid_context don't have any state to switch on
 context switch. Everything is global. So it is OK to be part of sw/hw
 groups.
 In sched_out/sched_in, del/add must be called, so the
 perf_invalid_context event can be disabled/enabled accordingly during
 context switch. The event count only be read when the event is already
 sched_in.
 
 However group read doesn't work with mix events.
 
 For example,
 perf record -e '{cycles,uncore_imc_0/cas_count_read/}:S' -a sleep 1
 It always gets EINVAL.

From my PoV that makes sense. One is CPU-affine, the other is not, and
the two cannot be scheduled in the same PMU transaction by the nature of
the hardware. Fundamentally, you cannot provide group semantics due to
this.

Even if you ignore the fundamental semantics of groups, there are other
problems with allowing shared contexts:

* The *_txn functions only get called on the group leader's PMU. If your
  system PMU has these functions, they are not called.

* Event rotation is per ctx, but now you could have some events in a CPU
  PMU's context, and some in the uncore PMU's context. So those can race
  with each other.

* Throttling is also per-context. So those can race with each other too.

 This patch set intends to fix this issue.
 perf record -e '{cycles,uncore_imc_0/cas_count_read/}:S' -a sleep 1
 [ perf record: Woken up 1 times to write data ]
 [ perf record: Captured and wrote 0.202 MB perf.data (12 samples) ]

You can already count the events concurrently without grouping them, and
the above implies that this patch just ends up misleading the user
w.r.t. group semantics.

If you want to be able to sample the events with a single read, then you
can attach the FDs.

I don't see that this solves a real problem. I see that it introduces a
new set of problems in addition to complicating existing code.

Mark.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V2 1/6] perf,core: allow invalid context events to be part of sw/hw groups

2015-04-16 Thread Mark Rutland
On Thu, Apr 16, 2015 at 05:31:40PM +0100, Mark Rutland wrote:
 Hi,
 
 If you're going to fundamentally change the behaviour of
 perf_invalid_context, please Cc authors of other system PMU drivers.
 Intel aren't the only ones with such PMUs.
 
 For instance, this affects the ARM CCI and CCN PMU drivers.

As an aside, is there any chance of a p...@vger.kernel.org list?

My work email becomes a little useless when subscribed to the full LKML
firehose, so while I'd like to keep up with perf core changes I don't
subscribe. In this case I stumbled across these patches by chance when
browsing an archive, then had to have the messages bounced to my
account.

Having a dedicated list would make things easier for myself, at least.

Mark.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V2 1/6] perf,core: allow invalid context events to be part of sw/hw groups

2015-04-16 Thread Mark Rutland
 Even if you ignore the fundamental semantics of groups, there are other
 problems with allowing shared contexts:
 
 * The *_txn functions only get called on the group leader's PMU. If your
   system PMU has these functions, they are not called.
 
 * Event rotation is per ctx, but now you could have some events in a CPU
   PMU's context, and some in the uncore PMU's context. So those can race
   with each other.
 
 * Throttling is also per-context. So those can race with each other too.

There's also a break down of behaviour: events in the uncore context
will get migrated to another CPU in the event of a hot unplug, while
events that are grouped with CPU events (and hence live in the CPU
context) will be destroyed.

Mark.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V2 1/6] perf,core: allow invalid context events to be part of sw/hw groups

2015-04-16 Thread Andi Kleen
 From my PoV that makes sense. One is CPU-affine, the other is not, and
 the two cannot be scheduled in the same PMU transaction by the nature of
 the hardware. Fundamentally, you cannot provide group semantics due to
 this.

Actually you can. Just use it like a free running counter, and the
different groups sample it. This will work from the different CPUs,
as long as the event is the same everywhere.

The implemention may not be quite right yet, but the basic concept 
should work, and is useful.

-Andi
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V2 1/6] perf,core: allow invalid context events to be part of sw/hw groups

2015-04-15 Thread Peter Zijlstra
On Wed, Apr 15, 2015 at 03:56:11AM -0400, Kan Liang wrote:
> The event count only be read when the event is already
> sched_in.

Yeah, so no. This breaks what groups are. Group events _must_ be
co-scheduled. You cannot guarantee you can schedule events from another
PMU.

Also, I cannot see how this can possibly work, you cannot put these
things on the same event_context.

Also, how does this work wrt cpumasks, regular events are per cpu,
uncore events are per node.

There is so much broken stuff here without explanation its not funny.

Please explain how this does not completely wreck everything?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V2 1/6] perf,core: allow invalid context events to be part of sw/hw groups

2015-04-15 Thread Peter Zijlstra
On Wed, Apr 15, 2015 at 06:21:11PM +0200, Andi Kleen wrote:
> On Wed, Apr 15, 2015 at 06:15:28PM +0200, Peter Zijlstra wrote:
> > On Wed, Apr 15, 2015 at 03:56:11AM -0400, Kan Liang wrote:
> > > From: Kan Liang 
> > > 
> > > The pmu marked as perf_invalid_context don't have any state to switch on
> > > context switch. Everything is global. So it is OK to be part of sw/hw
> > > groups.
> > > In sched_out/sched_in, del/add must be called, so the
> > > perf_invalid_context event can be disabled/enabled accordingly during
> > > context switch. The event count only be read when the event is already
> > > sched_in.
> > > 
> > > However group read doesn't work with mix events.
> > > 
> > > For example,
> > > perf record -e '{cycles,uncore_imc_0/cas_count_read/}:S' -a sleep 1
> > > It always gets EINVAL.
> > > 
> > > This patch set intends to fix this issue.
> > > perf record -e '{cycles,uncore_imc_0/cas_count_read/}:S' -a sleep 1
> > > [ perf record: Woken up 1 times to write data ]
> > > [ perf record: Captured and wrote 0.202 MB perf.data (12 samples) ]
> > > 
> > > This patch special case invalid context events and allow them to be part
> > > of sw/hw groups.
> > 
> > I don't get it. What, Why?
> 
> Without the patch you can't mix uncore and cpu core events in the same
> group.
> 
> Collecting uncore in PMIs is useful, for example to get memory
> bandwidth over time.

Well, start with a coherent changelog, why do you still think those are
optional?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V2 1/6] perf,core: allow invalid context events to be part of sw/hw groups

2015-04-15 Thread Andi Kleen
On Wed, Apr 15, 2015 at 06:15:28PM +0200, Peter Zijlstra wrote:
> On Wed, Apr 15, 2015 at 03:56:11AM -0400, Kan Liang wrote:
> > From: Kan Liang 
> > 
> > The pmu marked as perf_invalid_context don't have any state to switch on
> > context switch. Everything is global. So it is OK to be part of sw/hw
> > groups.
> > In sched_out/sched_in, del/add must be called, so the
> > perf_invalid_context event can be disabled/enabled accordingly during
> > context switch. The event count only be read when the event is already
> > sched_in.
> > 
> > However group read doesn't work with mix events.
> > 
> > For example,
> > perf record -e '{cycles,uncore_imc_0/cas_count_read/}:S' -a sleep 1
> > It always gets EINVAL.
> > 
> > This patch set intends to fix this issue.
> > perf record -e '{cycles,uncore_imc_0/cas_count_read/}:S' -a sleep 1
> > [ perf record: Woken up 1 times to write data ]
> > [ perf record: Captured and wrote 0.202 MB perf.data (12 samples) ]
> > 
> > This patch special case invalid context events and allow them to be part
> > of sw/hw groups.
> 
> I don't get it. What, Why?

Without the patch you can't mix uncore and cpu core events in the same
group.

Collecting uncore in PMIs is useful, for example to get memory
bandwidth over time.

-Andi

-- 
a...@linux.intel.com -- Speaking for myself only.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V2 1/6] perf,core: allow invalid context events to be part of sw/hw groups

2015-04-15 Thread Peter Zijlstra
On Wed, Apr 15, 2015 at 03:56:11AM -0400, Kan Liang wrote:
> From: Kan Liang 
> 
> The pmu marked as perf_invalid_context don't have any state to switch on
> context switch. Everything is global. So it is OK to be part of sw/hw
> groups.
> In sched_out/sched_in, del/add must be called, so the
> perf_invalid_context event can be disabled/enabled accordingly during
> context switch. The event count only be read when the event is already
> sched_in.
> 
> However group read doesn't work with mix events.
> 
> For example,
> perf record -e '{cycles,uncore_imc_0/cas_count_read/}:S' -a sleep 1
> It always gets EINVAL.
> 
> This patch set intends to fix this issue.
> perf record -e '{cycles,uncore_imc_0/cas_count_read/}:S' -a sleep 1
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.202 MB perf.data (12 samples) ]
> 
> This patch special case invalid context events and allow them to be part
> of sw/hw groups.

I don't get it. What, Why?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V2 1/6] perf,core: allow invalid context events to be part of sw/hw groups

2015-04-15 Thread Andi Kleen
On Wed, Apr 15, 2015 at 06:15:28PM +0200, Peter Zijlstra wrote:
 On Wed, Apr 15, 2015 at 03:56:11AM -0400, Kan Liang wrote:
  From: Kan Liang kan.li...@intel.com
  
  The pmu marked as perf_invalid_context don't have any state to switch on
  context switch. Everything is global. So it is OK to be part of sw/hw
  groups.
  In sched_out/sched_in, del/add must be called, so the
  perf_invalid_context event can be disabled/enabled accordingly during
  context switch. The event count only be read when the event is already
  sched_in.
  
  However group read doesn't work with mix events.
  
  For example,
  perf record -e '{cycles,uncore_imc_0/cas_count_read/}:S' -a sleep 1
  It always gets EINVAL.
  
  This patch set intends to fix this issue.
  perf record -e '{cycles,uncore_imc_0/cas_count_read/}:S' -a sleep 1
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.202 MB perf.data (12 samples) ]
  
  This patch special case invalid context events and allow them to be part
  of sw/hw groups.
 
 I don't get it. What, Why?

Without the patch you can't mix uncore and cpu core events in the same
group.

Collecting uncore in PMIs is useful, for example to get memory
bandwidth over time.

-Andi

-- 
a...@linux.intel.com -- Speaking for myself only.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V2 1/6] perf,core: allow invalid context events to be part of sw/hw groups

2015-04-15 Thread Peter Zijlstra
On Wed, Apr 15, 2015 at 06:21:11PM +0200, Andi Kleen wrote:
 On Wed, Apr 15, 2015 at 06:15:28PM +0200, Peter Zijlstra wrote:
  On Wed, Apr 15, 2015 at 03:56:11AM -0400, Kan Liang wrote:
   From: Kan Liang kan.li...@intel.com
   
   The pmu marked as perf_invalid_context don't have any state to switch on
   context switch. Everything is global. So it is OK to be part of sw/hw
   groups.
   In sched_out/sched_in, del/add must be called, so the
   perf_invalid_context event can be disabled/enabled accordingly during
   context switch. The event count only be read when the event is already
   sched_in.
   
   However group read doesn't work with mix events.
   
   For example,
   perf record -e '{cycles,uncore_imc_0/cas_count_read/}:S' -a sleep 1
   It always gets EINVAL.
   
   This patch set intends to fix this issue.
   perf record -e '{cycles,uncore_imc_0/cas_count_read/}:S' -a sleep 1
   [ perf record: Woken up 1 times to write data ]
   [ perf record: Captured and wrote 0.202 MB perf.data (12 samples) ]
   
   This patch special case invalid context events and allow them to be part
   of sw/hw groups.
  
  I don't get it. What, Why?
 
 Without the patch you can't mix uncore and cpu core events in the same
 group.
 
 Collecting uncore in PMIs is useful, for example to get memory
 bandwidth over time.

Well, start with a coherent changelog, why do you still think those are
optional?
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V2 1/6] perf,core: allow invalid context events to be part of sw/hw groups

2015-04-15 Thread Peter Zijlstra
On Wed, Apr 15, 2015 at 03:56:11AM -0400, Kan Liang wrote:
 From: Kan Liang kan.li...@intel.com
 
 The pmu marked as perf_invalid_context don't have any state to switch on
 context switch. Everything is global. So it is OK to be part of sw/hw
 groups.
 In sched_out/sched_in, del/add must be called, so the
 perf_invalid_context event can be disabled/enabled accordingly during
 context switch. The event count only be read when the event is already
 sched_in.
 
 However group read doesn't work with mix events.
 
 For example,
 perf record -e '{cycles,uncore_imc_0/cas_count_read/}:S' -a sleep 1
 It always gets EINVAL.
 
 This patch set intends to fix this issue.
 perf record -e '{cycles,uncore_imc_0/cas_count_read/}:S' -a sleep 1
 [ perf record: Woken up 1 times to write data ]
 [ perf record: Captured and wrote 0.202 MB perf.data (12 samples) ]
 
 This patch special case invalid context events and allow them to be part
 of sw/hw groups.

I don't get it. What, Why?
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V2 1/6] perf,core: allow invalid context events to be part of sw/hw groups

2015-04-15 Thread Peter Zijlstra
On Wed, Apr 15, 2015 at 03:56:11AM -0400, Kan Liang wrote:
 The event count only be read when the event is already
 sched_in.

Yeah, so no. This breaks what groups are. Group events _must_ be
co-scheduled. You cannot guarantee you can schedule events from another
PMU.

Also, I cannot see how this can possibly work, you cannot put these
things on the same event_context.

Also, how does this work wrt cpumasks, regular events are per cpu,
uncore events are per node.

There is so much broken stuff here without explanation its not funny.

Please explain how this does not completely wreck everything?
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/