Re: [PATCH V2 1/6] perf,core: allow invalid context events to be part of sw/hw groups
> If you don't care about sampling and only care about totals, then you > can just open the events concurrently *without* grouping them, as I > stated previously. perf record doesn't really support that. We need some group reader that runs regularly. The best choice for the leader is a CPU sample event, which also has the advantage that it won't disturb idle states. > > > From my PoV that violates group semantics, because now the events aren't > > > always counting at the same time (which would be the reason I grouped > > > them in the first place). > > > > You never use the absolute value, just differences. The differences > > effectively count only when the group runs. > > Except that the uncore PMU is counting during the periods the CPU PMU is > disabled (e.g. when it is being programmed, read, or written). There's a > race there that you cannot solve given the two are indepedent agents. We get useful information, like "memory bandwidth during the group run time". You're right formulas between different PMUs are problematic though. But in most cases time relation is good enough. -Andi -- a...@linux.intel.com -- Speaking for myself only. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH V2 1/6] perf,core: allow invalid context events to be part of sw/hw groups
On Sat, Apr 18, 2015 at 01:47:06AM +0100, Andi Kleen wrote: > > ... which would give you arbitrary skew, because one counter is > > free-running and the other is not (when scheduling a context in or out we > > stop > > the PMU) > > Everyone just reads the counter and subtracts it from > the last value they've seen. > > That's the same how any other shared free running counter work, > such as the standard timer. > > The only thing that perf needs to enforce is that the counters > are running with the same event. > > It also wouldn't work for sampling, but the uncore doesn't do > sampling anyways. If you don't care about sampling and only care about totals, then you can just open the events concurrently *without* grouping them, as I stated previously. > > From my PoV that violates group semantics, because now the events aren't > > always counting at the same time (which would be the reason I grouped > > them in the first place). > > You never use the absolute value, just differences. The differences > effectively count only when the group runs. Except that the uncore PMU is counting during the periods the CPU PMU is disabled (e.g. when it is being programmed, read, or written). There's a race there that you cannot solve given the two are indepedent agents. > > However, it is the case that you cannot offer group semantics. > > I don't think that's true. I believe that it is. I also believe it doesn't matter since you don't care about those semantics anyway. Given that you can just open independent events, which is already possible. Mark. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH V2 1/6] perf,core: allow invalid context events to be part of sw/hw groups
On Sat, Apr 18, 2015 at 01:47:06AM +0100, Andi Kleen wrote: ... which would give you arbitrary skew, because one counter is free-running and the other is not (when scheduling a context in or out we stop the PMU) Everyone just reads the counter and subtracts it from the last value they've seen. That's the same how any other shared free running counter work, such as the standard timer. The only thing that perf needs to enforce is that the counters are running with the same event. It also wouldn't work for sampling, but the uncore doesn't do sampling anyways. If you don't care about sampling and only care about totals, then you can just open the events concurrently *without* grouping them, as I stated previously. From my PoV that violates group semantics, because now the events aren't always counting at the same time (which would be the reason I grouped them in the first place). You never use the absolute value, just differences. The differences effectively count only when the group runs. Except that the uncore PMU is counting during the periods the CPU PMU is disabled (e.g. when it is being programmed, read, or written). There's a race there that you cannot solve given the two are indepedent agents. However, it is the case that you cannot offer group semantics. I don't think that's true. I believe that it is. I also believe it doesn't matter since you don't care about those semantics anyway. Given that you can just open independent events, which is already possible. Mark. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH V2 1/6] perf,core: allow invalid context events to be part of sw/hw groups
If you don't care about sampling and only care about totals, then you can just open the events concurrently *without* grouping them, as I stated previously. perf record doesn't really support that. We need some group reader that runs regularly. The best choice for the leader is a CPU sample event, which also has the advantage that it won't disturb idle states. From my PoV that violates group semantics, because now the events aren't always counting at the same time (which would be the reason I grouped them in the first place). You never use the absolute value, just differences. The differences effectively count only when the group runs. Except that the uncore PMU is counting during the periods the CPU PMU is disabled (e.g. when it is being programmed, read, or written). There's a race there that you cannot solve given the two are indepedent agents. We get useful information, like memory bandwidth during the group run time. You're right formulas between different PMUs are problematic though. But in most cases time relation is good enough. -Andi -- a...@linux.intel.com -- Speaking for myself only. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH V2 1/6] perf,core: allow invalid context events to be part of sw/hw groups
> ... which would give you arbitrary skew, because one counter is > free-running and the other is not (when scheduling a context in or out we stop > the PMU) Everyone just reads the counter and subtracts it from the last value they've seen. That's the same how any other shared free running counter work, such as the standard timer. The only thing that perf needs to enforce is that the counters are running with the same event. It also wouldn't work for sampling, but the uncore doesn't do sampling anyways. > From my PoV that violates group semantics, because now the events aren't > always counting at the same time (which would be the reason I grouped > them in the first place). You never use the absolute value, just differences. The differences effectively count only when the group runs. > However, it is the case that you cannot offer group semantics. I don't think that's true. -Andi -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH V2 1/6] perf,core: allow invalid context events to be part of sw/hw groups
On Thu, Apr 16, 2015 at 10:23:42PM +0100, Andi Kleen wrote: > > From my PoV that makes sense. One is CPU-affine, the other is not, and > > the two cannot be scheduled in the same PMU transaction by the nature of > > the hardware. Fundamentally, you cannot provide group semantics due to > > this. > > Actually you can. Just use it like a free running counter, and the > different groups sample it. This will work from the different CPUs, > as long as the event is the same everywhere. ... which would give you arbitrary skew, because one counter is free-running and the other is not (when scheduling a context in or out we stop the PMU) >From my PoV that violates group semantics, because now the events aren't always counting at the same time (which would be the reason I grouped them in the first place). > The implemention may not be quite right yet, but the basic concept > should work, and is useful. I can see that associating counts from different PMUs at points in time may be useful, even if they aren't sampled at the precise same time, and you have weaker guarantees than the current group semantics. However, it is the case that you cannot offer group semantics. Mark. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH V2 1/6] perf,core: allow invalid context events to be part of sw/hw groups
On Thu, Apr 16, 2015 at 10:23:42PM +0100, Andi Kleen wrote: From my PoV that makes sense. One is CPU-affine, the other is not, and the two cannot be scheduled in the same PMU transaction by the nature of the hardware. Fundamentally, you cannot provide group semantics due to this. Actually you can. Just use it like a free running counter, and the different groups sample it. This will work from the different CPUs, as long as the event is the same everywhere. ... which would give you arbitrary skew, because one counter is free-running and the other is not (when scheduling a context in or out we stop the PMU) From my PoV that violates group semantics, because now the events aren't always counting at the same time (which would be the reason I grouped them in the first place). The implemention may not be quite right yet, but the basic concept should work, and is useful. I can see that associating counts from different PMUs at points in time may be useful, even if they aren't sampled at the precise same time, and you have weaker guarantees than the current group semantics. However, it is the case that you cannot offer group semantics. Mark. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH V2 1/6] perf,core: allow invalid context events to be part of sw/hw groups
... which would give you arbitrary skew, because one counter is free-running and the other is not (when scheduling a context in or out we stop the PMU) Everyone just reads the counter and subtracts it from the last value they've seen. That's the same how any other shared free running counter work, such as the standard timer. The only thing that perf needs to enforce is that the counters are running with the same event. It also wouldn't work for sampling, but the uncore doesn't do sampling anyways. From my PoV that violates group semantics, because now the events aren't always counting at the same time (which would be the reason I grouped them in the first place). You never use the absolute value, just differences. The differences effectively count only when the group runs. However, it is the case that you cannot offer group semantics. I don't think that's true. -Andi -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH V2 1/6] perf,core: allow invalid context events to be part of sw/hw groups
> From my PoV that makes sense. One is CPU-affine, the other is not, and > the two cannot be scheduled in the same PMU transaction by the nature of > the hardware. Fundamentally, you cannot provide group semantics due to > this. Actually you can. Just use it like a free running counter, and the different groups sample it. This will work from the different CPUs, as long as the event is the same everywhere. The implemention may not be quite right yet, but the basic concept should work, and is useful. -Andi -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH V2 1/6] perf,core: allow invalid context events to be part of sw/hw groups
On Thu, Apr 16, 2015 at 05:31:40PM +0100, Mark Rutland wrote: > Hi, > > If you're going to fundamentally change the behaviour of > perf_invalid_context, please Cc authors of other system PMU drivers. > Intel aren't the only ones with such PMUs. > > For instance, this affects the ARM CCI and CCN PMU drivers. As an aside, is there any chance of a p...@vger.kernel.org list? My work email becomes a little useless when subscribed to the full LKML firehose, so while I'd like to keep up with perf core changes I don't subscribe. In this case I stumbled across these patches by chance when browsing an archive, then had to have the messages bounced to my account. Having a dedicated list would make things easier for myself, at least. Mark. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH V2 1/6] perf,core: allow invalid context events to be part of sw/hw groups
> Even if you ignore the fundamental semantics of groups, there are other > problems with allowing shared contexts: > > * The *_txn functions only get called on the group leader's PMU. If your > system PMU has these functions, they are not called. > > * Event rotation is per ctx, but now you could have some events in a CPU > PMU's context, and some in the uncore PMU's context. So those can race > with each other. > > * Throttling is also per-context. So those can race with each other too. There's also a break down of behaviour: events in the uncore context will get migrated to another CPU in the event of a hot unplug, while events that are grouped with CPU events (and hence live in the CPU context) will be destroyed. Mark. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH V2 1/6] perf,core: allow invalid context events to be part of sw/hw groups
Hi, If you're going to fundamentally change the behaviour of perf_invalid_context, please Cc authors of other system PMU drivers. Intel aren't the only ones with such PMUs. For instance, this affects the ARM CCI and CCN PMU drivers. On Wed, Apr 15, 2015 at 08:56:11AM +0100, Kan Liang wrote: > From: Kan Liang > > The pmu marked as perf_invalid_context don't have any state to switch on > context switch. Everything is global. So it is OK to be part of sw/hw > groups. > In sched_out/sched_in, del/add must be called, so the > perf_invalid_context event can be disabled/enabled accordingly during > context switch. The event count only be read when the event is already > sched_in. > > However group read doesn't work with mix events. > > For example, > perf record -e '{cycles,uncore_imc_0/cas_count_read/}:S' -a sleep 1 > It always gets EINVAL. >From my PoV that makes sense. One is CPU-affine, the other is not, and the two cannot be scheduled in the same PMU transaction by the nature of the hardware. Fundamentally, you cannot provide group semantics due to this. Even if you ignore the fundamental semantics of groups, there are other problems with allowing shared contexts: * The *_txn functions only get called on the group leader's PMU. If your system PMU has these functions, they are not called. * Event rotation is per ctx, but now you could have some events in a CPU PMU's context, and some in the uncore PMU's context. So those can race with each other. * Throttling is also per-context. So those can race with each other too. > This patch set intends to fix this issue. > perf record -e '{cycles,uncore_imc_0/cas_count_read/}:S' -a sleep 1 > [ perf record: Woken up 1 times to write data ] > [ perf record: Captured and wrote 0.202 MB perf.data (12 samples) ] You can already count the events concurrently without grouping them, and the above implies that this patch just ends up misleading the user w.r.t. group semantics. If you want to be able to sample the events with a single read, then you can attach the FDs. I don't see that this solves a real problem. I see that it introduces a new set of problems in addition to complicating existing code. Mark. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH V2 1/6] perf,core: allow invalid context events to be part of sw/hw groups
> > On Wed, Apr 15, 2015 at 03:56:11AM -0400, Kan Liang wrote: > > The event count only be read when the event is already sched_in. > > Yeah, so no. This breaks what groups are. Group events _must_ be co- > scheduled. You cannot guarantee you can schedule events from another > PMU. Why? I think it's similar as sw and hw mixed event group. group_sched_in will co-schedule all events in sibling_list. Invalid context events is already added in sibling_list in perf_install_in_context. So all group events will be scheduled together. event->pmu guarantee proper add is called. For invalid context events, everything is global. It never fails to schedule. > > Also, I cannot see how this can possibly work, you cannot put these things > on the same event_context. Why not? For a sw and hw mixed event group, they use same hw event_context. Actually, the invalid context events don't have any state to switch on context switch. They can attach to any event_context. > > Also, how does this work wrt cpumasks, regular events are per cpu, uncore > events are per node. Currently, uncore events is only available on the first cpu of the node. So if it's a hw and uncore mixed group, only cpu 0 do the group. For other CPUs, there will be only hw event. Perf tool will guarantee it. Here are some examples. [perf]$ sudo perf record -e '{cycles,uncore_imc_0/cas_count_read/}:S' -C 0-4 -vvv -- sleep 1 perf_event_attr: size 112 { sample_period, sample_freq } 4000 sample_type IP|TID|TIME|READ|ID|CPU|PERIOD read_format ID|GROUP disabled 1 mmap 1 comm 1 freq 1 task 1 sample_id_all1 exclude_guest1 mmap21 comm_exec1 sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 sys_perf_event_open: pid -1 cpu 1 group_fd -1 flags 0x8 sys_perf_event_open: pid -1 cpu 2 group_fd -1 flags 0x8 sys_perf_event_open: pid -1 cpu 3 group_fd -1 flags 0x8 sys_perf_event_open: pid -1 cpu 4 group_fd -1 flags 0x8 perf_event_attr: type 15 size 112 config 304 sample_type IP|TID|TIME|READ|ID|CPU|PERIOD read_format ID|GROUP freq 1 sample_id_all1 exclude_guest1 sys_perf_event_open: pid -1 cpu 0 group_fd 3 flags 0x8 mmap size 528384B perf event ring buffer mmapped per cpu [perf]$ sudo perf stat -e '{cycles,uncore_imc_0/cas_count_read/}' -C 0-4 --per-core -- sleep 1 Performance counter stats for 'CPU(s) 0-4': S0-C0 11367712 cycles (100.00%) S0-C0 1 0.17 MiB uncore_imc_0/cas_count_read/ S0-C1 1 982923 cycles S0-C1 0 MiB uncore_imc_0/cas_count_read/ S0-C2 1 958585 cycles S0-C2 0 MiB uncore_imc_0/cas_count_read/ S0-C3 1 978161 cycles S0-C3 0 MiB uncore_imc_0/cas_count_read/ S0-C4 1 976012 cycles S0-C4 0 MiB uncore_imc_0/cas_count_read/ 1.001151765 seconds time elapsed > > There is so much broken stuff here without explanation its not funny. > > Please explain how this does not completely wreck everything? OK. I will refine the description when questions as above are addressed. Thanks, Kan -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH V2 1/6] perf,core: allow invalid context events to be part of sw/hw groups
On Wed, Apr 15, 2015 at 03:56:11AM -0400, Kan Liang wrote: The event count only be read when the event is already sched_in. Yeah, so no. This breaks what groups are. Group events _must_ be co- scheduled. You cannot guarantee you can schedule events from another PMU. Why? I think it's similar as sw and hw mixed event group. group_sched_in will co-schedule all events in sibling_list. Invalid context events is already added in sibling_list in perf_install_in_context. So all group events will be scheduled together. event-pmu guarantee proper add is called. For invalid context events, everything is global. It never fails to schedule. Also, I cannot see how this can possibly work, you cannot put these things on the same event_context. Why not? For a sw and hw mixed event group, they use same hw event_context. Actually, the invalid context events don't have any state to switch on context switch. They can attach to any event_context. Also, how does this work wrt cpumasks, regular events are per cpu, uncore events are per node. Currently, uncore events is only available on the first cpu of the node. So if it's a hw and uncore mixed group, only cpu 0 do the group. For other CPUs, there will be only hw event. Perf tool will guarantee it. Here are some examples. [perf]$ sudo perf record -e '{cycles,uncore_imc_0/cas_count_read/}:S' -C 0-4 -vvv -- sleep 1 perf_event_attr: size 112 { sample_period, sample_freq } 4000 sample_type IP|TID|TIME|READ|ID|CPU|PERIOD read_format ID|GROUP disabled 1 mmap 1 comm 1 freq 1 task 1 sample_id_all1 exclude_guest1 mmap21 comm_exec1 sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 sys_perf_event_open: pid -1 cpu 1 group_fd -1 flags 0x8 sys_perf_event_open: pid -1 cpu 2 group_fd -1 flags 0x8 sys_perf_event_open: pid -1 cpu 3 group_fd -1 flags 0x8 sys_perf_event_open: pid -1 cpu 4 group_fd -1 flags 0x8 perf_event_attr: type 15 size 112 config 304 sample_type IP|TID|TIME|READ|ID|CPU|PERIOD read_format ID|GROUP freq 1 sample_id_all1 exclude_guest1 sys_perf_event_open: pid -1 cpu 0 group_fd 3 flags 0x8 mmap size 528384B perf event ring buffer mmapped per cpu [perf]$ sudo perf stat -e '{cycles,uncore_imc_0/cas_count_read/}' -C 0-4 --per-core -- sleep 1 Performance counter stats for 'CPU(s) 0-4': S0-C0 11367712 cycles (100.00%) S0-C0 1 0.17 MiB uncore_imc_0/cas_count_read/ S0-C1 1 982923 cycles S0-C1 0 not counted MiB uncore_imc_0/cas_count_read/ S0-C2 1 958585 cycles S0-C2 0 not counted MiB uncore_imc_0/cas_count_read/ S0-C3 1 978161 cycles S0-C3 0 not counted MiB uncore_imc_0/cas_count_read/ S0-C4 1 976012 cycles S0-C4 0 not counted MiB uncore_imc_0/cas_count_read/ 1.001151765 seconds time elapsed There is so much broken stuff here without explanation its not funny. Please explain how this does not completely wreck everything? OK. I will refine the description when questions as above are addressed. Thanks, Kan -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH V2 1/6] perf,core: allow invalid context events to be part of sw/hw groups
Hi, If you're going to fundamentally change the behaviour of perf_invalid_context, please Cc authors of other system PMU drivers. Intel aren't the only ones with such PMUs. For instance, this affects the ARM CCI and CCN PMU drivers. On Wed, Apr 15, 2015 at 08:56:11AM +0100, Kan Liang wrote: From: Kan Liang kan.li...@intel.com The pmu marked as perf_invalid_context don't have any state to switch on context switch. Everything is global. So it is OK to be part of sw/hw groups. In sched_out/sched_in, del/add must be called, so the perf_invalid_context event can be disabled/enabled accordingly during context switch. The event count only be read when the event is already sched_in. However group read doesn't work with mix events. For example, perf record -e '{cycles,uncore_imc_0/cas_count_read/}:S' -a sleep 1 It always gets EINVAL. From my PoV that makes sense. One is CPU-affine, the other is not, and the two cannot be scheduled in the same PMU transaction by the nature of the hardware. Fundamentally, you cannot provide group semantics due to this. Even if you ignore the fundamental semantics of groups, there are other problems with allowing shared contexts: * The *_txn functions only get called on the group leader's PMU. If your system PMU has these functions, they are not called. * Event rotation is per ctx, but now you could have some events in a CPU PMU's context, and some in the uncore PMU's context. So those can race with each other. * Throttling is also per-context. So those can race with each other too. This patch set intends to fix this issue. perf record -e '{cycles,uncore_imc_0/cas_count_read/}:S' -a sleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.202 MB perf.data (12 samples) ] You can already count the events concurrently without grouping them, and the above implies that this patch just ends up misleading the user w.r.t. group semantics. If you want to be able to sample the events with a single read, then you can attach the FDs. I don't see that this solves a real problem. I see that it introduces a new set of problems in addition to complicating existing code. Mark. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH V2 1/6] perf,core: allow invalid context events to be part of sw/hw groups
On Thu, Apr 16, 2015 at 05:31:40PM +0100, Mark Rutland wrote: Hi, If you're going to fundamentally change the behaviour of perf_invalid_context, please Cc authors of other system PMU drivers. Intel aren't the only ones with such PMUs. For instance, this affects the ARM CCI and CCN PMU drivers. As an aside, is there any chance of a p...@vger.kernel.org list? My work email becomes a little useless when subscribed to the full LKML firehose, so while I'd like to keep up with perf core changes I don't subscribe. In this case I stumbled across these patches by chance when browsing an archive, then had to have the messages bounced to my account. Having a dedicated list would make things easier for myself, at least. Mark. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH V2 1/6] perf,core: allow invalid context events to be part of sw/hw groups
Even if you ignore the fundamental semantics of groups, there are other problems with allowing shared contexts: * The *_txn functions only get called on the group leader's PMU. If your system PMU has these functions, they are not called. * Event rotation is per ctx, but now you could have some events in a CPU PMU's context, and some in the uncore PMU's context. So those can race with each other. * Throttling is also per-context. So those can race with each other too. There's also a break down of behaviour: events in the uncore context will get migrated to another CPU in the event of a hot unplug, while events that are grouped with CPU events (and hence live in the CPU context) will be destroyed. Mark. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH V2 1/6] perf,core: allow invalid context events to be part of sw/hw groups
From my PoV that makes sense. One is CPU-affine, the other is not, and the two cannot be scheduled in the same PMU transaction by the nature of the hardware. Fundamentally, you cannot provide group semantics due to this. Actually you can. Just use it like a free running counter, and the different groups sample it. This will work from the different CPUs, as long as the event is the same everywhere. The implemention may not be quite right yet, but the basic concept should work, and is useful. -Andi -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH V2 1/6] perf,core: allow invalid context events to be part of sw/hw groups
On Wed, Apr 15, 2015 at 03:56:11AM -0400, Kan Liang wrote: > The event count only be read when the event is already > sched_in. Yeah, so no. This breaks what groups are. Group events _must_ be co-scheduled. You cannot guarantee you can schedule events from another PMU. Also, I cannot see how this can possibly work, you cannot put these things on the same event_context. Also, how does this work wrt cpumasks, regular events are per cpu, uncore events are per node. There is so much broken stuff here without explanation its not funny. Please explain how this does not completely wreck everything? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH V2 1/6] perf,core: allow invalid context events to be part of sw/hw groups
On Wed, Apr 15, 2015 at 06:21:11PM +0200, Andi Kleen wrote: > On Wed, Apr 15, 2015 at 06:15:28PM +0200, Peter Zijlstra wrote: > > On Wed, Apr 15, 2015 at 03:56:11AM -0400, Kan Liang wrote: > > > From: Kan Liang > > > > > > The pmu marked as perf_invalid_context don't have any state to switch on > > > context switch. Everything is global. So it is OK to be part of sw/hw > > > groups. > > > In sched_out/sched_in, del/add must be called, so the > > > perf_invalid_context event can be disabled/enabled accordingly during > > > context switch. The event count only be read when the event is already > > > sched_in. > > > > > > However group read doesn't work with mix events. > > > > > > For example, > > > perf record -e '{cycles,uncore_imc_0/cas_count_read/}:S' -a sleep 1 > > > It always gets EINVAL. > > > > > > This patch set intends to fix this issue. > > > perf record -e '{cycles,uncore_imc_0/cas_count_read/}:S' -a sleep 1 > > > [ perf record: Woken up 1 times to write data ] > > > [ perf record: Captured and wrote 0.202 MB perf.data (12 samples) ] > > > > > > This patch special case invalid context events and allow them to be part > > > of sw/hw groups. > > > > I don't get it. What, Why? > > Without the patch you can't mix uncore and cpu core events in the same > group. > > Collecting uncore in PMIs is useful, for example to get memory > bandwidth over time. Well, start with a coherent changelog, why do you still think those are optional? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH V2 1/6] perf,core: allow invalid context events to be part of sw/hw groups
On Wed, Apr 15, 2015 at 06:15:28PM +0200, Peter Zijlstra wrote: > On Wed, Apr 15, 2015 at 03:56:11AM -0400, Kan Liang wrote: > > From: Kan Liang > > > > The pmu marked as perf_invalid_context don't have any state to switch on > > context switch. Everything is global. So it is OK to be part of sw/hw > > groups. > > In sched_out/sched_in, del/add must be called, so the > > perf_invalid_context event can be disabled/enabled accordingly during > > context switch. The event count only be read when the event is already > > sched_in. > > > > However group read doesn't work with mix events. > > > > For example, > > perf record -e '{cycles,uncore_imc_0/cas_count_read/}:S' -a sleep 1 > > It always gets EINVAL. > > > > This patch set intends to fix this issue. > > perf record -e '{cycles,uncore_imc_0/cas_count_read/}:S' -a sleep 1 > > [ perf record: Woken up 1 times to write data ] > > [ perf record: Captured and wrote 0.202 MB perf.data (12 samples) ] > > > > This patch special case invalid context events and allow them to be part > > of sw/hw groups. > > I don't get it. What, Why? Without the patch you can't mix uncore and cpu core events in the same group. Collecting uncore in PMIs is useful, for example to get memory bandwidth over time. -Andi -- a...@linux.intel.com -- Speaking for myself only. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH V2 1/6] perf,core: allow invalid context events to be part of sw/hw groups
On Wed, Apr 15, 2015 at 03:56:11AM -0400, Kan Liang wrote: > From: Kan Liang > > The pmu marked as perf_invalid_context don't have any state to switch on > context switch. Everything is global. So it is OK to be part of sw/hw > groups. > In sched_out/sched_in, del/add must be called, so the > perf_invalid_context event can be disabled/enabled accordingly during > context switch. The event count only be read when the event is already > sched_in. > > However group read doesn't work with mix events. > > For example, > perf record -e '{cycles,uncore_imc_0/cas_count_read/}:S' -a sleep 1 > It always gets EINVAL. > > This patch set intends to fix this issue. > perf record -e '{cycles,uncore_imc_0/cas_count_read/}:S' -a sleep 1 > [ perf record: Woken up 1 times to write data ] > [ perf record: Captured and wrote 0.202 MB perf.data (12 samples) ] > > This patch special case invalid context events and allow them to be part > of sw/hw groups. I don't get it. What, Why? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH V2 1/6] perf,core: allow invalid context events to be part of sw/hw groups
On Wed, Apr 15, 2015 at 06:15:28PM +0200, Peter Zijlstra wrote: On Wed, Apr 15, 2015 at 03:56:11AM -0400, Kan Liang wrote: From: Kan Liang kan.li...@intel.com The pmu marked as perf_invalid_context don't have any state to switch on context switch. Everything is global. So it is OK to be part of sw/hw groups. In sched_out/sched_in, del/add must be called, so the perf_invalid_context event can be disabled/enabled accordingly during context switch. The event count only be read when the event is already sched_in. However group read doesn't work with mix events. For example, perf record -e '{cycles,uncore_imc_0/cas_count_read/}:S' -a sleep 1 It always gets EINVAL. This patch set intends to fix this issue. perf record -e '{cycles,uncore_imc_0/cas_count_read/}:S' -a sleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.202 MB perf.data (12 samples) ] This patch special case invalid context events and allow them to be part of sw/hw groups. I don't get it. What, Why? Without the patch you can't mix uncore and cpu core events in the same group. Collecting uncore in PMIs is useful, for example to get memory bandwidth over time. -Andi -- a...@linux.intel.com -- Speaking for myself only. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH V2 1/6] perf,core: allow invalid context events to be part of sw/hw groups
On Wed, Apr 15, 2015 at 06:21:11PM +0200, Andi Kleen wrote: On Wed, Apr 15, 2015 at 06:15:28PM +0200, Peter Zijlstra wrote: On Wed, Apr 15, 2015 at 03:56:11AM -0400, Kan Liang wrote: From: Kan Liang kan.li...@intel.com The pmu marked as perf_invalid_context don't have any state to switch on context switch. Everything is global. So it is OK to be part of sw/hw groups. In sched_out/sched_in, del/add must be called, so the perf_invalid_context event can be disabled/enabled accordingly during context switch. The event count only be read when the event is already sched_in. However group read doesn't work with mix events. For example, perf record -e '{cycles,uncore_imc_0/cas_count_read/}:S' -a sleep 1 It always gets EINVAL. This patch set intends to fix this issue. perf record -e '{cycles,uncore_imc_0/cas_count_read/}:S' -a sleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.202 MB perf.data (12 samples) ] This patch special case invalid context events and allow them to be part of sw/hw groups. I don't get it. What, Why? Without the patch you can't mix uncore and cpu core events in the same group. Collecting uncore in PMIs is useful, for example to get memory bandwidth over time. Well, start with a coherent changelog, why do you still think those are optional? -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH V2 1/6] perf,core: allow invalid context events to be part of sw/hw groups
On Wed, Apr 15, 2015 at 03:56:11AM -0400, Kan Liang wrote: From: Kan Liang kan.li...@intel.com The pmu marked as perf_invalid_context don't have any state to switch on context switch. Everything is global. So it is OK to be part of sw/hw groups. In sched_out/sched_in, del/add must be called, so the perf_invalid_context event can be disabled/enabled accordingly during context switch. The event count only be read when the event is already sched_in. However group read doesn't work with mix events. For example, perf record -e '{cycles,uncore_imc_0/cas_count_read/}:S' -a sleep 1 It always gets EINVAL. This patch set intends to fix this issue. perf record -e '{cycles,uncore_imc_0/cas_count_read/}:S' -a sleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.202 MB perf.data (12 samples) ] This patch special case invalid context events and allow them to be part of sw/hw groups. I don't get it. What, Why? -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH V2 1/6] perf,core: allow invalid context events to be part of sw/hw groups
On Wed, Apr 15, 2015 at 03:56:11AM -0400, Kan Liang wrote: The event count only be read when the event is already sched_in. Yeah, so no. This breaks what groups are. Group events _must_ be co-scheduled. You cannot guarantee you can schedule events from another PMU. Also, I cannot see how this can possibly work, you cannot put these things on the same event_context. Also, how does this work wrt cpumasks, regular events are per cpu, uncore events are per node. There is so much broken stuff here without explanation its not funny. Please explain how this does not completely wreck everything? -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/