[tip: perf/core] perf: Cap allocation order at aux_watermark

2021-04-16 Thread tip-bot2 for Alexander Shishkin
The following commit has been merged into the perf/core branch of tip:

Commit-ID: d68e6799a5c87f415d3bfa0dea49caee28ab00d1
Gitweb:
https://git.kernel.org/tip/d68e6799a5c87f415d3bfa0dea49caee28ab00d1
Author:Alexander Shishkin 
AuthorDate:Wed, 14 Apr 2021 18:49:54 +03:00
Committer: Peter Zijlstra 
CommitterDate: Fri, 16 Apr 2021 16:32:39 +02:00

perf: Cap allocation order at aux_watermark

Currently, we start allocating AUX pages half the size of the total
requested AUX buffer size, ignoring the attr.aux_watermark setting. This,
in turn, makes intel_pt driver disregard the watermark also, as it uses
page order for its SG (ToPA) configuration.

Now, this can be fixed in the intel_pt PMU driver, but seeing as it's the
only one currently making use of high order allocations, there is no
reason not to fix the allocator instead. This way, any other driver
wishing to add this support would not have to worry about this.

Signed-off-by: Alexander Shishkin 
Signed-off-by: Peter Zijlstra (Intel) 
Link: 
https://lkml.kernel.org/r/20210414154955.49603-2-alexander.shish...@linux.intel.com
---
 kernel/events/ring_buffer.c | 34 ++
 1 file changed, 18 insertions(+), 16 deletions(-)

diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c
index bd55ccc..5286871 100644
--- a/kernel/events/ring_buffer.c
+++ b/kernel/events/ring_buffer.c
@@ -674,21 +674,26 @@ int rb_alloc_aux(struct perf_buffer *rb, struct 
perf_event *event,
if (!has_aux(event))
return -EOPNOTSUPP;
 
-   /*
-* We need to start with the max_order that fits in nr_pages,
-* not the other way around, hence ilog2() and not get_order.
-*/
-   max_order = ilog2(nr_pages);
-
-   /*
-* PMU requests more than one contiguous chunks of memory
-* for SW double buffering
-*/
if (!overwrite) {
-   if (!max_order)
-   return -EINVAL;
+   /*
+* Watermark defaults to half the buffer, and so does the
+* max_order, to aid PMU drivers in double buffering.
+*/
+   if (!watermark)
+   watermark = nr_pages << (PAGE_SHIFT - 1);
 
-   max_order--;
+   /*
+* Use aux_watermark as the basis for chunking to
+* help PMU drivers honor the watermark.
+*/
+   max_order = get_order(watermark);
+   } else {
+   /*
+* We need to start with the max_order that fits in nr_pages,
+* not the other way around, hence ilog2() and not get_order.
+*/
+   max_order = ilog2(nr_pages);
+   watermark = 0;
}
 
rb->aux_pages = kcalloc_node(nr_pages, sizeof(void *), GFP_KERNEL,
@@ -743,9 +748,6 @@ int rb_alloc_aux(struct perf_buffer *rb, struct perf_event 
*event,
rb->aux_overwrite = overwrite;
rb->aux_watermark = watermark;
 
-   if (!rb->aux_watermark && !rb->aux_overwrite)
-   rb->aux_watermark = nr_pages << (PAGE_SHIFT - 1);
-
 out:
if (!ret)
rb->aux_pgoff = pgoff;


[tip: perf/core] perf intel-pt: Use aux_watermark

2021-04-16 Thread tip-bot2 for Alexander Shishkin
The following commit has been merged into the perf/core branch of tip:

Commit-ID: 874fc35cdd55e2d46161901de43ec58ca2efc5fe
Gitweb:
https://git.kernel.org/tip/874fc35cdd55e2d46161901de43ec58ca2efc5fe
Author:Alexander Shishkin 
AuthorDate:Wed, 14 Apr 2021 18:49:55 +03:00
Committer: Peter Zijlstra 
CommitterDate: Fri, 16 Apr 2021 16:32:39 +02:00

perf intel-pt: Use aux_watermark

Turns out, the default setting of attr.aux_watermark to half of the total
buffer size is not very useful, especially with smaller buffers. The
problem is that, after half of the buffer is filled up, the kernel updates
->aux_head and sets up the next "transaction", while observing that
->aux_tail is still zero (as userspace haven't had the chance to update
it), meaning that the trace will have to stop at the end of this second
"transaction". This means, for example, that the second PERF_RECORD_AUX in
every trace comes with TRUNCATED flag set.

Setting attr.aux_watermark to quarter of the buffer gives enough space for
the ->aux_tail update to be observed and prevents the data loss.

The obligatory before/after showcase:

> # perf_before record -e intel_pt//u -m,8 uname
> Linux
> [ perf record: Woken up 6 times to write data ]
> Warning:
> AUX data lost 4 times out of 10!
>
> [ perf record: Captured and wrote 0.099 MB perf.data ]
> # perf record -e intel_pt//u -m,8 uname
> Linux
> [ perf record: Woken up 4 times to write data ]
> [ perf record: Captured and wrote 0.039 MB perf.data ]

The effect is still visible with large workloads and large buffers,
although less pronounced.

Signed-off-by: Alexander Shishkin 
Signed-off-by: Peter Zijlstra (Intel) 
Link: 
https://lkml.kernel.org/r/20210414154955.49603-3-alexander.shish...@linux.intel.com
---
 tools/perf/arch/x86/util/intel-pt.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/tools/perf/arch/x86/util/intel-pt.c 
b/tools/perf/arch/x86/util/intel-pt.c
index a6420c6..6df0dc0 100644
--- a/tools/perf/arch/x86/util/intel-pt.c
+++ b/tools/perf/arch/x86/util/intel-pt.c
@@ -776,6 +776,12 @@ static int intel_pt_recording_options(struct 
auxtrace_record *itr,
}
}
 
+   if (!opts->auxtrace_snapshot_mode && !opts->auxtrace_sample_mode) {
+   u32 aux_watermark = opts->auxtrace_mmap_pages * page_size / 4;
+
+   intel_pt_evsel->core.attr.aux_watermark = aux_watermark;
+   }
+
intel_pt_parse_terms(intel_pt_pmu->name, _pt_pmu->format,
 "tsc", _bit);
 


[tip: perf/urgent] perf/aux: Fix AUX output stopping

2019-10-22 Thread tip-bot2 for Alexander Shishkin
The following commit has been merged into the perf/urgent branch of tip:

Commit-ID: f3a519e4add93b7b31a6616f0b09635ff2e6a159
Gitweb:
https://git.kernel.org/tip/f3a519e4add93b7b31a6616f0b09635ff2e6a159
Author:Alexander Shishkin 
AuthorDate:Tue, 22 Oct 2019 10:39:40 +03:00
Committer: Ingo Molnar 
CommitterDate: Tue, 22 Oct 2019 14:39:37 +02:00

perf/aux: Fix AUX output stopping

Commit:

  8a58ddae2379 ("perf/core: Fix exclusive events' grouping")

allows CAP_EXCLUSIVE events to be grouped with other events. Since all
of those also happen to be AUX events (which is not the case the other
way around, because arch/s390), this changes the rules for stopping the
output: the AUX event may not be on its PMU's context any more, if it's
grouped with a HW event, in which case it will be on that HW event's
context instead. If that's the case, munmap() of the AUX buffer can't
find and stop the AUX event, potentially leaving the last reference with
the atomic context, which will then end up freeing the AUX buffer. This
will then trip warnings:

Fix this by using the context's PMU context when looking for events
to stop, instead of the event's PMU context.

Signed-off-by: Alexander Shishkin 
Cc: Arnaldo Carvalho de Melo 
Cc: Jiri Olsa 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Peter Zijlstra 
Cc: Stephane Eranian 
Cc: Thomas Gleixner 
Cc: Vince Weaver 
Cc: sta...@vger.kernel.org
Link: 
https://lkml.kernel.org/r/20191022073940.61814-1-alexander.shish...@linux.intel.com
Signed-off-by: Ingo Molnar 
---
 kernel/events/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index f5d7950..bb3748d 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -6949,7 +6949,7 @@ static void __perf_event_output_stop(struct perf_event 
*event, void *data)
 static int __perf_pmu_output_stop(void *info)
 {
struct perf_event *event = info;
-   struct pmu *pmu = event->pmu;
+   struct pmu *pmu = event->ctx->pmu;
struct perf_cpu_context *cpuctx = this_cpu_ptr(pmu->pmu_cpu_context);
struct remote_output ro = {
.rb = event->rb,


[tip: perf/urgent] perf/core: Fix inheritance of aux_output groups

2019-10-07 Thread tip-bot2 for Alexander Shishkin
The following commit has been merged into the perf/urgent branch of tip:

Commit-ID: f733c6b508bcaa3441ba1eacf16efb9abd47489f
Gitweb:
https://git.kernel.org/tip/f733c6b508bcaa3441ba1eacf16efb9abd47489f
Author:Alexander Shishkin 
AuthorDate:Fri, 04 Oct 2019 15:57:29 +03:00
Committer: Ingo Molnar 
CommitterDate: Mon, 07 Oct 2019 16:50:42 +02:00

perf/core: Fix inheritance of aux_output groups

Commit:

  ab43762ef010 ("perf: Allow normal events to output AUX data")

forgets to configure aux_output relation in the inherited groups, which
results in child PEBS events forever failing to schedule.

Fix this by setting up the AUX output link in the inheritance path.

Signed-off-by: Alexander Shishkin 
Cc: Arnaldo Carvalho de Melo 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Link: 
https://lkml.kernel.org/r/20191004125729.32397-1-alexander.shish...@linux.intel.com
Signed-off-by: Ingo Molnar 
---
 kernel/events/core.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 3f0cb82..f953dd1 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -11862,6 +11862,10 @@ static int inherit_group(struct perf_event 
*parent_event,
child, leader, child_ctx);
if (IS_ERR(child_ctr))
return PTR_ERR(child_ctr);
+
+   if (sub->aux_event == parent_event &&
+   !perf_get_aux_event(child_ctr, leader))
+   return -EINVAL;
}
return 0;
 }


[tip: perf/core] perf: Allow normal events to output AUX data

2019-08-28 Thread tip-bot2 for Alexander Shishkin
The following commit has been merged into the perf/core branch of tip:

Commit-ID: ab43762ef010967e4ccd53627f70a2eecbeafefb
Gitweb:
https://git.kernel.org/tip/ab43762ef010967e4ccd53627f70a2eecbeafefb
Author:Alexander Shishkin 
AuthorDate:Tue, 06 Aug 2019 11:46:00 +03:00
Committer: Peter Zijlstra 
CommitterDate: Wed, 28 Aug 2019 11:29:38 +02:00

perf: Allow normal events to output AUX data

In some cases, ordinary (non-AUX) events can generate data for AUX events.
For example, PEBS events can come out as records in the Intel PT stream
instead of their usual DS records, if configured to do so.

One requirement for such events is to consistently schedule together, to
ensure that the data from the "AUX output" events isn't lost while their
corresponding AUX event is not scheduled. We use grouping to provide this
guarantee: an "AUX output" event can be added to a group where an AUX event
is a group leader, and provided that the former supports writing to the
latter.

Signed-off-by: Alexander Shishkin 
Signed-off-by: Peter Zijlstra (Intel) 
Cc: Ingo Molnar 
Cc: Arnaldo Carvalho de Melo 
Cc: kan.li...@linux.intel.com
Link: 
https://lkml.kernel.org/r/20190806084606.4021-2-alexander.shish...@linux.intel.com
---
 include/linux/perf_event.h  | 14 +-
 include/uapi/linux/perf_event.h |  3 +-
 kernel/events/core.c| 93 -
 3 files changed, 109 insertions(+), 1 deletion(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index e8ad3c5..61448c1 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -246,6 +246,7 @@ struct perf_event;
 #define PERF_PMU_CAP_ITRACE0x20
 #define PERF_PMU_CAP_HETEROGENEOUS_CPUS0x40
 #define PERF_PMU_CAP_NO_EXCLUDE0x80
+#define PERF_PMU_CAP_AUX_OUTPUT0x100
 
 /**
  * struct pmu - generic performance monitoring unit
@@ -447,6 +448,16 @@ struct pmu {
/* optional */
 
/*
+* Check if event can be used for aux_output purposes for
+* events of this PMU.
+*
+* Runs from perf_event_open(). Should return 0 for "no match"
+* or non-zero for "match".
+*/
+   int (*aux_output_match) (struct perf_event *event);
+   /* optional */
+
+   /*
 * Filter events for PMU-specific reasons.
 */
int (*filter_match) (struct perf_event *event); /* optional 
*/
@@ -681,6 +692,9 @@ struct perf_event {
struct perf_addr_filter_range   *addr_filter_ranges;
unsigned long   addr_filters_gen;
 
+   /* for aux_output events */
+   struct perf_event   *aux_event;
+
void (*destroy)(struct perf_event *);
struct rcu_head rcu_head;
 
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index 7198ddd..bb7b271 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -374,7 +374,8 @@ struct perf_event_attr {
namespaces :  1, /* include namespaces data 
*/
ksymbol:  1, /* include ksymbol events 
*/
bpf_event  :  1, /* include bpf events */
-   __reserved_1   : 33;
+   aux_output :  1, /* generate AUX records 
instead of events */
+   __reserved_1   : 32;
 
union {
__u32   wakeup_events;/* wakeup every n events */
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 0463c11..2aad959 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -1887,6 +1887,89 @@ list_del_event(struct perf_event *event, struct 
perf_event_context *ctx)
ctx->generation++;
 }
 
+static int
+perf_aux_output_match(struct perf_event *event, struct perf_event *aux_event)
+{
+   if (!has_aux(aux_event))
+   return 0;
+
+   if (!event->pmu->aux_output_match)
+   return 0;
+
+   return event->pmu->aux_output_match(aux_event);
+}
+
+static void put_event(struct perf_event *event);
+static void event_sched_out(struct perf_event *event,
+   struct perf_cpu_context *cpuctx,
+   struct perf_event_context *ctx);
+
+static void perf_put_aux_event(struct perf_event *event)
+{
+   struct perf_event_context *ctx = event->ctx;
+   struct perf_cpu_context *cpuctx = __get_cpu_context(ctx);
+   struct perf_event *iter;
+
+   /*
+* If event uses aux_event tear down the link
+*/
+   if (event->aux_event) {
+   iter = event->aux_event;
+   event->aux_event = NULL;
+   put_event(iter);
+   return;
+   }
+
+   /*
+* If the event is 

[tip: perf/core] perf/x86/intel: Support PEBS output to PT

2019-08-28 Thread tip-bot2 for Alexander Shishkin
The following commit has been merged into the perf/core branch of tip:

Commit-ID: 42880f726c66f13ae1d9ac9ce4c43abe64ecac84
Gitweb:
https://git.kernel.org/tip/42880f726c66f13ae1d9ac9ce4c43abe64ecac84
Author:Alexander Shishkin 
AuthorDate:Tue, 06 Aug 2019 11:46:01 +03:00
Committer: Peter Zijlstra 
CommitterDate: Wed, 28 Aug 2019 11:29:39 +02:00

perf/x86/intel: Support PEBS output to PT

If PEBS declares ability to output its data to Intel PT stream, use the
aux_output attribute bit to enable PEBS data output to PT. This requires
a PT event to be present and scheduled in the same context. Unlike the
DS area, the kernel does not extract PEBS records from the PT stream to
generate corresponding records in the perf stream, because that would
require real time in-kernel PT decoding, which is not feasible. The PMI,
however, can still be used.

The output setting is per-CPU, so all PEBS events must be either writing
to PT or to the DS area, therefore, in case of conflict, the conflicting
event will fail to schedule, allowing the rotation logic to alternate
between the PEBS->PT and PEBS->DS events.

Signed-off-by: Alexander Shishkin 
Signed-off-by: Peter Zijlstra (Intel) 
Cc: Ingo Molnar 
Cc: Arnaldo Carvalho de Melo 
Cc: kan.li...@linux.intel.com
Link: 
https://lkml.kernel.org/r/20190806084606.4021-3-alexander.shish...@linux.intel.com
---
 arch/x86/events/core.c   | 34 +-
 arch/x86/events/intel/core.c | 18 +++-
 arch/x86/events/intel/ds.c   | 51 ++-
 arch/x86/events/intel/pt.c   |  5 +++-
 arch/x86/events/perf_event.h | 17 ++-
 arch/x86/include/asm/intel_pt.h  |  2 +-
 arch/x86/include/asm/msr-index.h |  4 ++-
 7 files changed, 130 insertions(+), 1 deletion(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 325959d..15b90b1 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -1005,6 +1005,27 @@ static int collect_events(struct cpu_hw_events *cpuc, 
struct perf_event *leader,
 
/* current number of events already accepted */
n = cpuc->n_events;
+   if (!cpuc->n_events)
+   cpuc->pebs_output = 0;
+
+   if (!cpuc->is_fake && leader->attr.precise_ip) {
+   /*
+* For PEBS->PT, if !aux_event, the group leader (PT) went
+* away, the group was broken down and this singleton event
+* can't schedule any more.
+*/
+   if (is_pebs_pt(leader) && !leader->aux_event)
+   return -EINVAL;
+
+   /*
+* pebs_output: 0: no PEBS so far, 1: PT, 2: DS
+*/
+   if (cpuc->pebs_output &&
+   cpuc->pebs_output != is_pebs_pt(leader) + 1)
+   return -EINVAL;
+
+   cpuc->pebs_output = is_pebs_pt(leader) + 1;
+   }
 
if (is_x86_event(leader)) {
if (n >= max_count)
@@ -2241,6 +2262,17 @@ static int x86_pmu_check_period(struct perf_event 
*event, u64 value)
return 0;
 }
 
+static int x86_pmu_aux_output_match(struct perf_event *event)
+{
+   if (!(pmu.capabilities & PERF_PMU_CAP_AUX_OUTPUT))
+   return 0;
+
+   if (x86_pmu.aux_output_match)
+   return x86_pmu.aux_output_match(event);
+
+   return 0;
+}
+
 static struct pmu pmu = {
.pmu_enable = x86_pmu_enable,
.pmu_disable= x86_pmu_disable,
@@ -2266,6 +2298,8 @@ static struct pmu pmu = {
.sched_task = x86_pmu_sched_task,
.task_ctx_size  = sizeof(struct x86_perf_task_context),
.check_period   = x86_pmu_check_period,
+
+   .aux_output_match   = x86_pmu_aux_output_match,
 };
 
 void arch_perf_update_userpage(struct perf_event *event,
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 648260b..28459f4 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -18,6 +18,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -3298,6 +3299,13 @@ static int intel_pmu_hw_config(struct perf_event *event)
}
}
 
+   if (event->attr.aux_output) {
+   if (!event->attr.precise_ip)
+   return -EINVAL;
+
+   event->hw.flags |= PERF_X86_EVENT_PEBS_VIA_PT;
+   }
+
if (event->attr.type != PERF_TYPE_RAW)
return 0;
 
@@ -3811,6 +3819,14 @@ static int intel_pmu_check_period(struct perf_event 
*event, u64 value)
return intel_pmu_has_bts_period(event, value) ? -EINVAL : 0;
 }
 
+static int intel_pmu_aux_output_match(struct perf_event *event)
+{
+   if (!x86_pmu.intel_cap.pebs_output_pt_available)
+   return 0;
+
+   return is_intel_pt_event(event);
+}
+
 PMU_FORMAT_ATTR(offcore_rsp, "config1:0-63");
 
 PMU_FORMAT_ATTR(ldlat, "config1:0-15");
@@ -3935,6 +3951,8 @@ 

[tip: perf/core] perf/x86/intel/pt: Clean up ToPA allocation path

2019-08-26 Thread tip-bot2 for Alexander Shishkin
The following commit has been merged into the perf/core branch of tip:

Commit-ID: 90583af61d0c0d2826f42a297a03645b35c49085
Gitweb:
https://git.kernel.org/tip/90583af61d0c0d2826f42a297a03645b35c49085
Author:Alexander Shishkin 
AuthorDate:Wed, 21 Aug 2019 15:47:22 +03:00
Committer: Ingo Molnar 
CommitterDate: Mon, 26 Aug 2019 12:00:12 +02:00

perf/x86/intel/pt: Clean up ToPA allocation path

Some of the allocation parameters are passed as function arguments,
while the CPU number for per-cpu allocation is passed via the buffer
object. There's no reason for this.

Pass the CPU as a function argument instead.

Signed-off-by: Alexander Shishkin 
Cc: Arnaldo Carvalho de Melo 
Cc: Jiri Olsa 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Peter Zijlstra 
Cc: Stephane Eranian 
Cc: Thomas Gleixner 
Cc: Vince Weaver 
Link: 
http://lkml.kernel.org/r/20190821124727.73310-2-alexander.shish...@linux.intel.com
Signed-off-by: Ingo Molnar 
---
 arch/x86/events/intel/pt.c | 15 +++
 arch/x86/events/intel/pt.h |  2 --
 2 files changed, 7 insertions(+), 10 deletions(-)

diff --git a/arch/x86/events/intel/pt.c b/arch/x86/events/intel/pt.c
index d3dc227..9d9258f 100644
--- a/arch/x86/events/intel/pt.c
+++ b/arch/x86/events/intel/pt.c
@@ -670,7 +670,7 @@ static bool topa_table_full(struct topa *topa)
  *
  * Return: 0 on success or error code.
  */
-static int topa_insert_pages(struct pt_buffer *buf, gfp_t gfp)
+static int topa_insert_pages(struct pt_buffer *buf, int cpu, gfp_t gfp)
 {
struct topa *topa = buf->last;
int order = 0;
@@ -681,7 +681,7 @@ static int topa_insert_pages(struct pt_buffer *buf, gfp_t 
gfp)
order = page_private(p);
 
if (topa_table_full(topa)) {
-   topa = topa_alloc(buf->cpu, gfp);
+   topa = topa_alloc(cpu, gfp);
if (!topa)
return -ENOMEM;
 
@@ -1061,20 +1061,20 @@ static void pt_buffer_fini_topa(struct pt_buffer *buf)
  * @size:  Total size of all regions within this ToPA.
  * @gfp:   Allocation flags.
  */
-static int pt_buffer_init_topa(struct pt_buffer *buf, unsigned long nr_pages,
-  gfp_t gfp)
+static int pt_buffer_init_topa(struct pt_buffer *buf, int cpu,
+  unsigned long nr_pages, gfp_t gfp)
 {
struct topa *topa;
int err;
 
-   topa = topa_alloc(buf->cpu, gfp);
+   topa = topa_alloc(cpu, gfp);
if (!topa)
return -ENOMEM;
 
topa_insert_table(buf, topa);
 
while (buf->nr_pages < nr_pages) {
-   err = topa_insert_pages(buf, gfp);
+   err = topa_insert_pages(buf, cpu, gfp);
if (err) {
pt_buffer_fini_topa(buf);
return -ENOMEM;
@@ -1124,13 +1124,12 @@ pt_buffer_setup_aux(struct perf_event *event, void 
**pages,
if (!buf)
return NULL;
 
-   buf->cpu = cpu;
buf->snapshot = snapshot;
buf->data_pages = pages;
 
INIT_LIST_HEAD(>tables);
 
-   ret = pt_buffer_init_topa(buf, nr_pages, GFP_KERNEL);
+   ret = pt_buffer_init_topa(buf, cpu, nr_pages, GFP_KERNEL);
if (ret) {
kfree(buf);
return NULL;
diff --git a/arch/x86/events/intel/pt.h b/arch/x86/events/intel/pt.h
index 63fe406..8de8ed0 100644
--- a/arch/x86/events/intel/pt.h
+++ b/arch/x86/events/intel/pt.h
@@ -53,7 +53,6 @@ struct pt_pmu {
 /**
  * struct pt_buffer - buffer configuration; one buffer per task_struct or
  * cpu, depending on perf event configuration
- * @cpu:   cpu for per-cpu allocation
  * @tables:list of ToPA tables in this buffer
  * @first: shorthand for first topa table
  * @last:  shorthand for last topa table
@@ -71,7 +70,6 @@ struct pt_pmu {
  * @topa_index:table of topa entries indexed by page offset
  */
 struct pt_buffer {
-   int cpu;
struct list_headtables;
struct topa *first, *last, *cur;
unsigned intcur_idx;


[tip: perf/core] perf/x86/intel/pt: Use pointer arithmetics instead in ToPA entry calculation

2019-08-26 Thread tip-bot2 for Alexander Shishkin
The following commit has been merged into the perf/core branch of tip:

Commit-ID: 539f7c26b41d4ed7d88dd9756de3966ae7ca07b4
Gitweb:
https://git.kernel.org/tip/539f7c26b41d4ed7d88dd9756de3966ae7ca07b4
Author:Alexander Shishkin 
AuthorDate:Wed, 21 Aug 2019 15:47:24 +03:00
Committer: Ingo Molnar 
CommitterDate: Mon, 26 Aug 2019 12:00:13 +02:00

perf/x86/intel/pt: Use pointer arithmetics instead in ToPA entry calculation

Currently, pt_buffer_reset_offsets() calculates the current ToPA entry by
casting pointers to addresses and performing ungainly subtractions and
divisions instead of a simpler pointer arithmetic, which would be perfectly
applicable in that case. Fix that.

Signed-off-by: Alexander Shishkin 
Cc: Arnaldo Carvalho de Melo 
Cc: Jiri Olsa 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Peter Zijlstra 
Cc: Stephane Eranian 
Cc: Thomas Gleixner 
Cc: Vince Weaver 
Link: 
http://lkml.kernel.org/r/20190821124727.73310-4-alexander.shish...@linux.intel.com
Signed-off-by: Ingo Molnar 
---
 arch/x86/events/intel/pt.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/x86/events/intel/pt.c b/arch/x86/events/intel/pt.c
index f269875..188d45f 100644
--- a/arch/x86/events/intel/pt.c
+++ b/arch/x86/events/intel/pt.c
@@ -1030,8 +1030,7 @@ static void pt_buffer_reset_offsets(struct pt_buffer 
*buf, unsigned long head)
pg = pt_topa_next_entry(buf, pg);
 
buf->cur = (struct topa *)((unsigned long)buf->topa_index[pg] & 
PAGE_MASK);
-   buf->cur_idx = ((unsigned long)buf->topa_index[pg] -
-   (unsigned long)buf->cur) / sizeof(struct topa_entry);
+   buf->cur_idx = buf->topa_index[pg] - TOPA_ENTRY(buf->cur, 0);
buf->output_off = head & (pt_buffer_region_size(buf) - 1);
 
local64_set(>head, head);


[tip: perf/core] perf/x86/intel/pt: Get rid of reverse lookup table for ToPA

2019-08-26 Thread tip-bot2 for Alexander Shishkin
The following commit has been merged into the perf/core branch of tip:

Commit-ID: 39152ee51b77851689f9b23fde6f610d13566c39
Gitweb:
https://git.kernel.org/tip/39152ee51b77851689f9b23fde6f610d13566c39
Author:Alexander Shishkin 
AuthorDate:Wed, 21 Aug 2019 15:47:27 +03:00
Committer: Ingo Molnar 
CommitterDate: Mon, 26 Aug 2019 12:00:16 +02:00

perf/x86/intel/pt: Get rid of reverse lookup table for ToPA

In order to quickly find a ToPA entry by its page offset in the buffer,
we're using a reverse lookup table. The problem with it is that it's a
large array of mostly similar pointers, especially so now that we're
using high order allocations from the page allocator. Because its size
is limited to whatever is the maximum for kmalloc(), it places a limit
on the number of ToPA entries per buffer, and therefore, on the total
buffer size, which otherwise doesn't have to be there.

Replace the reverse lookup table with a simple runtime lookup. With the
high order AUX allocations in place, the runtime penalty of such a lookup
is much smaller and in cases where all entries in a ToPA table are of
the same size, the complexity is O(1).

Signed-off-by: Alexander Shishkin 
Cc: Arnaldo Carvalho de Melo 
Cc: Jiri Olsa 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Peter Zijlstra 
Cc: Stephane Eranian 
Cc: Thomas Gleixner 
Cc: Vince Weaver 
Link: 
http://lkml.kernel.org/r/20190821124727.73310-7-alexander.shish...@linux.intel.com
Signed-off-by: Ingo Molnar 
---
 arch/x86/events/intel/pt.c | 194 +++-
 arch/x86/events/intel/pt.h |  10 +-
 2 files changed, 131 insertions(+), 73 deletions(-)

diff --git a/arch/x86/events/intel/pt.c b/arch/x86/events/intel/pt.c
index 0f38ed3..fa43d90 100644
--- a/arch/x86/events/intel/pt.c
+++ b/arch/x86/events/intel/pt.c
@@ -551,12 +551,14 @@ static void pt_config_buffer(void *buf, unsigned int 
topa_idx,
  * @offset:offset of the first entry in this table in the buffer
  * @size:  total size of all entries in this table
  * @last:  index of the last initialized entry in this table
+ * @z_count:   how many times the first entry repeats
  */
 struct topa {
struct list_headlist;
u64 offset;
size_t  size;
int last;
+   unsigned intz_count;
 };
 
 /*
@@ -598,6 +600,7 @@ static inline phys_addr_t topa_pfn(struct topa *topa)
? _to_page(t)->table[(t)->last]\
: _to_page(t)->table[(i)])
 #define TOPA_ENTRY_SIZE(t, i) (sizes(TOPA_ENTRY((t), (i))->size))
+#define TOPA_ENTRY_PAGES(t, i) (1 << TOPA_ENTRY((t), (i))->size)
 
 /**
  * topa_alloc() - allocate page-sized ToPA table
@@ -713,6 +716,11 @@ static int topa_insert_pages(struct pt_buffer *buf, int 
cpu, gfp_t gfp)
topa_insert_table(buf, topa);
}
 
+   if (topa->z_count == topa->last - 1) {
+   if (order == TOPA_ENTRY(topa, topa->last - 1)->size)
+   topa->z_count++;
+   }
+
TOPA_ENTRY(topa, -1)->base = page_to_phys(p) >> TOPA_SHIFT;
TOPA_ENTRY(topa, -1)->size = order;
if (!buf->snapshot &&
@@ -756,6 +764,8 @@ static void pt_topa_dump(struct pt_buffer *buf)
 tp->table[i].stop) ||
tp->table[i].end)
break;
+   if (!i && topa->z_count)
+   i += topa->z_count;
}
}
 }
@@ -907,29 +917,97 @@ static void pt_read_offset(struct pt_buffer *buf)
buf->cur_idx = (offset & 0xff80) >> 7;
 }
 
-/**
- * pt_topa_next_entry() - obtain index of the first page in the next ToPA entry
- * @buf:   PT buffer.
- * @pg:Page offset in the buffer.
- *
- * When advancing to the next output region (ToPA entry), given a page offset
- * into the buffer, we need to find the offset of the first page in the next
- * region.
- */
-static unsigned int pt_topa_next_entry(struct pt_buffer *buf, unsigned int pg)
+static struct topa_entry *
+pt_topa_entry_for_page(struct pt_buffer *buf, unsigned int pg)
+{
+   struct topa_page *tp;
+   struct topa *topa;
+   unsigned int idx, cur_pg = 0, z_pg = 0, start_idx = 0;
+
+   /*
+* Indicates a bug in the caller.
+*/
+   if (WARN_ON_ONCE(pg >= buf->nr_pages))
+   return NULL;
+
+   /*
+* First, find the ToPA table where @pg fits. With high
+* order allocations, there shouldn't be many of these.
+*/
+   list_for_each_entry(topa, >tables, list) {
+   if (topa->offset + topa->size > pg << PAGE_SHIFT)
+   goto found;
+   }
+
+   /*
+* Hitting this means we have a problem in the ToPA
+* allocation code.
+*/
+   WARN_ON_ONCE(1);
+
+   return NULL;
+
+found:
+   /*
+* Indicates a problem in the ToPA allocation 

[tip: perf/core] perf/x86/intel/pt: Split ToPA metadata and page layout

2019-08-26 Thread tip-bot2 for Alexander Shishkin
The following commit has been merged into the perf/core branch of tip:

Commit-ID: 38bb8d77d0b932a0773b5de2ef42479409314f96
Gitweb:
https://git.kernel.org/tip/38bb8d77d0b932a0773b5de2ef42479409314f96
Author:Alexander Shishkin 
AuthorDate:Wed, 21 Aug 2019 15:47:25 +03:00
Committer: Ingo Molnar 
CommitterDate: Mon, 26 Aug 2019 12:00:14 +02:00

perf/x86/intel/pt: Split ToPA metadata and page layout

PT uses page sized ToPA tables, where the ToPA table resides at the bottom
and its driver-specific metadata taking up a few words at the top of the
page. The split is currently calculated manually and needs to be redone
every time a field is added to or removed from the metadata structure.
Also, the 32-bit version can be made smaller.

By splitting the table and metadata into separate structures, we are making
the compiler figure out the division of the page.

Signed-off-by: Alexander Shishkin 
Cc: Arnaldo Carvalho de Melo 
Cc: Jiri Olsa 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Peter Zijlstra 
Cc: Stephane Eranian 
Cc: Thomas Gleixner 
Cc: Vince Weaver 
Link: 
http://lkml.kernel.org/r/20190821124727.73310-5-alexander.shish...@linux.intel.com
Signed-off-by: Ingo Molnar 
---
 arch/x86/events/intel/pt.c | 93 +++--
 1 file changed, 60 insertions(+), 33 deletions(-)

diff --git a/arch/x86/events/intel/pt.c b/arch/x86/events/intel/pt.c
index 188d45f..2e3f068 100644
--- a/arch/x86/events/intel/pt.c
+++ b/arch/x86/events/intel/pt.c
@@ -545,16 +545,8 @@ static void pt_config_buffer(void *buf, unsigned int 
topa_idx,
wrmsrl(MSR_IA32_RTIT_OUTPUT_MASK, reg);
 }
 
-/*
- * Keep ToPA table-related metadata on the same page as the actual table,
- * taking up a few words from the top
- */
-
-#define TENTS_PER_PAGE (((PAGE_SIZE - 40) / sizeof(struct topa_entry)) - 1)
-
 /**
- * struct topa - page-sized ToPA table with metadata at the top
- * @table: actual ToPA table entries, as understood by PT hardware
+ * struct topa - ToPA metadata
  * @list:  linkage to struct pt_buffer's list of tables
  * @phys:  physical address of this page
  * @offset:offset of the first entry in this table in the buffer
@@ -562,7 +554,6 @@ static void pt_config_buffer(void *buf, unsigned int 
topa_idx,
  * @last:  index of the last initialized entry in this table
  */
 struct topa {
-   struct topa_entry   table[TENTS_PER_PAGE];
struct list_headlist;
u64 phys;
u64 offset;
@@ -570,8 +561,39 @@ struct topa {
int last;
 };
 
+/*
+ * Keep ToPA table-related metadata on the same page as the actual table,
+ * taking up a few words from the top
+ */
+
+#define TENTS_PER_PAGE \
+   ((PAGE_SIZE - sizeof(struct topa)) / sizeof(struct topa_entry))
+
+/**
+ * struct topa_page - page-sized ToPA table with metadata at the top
+ * @table: actual ToPA table entries, as understood by PT hardware
+ * @topa:  metadata
+ */
+struct topa_page {
+   struct topa_entry   table[TENTS_PER_PAGE];
+   struct topa topa;
+};
+
+static inline struct topa_page *topa_to_page(struct topa *topa)
+{
+   return container_of(topa, struct topa_page, topa);
+}
+
+static inline struct topa_page *topa_entry_to_page(struct topa_entry *te)
+{
+   return (struct topa_page *)((unsigned long)te & PAGE_MASK);
+}
+
 /* make -1 stand for the last table entry */
-#define TOPA_ENTRY(t, i) ((i) == -1 ? &(t)->table[(t)->last] : 
&(t)->table[(i)])
+#define TOPA_ENTRY(t, i)   \
+   ((i) == -1  \
+   ? _to_page(t)->table[(t)->last]\
+   : _to_page(t)->table[(i)])
 #define TOPA_ENTRY_SIZE(t, i) (sizes(TOPA_ENTRY((t), (i))->size))
 
 /**
@@ -584,27 +606,27 @@ struct topa {
 static struct topa *topa_alloc(int cpu, gfp_t gfp)
 {
int node = cpu_to_node(cpu);
-   struct topa *topa;
+   struct topa_page *tp;
struct page *p;
 
p = alloc_pages_node(node, gfp | __GFP_ZERO, 0);
if (!p)
return NULL;
 
-   topa = page_address(p);
-   topa->last = 0;
-   topa->phys = page_to_phys(p);
+   tp = page_address(p);
+   tp->topa.last = 0;
+   tp->topa.phys = page_to_phys(p);
 
/*
 * In case of singe-entry ToPA, always put the self-referencing END
 * link as the 2nd entry in the table
 */
if (!intel_pt_validate_hw_cap(PT_CAP_topa_multiple_entries)) {
-   TOPA_ENTRY(topa, 1)->base = topa->phys >> TOPA_SHIFT;
-   TOPA_ENTRY(topa, 1)->end = 1;
+   TOPA_ENTRY(>topa, 1)->base = tp->topa.phys;
+   TOPA_ENTRY(>topa, 1)->end = 1;
}
 
-   return topa;
+   return >topa;
 }
 
 /**
@@ -714,22 +736,23 @@ static void pt_topa_dump(struct pt_buffer *buf)
struct topa *topa;
 
list_for_each_entry(topa, >tables, list) {

[tip: perf/core] perf/x86/intel/pt: Use helpers to obtain ToPA entry size

2019-08-26 Thread tip-bot2 for Alexander Shishkin
The following commit has been merged into the perf/core branch of tip:

Commit-ID: fffec50f541ace292383c0cbe9a2a97d16d201c6
Gitweb:
https://git.kernel.org/tip/fffec50f541ace292383c0cbe9a2a97d16d201c6
Author:Alexander Shishkin 
AuthorDate:Wed, 21 Aug 2019 15:47:23 +03:00
Committer: Ingo Molnar 
CommitterDate: Mon, 26 Aug 2019 12:00:13 +02:00

perf/x86/intel/pt: Use helpers to obtain ToPA entry size

There are a few places in the PT driver that need to obtain the size of
a ToPA entry, some of them for the current ToPA entry in the buffer.
Use helpers for those, to make the lines shorter and more readable.

Signed-off-by: Alexander Shishkin 
Cc: Arnaldo Carvalho de Melo 
Cc: Jiri Olsa 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Peter Zijlstra 
Cc: Stephane Eranian 
Cc: Thomas Gleixner 
Cc: Vince Weaver 
Link: 
http://lkml.kernel.org/r/20190821124727.73310-3-alexander.shish...@linux.intel.com
Signed-off-by: Ingo Molnar 
---
 arch/x86/events/intel/pt.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/x86/events/intel/pt.c b/arch/x86/events/intel/pt.c
index 9d9258f..f269875 100644
--- a/arch/x86/events/intel/pt.c
+++ b/arch/x86/events/intel/pt.c
@@ -572,6 +572,7 @@ struct topa {
 
 /* make -1 stand for the last table entry */
 #define TOPA_ENTRY(t, i) ((i) == -1 ? &(t)->table[(t)->last] : 
&(t)->table[(i)])
+#define TOPA_ENTRY_SIZE(t, i) (sizes(TOPA_ENTRY((t), (i))->size))
 
 /**
  * topa_alloc() - allocate page-sized ToPA table
@@ -771,7 +772,7 @@ static void pt_update_head(struct pt *pt)
 
/* offset of the current output region within this table */
for (topa_idx = 0; topa_idx < buf->cur_idx; topa_idx++)
-   base += sizes(buf->cur->table[topa_idx].size);
+   base += TOPA_ENTRY_SIZE(buf->cur, topa_idx);
 
if (buf->snapshot) {
local_set(>data_size, base);
@@ -800,7 +801,7 @@ static void *pt_buffer_region(struct pt_buffer *buf)
  */
 static size_t pt_buffer_region_size(struct pt_buffer *buf)
 {
-   return sizes(buf->cur->table[buf->cur_idx].size);
+   return TOPA_ENTRY_SIZE(buf->cur, buf->cur_idx);
 }
 
 /**
@@ -830,7 +831,7 @@ static void pt_handle_status(struct pt *pt)
 * know.
 */
if (!intel_pt_validate_hw_cap(PT_CAP_topa_multiple_entries) ||
-   buf->output_off == sizes(TOPA_ENTRY(buf->cur, 
buf->cur_idx)->size)) {
+   buf->output_off == pt_buffer_region_size(buf)) {
perf_aux_output_flag(>handle,
 PERF_AUX_FLAG_TRUNCATED);
advance++;
@@ -925,8 +926,7 @@ static int pt_buffer_reset_markers(struct pt_buffer *buf,
unsigned long idx, npages, wakeup;
 
/* can't stop in the middle of an output region */
-   if (buf->output_off + handle->size + 1 <
-   sizes(TOPA_ENTRY(buf->cur, buf->cur_idx)->size)) {
+   if (buf->output_off + handle->size + 1 < pt_buffer_region_size(buf)) {
perf_aux_output_flag(handle, PERF_AUX_FLAG_TRUNCATED);
return -EINVAL;
}
@@ -1032,7 +1032,7 @@ static void pt_buffer_reset_offsets(struct pt_buffer 
*buf, unsigned long head)
buf->cur = (struct topa *)((unsigned long)buf->topa_index[pg] & 
PAGE_MASK);
buf->cur_idx = ((unsigned long)buf->topa_index[pg] -
(unsigned long)buf->cur) / sizeof(struct topa_entry);
-   buf->output_off = head & (sizes(buf->cur->table[buf->cur_idx].size) - 
1);
+   buf->output_off = head & (pt_buffer_region_size(buf) - 1);
 
local64_set(>head, head);
local_set(>data_size, 0);


[tip: perf/core] perf/x86/intel/pt: Free up space in a ToPA descriptor

2019-08-26 Thread tip-bot2 for Alexander Shishkin
The following commit has been merged into the perf/core branch of tip:

Commit-ID: 91feca5e2ecc9752894d57c9a72c2645471929c3
Gitweb:
https://git.kernel.org/tip/91feca5e2ecc9752894d57c9a72c2645471929c3
Author:Alexander Shishkin 
AuthorDate:Wed, 21 Aug 2019 15:47:26 +03:00
Committer: Ingo Molnar 
CommitterDate: Mon, 26 Aug 2019 12:00:15 +02:00

perf/x86/intel/pt: Free up space in a ToPA descriptor

Currently, we're storing physical address of a ToPA table in its
descriptor, which is completely unnecessary. Since the descriptor
and the table itself share the same page, reducing the descriptor
size leaves more space for the table.

Signed-off-by: Alexander Shishkin 
Cc: Arnaldo Carvalho de Melo 
Cc: Jiri Olsa 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Peter Zijlstra 
Cc: Stephane Eranian 
Cc: Thomas Gleixner 
Cc: Vince Weaver 
Link: 
http://lkml.kernel.org/r/20190821124727.73310-6-alexander.shish...@linux.intel.com
Signed-off-by: Ingo Molnar 
---
 arch/x86/events/intel/pt.c | 18 ++
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/arch/x86/events/intel/pt.c b/arch/x86/events/intel/pt.c
index 2e3f068..0f38ed3 100644
--- a/arch/x86/events/intel/pt.c
+++ b/arch/x86/events/intel/pt.c
@@ -548,14 +548,12 @@ static void pt_config_buffer(void *buf, unsigned int 
topa_idx,
 /**
  * struct topa - ToPA metadata
  * @list:  linkage to struct pt_buffer's list of tables
- * @phys:  physical address of this page
  * @offset:offset of the first entry in this table in the buffer
  * @size:  total size of all entries in this table
  * @last:  index of the last initialized entry in this table
  */
 struct topa {
struct list_headlist;
-   u64 phys;
u64 offset;
size_t  size;
int last;
@@ -589,6 +587,11 @@ static inline struct topa_page *topa_entry_to_page(struct 
topa_entry *te)
return (struct topa_page *)((unsigned long)te & PAGE_MASK);
 }
 
+static inline phys_addr_t topa_pfn(struct topa *topa)
+{
+   return PFN_DOWN(virt_to_phys(topa_to_page(topa)));
+}
+
 /* make -1 stand for the last table entry */
 #define TOPA_ENTRY(t, i)   \
((i) == -1  \
@@ -615,14 +618,13 @@ static struct topa *topa_alloc(int cpu, gfp_t gfp)
 
tp = page_address(p);
tp->topa.last = 0;
-   tp->topa.phys = page_to_phys(p);
 
/*
 * In case of singe-entry ToPA, always put the self-referencing END
 * link as the 2nd entry in the table
 */
if (!intel_pt_validate_hw_cap(PT_CAP_topa_multiple_entries)) {
-   TOPA_ENTRY(>topa, 1)->base = tp->topa.phys;
+   TOPA_ENTRY(>topa, 1)->base = page_to_phys(p);
TOPA_ENTRY(>topa, 1)->end = 1;
}
 
@@ -666,7 +668,7 @@ static void topa_insert_table(struct pt_buffer *buf, struct 
topa *topa)
 
BUG_ON(last->last != TENTS_PER_PAGE - 1);
 
-   TOPA_ENTRY(last, -1)->base = topa->phys >> TOPA_SHIFT;
+   TOPA_ENTRY(last, -1)->base = topa_pfn(topa);
TOPA_ENTRY(last, -1)->end = 1;
 }
 
@@ -739,8 +741,8 @@ static void pt_topa_dump(struct pt_buffer *buf)
struct topa_page *tp = topa_to_page(topa);
int i;
 
-   pr_debug("# table @%p (%016Lx), off %llx size %zx\n", tp->table,
-topa->phys, topa->offset, topa->size);
+   pr_debug("# table @%p, off %llx size %zx\n", tp->table,
+topa->offset, topa->size);
for (i = 0; i < TENTS_PER_PAGE; i++) {
pr_debug("# entry @%p (%lx sz %u %c%c%c) raw=%16llx\n",
 >table[i],
@@ -,7 +1113,7 @@ static int pt_buffer_init_topa(struct pt_buffer *buf, int 
cpu,
 
/* link last table to the first one, unless we're double buffering */
if (intel_pt_validate_hw_cap(PT_CAP_topa_multiple_entries)) {
-   TOPA_ENTRY(buf->last, -1)->base = buf->first->phys >> 
TOPA_SHIFT;
+   TOPA_ENTRY(buf->last, -1)->base = topa_pfn(buf->first);
TOPA_ENTRY(buf->last, -1)->end = 1;
}