Re: [PATCH 07/25] perf tools: Add build_id__is_defined function

2020-11-26 Thread Arnaldo Carvalho de Melo
Em Thu, Nov 26, 2020 at 06:00:08PM +0100, Jiri Olsa escreveu:
> Adding build_id__is_defined helper to check build id
> is defined and is != zero build id.

Thanks, applied.

- Arnaldo

 
> Signed-off-by: Jiri Olsa 
> ---
>  tools/perf/util/build-id.c | 6 ++
>  tools/perf/util/build-id.h | 1 +
>  2 files changed, 7 insertions(+)
> 
> diff --git a/tools/perf/util/build-id.c b/tools/perf/util/build-id.c
> index 6b410c3d52dc..2aacc8b29f7e 100644
> --- a/tools/perf/util/build-id.c
> +++ b/tools/perf/util/build-id.c
> @@ -37,6 +37,7 @@
>  
>  #include 
>  #include 
> +#include 
>  #include 
>  
>  static bool no_buildid_cache;
> @@ -912,3 +913,8 @@ void build_id__init(struct build_id *bid, const u8 *data, 
> size_t size)
>   memcpy(bid->data, data, size);
>   bid->size = size;
>  }
> +
> +bool build_id__is_defined(const struct build_id *bid)
> +{
> + return bid && bid->size ? !!memchr_inv(bid->data, 0, bid->size) : false;
> +}
> diff --git a/tools/perf/util/build-id.h b/tools/perf/util/build-id.h
> index f293f99d5dba..d53415feaf69 100644
> --- a/tools/perf/util/build-id.h
> +++ b/tools/perf/util/build-id.h
> @@ -21,6 +21,7 @@ struct feat_fd;
>  
>  void build_id__init(struct build_id *bid, const u8 *data, size_t size);
>  int build_id__sprintf(const struct build_id *build_id, char *bf);
> +bool build_id__is_defined(const struct build_id *bid);
>  int sysfs__sprintf_build_id(const char *root_dir, char *sbuild_id);
>  int filename__sprintf_build_id(const char *pathname, char *sbuild_id);
>  char *build_id_cache__kallsyms_path(const char *sbuild_id, char *bf,
> -- 
> 2.26.2
> 

-- 

- Arnaldo


Re: [PATCH 06/25] perf tools: Do not swap mmap2 fields in case it contains build id

2020-11-26 Thread Arnaldo Carvalho de Melo
Em Thu, Nov 26, 2020 at 06:00:07PM +0100, Jiri Olsa escreveu:
> If PERF_RECORD_MISC_MMAP_BUILD_ID misc bit is set,
> mmap2 event carries build id, placed in following union:
> 
>   union {
>   struct {
>   u32   maj;
>   u32   min;
>   u64   ino;
>   u64   ino_generation;
>   };
>   struct {
>   u8build_id[20];
>   u8build_id_size;
>   u8__reserved_1;
>   u16   __reserved_2;
>   };
>   };

Did you forgot to update just this cset comment?

- Arnaldo
 
> In this case we can't swap above fields.
> 
> Signed-off-by: Jiri Olsa 
> ---
>  tools/perf/util/session.c | 11 +++
>  1 file changed, 7 insertions(+), 4 deletions(-)
> 
> diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
> index 5cc722b6fe7c..cc1c11ca94fd 100644
> --- a/tools/perf/util/session.c
> +++ b/tools/perf/util/session.c
> @@ -592,10 +592,13 @@ static void perf_event__mmap2_swap(union perf_event 
> *event,
>   event->mmap2.start = bswap_64(event->mmap2.start);
>   event->mmap2.len   = bswap_64(event->mmap2.len);
>   event->mmap2.pgoff = bswap_64(event->mmap2.pgoff);
> - event->mmap2.maj   = bswap_32(event->mmap2.maj);
> - event->mmap2.min   = bswap_32(event->mmap2.min);
> - event->mmap2.ino   = bswap_64(event->mmap2.ino);
> - event->mmap2.ino_generation = bswap_64(event->mmap2.ino_generation);
> +
> + if (!(event->header.misc & PERF_RECORD_MISC_MMAP_BUILD_ID)) {
> + event->mmap2.maj   = bswap_32(event->mmap2.maj);
> + event->mmap2.min   = bswap_32(event->mmap2.min);
> + event->mmap2.ino   = bswap_64(event->mmap2.ino);
> + event->mmap2.ino_generation = 
> bswap_64(event->mmap2.ino_generation);
> + }
>  
>   if (sample_id_all) {
>   void *data = &event->mmap2.filename;
> -- 
> 2.26.2
> 

-- 

- Arnaldo


Re: [PATCH 05/25] tools lib: Adopt memchr_inv() from kernel

2020-11-26 Thread Arnaldo Carvalho de Melo
Em Thu, Nov 26, 2020 at 06:00:06PM +0100, Jiri Olsa escreveu:
> We'll use it to check for undefined/zero data.

Thanks, applied.

- Arnaldo

 
> Signed-off-by: Jiri Olsa 
> ---
>  tools/include/linux/string.h |  1 +
>  tools/lib/string.c   | 58 
>  2 files changed, 59 insertions(+)
> 
> diff --git a/tools/include/linux/string.h b/tools/include/linux/string.h
> index 5e9e781905ed..db5c99318c79 100644
> --- a/tools/include/linux/string.h
> +++ b/tools/include/linux/string.h
> @@ -46,4 +46,5 @@ extern char * __must_check skip_spaces(const char *);
>  
>  extern char *strim(char *);
>  
> +extern void *memchr_inv(const void *start, int c, size_t bytes);
>  #endif /* _TOOLS_LINUX_STRING_H_ */
> diff --git a/tools/lib/string.c b/tools/lib/string.c
> index f645343815de..8b6892f959ab 100644
> --- a/tools/lib/string.c
> +++ b/tools/lib/string.c
> @@ -168,3 +168,61 @@ char *strreplace(char *s, char old, char new)
>   *s = new;
>   return s;
>  }
> +
> +static void *check_bytes8(const u8 *start, u8 value, unsigned int bytes)
> +{
> + while (bytes) {
> + if (*start != value)
> + return (void *)start;
> + start++;
> + bytes--;
> + }
> + return NULL;
> +}
> +
> +/**
> + * memchr_inv - Find an unmatching character in an area of memory.
> + * @start: The memory area
> + * @c: Find a character other than c
> + * @bytes: The size of the area.
> + *
> + * returns the address of the first character other than @c, or %NULL
> + * if the whole buffer contains just @c.
> + */
> +void *memchr_inv(const void *start, int c, size_t bytes)
> +{
> + u8 value = c;
> + u64 value64;
> + unsigned int words, prefix;
> +
> + if (bytes <= 16)
> + return check_bytes8(start, value, bytes);
> +
> + value64 = value;
> + value64 |= value64 << 8;
> + value64 |= value64 << 16;
> + value64 |= value64 << 32;
> +
> + prefix = (unsigned long)start % 8;
> + if (prefix) {
> + u8 *r;
> +
> + prefix = 8 - prefix;
> + r = check_bytes8(start, value, prefix);
> + if (r)
> + return r;
> + start += prefix;
> + bytes -= prefix;
> + }
> +
> + words = bytes / 8;
> +
> + while (words) {
> + if (*(u64 *)start != value64)
> + return check_bytes8(start, value, 8);
> + start += 8;
> + words--;
> + }
> +
> + return check_bytes8(start, value, bytes % 8);
> +}
> -- 
> 2.26.2
> 

-- 

- Arnaldo


Re: [PATCH 25/25] perf record: Add --buildid-mmap option to enable mmap's build id

2020-11-26 Thread Arnaldo Carvalho de Melo
Em Thu, Nov 26, 2020 at 06:00:26PM +0100, Jiri Olsa escreveu:

> Adding --buildid-mmap option to enable build id in mmap2 events.  It
> will only work if there's kernel support for that and it disables
> build id cache (implies --no-buildid).
 
> It's also possible to enable it permanently via config option in
> ~.perfconfig file:
 
>   [record]
>   build-id=mmap
 
> Also added build_id bit in the verbose output for perf_event_attr:
 
>   # perf record --buildid-mmap -vv
>   ...
>   perf_event_attr:
> type 1
> size 120
> ...
> build_id 1
 
> Adding also missing text_poke bit.
 
> Acked-by: Ian Rogers 
> Signed-off-by: Jiri Olsa 



> @@ -2554,6 +2557,8 @@ static struct option __record_options[] = {
>  "file", "vmlinux pathname"),
>   OPT_BOOLEAN(0, "buildid-all", &record.buildid_all,
>   "Record build-id of all DSOs regardless of hits"),
> + OPT_BOOLEAN(0, "buildid-mmap", &record.buildid_mmap,
> + "Record build-id in map events"),

Stephane, do you think it would be a problem to use
perf_can_record_build_id() at tool start and if it says that we can get
those mmaps with build ids use it by default? Older perf tools would get
that new bit and bogus values for that maj/min/ino_generation things,
people noticing such problems would either update their tools or ask for
the use of --no-buildid-mmap in 'perf record' sessions.

The problem I see is that this is important information to have by
default, forcing the user to add more and more command line opt-in
options doesn't seem interesting.

Having it as a .perfconfig variable helps, but then we at some point
need to start shipping some example .perfconfig to enable all these new
features so that the user can, with just one step, have all the modern
goodies.

- Arnaldo

>   OPT_BOOLEAN(0, "timestamp-filename", &record.timestamp_filename,
>   "append timestamp to output filename"),
>   OPT_BOOLEAN(0, "timestamp-boundary", &record.timestamp_boundary,
> @@ -2657,6 +2662,21 @@ int cmd_record(int argc, const char **argv)
>  
>   }
>  
> + if (rec->buildid_mmap) {
> + if (!perf_can_record_build_id()) {
> + pr_err("Failed: no support to record build id in mmap 
> events, update your kernel.\n");

> + err = -EINVAL;
> + goto out_opts;
> + }
> + pr_debug("Enabling build id in mmap2 events.\n");
> + /* Enable mmap build id synthesizing. */
> + symbol_conf.buildid_mmap2 = true;
> + /* Enable perf_event_attr::build_id bit. */
> + rec->opts.build_id = true;
> + /* Disable build id cache. */
> + rec->no_buildid = true;
> + }
> +
>   if (rec->opts.kcore)
>   rec->data.is_dir = true;
>  
> diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
> index 1cad6051d8b0..749d806ee1d1 100644
> --- a/tools/perf/util/evsel.c
> +++ b/tools/perf/util/evsel.c
> @@ -1170,10 +1170,12 @@ void evsel__config(struct evsel *evsel, struct 
> record_opts *opts,
>   if (opts->sample_weight)
>   evsel__set_sample_bit(evsel, WEIGHT);
>  
> - attr->task  = track;
> - attr->mmap  = track;
> - attr->mmap2 = track && !perf_missing_features.mmap2;
> - attr->comm  = track;
> + attr->task = track;
> + attr->mmap = track;
> + attr->mmap2= track && !perf_missing_features.mmap2;
> + attr->comm = track;
> + attr->build_id = track && opts->build_id;
> +
>   /*
>* ksymbol is tracked separately with text poke because it needs to be
>* system wide and enabled immediately.
> diff --git a/tools/perf/util/perf_api_probe.c 
> b/tools/perf/util/perf_api_probe.c
> index 3840d02f0f7b..829af17a0867 100644
> --- a/tools/perf/util/perf_api_probe.c
> +++ b/tools/perf/util/perf_api_probe.c
> @@ -98,6 +98,11 @@ static void perf_probe_text_poke(struct evsel *evsel)
>   evsel->core.attr.text_poke = 1;
>  }
>  
> +static void perf_probe_build_id(struct evsel *evsel)
> +{
> + evsel->core.attr.build_id = 1;
> +}
> +
>  bool perf_can_sample_identifier(void)
>  {
>   return perf_probe_api(perf_probe_sample_identifier);
> @@ -172,3 +177,8 @@ bool perf_can_aux_sample(void)
>  
>   return true;
>  }
> +
> +bool perf_can_record_build_id(void)
> +{
> + return perf_probe_api(perf_probe_build_id);
> +}
> diff --git a/tools/perf/util/perf_api_probe.h 
> b/tools/perf/util/perf_api_probe.h
> index d5506a983a94..f12ca55f509a 100644
> --- a/tools/perf/util/perf_api_probe.h
> +++ b/tools/perf/util/perf_api_probe.h
> @@ -11,5 +11,6 @@ bool perf_can_record_cpu_wide(void);
>  bool perf_can_record_switch_events(void);
>  bool perf_can_record_text_poke_events(void);
>  bool perf_can_sample_identifier(void);
> +bool perf_can_record_build_id(void);
>  
>  #endif // __PERF_API

Re: [BUG] perf probe can't remove probes

2020-11-26 Thread Arnaldo Carvalho de Melo
Em Thu, Nov 26, 2020 at 09:21:25AM +0900, Masami Hiramatsu escreveu:
> Hi Arnaldo,
> 
> On Wed, 25 Nov 2020 14:27:55 -0300
> Arnaldo Carvalho de Melo  wrote:
> 
> > 
> > Masami, have you stumbled on this already?
> > 
> > [root@seventh ~]# perf probe security_locked_down%return 'ret=$retval'
> > Added new event:
> >   probe:security_locked_down__return (on security_locked_down%return with 
> > ret=$retval)
> > 
> > You can now use it in all perf tools, such as:
> > 
> > perf record -e probe:security_locked_down__return -aR sleep 1
> > 
> > [root@seventh ~]# perf probe security_locked_down what
> > Added new event:
> >   probe:security_locked_down (on security_locked_down with what)
> > 
> > You can now use it in all perf tools, such as:
> > 
> > perf record -e probe:security_locked_down -aR sleep 1
> > 
> > [root@seventh ~]#
> > 
> > 
> > [root@seventh ~]# uname -r
> > 5.10.0-rc3.bpfsign+
> > [root@seventh ~]# perf probe -l
> >   probe:security_locked_down (on 
> > security_locked_down@git/bpf/security/security.c with what)
> >   probe:security_locked_down__return (on 
> > security_locked_down%return@git/bpf/security/security.c with ret)
> > [root@seventh ~]# perf probe -D '*:*'
> > Semantic error :There is non-digit char in line number.
> > 
> >  Usage: perf probe [] 'PROBEDEF' ['PROBEDEF' ...]
> > or: perf probe [] --add 'PROBEDEF' [--add 'PROBEDEF' ...]
> > or: perf probe [] --del '[GROUP:]EVENT' ...
> > or: perf probe --list [GROUP:]EVENT ...
> > or: perf probe [] --line 'LINEDESC'
> > or: perf probe [] --vars 'PROBEPOINT'
> > or: perf probe [] --funcs
> > 
> > -D, --definition 
> > <[EVENT=]FUNC[@SRC][+OFF|%return|:RL|;PT]|SRC:AL|SRC;PT [[NAME=]ARG ...]>
> >   Show trace event definition of given traceevent 
> > for k/uprobe_events.
> 
> As you can see, "-D" is showing definition. Not delete. (*)
> Delete is "-d" or "--del".

Yeah, I was in a hurry and looked at just the first line right after the
command, didn't want to forget reporting it so sent the "bug" report,
d0h, sorry about the noise, using -d or --del works.

But having both -d and -D, in retrospect, wasn't such a good idea :-\

- Arnaldo
 
> (*) this option is for different version of kernel, remote-machine
> and boot-time tracing.
> 
> > [root@seventh ~]# perf probe probe:security_locked_down
> > Semantic error :There is non-digit char in line number.
> >   Error: Command Parse Error.
> > [root@seventh ~]# perf probe probe:security_locked_down__return
> > Semantic error :There is non-digit char in line number.
> >   Error: Command Parse Error.
> 
> Since you don't pass any option, both are for adding new probe event.
> 
> What happen if you run
> 
> $ perf probe -d "*:*"
> 
> ?
> 
> Thank you,
> 
> -- 
> Masami Hiramatsu 

-- 

- Arnaldo


Re: [PATCH v9 01/16] perf arm-spe: Refactor printing string to buffer

2020-11-26 Thread Arnaldo Carvalho de Melo
Em Thu, Nov 19, 2020 at 11:24:26PM +0800, Leo Yan escreveu:
> When outputs strings to the decoding buffer with function snprintf(),
> SPE decoder needs to detects if any error returns from snprintf() and if
> so needs to directly bail out.  If snprintf() returns success, it needs
> to update buffer pointer and reduce the buffer length so can continue to
> output the next string into the consequent memory space.
> 
> This complex logics are spreading in the function arm_spe_pkt_desc() so
> there has many duplicate codes for handling error detecting, increment
> buffer pointer and decrement buffer size.
> 
> To avoid the duplicate code, this patch introduces a new helper function
> arm_spe_pkt_out_string() which is used to wrap up the complex logics,
> and it's used by the caller arm_spe_pkt_desc().  This patch moves the
> variable 'blen' as the function's local variable so allows to remove
> the unnecessary braces and improve the readability.
> 
> This patch simplifies the return value for arm_spe_pkt_desc(): '0' means
> success and other values mean an error has occurred.  To realize this,
> it relies on arm_spe_pkt_out_string()'s parameter 'err', the 'err' is a
> cumulative value, returns its final value if printing buffer is called
> for one time or multiple times.  Finally, the error is handled in a
> central place, rather than directly bailing out in switch-cases, it
> returns error at the end of arm_spe_pkt_desc().
> 
> This patch changes the caller arm_spe_dump() to respect the updated
> return value semantics of arm_spe_pkt_desc().
> 
> Suggested-by: Dave Martin 
> Signed-off-by: Leo Yan 
> Reviewed-by: Andre Przywara 
> Reviewed-by: Dave Martin 
> ---
>  .../arm-spe-decoder/arm-spe-pkt-decoder.c | 302 +-
>  tools/perf/util/arm-spe.c |   2 +-
>  2 files changed, 151 insertions(+), 153 deletions(-)
> 
> diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c 
> b/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c
> index 671a4763fb47..fbededc1bcd4 100644
> --- a/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c
> +++ b/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c
> @@ -9,6 +9,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include "arm-spe-pkt-decoder.h"
>  
> @@ -258,192 +259,189 @@ int arm_spe_get_packet(const unsigned char *buf, 
> size_t len,
>   return ret;
>  }
>  
> +static int arm_spe_pkt_out_string(int *err, char **buf_p, size_t *blen,
> +   const char *fmt, ...)
> +{
> + va_list ap;
> + int ret;


Ok, from a quick look this continues confusing, but at least its not
named scnprintf() and then people won't expect the usual semantics.

Applying.

- Arnaldo

> + /* Bail out if any error occurred */
> + if (err && *err)
> + return *err;
> +
> + va_start(ap, fmt);
> + ret = vsnprintf(*buf_p, *blen, fmt, ap);
> + va_end(ap);
> +
> + if (ret < 0) {
> + if (err && !*err)
> + *err = ret;
> +
> + /*
> +  * A return value of *blen or more means that the output was
> +  * truncated and the buffer is overrun.
> +  */
> + } else if ((size_t)ret >= *blen) {
> + (*buf_p)[*blen - 1] = '\0';
> +
> + /*
> +  * Set *err to 'ret' to avoid overflow if tries to
> +  * fill this buffer sequentially.
> +  */
> + if (err && !*err)
> + *err = ret;
> + } else {
> + *buf_p += ret;
> + *blen -= ret;
> + }
> +
> + return ret;
> +}
> +
>  int arm_spe_pkt_desc(const struct arm_spe_pkt *packet, char *buf,
>size_t buf_len)
>  {
> - int ret, ns, el, idx = packet->index;
> + int ns, el, idx = packet->index;
>   unsigned long long payload = packet->payload;
>   const char *name = arm_spe_pkt_name(packet->type);
> + char *buf_orig = buf;
> + size_t blen = buf_len;
> + int err = 0;
>  
>   switch (packet->type) {
>   case ARM_SPE_BAD:
>   case ARM_SPE_PAD:
>   case ARM_SPE_END:
> - return snprintf(buf, buf_len, "%s", name);
> - case ARM_SPE_EVENTS: {
> - size_t blen = buf_len;
> -
> - ret = 0;
> - ret = snprintf(buf, buf_len, "EV");
> - buf += ret;
> - blen -= ret;
> - if (payload & 0x1) {
> - ret = snprintf(buf, buf_len, " EXCEPTION-GEN");
> - buf += ret;
> - blen -= ret;
> - }
> - if (payload & 0x2) {
> - ret = snprintf(buf, buf_len, " RETIRED");
> - buf += ret;
> - blen -= ret;
> - }
> - if (payload & 0x4) {
> - ret = snprintf(buf, buf_len, " L1D-ACCESS");
> - buf += ret;
> - blen -= ret;
> - }
> -

Re: [PATCH v9 00/16] perf arm-spe: Refactor decoding & dumping flow

2020-11-26 Thread Arnaldo Carvalho de Melo
Em Wed, Nov 25, 2020 at 02:17:56PM +, Will Deacon escreveu:
> On Thu, Nov 19, 2020 at 11:24:25PM +0800, Leo Yan wrote:
> > This is patch set v9 for refactoring Arm SPE trace decoding and dumping.
> > 
> > According to comments and suggestions from patch set v8, it squashs the
> > two patches into single one: "perf arm-spe: Refactor printing string to
> > buffer" and "perf arm-spe: Consolidate arm_spe_pkt_desc()'s return
> > value".
> > 
> > In the patch 01/16, it renames the function arm_spe_pkt_snprintf() to
> > arm_spe_pkt_out_string(), since the function is not the same semantics
> > with snprintf(), the renaming can avoid confusion.
> > 
> > This patch set is cleanly applied on the top of perf/core branch
> > with commit 29396cd573da ("perf expr: Force encapsulation on
> > expr_id_data").
> > 
> > This patch set has been tested on Hisilicon D06 platform with commands
> > "perf report -D" and "perf script", compared the decoding results
> > between with this patch set and without this patch set, "diff" tool
> > shows the result as expected.
> > 
> > I also manually built the patches for arm/arm64/x86_64 and verfied
> > every single patch can build successfully.
> 
> I'm unable to test this, so I'm please that you can! Anyway, it all looks
> fine from a quick look:
> 
> Acked-by: Will Deacon 
> 
> so I think Arnaldo can pick this up when he's ready.

This is all ARM specific stuff, if you are good with it, I'll do just a
cursory look and apply.

- Arnaldo


[BUG] perf probe can't remove probes

2020-11-25 Thread Arnaldo Carvalho de Melo


Masami, have you stumbled on this already?

[root@seventh ~]# perf probe security_locked_down%return 'ret=$retval'
Added new event:
  probe:security_locked_down__return (on security_locked_down%return with 
ret=$retval)

You can now use it in all perf tools, such as:

perf record -e probe:security_locked_down__return -aR sleep 1

[root@seventh ~]# perf probe security_locked_down what
Added new event:
  probe:security_locked_down (on security_locked_down with what)

You can now use it in all perf tools, such as:

perf record -e probe:security_locked_down -aR sleep 1

[root@seventh ~]#


[root@seventh ~]# uname -r
5.10.0-rc3.bpfsign+
[root@seventh ~]# perf probe -l
  probe:security_locked_down (on 
security_locked_down@git/bpf/security/security.c with what)
  probe:security_locked_down__return (on 
security_locked_down%return@git/bpf/security/security.c with ret)
[root@seventh ~]# perf probe -D '*:*'
Semantic error :There is non-digit char in line number.

 Usage: perf probe [] 'PROBEDEF' ['PROBEDEF' ...]
or: perf probe [] --add 'PROBEDEF' [--add 'PROBEDEF' ...]
or: perf probe [] --del '[GROUP:]EVENT' ...
or: perf probe --list [GROUP:]EVENT ...
or: perf probe [] --line 'LINEDESC'
or: perf probe [] --vars 'PROBEPOINT'
or: perf probe [] --funcs

-D, --definition <[EVENT=]FUNC[@SRC][+OFF|%return|:RL|;PT]|SRC:AL|SRC;PT 
[[NAME=]ARG ...]>
  Show trace event definition of given traceevent for 
k/uprobe_events.
[root@seventh ~]# perf probe probe:security_locked_down
Semantic error :There is non-digit char in line number.
  Error: Command Parse Error.
[root@seventh ~]# perf probe probe:security_locked_down__return
Semantic error :There is non-digit char in line number.
  Error: Command Parse Error.
[root@seventh ~]# cat /sys/kernel/debug/kprobes/
blacklist  enabledlist
[root@seventh ~]# cat /sys/kernel/debug/kprobes/list
8248b350  k  security_locked_down+0x0[FTRACE]
8248b350  r  security_locked_down+0x0[FTRACE]
[root@seventh ~]#

[root@seventh ~]# cat /etc/fedora-release
Fedora release 33 (Thirty Three)
[root@seventh ~]# gcc -v
Using built-in specs.
COLLECT_GCC=/usr/bin/gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/10/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-redhat-linux
Configured with: ../configure --enable-bootstrap 
--enable-languages=c,c++,fortran,objc,obj-c++,ada,go,d,lto --prefix=/usr 
--mandir=/usr/share/man --infodir=/usr/share/info 
--with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-shared 
--enable-threads=posix --enable-checking=release --enable-multilib 
--with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions 
--enable-gnu-unique-object --enable-linker-build-id 
--with-gcc-major-version-only --with-linker-hash-style=gnu --enable-plugin 
--enable-initfini-array --with-isl --enable-offload-targets=nvptx-none 
--without-cuda-driver --enable-gnu-indirect-function --enable-cet 
--with-tune=generic --with-arch_32=i686 --build=x86_64-redhat-linux
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 10.2.1 20201016 (Red Hat 10.2.1-6) (GCC)
[root@seventh ~]# rpm -q elfutils
elfutils-0.182-1.fc33.x86_64
[root@seventh ~]#

- Arnaldo


ANNOUNCE: pahole v1.19 (Split BTF for kmodules, DWARF bug workarounds, speedups, --packed)

2020-11-24 Thread Arnaldo Carvalho de Melo
Hi,
 
The v1.19 release of pahole and its friends is out, available at
the usual places:

Main git repo:

   git://git.kernel.org/pub/scm/devel/pahole/pahole.git

Mirror git repo:

   https://github.com/acmel/dwarves.git

tarball + gpg signature:

   https://fedorapeople.org/~acme/dwarves/dwarves-1.19.tar.xz
   https://fedorapeople.org/~acme/dwarves/dwarves-1.19.tar.bz2
   https://fedorapeople.org/~acme/dwarves/dwarves-1.19.tar.sign

Best Regards,

- Arnaldo
 
v1.19:

- Support split BTF, where a main BTF file, vmlinux, can be used to find types
  and then a kernel module, for instance, can have just what is unique to it.

  For instance, looking for a type in the main vmlinux BTF info:

$ pahole wmi_notify_handler
pahole: type 'wmi_notify_handler' not found
$

  If we look at the 'wmi' module BTF info that is in:

$ ls -la /sys/kernel/btf/wmi
-r--r--r--. 1 root root 2866 Nov 18 13:35 /sys/kernel/btf/wmi
$

$ pahole /sys/kernel/btf/wmi -C wmi_notify_handler
typedef void (*wmi_notify_handler)(u32, void *);
$

  '--btf_base=/sys/kernel/btf/vmlinux' was automatically added in this last
  example, an option that was also introduced in this version where types used 
in
  the wmi.ko module but present in vmlinux can be found so that there is no
  duplicity of types.

- Update libbpf to get the split BTF support and use some of its functions to
  load BTF and speed up DWARF loading and BTF encoding.

- Support cross-compiled ELF binaries with different endianness

- Support showing typedefs for anonymous types, like structs, unions and enums,
  see the "Align enumerators" entry below for an example, another:

$ pahole rwlock_t
typedef struct {
arch_rwlock_t  raw_lock; /* 0 8 */

/* size: 8, cachelines: 1, members: 1 */
/* last cacheline: 8 bytes */
} rwlock_t;
$

- Align enumerators:

$ pahole ZSTD_strategy
typedef enum {
ZSTD_fast= 0,
ZSTD_dfast   = 1,
ZSTD_greedy  = 2,
ZSTD_lazy= 3,
ZSTD_lazy2   = 4,
ZSTD_btlazy2 = 5,
ZSTD_btopt   = 6,
ZSTD_btopt2  = 7,
} ZSTD_strategy;
$

- Workaround bugs in the generation of DWARF records for functions in some gcc
  versions that were causing breakage in the encoding of BTF:

   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97060 "Missing 
DW_AT_declaration=1 in dwarf data"

- Ignore zero-sized ELF symbols instead of erroring out.

- Handle union forward declaration properly in the BTF loader.

- Introduce --numeric_version for use in scripts and Makefiles:

$ pahole --version
v1.19
$ pahole --numeric_version
119
$

  To avoid things like this in the kernel's scripts/link-vmlinux.sh:

pahole_ver=$(${PAHOLE} --version | sed -E 's/v([0-9]+)\.([0-9]+)/\1\2/')

- Try sole pfunct argument as a function name, just like pahole with type names:

$ pfunct tcp_v4_rcv
int tcp_v4_rcv(struct sk_buff * skb);
$

- Speed up pfunct using some of the load techniques used in pahole.

- Discard CUs after BTF encoding as they're not used anymore, greatly reducing
  memory usage and speeding up vmlinux BTF encoding.

- Revamp how per-CPU variables are encoded in BTF.

- Include BTF info for static functions.

- Use BTF's string APIs for strings management, greatly improving performance
  over the tsearch().

- Increase size of DWARF lookup hash table, shaving off about 1 second out of
  about 20 seconds total for Linux BTF dedup.

- Stop BTF encoding when errors are found in some DWARF CU.

- Implement --packed, to show just packed structures, for instance, here are
  the top 5 packed data structures in the Linux kernel:

  $ pahole --sizes --packed | sort -k2 -nr | head -5
  e820_table64004   0
  boot_params   40960
  efi_variable  20840
  snd_soc_tplg_pcm  912 0
  ntb_info_regs 800 0
  $

  And here is one of them:

  $ pahole efi_variable
  struct efi_variable {
efi_char16_t   VariableName[512];/* 0  1024 */
/* --- cacheline 16 boundary (1024 bytes) --- */
efi_guid_t VendorGuid;   /*  102416 */
long unsigned int  DataSize; /*  1040 8 */
__u8   Data[1024];   /*  1048  1024 */
/* --- cacheline 32 boundary (2048 bytes) was 24 bytes ago --- */
efi_status_t   Status;   /*  2072 8 */
__u32  Attributes;   /*  2080 4 */

/* size: 2084, cachelines: 33, members: 6 */
/* last cacheline: 36 bytes */
  } __attribute__((__packed__));
  $

- Fix bug in distros such as OpenSUSE:15.2 where DW_AT_alignment isn't defined.

Signed-off-by: Arnaldo Carvalho de Melo 


Re: [RFC 0/2] Introduce perf-stat -b for BPF programs

2020-11-19 Thread Arnaldo Carvalho de Melo
Em Wed, Nov 18, 2020 at 08:50:44PM -0800, Song Liu escreveu:
> This set introduces perf-stat -b option to count events for BPF programs.
> This is similar to bpftool-prog-profile. But perf-stat makes it much more
> flexible.
> 
> Sending as RFC because I would like comments on some key design choices:
>   1. We are using BPF skeletons here, which is by far the easiest way to
>  write and ship BPF programs. However, this requires bpftool, which
>  makes building perf slower.
>   2. A Makefile is added to tools/perf/util/bpf_skel/ to build bpftool,
>  and BPF skeletons. This keeps main perf Makefiles simple. But we may
>  not like it for some reason?

I'll review it in detail, but before that: thanks a lot for working on
this! Looks super cool from a first quick look. :-)

- Arnaldo
 
> Some known limitations (or work to be done):
>   1. Only counting events for one BPF program at a time.
>   2. Need extra logic in target__validate().

 
> Song Liu (2):
>   perf: support build BPF skeletons with perf
>   perf-stat: enable counting events for BPF programs
> 
>  tools/build/Makefile.feature  |   3 +-
>  tools/perf/Makefile.config|   8 +
>  tools/perf/Makefile.perf  |  15 +-
>  tools/perf/builtin-stat.c |  59 -
>  tools/perf/util/Build |   1 +
>  tools/perf/util/bpf_counter.c | 215 ++
>  tools/perf/util/bpf_counter.h |  71 ++
>  tools/perf/util/bpf_skel/.gitignore   |   3 +
>  tools/perf/util/bpf_skel/Makefile |  71 ++
>  .../util/bpf_skel/bpf_prog_profiler.bpf.c |  96 
>  tools/perf/util/bpf_skel/dummy.bpf.c  |  19 ++
>  tools/perf/util/evsel.c   |  10 +
>  tools/perf/util/evsel.h   |   5 +
>  tools/perf/util/target.h  |   6 +
>  14 files changed, 571 insertions(+), 11 deletions(-)
>  create mode 100644 tools/perf/util/bpf_counter.c
>  create mode 100644 tools/perf/util/bpf_counter.h
>  create mode 100644 tools/perf/util/bpf_skel/.gitignore
>  create mode 100644 tools/perf/util/bpf_skel/Makefile
>  create mode 100644 tools/perf/util/bpf_skel/bpf_prog_profiler.bpf.c
>  create mode 100644 tools/perf/util/bpf_skel/dummy.bpf.c
> 
> --
> 2.24.1

-- 

- Arnaldo


Re: [PATCH 06/24] perf tools: Add build_id__is_defined function

2020-11-17 Thread Arnaldo Carvalho de Melo
Em Tue, Nov 17, 2020 at 09:53:59PM +0100, Jiri Olsa escreveu:
> On Tue, Nov 17, 2020 at 11:00:37AM -0800, Ian Rogers wrote:
> > On Tue, Nov 17, 2020 at 3:01 AM Jiri Olsa  wrote:
> > 
> > > Adding build_id__is_defined helper to check build id
> > > is defined and is != zero build id.
> > >
> > > Signed-off-by: Jiri Olsa 
> > > ---
> > >  tools/perf/util/build-id.c | 7 +++
> > >  tools/perf/util/build-id.h | 1 +
> > >  2 files changed, 8 insertions(+)
> > >
> > > diff --git a/tools/perf/util/build-id.c b/tools/perf/util/build-id.c
> > > index 6b410c3d52dc..7d9ecc37849c 100644
> > > --- a/tools/perf/util/build-id.c
> > > +++ b/tools/perf/util/build-id.c
> > > @@ -912,3 +912,10 @@ void build_id__init(struct build_id *bid, const u8
> > > *data, size_t size)
> > > memcpy(bid->data, data, size);
> > > bid->size = size;
> > >  }
> > > +
> > > +bool build_id__is_defined(const struct build_id *bid)
> > > +{
> > > +   static u8 zero[BUILD_ID_SIZE];
> > > +
> > > +   return bid && bid->size ? memcmp(bid->data, &zero, bid->size) :
> > > false;

> > Fwiw, I find this method to test for zero a little hard to parse - I'm
> 
> heh, it's controversial one, Namhyung commented
> on this one in previous version, so I changed it ;-)
>   
> https://lore.kernel.org/lkml/cam9d7cjjgjtn8sdglz1poqz-suxwjnvandyove1yhxq46pr...@mail.gmail.com/

So, the kernel has an idiom for this in lib/string.c:

/**
 * memchr_inv - Find an unmatching character in an area of memory.
 * @start: The memory area
 * @c: Find a character other than c
 * @bytes: The size of the area.
 *
 * returns the address of the first character other than @c, or %NULL
 * if the whole buffer contains just @c.
 */
void *memchr_inv(const void *start, int c, size_t bytes)

No need for any array of some particular size :-)

Its been there for a while:

commit 798248206b59acc6e1238c778281419c041891a7
Author: Akinobu Mita 
Date:   Mon Oct 31 17:08:07 2011 -0700

lib/string.c: introduce memchr_inv()

memchr_inv() is mainly used to check whether the whole buffer is filled
with just a specified byte.

- Arnaldo
 
> 
> > failing as a C programmer :-) Nit, should zero be const?
> 
> right, should be const, will change


Re: [PATCH 23/24] perf buildid-list: Add support for mmap2's buildid events

2020-11-17 Thread Arnaldo Carvalho de Melo
Em Tue, Nov 17, 2020 at 04:21:40PM +0100, Jiri Olsa escreveu:
> On Tue, Nov 17, 2020 at 09:50:40AM -0300, Arnaldo Carvalho de Melo wrote:
> > Em Tue, Nov 17, 2020 at 12:00:52PM +0100, Jiri Olsa escreveu:
> > > Add buildid-list support for mmap2's build id data, so we can
> > > display build ids for dso objects for data without the build
> > > id cache update.
> > 
> > >   $ perf buildid-list
> > >   1805c738c8f3ec0f47b7ea09080c28f34d18a82b /usr/lib64/ld-2.31.so
> > >   d278249792061c6b74d1693ca59513be1def13f2 /usr/lib64/libc-2.31.so
> > > 
> > > By default only dso objects with hits are shown.
> > 
> > Would be interesting to be able to show all the build ids that are
> > there. a 'perf buildid-list --all' or make this under --force?
> 
> ok, will check.. one other tool I think would be handy is
> to show which debuginfo is not available, because it can
> change the report a lot - missing symbols are not getting
> accounted, and their hits are accounted only as separated
> addresses

Right, as below.

So you suggest something like:

  # perf buildid-cache --fetch-missing-debuginfo

?

- Arnaldo

[root@quaco ~]# rpm -qf `which stress-ng`
stress-ng-0.11.21-1.fc32.x86_64
[root@quaco ~]# rpm -q stress-ng-debuginfo
stress-ng-debuginfo-0.07.29-10.fc31.x86_64
[root@quaco ~]# perf record stress-ng -t 1 -c 1
stress-ng: info:  [656926] dispatching hogs: 1 cpu
stress-ng: info:  [656926] successful run completed in 1.02s
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.204 MB perf.data (4082 samples) ]
[root@quaco ~]# perf report | head -20
# To display the perf.data header info, please use --header/--header-only 
options.
#
#
# Total Lost Samples: 0
#
# Samples: 4K of event 'cycles'
# Event count (approx.): 3997318603
#
# Overhead  CommandShared Object Symbol
#   .    ..
#
 7.91%  stress-ng-cpu  stress-ng [.] 0x00035ed9
 7.15%  stress-ng-cpu  stress-ng [.] 0x00035ecc
 6.54%  stress-ng-cpu  stress-ng [.] 0x0003bbf6
 4.39%  stress-ng-cpu  stress-ng [.] 0x0003a083
 4.15%  stress-ng-cpu  stress-ng [.] 0x00065ed8
 3.67%  stress-ng-cpu  stress-ng [.] 0x00065ecf
 3.41%  stress-ng-cpu  stress-ng [.] 0x00065ee1
 3.11%  stress-ng-cpu  stress-ng [.] 0x0003bbf2
 2.65%  stress-ng-cpu  stress-ng [.] 0x0003a07b
[root@quaco ~]#


So the above is with a stress-ng-debuginfo package that doesn't matches
the binary installed, so build-id checkign fails, resolving symbols
fail, then:

[root@quaco ~]# rpm -q stress-ng-debuginfo
stress-ng-debuginfo-0.11.21-1.fc32.x86_64
[root@quaco ~]# rpm -q stress-ng
stress-ng-0.11.21-1.fc32.x86_64
[root@quaco ~]#

[root@quaco ~]# rpm -q stress-ng-debuginfo
stress-ng-debuginfo-0.11.21-1.fc32.x86_64
[root@quaco ~]# rpm -q stress-ng
stress-ng-0.11.21-1.fc32.x86_64
[root@quaco ~]# perf report | head -20
# To display the perf.data header info, please use --header/--header-only 
options.
#
#
# Total Lost Samples: 0
#
# Samples: 4K of event 'cycles'
# Event count (approx.): 3997318603
#
# Overhead  CommandShared Object  Symbol
#   .  .  
.
#
21.48%  stress-ng-cpu  stress-ng  [.] is_prime
16.02%  stress-ng-cpu  stress-ng  [.] stress_cpu_sieve
12.61%  stress-ng-cpu  stress-ng  [.] stress_cpu_cpuid.sse2.2
11.94%  stress-ng-cpu  stress-ng  [.] ackermann
 8.24%  stress-ng-cpu  stress-ng  [.] stress_cpu_correlate
 3.82%  stress-ng-cpu  stress-ng  [.] queens_try
 2.63%  stress-ng-cpu  stress-ng  [.] stress_cpu_nsqrt.sse2.2
 2.46%  stress-ng-cpu  stress-ng  [.] ccitt_crc16
 2.25%  stress-ng-cpu  stress-ng  [.] 
stress_cpu_complex_long_double.sse2.2
[root@quaco ~]#

[root@quaco ~]# perf report -v | head -20
build id event received for vmlinux: f72ec65d81949c5ba63ccaa16b59c79d1696bc4d 
[20]
build id event received for /usr/bin/stress-ng: 
82b81bd823dcac393292faaaf40997723ce358a8 [20]
build id event received for [vdso]: a1f89b9b9d2093ae926c550a7de060d435277fbf 
[20]
build id event received for /usr/lib64/libm-2.31.so: 
fdf1f1d0761b7392e419d5d72e43d3fd3db6e184 [20]
build id event received for /usr/lib64/libc-2.31.so: 
d278249792061c6b74d1693ca59513be1def13f2 [20]
Looking at the vmlinux_path (8 entries long)
symsrc__init: build id mismatch for vmlinux.
Using /usr/lib/debug/lib/modules/5.8.14-200.fc32.x86_64/vmlinux for symbols
# To display the perf.data header info, please use --header/--header-only 

Re: [PATCH 13/24] perf tools: Allow mmap2 event to synthesize kernel image

2020-11-17 Thread Arnaldo Carvalho de Melo
Em Tue, Nov 17, 2020 at 04:16:51PM +0100, Jiri Olsa escreveu:
> On Tue, Nov 17, 2020 at 09:44:37AM -0300, Arnaldo Carvalho de Melo wrote:
> > Em Tue, Nov 17, 2020 at 12:00:42PM +0100, Jiri Olsa escreveu:
> > > Allow mmap2 event to synthesize kernel image,
> > > so we can synthesize kernel build id data in
> > > following changes.
> > > 
> > > It's enabled by new symbol_conf.buildid_mmap2
> > > bool, which will be switched in following
> > > changes.
> > 
> > Why make this an option? MMAP2 goes back years:
> > 
> > 13d7a2410fa637f45 (Stephane Eranian 2013-08-21 12:10:24 +0200  904) 
> >  * The MMAP2 records are an augmented version of MMAP, they add
> > 13d7a2410fa637f45 (Stephane Eranian 2013-08-21 12:10:24 +0200  905) 
> >  * maj, min, ino numbers to be used to uniquely identify each mapping
> > 
> > Also we unconditionally generate MMAP2 events if the kernel supports it,
> > from evsel__config():
> > 
> >   attr->mmap  = track;
> >   attr->mmap2 = track && !perf_missing_features.mmap2;
> > 
> > So perhaps we should reuse that logic? I.e. use mmap2 if the kernel
> > supports it?
> 
> mmap2 itself is not a problem, the problem is the new
> bit (PERF_RECORD_MISC_MMAP_BUILD_ID) that says there's
> build id in mmap2.. older perf tool won't understand
> that and report will crash

Is this theoretical or have you experienced it?

Would be good to tweak the perf.data reader code to not crash on unknown
bits like that :-\

But by looking at machine__process_mmap2_event() I couldn't imagine how
that would crash.

It would get bogus maj, min, ino, ino_generation, but probably that
wouldn't make it crash.

- Arnaldo

int machine__process_mmap2_event(struct machine *machine,
 union perf_event *event,
 struct perf_sample *sample)
{
struct thread *thread;
struct map *map;
struct dso_id dso_id = {
.maj = event->mmap2.maj,
.min = event->mmap2.min,
.ino = event->mmap2.ino,
.ino_generation = event->mmap2.ino_generation,
}; 
int ret = 0;   

if (dump_trace)
perf_event__fprintf_mmap2(event, stdout);
 
if (sample->cpumode == PERF_RECORD_MISC_GUEST_KERNEL ||
sample->cpumode == PERF_RECORD_MISC_KERNEL) {
ret = machine__process_kernel_mmap_event(machine, event);
if (ret < 0)
goto out_problem;
return 0;
}
 
thread = machine__findnew_thread(machine, event->mmap2.pid,
event->mmap2.tid);
if (thread == NULL)
goto out_problem;

map = map__new(machine, event->mmap2.start,
event->mmap2.len, event->mmap2.pgoff,
&dso_id, event->mmap2.prot,
event->mmap2.flags,
event->mmap2.filename, thread);

if (map == NULL)
goto out_problem_map;

ret = thread__insert_map(thread, map);
if (ret)
goto out_problem_insert;

thread__put(thread);
map__put(map);
return 0;

out_problem_insert:
map__put(map);
out_problem_map:
thread__put(thread);
out_problem:
dump_printf("problem processing PERF_RECORD_MMAP2, skipping event.\n");
return 0;
}


[GIT PULL] perf tools fixes for v5.10: 3rd batch

2020-11-17 Thread Arnaldo Carvalho de Melo
Hi Linus,

Please consider pulling,

Best regards,

- Arnaldo

The following changes since commit af5043c89a8ef6b6949a245fff355a552eaed240:

  Merge tag 'acpi-5.10-rc4' of 
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm (2020-11-12 
11:06:53 -0800)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git 
tags/perf-tools-fixes-for-v5.10-2020-11-17

for you to fetch changes up to 568beb27959b0515d325ea1c6cf211eed2d66740:

  perf test: Avoid an msan warning in a copied stack. (2020-11-16 14:10:58 
-0300)


perf tools updates for v5.10: 3rd batch.

- Fix file corruption due to event deletion in 'perf inject'.

- Update arch/x86/lib/mem{cpy,set}_64.S copies used in 'perf bench mem
  memcpy', silencing perf build warning.

- Avoid an msan warning in a copied stack in 'perf test'.

- Correct tracepoint field name "flags" in ARM's CS-ETM hardware tracing
  'perf test' entry.

- Update branch sample pattern for cs-etm to cope with excluding guest
  in userspace counting.

- Don't free "lock_seq_stat" if read_count isn't zero in 'perf lock'.

Signed-off-by: Arnaldo Carvalho de Melo 

Test results at the signed tag at:

https://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git/tag/?h=perf-tools-fixes-for-v5.10-2020-11-17

--------
Al Grant (1):
  perf inject: Fix file corruption due to event deletion

Arnaldo Carvalho de Melo (1):
  tools arch: Update arch/x86/lib/mem{cpy,set}_64.S copies used in 'perf 
bench mem memcpy'

Ian Rogers (1):
  perf test: Avoid an msan warning in a copied stack.

Leo Yan (4):
  perf lock: Correct field name "flags"
  perf lock: Don't free "lock_seq_stat" if read_count isn't zero
  perf test: Fix a typo in cs-etm testing
  perf test: Update branch sample pattern for cs-etm

 tools/arch/x86/lib/memcpy_64.S   |  8 +++-
 tools/arch/x86/lib/memset_64.S   | 11 ++-
 tools/perf/arch/x86/tests/dwarf-unwind.c |  7 +++
 tools/perf/bench/mem-memcpy-x86-64-asm.S |  3 +++
 tools/perf/bench/mem-memset-x86-64-asm.S |  3 +++
 tools/perf/builtin-inject.c  | 12 +---
 tools/perf/builtin-lock.c|  4 ++--
 tools/perf/tests/shell/test_arm_coresight.sh |  4 ++--
 tools/perf/util/include/linux/linkage.h  |  7 +++
 9 files changed, 34 insertions(+), 25 deletions(-)


Re: [PATCH 23/24] perf buildid-list: Add support for mmap2's buildid events

2020-11-17 Thread Arnaldo Carvalho de Melo
Em Tue, Nov 17, 2020 at 12:00:52PM +0100, Jiri Olsa escreveu:
> Add buildid-list support for mmap2's build id data, so we can
> display build ids for dso objects for data without the build
> id cache update.

>   $ perf buildid-list
>   1805c738c8f3ec0f47b7ea09080c28f34d18a82b /usr/lib64/ld-2.31.so
>   d278249792061c6b74d1693ca59513be1def13f2 /usr/lib64/libc-2.31.so
> 
> By default only dso objects with hits are shown.

Would be interesting to be able to show all the build ids that are
there. a 'perf buildid-list --all' or make this under --force?

- Arnaldo
 
> Signed-off-by: Jiri Olsa 
> ---
>  tools/perf/builtin-buildid-list.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/tools/perf/builtin-buildid-list.c 
> b/tools/perf/builtin-buildid-list.c
> index e3ef75583514..87f5b1a4a7fa 100644
> --- a/tools/perf/builtin-buildid-list.c
> +++ b/tools/perf/builtin-buildid-list.c
> @@ -77,6 +77,9 @@ static int perf_session__list_build_ids(bool force, bool 
> with_hits)
>   perf_header__has_feat(&session->header, HEADER_AUXTRACE))
>   with_hits = false;
>  
> + if (!perf_header__has_feat(&session->header, HEADER_BUILD_ID))
> + with_hits = true;
> +
>   /*
>* in pipe-mode, the only way to get the buildids is to parse
>* the record stream. Buildids are stored as RECORD_HEADER_BUILD_ID
> -- 
> 2.26.2
> 

-- 

- Arnaldo


Re: [PATCH 21/24] perf buildid-cache: Add support to add build ids from perf data

2020-11-17 Thread Arnaldo Carvalho de Melo
Em Tue, Nov 17, 2020 at 12:00:50PM +0100, Jiri Olsa escreveu:
> Adding support to specify perf data file as -a option file
> argument,
> 
> If the file is detected to be perf data file, it is processed
> and all dso objects with sample hit are stored to the build
> id cache.
> 
>   $ DEBUGINFOD_URLS=http://192.168.122.174:8002 perf buildid-cache -a 
> perf.data
>   OK   5dcec522abf136fcfd3128f47e131f2365834dd7 
> /home/jolsa/.debug/.build-id/5d/cec522abf136fcfd3128f47e131f2365834dd7/elf
>   OK   5784f813b727a50cfd3363234aef9fcbab685cc4 
> /lib/modules/5.10.0-rc2speed+/kernel/fs/xfs/xfs.ko
> 
> By default we store only dso with hits, but it's possible to
> specify 'all' to store all dso objects, like:
> -a perf.data,all
> 
>   $ DEBUGINFOD_URLS=http://192.168.122.174:8002 perf buildid-cache -a 
> perf.data,all
>   OK   5dcec522abf136fcfd3128f47e131f2365834dd7 
> /home/jolsa/.debug/.build-id/5d/cec522abf136fcfd3128f47e131f2365834dd7/elf
>   OK   6ce92dc7c31f12fe5b7775a2bb8b14a3546ce2cd 
> /lib/modules/5.10.0-rc2speed+/kernel/drivers/firmware/qemu_fw_cfg.ko
>   OK   bf3f6d32dccc159f841fc3658c241d0e74c61fbb 
> /lib/modules/5.10.0-rc2speed+/kernel/drivers/block/virtio_blk.ko
>   OK   e896b4329cf9f190f1a0fae933f425ff8f71b052 
> /lib/modules/5.10.0-rc2speed+/kernel/drivers/char/virtio_console.ko
>   OK   5bedc933cb59e053ecb472f327bd73c548364479 
> /lib/modules/5.10.0-rc2speed+/kernel/drivers/input/serio/serio_raw.ko
>   OK   cecc506368a8b7a473a5f900d26f0d3d914a9c23 
> /lib/modules/5.10.0-rc2speed+/kernel/arch/x86/crypto/crc32c-intel.ko
>   OK   91076fb3646d061a0a42cf7bddb339a665ee4f80 
> /lib/modules/5.10.0-rc2speed+/kernel/arch/x86/crypto/ghash-clmulni-intel.ko
>   OK   4e2a304d788bb8e2e950bc82a5944e042afa0bf2 
> /lib/modules/5.10.0-rc2speed+/kernel/drivers/media/cec/core/cec.ko
>   OK   31ab0da5ad81e6803280177f507a95f3053d585e 
> /lib/modules/5.10.0-rc2speed+/kernel/lib/libcrc32c.ko
>   OK   f6154bca47c149f48c942fcc3d653041dd285c65 
> /lib/modules/5.10.0-rc2speed+/kernel/drivers/gpu/drm/ttm/ttm.ko
>   OK   723f5852de81590d54b23b38c160d3618b41951b 
> /lib/modules/5.10.0-rc2speed+/kernel/arch/x86/crypto/crct10dif-pclmul.ko
>   OK   06b1eab7f141cbc3e5a5db47909c8ab5cb242e40 
> /lib/modules/5.10.0-rc2speed+/kernel/drivers/gpu/drm/drm_ttm_helper.ko
>   OK   38292b862cf3ff87489508fdb4895efa45780813 
> /lib/modules/5.10.0-rc2speed+/kernel/drivers/gpu/drm/qxl/qxl.ko
>   OK   cdf51e58609bf2ce4837a7b195e0ccae0a930907 
> /lib/modules/5.10.0-rc2speed+/kernel/arch/x86/crypto/crc32-pclmul.ko
>   OK   5ca8958388f6688452ecc2cb83d6031394c659ad 
> /lib/modules/5.10.0-rc2speed+/kernel/drivers/gpu/drm/drm.ko
>   OK   236bc4e4f38bf3559007566cb32b3dcc1bc28d2d 
> /lib/modules/5.10.0-rc2speed+/kernel/drivers/gpu/drm/drm_kms_helper.ko
>   OK   5784f813b727a50cfd3363234aef9fcbab685cc4 
> /lib/modules/5.10.0-rc2speed+/kernel/fs/xfs/xfs.ko
>   OK   66db2be3efaa43bb5a5c481986e9554e1885cc69 /usr/lib/systemd/systemd
>   OK   7db607d9f2de89860d9639712da64c8bacd31e4b /usr/lib64/libm-2.30.so
>   OK   55b5f9652e1d17c1dd58f62628d5063428e5db91 /usr/lib64/libudev.so.1.6.15
>   OK   63b97070bf097130713bb6c89cf7100b5f3c9b17 
> /usr/lib64/libunistring.so.2.1.0
>   ...

This is a cool feature! :-)

- Arnaldo

> 
> Once perf data is specified, no other file can be specified in
> the option, otherwise it causes syntax error.
> 
> Signed-off-by: Jiri Olsa 
> ---
>  .../perf/Documentation/perf-buildid-cache.txt |  12 +-
>  tools/perf/builtin-buildid-cache.c| 215 +-
>  tools/perf/util/probe-event.c |   6 +-
>  3 files changed, 227 insertions(+), 6 deletions(-)
> 
> diff --git a/tools/perf/Documentation/perf-buildid-cache.txt 
> b/tools/perf/Documentation/perf-buildid-cache.txt
> index f6de0952ff3c..b77da5138bca 100644
> --- a/tools/perf/Documentation/perf-buildid-cache.txt
> +++ b/tools/perf/Documentation/perf-buildid-cache.txt
> @@ -23,7 +23,17 @@ OPTIONS
>  ---
>  -a::
>  --add=::
> -Add specified file to the cache.
> +Add specified file or perf.data binaries to the cache.
> +
> +If the file is detected to be perf data file, it is processed
> +and all dso objects with sample hit are stored to the cache.
> +
> +It's possible to specify 'all' to store all dso objects, like:
> +-a perf.data,all
> +
> +Once perf data is specified, no other file can be specified in
> +the option, otherwise it causes syntax error.
> +
>  -f::
>  --force::
>   Don't complain, do it.
> diff --git a/tools/perf/builtin-buildid-cache.c 
> b/tools/perf/builtin-buildid-cache.c
> index a25411926e48..0bfb54ee1e5e 100644
> --- a/tools/perf/builtin-buildid-cache.c
> +++ b/tools/perf/builtin-buildid-cache.c
> @@ -29,6 +29,11 @@
>  #include "util/probe-file.h"
>  #include 
>  #include 
> +#include 
> +#include 
> +#ifdef HAVE_DEBUGINFOD_SUPPORT
> +#include 
> +#endif
>  
>  static int build_id_cache__kcore_buildid(const char *proc_dir, char 
> *sbuildid)
>  {
> @@ -348,

Re: [PATCH 15/24] perf tools: Synthesize build id for kernel/modules/tasks

2020-11-17 Thread Arnaldo Carvalho de Melo
Em Tue, Nov 17, 2020 at 12:00:44PM +0100, Jiri Olsa escreveu:
> Adding build id to synthesized mmap2 events for
> everything - kernel/modules/tasks.
> 
> Signed-off-by: Jiri Olsa 
> ---
>  tools/perf/util/synthetic-events.c | 32 ++
>  1 file changed, 32 insertions(+)
> 
> diff --git a/tools/perf/util/synthetic-events.c 
> b/tools/perf/util/synthetic-events.c
> index a18ae502d765..91b1962d399c 100644
> --- a/tools/perf/util/synthetic-events.c
> +++ b/tools/perf/util/synthetic-events.c
> @@ -347,6 +347,31 @@ static bool read_proc_maps_line(struct io *io, __u64 
> *start, __u64 *end,
>   }
>  }
>  
> +static void perf_record_mmap2__read_build_id(struct perf_record_mmap2 *event,
> +  bool is_kernel)
> +{
> + struct build_id bid;
> + int rc;
> +
> + if (is_kernel)
> + rc = sysfs__read_build_id("/sys/kernel/notes", &bid);
> + else
> + rc = filename__read_build_id(event->filename, &bid) > 0 ? 0 : 
> -1;
> +
> + if (rc == 0) {
> + memcpy(event->build_id, bid.data, sizeof(bid.data));
> + event->build_id_size = (u8) bid.size;
> + event->header.misc |= PERF_RECORD_MISC_MMAP_BUILD_ID;
> + event->__reserved_1 = 0;
> + event->__reserved_2 = 0;
> + } else {
> + if (event->filename[0] == '/') {
> + pr_debug2("Failed to read build ID for %s\n",
> +   event->filename);
> + }
> + }
> +}
> +
>  int perf_event__synthesize_mmap_events(struct perf_tool *tool,
>  union perf_event *event,
>  pid_t pid, pid_t tgid,
> @@ -453,6 +478,9 @@ int perf_event__synthesize_mmap_events(struct perf_tool 
> *tool,
>   event->mmap2.pid = tgid;
>   event->mmap2.tid = pid;
>  
> + if (symbol_conf.buildid_mmap2)
> + perf_record_mmap2__read_build_id(&event->mmap2, false);

Ditto

>   if (perf_tool__process_synth_event(tool, event, machine, 
> process) != 0) {
>   rc = -1;
>   break;
> @@ -630,6 +658,8 @@ int perf_event__synthesize_modules(struct perf_tool 
> *tool, perf_event__handler_t
>  
>   memcpy(event->mmap2.filename, pos->dso->long_name,
>  pos->dso->long_name_len + 1);
> +
> + perf_record_mmap2__read_build_id(&event->mmap2, false);
>   } else {
>   size = PERF_ALIGN(pos->dso->long_name_len + 1, 
> sizeof(u64));
>   event->mmap.header.type = PERF_RECORD_MMAP;
> @@ -1050,6 +1080,8 @@ static int __perf_event__synthesize_kernel_mmap(struct 
> perf_tool *tool,
>   event->mmap2.start = map->start;
>   event->mmap2.len   = map->end - event->mmap.start;
>   event->mmap2.pid   = machine->pid;
> +
> + perf_record_mmap2__read_build_id(&event->mmap2, true);
>   } else {
>   size = snprintf(event->mmap.filename, 
> sizeof(event->mmap.filename),
>   "%s%s", machine->mmap_name, 
> kmap->ref_reloc_sym->name) + 1;
> -- 
> 2.26.2
> 

-- 

- Arnaldo


Re: [PATCH 22/24] perf buildid-cache: Add --debuginfod option

2020-11-17 Thread Arnaldo Carvalho de Melo
Em Tue, Nov 17, 2020 at 12:00:51PM +0100, Jiri Olsa escreveu:
> Adding --debuginfod option to specify debuginfod url and
> support to do that through config file as well.
> 
> Use following in ~/.perfconfig file:
> 
>   [buildid-cache]
>   debuginfod=http://192.168.122.174:8002
 
Ditto, its cool this is getting nicely integrated :-)

- Arnaldo

> Signed-off-by: Jiri Olsa 
> ---
>  .../perf/Documentation/perf-buildid-cache.txt |  6 
>  tools/perf/Documentation/perf-config.txt  |  7 +
>  tools/perf/builtin-buildid-cache.c| 28 +--
>  3 files changed, 38 insertions(+), 3 deletions(-)
> 
> diff --git a/tools/perf/Documentation/perf-buildid-cache.txt 
> b/tools/perf/Documentation/perf-buildid-cache.txt
> index b77da5138bca..75385f4dc11f 100644
> --- a/tools/perf/Documentation/perf-buildid-cache.txt
> +++ b/tools/perf/Documentation/perf-buildid-cache.txt
> @@ -84,6 +84,12 @@ OPTIONS
>   used when creating a uprobe for a process that resides in a
>   different mount namespace from the perf(1) utility.
>  
> +--debuginfod=URL::
> + Specify debuginfod URL to be used when retrieving perf.data binaries,
> + it follows the same syntax as the DEBUGINFOD_URLS variable, like:
> +
> +   buildid-cache.debuginfod=http://192.168.122.174:8002
> +
>  SEE ALSO
>  
>  linkperf:perf-record[1], linkperf:perf-report[1], 
> linkperf:perf-buildid-list[1]
> diff --git a/tools/perf/Documentation/perf-config.txt 
> b/tools/perf/Documentation/perf-config.txt
> index 31069d8a5304..15fad32b9885 100644
> --- a/tools/perf/Documentation/perf-config.txt
> +++ b/tools/perf/Documentation/perf-config.txt
> @@ -238,6 +238,13 @@ buildid.*::
>   cache location, or to disable it altogether. If you want to 
> disable it,
>   set buildid.dir to /dev/null. The default is $HOME/.debug
>  
> +buildid-cache.*::
> + buildid-cache.debuginfod=URL
> + Specify debuginfod URL to be used when retrieving perf.data 
> binaries,
> + it follows the same syntax as the DEBUGINFOD_URLS variable, 
> like:
> +
> +   buildid-cache.debuginfod=http://192.168.122.174:8002
> +
>  annotate.*::
>   These are in control of addresses, jump function, source code
>   in lines of assembly code from a specific program.
> diff --git a/tools/perf/builtin-buildid-cache.c 
> b/tools/perf/builtin-buildid-cache.c
> index 0bfb54ee1e5e..fc03de7d2a28 100644
> --- a/tools/perf/builtin-buildid-cache.c
> +++ b/tools/perf/builtin-buildid-cache.c
> @@ -27,6 +27,7 @@
>  #include "util/time-utils.h"
>  #include "util/util.h"
>  #include "util/probe-file.h"
> +#include "util/config.h"
>  #include 
>  #include 
>  #include 
> @@ -552,12 +553,21 @@ build_id_cache__add_perf_data(const char *path, bool 
> all)
>   return err;
>  }
>  
> +static int perf_buildid_cache_config(const char *var, const char *value, 
> void *cb)
> +{
> + const char **debuginfod = cb;
> +
> + if (!strcmp(var, "buildid-cache.debuginfod"))
> + *debuginfod = strdup(value);
> +
> + return 0;
> +}
> +
>  int cmd_buildid_cache(int argc, const char **argv)
>  {
>   struct strlist *list;
>   struct str_node *pos;
> - int ret = 0;
> - int ns_id = -1;
> + int ret, ns_id = -1;
>   bool force = false;
>   bool list_files = false;
>   bool opts_flag = false;
> @@ -567,7 +577,8 @@ int cmd_buildid_cache(int argc, const char **argv)
>  *purge_name_list_str = NULL,
>  *missing_filename = NULL,
>  *update_name_list_str = NULL,
> -*kcore_filename = NULL;
> +*kcore_filename = NULL,
> +*debuginfod = NULL;
>   char sbuf[STRERR_BUFSIZE];
>  
>   struct perf_data data = {
> @@ -592,6 +603,8 @@ int cmd_buildid_cache(int argc, const char **argv)
>   OPT_BOOLEAN('f', "force", &force, "don't complain, do it"),
>   OPT_STRING('u', "update", &update_name_list_str, "file list",
>   "file(s) to update"),
> + OPT_STRING(0, "debuginfod", &debuginfod, "debuginfod url",
> + "set debuginfod url"),
>   OPT_INCR('v', "verbose", &verbose, "be more verbose"),
>   OPT_INTEGER(0, "target-ns", &ns_id, "target pid for namespace context"),
>   OPT_END()
> @@ -601,6 +614,10 @@ int cmd_buildid_cache(int argc, const char **argv)
>   NULL
>   };
>  
> + ret = perf_config(perf_buildid_cache_config, &debuginfod);
> + if (ret)
> + return ret;
> +
>   argc = parse_options(argc, argv, buildid_cache_options,
>buildid_cache_usage, 0);
>  
> @@ -612,6 +629,11 @@ int cmd_buildid_cache(int argc, const char **argv)
>   if (argc || !(list_files || opts_flag))
>   usage_with_options(buildid_cache_usage, buildid_cache_options);
>  
> + if (debuginfod) {
> + pr_debug("DEBUGINFOD_URLS=%s\n", debuginfod);
> + setenv("DEBUGINFOD_URLS"

Re: [PATCH 13/24] perf tools: Allow mmap2 event to synthesize kernel image

2020-11-17 Thread Arnaldo Carvalho de Melo
Em Tue, Nov 17, 2020 at 12:00:42PM +0100, Jiri Olsa escreveu:
> Allow mmap2 event to synthesize kernel image,
> so we can synthesize kernel build id data in
> following changes.
> 
> It's enabled by new symbol_conf.buildid_mmap2
> bool, which will be switched in following
> changes.

Why make this an option? MMAP2 goes back years:

13d7a2410fa637f45 (Stephane Eranian 2013-08-21 12:10:24 +0200  904) 
 * The MMAP2 records are an augmented version of MMAP, they add
13d7a2410fa637f45 (Stephane Eranian 2013-08-21 12:10:24 +0200  905) 
 * maj, min, ino numbers to be used to uniquely identify each mapping

Also we unconditionally generate MMAP2 events if the kernel supports it,
from evsel__config():

  attr->mmap  = track;
  attr->mmap2 = track && !perf_missing_features.mmap2;

So perhaps we should reuse that logic? I.e. use mmap2 if the kernel
supports it?

- Arnaldo
 
> Signed-off-by: Jiri Olsa 
> ---
>  tools/perf/util/symbol_conf.h  |  3 ++-
>  tools/perf/util/synthetic-events.c | 40 --
>  2 files changed, 29 insertions(+), 14 deletions(-)
> 
> diff --git a/tools/perf/util/symbol_conf.h b/tools/perf/util/symbol_conf.h
> index b916afb95ec5..b18f9c8dbb75 100644
> --- a/tools/perf/util/symbol_conf.h
> +++ b/tools/perf/util/symbol_conf.h
> @@ -42,7 +42,8 @@ struct symbol_conf {
>   report_block,
>   report_individual_block,
>   inline_name,
> - disable_add2line_warn;
> + disable_add2line_warn,
> + buildid_mmap2;
>   const char  *vmlinux_name,
>   *kallsyms_name,
>   *source_prefix,
> diff --git a/tools/perf/util/synthetic-events.c 
> b/tools/perf/util/synthetic-events.c
> index 8a23391558cf..872df6d6dbef 100644
> --- a/tools/perf/util/synthetic-events.c
> +++ b/tools/perf/util/synthetic-events.c
> @@ -988,11 +988,12 @@ static int __perf_event__synthesize_kernel_mmap(struct 
> perf_tool *tool,
>   perf_event__handler_t process,
>   struct machine *machine)
>  {
> - size_t size;
> + union perf_event *event;
> + size_t size = symbol_conf.buildid_mmap2 ?
> + sizeof(event->mmap2) : sizeof(event->mmap);
>   struct map *map = machine__kernel_map(machine);
>   struct kmap *kmap;
>   int err;
> - union perf_event *event;
>  
>   if (map == NULL)
>   return -1;
> @@ -1006,7 +1007,7 @@ static int __perf_event__synthesize_kernel_mmap(struct 
> perf_tool *tool,
>* available use this, and after it is use this as a fallback for older
>* kernels.
>*/
> - event = zalloc((sizeof(event->mmap) + machine->id_hdr_size));
> + event = zalloc(size + machine->id_hdr_size);
>   if (event == NULL) {
>   pr_debug("Not enough memory synthesizing mmap event "
>"for kernel modules\n");
> @@ -1023,16 +1024,29 @@ static int 
> __perf_event__synthesize_kernel_mmap(struct perf_tool *tool,
>   event->header.misc = PERF_RECORD_MISC_GUEST_KERNEL;
>   }
>  
> - size = snprintf(event->mmap.filename, sizeof(event->mmap.filename),
> - "%s%s", machine->mmap_name, kmap->ref_reloc_sym->name) 
> + 1;
> - size = PERF_ALIGN(size, sizeof(u64));
> - event->mmap.header.type = PERF_RECORD_MMAP;
> - event->mmap.header.size = (sizeof(event->mmap) -
> - (sizeof(event->mmap.filename) - size) + 
> machine->id_hdr_size);
> - event->mmap.pgoff = kmap->ref_reloc_sym->addr;
> - event->mmap.start = map->start;
> - event->mmap.len   = map->end - event->mmap.start;
> - event->mmap.pid   = machine->pid;
> + if (symbol_conf.buildid_mmap2) {
> + size = snprintf(event->mmap2.filename, 
> sizeof(event->mmap2.filename),
> + "%s%s", machine->mmap_name, 
> kmap->ref_reloc_sym->name) + 1;
> + size = PERF_ALIGN(size, sizeof(u64));
> + event->mmap2.header.type = PERF_RECORD_MMAP2;
> + event->mmap2.header.size = (sizeof(event->mmap2) -
> + (sizeof(event->mmap2.filename) - size) + 
> machine->id_hdr_size);
> + event->mmap2.pgoff = kmap->ref_reloc_sym->addr;
> + event->mmap2.start = map->start;
> + event->mmap2.len   = map->end - event->mmap.start;
> + event->mmap2.pid   = machine->pid;
> + } else {
> + size = snprintf(event->mmap.filename, 
> sizeof(event->mmap.filename),
> + "%s%s", machine->mmap_name, 
> kmap->ref_reloc_sym->name) + 1;
> + size = PERF_ALIGN(size, sizeof(u64));
> + event->mmap.header.type = PERF_RECORD_MMAP;
> + event->mmap.header.size = (sizeof(event->mmap) -
> + (sizeof(event->mmap.fil

Re: [PATCH 14/24] perf tools: Allow mmap2 event to synthesize modules

2020-11-17 Thread Arnaldo Carvalho de Melo
Em Tue, Nov 17, 2020 at 12:00:43PM +0100, Jiri Olsa escreveu:
> Allow mmap2 event to synthesize kernel modules,
> so we can synthesize module's build id data in
> following changes.
> 
> It's enabled by new symbol_conf.buildid_mmap2
> bool, which will be switched in following
> changes.

Ditto as for the kernel mmap event, don't we do this probing before
generating the synthetic events? If not perhaps we should, to avoid
synthesizing things and then failing on creating the events? If we do it
that way, we can switch from symbol_conf.buildid_mmap2 to
!perf_missing_features.mmap2.

- Arnaldo
 
> Signed-off-by: Jiri Olsa 
> ---
>  tools/perf/util/synthetic-events.c | 49 +++---
>  1 file changed, 32 insertions(+), 17 deletions(-)
> 
> diff --git a/tools/perf/util/synthetic-events.c 
> b/tools/perf/util/synthetic-events.c
> index 872df6d6dbef..a18ae502d765 100644
> --- a/tools/perf/util/synthetic-events.c
> +++ b/tools/perf/util/synthetic-events.c
> @@ -593,16 +593,17 @@ int perf_event__synthesize_modules(struct perf_tool 
> *tool, perf_event__handler_t
>   int rc = 0;
>   struct map *pos;
>   struct maps *maps = machine__kernel_maps(machine);
> - union perf_event *event = zalloc((sizeof(event->mmap) +
> -   machine->id_hdr_size));
> + union perf_event *event;
> + size_t size = symbol_conf.buildid_mmap2 ?
> + sizeof(event->mmap2) : sizeof(event->mmap);
> +
> + event = zalloc(size + machine->id_hdr_size);
>   if (event == NULL) {
>   pr_debug("Not enough memory synthesizing mmap event "
>"for kernel modules\n");
>   return -1;
>   }
>  
> - event->header.type = PERF_RECORD_MMAP;
> -
>   /*
>* kernel uses 0 for user space maps, see kernel/perf_event.c
>* __perf_event_mmap
> @@ -613,23 +614,37 @@ int perf_event__synthesize_modules(struct perf_tool 
> *tool, perf_event__handler_t
>   event->header.misc = PERF_RECORD_MISC_GUEST_KERNEL;
>  
>   maps__for_each_entry(maps, pos) {
> - size_t size;
> -
>   if (!__map__is_kmodule(pos))
>   continue;
>  
> - size = PERF_ALIGN(pos->dso->long_name_len + 1, sizeof(u64));
> - event->mmap.header.type = PERF_RECORD_MMAP;
> - event->mmap.header.size = (sizeof(event->mmap) -
> - (sizeof(event->mmap.filename) - size));
> - memset(event->mmap.filename + size, 0, machine->id_hdr_size);
> - event->mmap.header.size += machine->id_hdr_size;
> - event->mmap.start = pos->start;
> - event->mmap.len   = pos->end - pos->start;
> - event->mmap.pid   = machine->pid;
> + if (symbol_conf.buildid_mmap2) {
> + size = PERF_ALIGN(pos->dso->long_name_len + 1, 
> sizeof(u64));
> + event->mmap2.header.type = PERF_RECORD_MMAP2;
> + event->mmap2.header.size = (sizeof(event->mmap2) -
> + (sizeof(event->mmap2.filename) 
> - size));
> + memset(event->mmap2.filename + size, 0, 
> machine->id_hdr_size);
> + event->mmap2.header.size += machine->id_hdr_size;
> + event->mmap2.start = pos->start;
> + event->mmap2.len   = pos->end - pos->start;
> + event->mmap2.pid   = machine->pid;
> +
> + memcpy(event->mmap2.filename, pos->dso->long_name,
> +pos->dso->long_name_len + 1);
> + } else {
> + size = PERF_ALIGN(pos->dso->long_name_len + 1, 
> sizeof(u64));
> + event->mmap.header.type = PERF_RECORD_MMAP;
> + event->mmap.header.size = (sizeof(event->mmap) -
> + (sizeof(event->mmap.filename) - 
> size));
> + memset(event->mmap.filename + size, 0, 
> machine->id_hdr_size);
> + event->mmap.header.size += machine->id_hdr_size;
> + event->mmap.start = pos->start;
> + event->mmap.len   = pos->end - pos->start;
> + event->mmap.pid   = machine->pid;
> +
> + memcpy(event->mmap.filename, pos->dso->long_name,
> +pos->dso->long_name_len + 1);
> + }
>  
> - memcpy(event->mmap.filename, pos->dso->long_name,
> -pos->dso->long_name_len + 1);
>   if (perf_tool__process_synth_event(tool, event, machine, 
> process) != 0) {
>   rc = -1;
>   break;
> -- 
> 2.26.2
> 

-- 

- Arnaldo


Re: [PATCH 12/24] perf tools: Store build id from mmap2 events

2020-11-17 Thread Arnaldo Carvalho de Melo
Em Tue, Nov 17, 2020 at 12:00:41PM +0100, Jiri Olsa escreveu:
> When processing mmap2 event, check on the build id
> misc bit: PERF_RECORD_MISC_BUILD_ID and if it's set,
> store the build id in mmap's dso object.
> 
> Also adding the build id data arts to struct
> perf_record_mmap2 event definition.
> 
> Signed-off-by: Jiri Olsa 
> ---
>  kernel/events/core.c|  4 
>  tools/lib/perf/include/perf/event.h | 18 ++
>  tools/perf/util/machine.c   | 24 +++-
>  tools/perf/util/map.c   |  8 ++--
>  tools/perf/util/map.h   |  3 ++-
>  5 files changed, 45 insertions(+), 12 deletions(-)

You mixed up kernel changes with tools/ changes, can you please split
this up?

Also there is an indentation problem in the kernel changes, which, I
think, is just debugging cruft that you forgot to excise? :-)

- Arnaldo
 
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 5841b5bca68d..fa7f392c6c0c 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -8017,6 +8017,10 @@ static void perf_event_mmap_output(struct perf_event 
> *event,
>  
>   __output_copy(&handle, size, 4);
>   __output_copy(&handle, mmap_event->build_id, 
> BUILD_ID_SIZE);
> +
> +if (mmap_event->build_id_size > 20)
> + trace_printk("build_id_size %u %s\n", mmap_event->build_id_size, 
> mmap_event->file_name);
> +
>   } else {
>   perf_output_put(&handle, mmap_event->maj);
>   perf_output_put(&handle, mmap_event->min);
> diff --git a/tools/lib/perf/include/perf/event.h 
> b/tools/lib/perf/include/perf/event.h
> index 988c539bedb6..d82054225fcc 100644
> --- a/tools/lib/perf/include/perf/event.h
> +++ b/tools/lib/perf/include/perf/event.h
> @@ -23,10 +23,20 @@ struct perf_record_mmap2 {
>   __u64start;
>   __u64len;
>   __u64pgoff;
> - __u32maj;
> - __u32min;
> - __u64ino;
> - __u64ino_generation;
> + union {
> + struct {
> + __u32maj;
> + __u32min;
> + __u64ino;
> + __u64ino_generation;
> + };
> + struct {
> + __u8 build_id_size;
> + __u8 __reserved_1;
> + __u16__reserved_2;
> + __u8 build_id[20];
> + };
> + };
>   __u32prot;
>   __u32flags;
>   char filename[PATH_MAX];
> diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
> index 1ae32a81639c..1edb7d10b042 100644
> --- a/tools/perf/util/machine.c
> +++ b/tools/perf/util/machine.c
> @@ -1599,7 +1599,8 @@ static int machine__process_extra_kernel_map(struct 
> machine *machine,
>  }
>  
>  static int machine__process_kernel_mmap_event(struct machine *machine,
> -   struct extra_kernel_map *xm)
> +   struct extra_kernel_map *xm,
> +   struct build_id *bid)
>  {
>   struct map *map;
>   enum dso_space_type dso_space;
> @@ -1624,6 +1625,10 @@ static int machine__process_kernel_mmap_event(struct 
> machine *machine,
>   goto out_problem;
>  
>   map->end = map->start + xm->end - xm->start;
> +
> + if (build_id__is_defined(bid))
> + dso__set_build_id(map->dso, bid);
> +
>   } else if (is_kernel_mmap) {
>   const char *symbol_name = (xm->name + 
> strlen(machine->mmap_name));
>   /*
> @@ -1681,6 +1686,9 @@ static int machine__process_kernel_mmap_event(struct 
> machine *machine,
>  
>   machine__update_kernel_mmap(machine, xm->start, xm->end);
>  
> + if (build_id__is_defined(bid))
> + dso__set_build_id(kernel, bid);
> +
>   /*
>* Avoid using a zero address (kptr_restrict) for the ref reloc
>* symbol. Effectively having zero here means that at record
> @@ -1718,11 +1726,17 @@ int machine__process_mmap2_event(struct machine 
> *machine,
>   .ino = event->mmap2.ino,
>   .ino_generation = event->mmap2.ino_generation,
>   };
> + struct build_id __bid, *bid = NULL;
>   int ret = 0;
>  
>   if (dump_trace)
>   perf_event__fprintf_mmap2(event, stdout);
>  
> + if (event->header.misc & PERF_RECORD_MISC_MMAP_BUILD_ID) {
> + bid = &__bid;
> + build_id__init(bid, event->mmap2.build_id, 
> event->mmap2.build_id_size);
> + }
> +
>   if (sample->cpumode == PERF_RECORD_MISC_GUEST_KERNEL ||
>   sample->cpumode == PERF_RECORD_MISC_KERNE

Re: [PATCH 08/24] perf tools: Add support to read build id from compressed elf

2020-11-17 Thread Arnaldo Carvalho de Melo
Em Tue, Nov 17, 2020 at 12:00:37PM +0100, Jiri Olsa escreveu:
> Adding support to decompress file before reading build id.
> 
> Adding filename__read_build_id and change its current
> versions to read_build_id.
> 
> Shutting down stderr output of perf list in the shell test:
>   82: Check open filename arg using perf trace + vfs_getname  : Ok
> 
> because with decompression code in the place we the
> filename__read_build_id function is more verbose in case
> of error and the test did not account for that.

---
because with decompression code in place filename__read_build_id() is
more verbose in case of an error and the test didn't account for that.
---

I'll  fix it up when applying :-)

- Arnaldo
 
> Signed-off-by: Jiri Olsa 
> ---
>  .../tests/shell/trace+probe_vfs_getname.sh|  2 +-
>  tools/perf/util/symbol-elf.c  | 37 ++-
>  2 files changed, 36 insertions(+), 3 deletions(-)
> 
> diff --git a/tools/perf/tests/shell/trace+probe_vfs_getname.sh 
> b/tools/perf/tests/shell/trace+probe_vfs_getname.sh
> index 11cc2af13f2b..3660fcc02fef 100755
> --- a/tools/perf/tests/shell/trace+probe_vfs_getname.sh
> +++ b/tools/perf/tests/shell/trace+probe_vfs_getname.sh
> @@ -20,7 +20,7 @@ skip_if_no_perf_trace || exit 2
>  file=$(mktemp /tmp/temporary_file.X)
>  
>  trace_open_vfs_getname() {
> - evts=$(echo $(perf list syscalls:sys_enter_open* 2>&1 | egrep 
> 'open(at)? ' | sed -r 's/.*sys_enter_([a-z]+) +\[.*$/\1/') | sed 's/ /,/')
> + evts=$(echo $(perf list syscalls:sys_enter_open* >&1 2>/dev/nul | egrep 
> 'open(at)? ' | sed -r 's/.*sys_enter_([a-z]+) +\[.*$/\1/') | sed 's/ /,/')
>   perf trace -e $evts touch $file 2>&1 | \
>   egrep " +[0-9]+\.[0-9]+ +\( +[0-9]+\.[0-9]+ ms\): +touch\/[0-9]+ 
> open(at)?\((dfd: +CWD, +)?filename: +${file}, +flags: 
> CREAT\|NOCTTY\|NONBLOCK\|WRONLY, +mode: +IRUGO\|IWUGO\) += +[0-9]+$"
>  }
> diff --git a/tools/perf/util/symbol-elf.c b/tools/perf/util/symbol-elf.c
> index 44dd86a4f25f..f3577f7d72fe 100644
> --- a/tools/perf/util/symbol-elf.c
> +++ b/tools/perf/util/symbol-elf.c
> @@ -534,7 +534,7 @@ static int elf_read_build_id(Elf *elf, void *bf, size_t 
> size)
>  
>  #ifdef HAVE_LIBBFD_BUILDID_SUPPORT
>  
> -int filename__read_build_id(const char *filename, struct build_id *bid)
> +static int read_build_id(const char *filename, struct build_id *bid)
>  {
>   size_t size = sizeof(bid->data);
>   int err = -1;
> @@ -563,7 +563,7 @@ int filename__read_build_id(const char *filename, struct 
> build_id *bid)
>  
>  #else // HAVE_LIBBFD_BUILDID_SUPPORT
>  
> -int filename__read_build_id(const char *filename, struct build_id *bid)
> +static int read_build_id(const char *filename, struct build_id *bid)
>  {
>   size_t size = sizeof(bid->data);
>   int fd, err = -1;
> @@ -595,6 +595,39 @@ int filename__read_build_id(const char *filename, struct 
> build_id *bid)
>  
>  #endif // HAVE_LIBBFD_BUILDID_SUPPORT
>  
> +int filename__read_build_id(const char *filename, struct build_id *bid)
> +{
> + struct kmod_path m = { .name = NULL, };
> + char path[PATH_MAX];
> + int err;
> +
> + if (!filename)
> + return -EFAULT;
> +
> + err = kmod_path__parse(&m, filename);
> + if (err)
> + return -1;
> +
> + if (m.comp) {
> + int error = 0, fd;
> +
> + fd = filename__decompress(filename, path, sizeof(path), m.comp, 
> &error);
> + if (fd < 0) {
> + pr_debug("Failed to decompress (error %d) %s\n",
> +  error, filename);
> + return -1;
> + }
> + close(fd);
> + filename = path;
> + }
> +
> + err = read_build_id(filename, bid);
> +
> + if (m.comp)
> + unlink(filename);
> + return err;
> +}
> +
>  int sysfs__read_build_id(const char *filename, struct build_id *bid)
>  {
>   size_t size = sizeof(bid->data);
> -- 
> 2.26.2
> 

-- 

- Arnaldo


Re: [PATCH] perf vendor events: Update Skylake client events to v50

2020-11-16 Thread Arnaldo Carvalho de Melo
Em Mon, Nov 16, 2020 at 11:12:30AM -0800, Ian Rogers escreveu:
> On Mon, Nov 16, 2020 at 9:05 AM Arnaldo Carvalho de Melo 
> wrote:
> 
> > Em Fri, Nov 06, 2020 at 01:21:58PM +0900, Namhyung Kim escreveu:
> > > On Fri, Nov 6, 2020 at 12:12 PM Jin, Yao 
> > wrote:
> > > > >> Signed-off-by: Jin Yao 
> > > > >
> > > > > It seems not applied to acme/perf/core cleanly.
> > > > >
> > > > > Thanks,
> > > > > Namhyung
> > > > >
> > > >
> > > > It seems the patch mail is truncated. :(
> > > >
> > > > I attach the patch file in the mail. Sorry for inconvenience.
> > >
> > > I've checked it fixed the perf test on my laptop (skylake).
> > >
> > > Tested-by: Namhyung Kim 
> >
> > Thanks, applied.
> >
> > - Arnaldo
> >
> 
> Nit, as the code is generated, would it be possible to add the commands to
> regenerate it?
> 
> E.g. Using the code in:
> https://github.com/intel/event-converter-for-linux-perf
> And extracted perfmon_server_events_v1.4.tar from:
> https://download.01.org/perfmon/
> run:
> json-to-perf-json.py --outdir tools/perf/pmu-events/arch/arch/x86/sklylakex
> perfmon/SKX/skylakex_core_v1.24.json
 
> Looking at the download.01.org/perfmon json there are files
> like skylakex_fp_arith_inst_v1.24.json, and how these are incorporated into
> these events is less than clear.

You mean change event-converter-for-linux-perf to add this as JSON
comments at the start of the generated files?


- Arnaldo


Re: [PATCH] perf test: Avoid an msan warning in a copied stack.

2020-11-16 Thread Arnaldo Carvalho de Melo
Em Fri, Nov 13, 2020 at 10:20:53AM -0800, Ian Rogers escreveu:
> This fix is for a failure that occurred in the DWARF unwind perf test.
> Stack unwinders may probe memory when looking for frames. Memory
> sanitizer will poison and track uninitialized memory on the stack, and
> on the heap if the value is copied to the heap. This can lead to false
> memory sanitizer failures for the use of an uninitialized value. Avoid
> this problem by removing the poison on the copied stack.
> 
> The full msan failure with track origins looks like:



Thanks, applied.

- Arnaldo



Re: [PATCH v2] perf expr: Force encapsulation on expr_id_data

2020-11-16 Thread Arnaldo Carvalho de Melo
Em Fri, Nov 13, 2020 at 10:38:28AM -0800, Ian Rogers escreveu:
> On Fri, Sep 4, 2020 at 9:29 AM Arnaldo Carvalho de Melo 
> wrote:
> 
> > Em Thu, Sep 03, 2020 at 10:53:16PM -0700, Ian Rogers escreveu:
> > > On Thu, Aug 27, 2020 at 12:00 AM kajoljain  wrote:
> > > >
> > > >
> > > >
> > > > On 8/26/20 9:27 PM, Jiri Olsa wrote:
> > > > > On Wed, Aug 26, 2020 at 08:30:55AM -0700, Ian Rogers wrote:
> > > > >> This patch resolves some undefined behavior where variables in
> > > > >> expr_id_data were accessed (for debugging) without being defined. To
> > > > >> better enforce the tagged union behavior, the struct is moved into
> > > > >> expr.c and accessors provided. Tag values (kinds) are explicitly
> > > > >> identified.
> > > >
> > > > Reviewed-By: Kajol Jain
> > > >
> > > > Thanks,
> > > > Kajol Jain
> > > > >>
> > > > >> Signed-off-by: Ian Rogers 
> > > > >
> > > > > great, thanks for doing this
> > > > >
> > > > > Acked-by: Jiri Olsa 
> > > > >
> > > > > jirka
> > > > >
> > >
> > > Thanks for the reviews! Arnaldo could this get merged? Thanks!
> >
> > I'll get this and the other outstanding patches into perf/core soon as I
> > got urgent stuff already merged by Linus,
> >
> > Thanks!
> >
> > - Arnaldo
> >
> 
> I just spotted this wasn't merged and will conflict with:
> https://lore.kernel.org/lkml/20201113001651.544348-1-irog...@google.com/
> I can fix that patch when this lands.

Thanks, applied, will test build and push publicly today.

- Arnaldo


Re: [PATCH] perf vendor events: Update Skylake client events to v50

2020-11-16 Thread Arnaldo Carvalho de Melo
Em Fri, Nov 06, 2020 at 01:21:58PM +0900, Namhyung Kim escreveu:
> On Fri, Nov 6, 2020 at 12:12 PM Jin, Yao  wrote:
> > >> Signed-off-by: Jin Yao 
> > >
> > > It seems not applied to acme/perf/core cleanly.
> > >
> > > Thanks,
> > > Namhyung
> > >
> >
> > It seems the patch mail is truncated. :(
> >
> > I attach the patch file in the mail. Sorry for inconvenience.
> 
> I've checked it fixed the perf test on my laptop (skylake).
> 
> Tested-by: Namhyung Kim 

Thanks, applied.

- Arnaldo



Re: [PATCH] perf bench: Update arch/x86/lib/mem{cpy,set}_64.S

2020-11-16 Thread Arnaldo Carvalho de Melo
Em Tue, Nov 03, 2020 at 04:56:09PM -0800, Fangrui Song escreveu:
> In memset_64.S, the macros expand to `.weak MEMSET ... .globl MEMSET`
> which will produce a STB_WEAK MEMSET with GNU as but STB_GLOBAL MEMSET
> with LLVM's integrated assembler before LLVM 12. LLVM 12 (since
> https://reviews.llvm.org/D90108) will error on such an overridden symbol
> binding. memcpy_64.S is similar.
> 
> Port http://lore.kernel.org/r/20201103012358.168682-1-mask...@google.com
> ("x86_64: Change .weak to SYM_FUNC_START_WEAK for arch/x86/lib/mem*_64.S")
> to fix the issue. Additionally, port SYM_L_WEAK and SYM_FUNC_START_WEAK
> from include/linux/linkage.h to tools/perf/util/include/linux/linkage.h

Sorry, I noticed this just now and I have done this update already, will
send to Linus soon.

- Arnaldo
 
> Fixes: 7d7d1bf1d1da ("perf bench: Copy kernel files needed to build 
> mem{cpy,set} x86_64 benchmarks")
> Link: https://lore.kernel.org/r/20201103012358.168682-1-mask...@google.com
> Signed-off-by: Fangrui Song 
> ---
>  tools/arch/x86/lib/memcpy_64.S  | 4 +---
>  tools/arch/x86/lib/memset_64.S  | 4 +---
>  tools/perf/util/include/linux/linkage.h | 7 +++
>  3 files changed, 9 insertions(+), 6 deletions(-)
> 
> diff --git a/tools/arch/x86/lib/memcpy_64.S b/tools/arch/x86/lib/memcpy_64.S
> index 0b5b8ae56bd9..9428f251df0f 100644
> --- a/tools/arch/x86/lib/memcpy_64.S
> +++ b/tools/arch/x86/lib/memcpy_64.S
> @@ -16,8 +16,6 @@
>   * to a jmp to memcpy_erms which does the REP; MOVSB mem copy.
>   */
>  
> -.weak memcpy
> -
>  /*
>   * memcpy - Copy a memory block.
>   *
> @@ -30,7 +28,7 @@
>   * rax original destination
>   */
>  SYM_FUNC_START_ALIAS(__memcpy)
> -SYM_FUNC_START_LOCAL(memcpy)
> +SYM_FUNC_START_WEAK(memcpy)
>   ALTERNATIVE_2 "jmp memcpy_orig", "", X86_FEATURE_REP_GOOD, \
> "jmp memcpy_erms", X86_FEATURE_ERMS
>  
> diff --git a/tools/arch/x86/lib/memset_64.S b/tools/arch/x86/lib/memset_64.S
> index fd5d25a474b7..1f9b11f9244d 100644
> --- a/tools/arch/x86/lib/memset_64.S
> +++ b/tools/arch/x86/lib/memset_64.S
> @@ -5,8 +5,6 @@
>  #include 
>  #include 
>  
> -.weak memset
> -
>  /*
>   * ISO C memset - set a memory block to a byte value. This function uses fast
>   * string to get better performance than the original function. The code is
> @@ -18,7 +16,7 @@
>   *
>   * rax   original destination
>   */
> -SYM_FUNC_START_ALIAS(memset)
> +SYM_FUNC_START_WEAK(memset)
>  SYM_FUNC_START(__memset)
>   /*
>* Some CPUs support enhanced REP MOVSB/STOSB feature. It is recommended
> diff --git a/tools/perf/util/include/linux/linkage.h 
> b/tools/perf/util/include/linux/linkage.h
> index b8a5159361b4..0e493bf3151b 100644
> --- a/tools/perf/util/include/linux/linkage.h
> +++ b/tools/perf/util/include/linux/linkage.h
> @@ -25,6 +25,7 @@
>  
>  /* SYM_L_* -- linkage of symbols */
>  #define SYM_L_GLOBAL(name)   .globl name
> +#define SYM_L_WEAK(name) .weak name
>  #define SYM_L_LOCAL(name)/* nothing */
>  
>  #define ALIGN __ALIGN
> @@ -78,6 +79,12 @@
>   SYM_START(name, SYM_L_LOCAL, SYM_A_ALIGN)
>  #endif
>  
> +/* SYM_FUNC_START_WEAK -- use for weak functions */
> +#ifndef SYM_FUNC_START_WEAK
> +#define SYM_FUNC_START_WEAK(name)\
> + SYM_START(name, SYM_L_WEAK, SYM_A_ALIGN)
> +#endif
> +
>  /* SYM_FUNC_END_ALIAS -- the end of LOCAL_ALIASed or ALIASed function */
>  #ifndef SYM_FUNC_END_ALIAS
>  #define SYM_FUNC_END_ALIAS(name) \
> -- 
> 2.29.1.341.ge80a0c044ae-goog
> 

-- 

- Arnaldo


Re: [PATCH v2] perf data: Allow to use stdio functions for pipe mode

2020-11-16 Thread Arnaldo Carvalho de Melo
Em Sat, Nov 14, 2020 at 09:55:34PM +0100, Jiri Olsa escreveu:
> On Fri, Oct 30, 2020 at 02:47:42PM +0900, Namhyung Kim wrote:
> > When perf data is in a pipe, it reads each event separately using
> > read(2) syscall.  This is a huge performance bottleneck when
> > processing large data like in perf inject.  Also perf inject needs to
> > use write(2) syscall for the output.
> > 
> > So convert it to use buffer I/O functions in stdio library for pipe
> > data.  This makes inject-build-id bench time drops from 20ms to 8ms.
> > 
> >   $ perf bench internals inject-build-id
> >   # Running 'internals/inject-build-id' benchmark:
> > Average build-id injection took: 8.074 msec (+- 0.013 msec)
> > Average time per event: 0.792 usec (+- 0.001 usec)
> > Average memory usage: 8328 KB (+- 0 KB)
> > Average build-id-all injection took: 5.490 msec (+- 0.008 msec)
> > Average time per event: 0.538 usec (+- 0.001 usec)
> > Average memory usage: 7563 KB (+- 0 KB)
> > 
> > This patch enables it just for perf inject when used with pipe (it's a
> > default behavior).  Maybe we could do it for perf record and/or report
> > later..
> > 
> > Signed-off-by: Namhyung Kim 
> 
> Acked-by: Jiri Olsa 

Thanks, tested and applied.

- Arnaldo
 
> thanks,
> jirka
> 
> > ---
> > v2: check result of fdopen()
> > 
> >  tools/perf/builtin-inject.c |  2 ++
> >  tools/perf/util/data.c  | 41 ++---
> >  tools/perf/util/data.h  | 11 +-
> >  tools/perf/util/header.c|  8 
> >  tools/perf/util/session.c   |  7 ---
> >  5 files changed, 58 insertions(+), 11 deletions(-)
> > 
> > diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c
> > index 452a75fe68e5..14d6c88fed76 100644
> > --- a/tools/perf/builtin-inject.c
> > +++ b/tools/perf/builtin-inject.c
> > @@ -853,10 +853,12 @@ int cmd_inject(int argc, const char **argv)
> > .output = {
> > .path = "-",
> > .mode = PERF_DATA_MODE_WRITE,
> > +   .use_stdio = true,
> > },
> > };
> > struct perf_data data = {
> > .mode = PERF_DATA_MODE_READ,
> > +   .use_stdio = true,
> > };
> > int ret;
> >  
> > diff --git a/tools/perf/util/data.c b/tools/perf/util/data.c
> > index c47aa34fdc0a..05bbcb663c41 100644
> > --- a/tools/perf/util/data.c
> > +++ b/tools/perf/util/data.c
> > @@ -174,8 +174,21 @@ static bool check_pipe(struct perf_data *data)
> > is_pipe = true;
> > }
> >  
> > -   if (is_pipe)
> > -   data->file.fd = fd;
> > +   if (is_pipe) {
> > +   if (data->use_stdio) {
> > +   const char *mode;
> > +
> > +   mode = perf_data__is_read(data) ? "r" : "w";
> > +   data->file.fptr = fdopen(fd, mode);
> > +
> > +   if (data->file.fptr == NULL) {
> > +   data->file.fd = fd;
> > +   data->use_stdio = false;
> > +   }
> > +   } else {
> > +   data->file.fd = fd;
> > +   }
> > +   }
> >  
> > return data->is_pipe = is_pipe;
> >  }
> > @@ -334,6 +347,9 @@ int perf_data__open(struct perf_data *data)
> > if (check_pipe(data))
> > return 0;
> >  
> > +   /* currently it allows stdio for pipe only */
> > +   data->use_stdio = false;
> > +
> > if (!data->path)
> > data->path = "perf.data";
> >  
> > @@ -353,7 +369,21 @@ void perf_data__close(struct perf_data *data)
> > perf_data__close_dir(data);
> >  
> > zfree(&data->file.path);
> > -   close(data->file.fd);
> > +
> > +   if (data->use_stdio)
> > +   fclose(data->file.fptr);
> > +   else
> > +   close(data->file.fd);
> > +}
> > +
> > +ssize_t perf_data__read(struct perf_data *data, void *buf, size_t size)
> > +{
> > +   if (data->use_stdio) {
> > +   if (fread(buf, size, 1, data->file.fptr) == 1)
> > +   return size;
> > +   return feof(data->file.fptr) ? 0 : -1;
> > +   }
> > +   return readn(data->file.fd, buf, size);
> >  }
> >  
> >  ssize_t perf_data_file__write(struct perf_data_file *file,
> > @@ -365,6 +395,11 @@ ssize_t perf_data_file__write(struct perf_data_file 
> > *file,
> >  ssize_t perf_data__write(struct perf_data *data,
> >   void *buf, size_t size)
> >  {
> > +   if (data->use_stdio) {
> > +   if (fwrite(buf, size, 1, data->file.fptr) == 1)
> > +   return size;
> > +   return -1;
> > +   }
> > return perf_data_file__write(&data->file, buf, size);
> >  }
> >  
> > diff --git a/tools/perf/util/data.h b/tools/perf/util/data.h
> > index 75947ef6bc17..c563fcbb0288 100644
> > --- a/tools/perf/util/data.h
> > +++ b/tools/perf/util/data.h
> > @@ -2,6 +2,7 @@
> >  #ifndef __PERF_DATA_H
> >  #define __PERF_DATA_H
> >  
> > +#include 
> >  #include 
> >  
> >  enum perf_data_mode {
> > @@ -16,7 

Re: [PATCH 24/24] perf record: Add --buildid-mmap option to enable mmap's build id

2020-11-12 Thread Arnaldo Carvalho de Melo
Em Thu, Nov 12, 2020 at 12:57:10PM +0100, Jiri Olsa escreveu:
> On Wed, Nov 11, 2020 at 09:00:46AM -0800, Andi Kleen wrote:
> > On Mon, Nov 09, 2020 at 10:54:15PM +0100, Jiri Olsa wrote:
> > > Adding --buildid-mmap option to enable build id in mmap2 events.
> > > It will only work if there's kernel support for that and it disables
> > > build id cache (implies --no-buildid).

> > What's the point of the option? Why not enable it by default
> > if the kernel supports it?
 
> > With the option most user won't get the benefit.
 
> > The only reason I can think of for an option would be to disable
> > so that old tools can still process.
 
> yes, that was request in the rfc post, we want the new default
> perf.data be still readable by older perf tools

We need to change perf so that when it finds some option it doesn't
grok, it just ignores extra things in a record like MMAP2 and just warns
the user that things are being ignored.

So that we can add new stuff by default without requiring an ever longer
command line option, like with --all-cgroups, etc.

And provide the options to avoid using new stuff if we know that the
perf.data file will be processed by someone with an older tool that
can't update.

- Arnaldo


Re: [PATCH v8 06/22] perf arm-spe: Refactor printing string to buffer

2020-11-11 Thread Arnaldo Carvalho de Melo
Em Wed, Nov 11, 2020 at 03:02:48PM -0300, Arnaldo Carvalho de Melo escreveu:
> > I'll keep the series up to that point and will run my build tests, then
> > push it publicly to acme/perf/core and you can go from there, ok?

> > I've changed the BIT() to BIT_ULL() as Andre suggested and I'm testing
> > it again.
 
> To make it clear, this is what I have locally:
 
> 0a04244cabc5560c (HEAD -> perf/core) perf arm-spe: Fix packet length handling
> b65577baf4829092 perf arm-spe: Refactor arm_spe_get_events()
> b2ded2e2e2764e50 perf arm-spe: Refactor payload size calculation
> 903b659436b70692 perf arm-spe: Fix a typo in comment
> c185f1cde46653cd perf arm-spe: Include bitops.h for BIT() macro
> 40714c58630aaaf1 perf mem: Support ARM SPE events
> c825f7885178f994 perf c2c: Support AUX trace
> 13e5df1e3f1ba1a9 perf mem: Support AUX trace
> 014a771c7867fda5 perf auxtrace: Add itrace option '-M' for memory events
> 436cce00710a3f23 perf mem: Only initialize memory event for recording
> 8b8173b45a7a9709 perf c2c: Support memory event PERF_MEM_EVENTS__LOAD_STORE
> 4ba2452cd88f39da perf mem: Support new memory event 
> PERF_MEM_EVENTS__LOAD_STORE
> eaf6aaeec5fa301c perf mem: Introduce weak function perf_mem_events__ptr()
> f9f16dfbe76e63ba perf mem: Search event name with more flexible path
> 644bf4b0f7acde64 (tag: perf-tools-tests-v5.11-2020-11-04, acme/perf/core) 
> perf jevents: Add test for arch std events

So with the above it works with at least these:

[perfbuilder@five ~]$ dm android-ndk:r15c-arm ubuntu:18.04-x-arm
   122.37 android-ndk:r15c-arm  : Ok   arm-linux-androideabi-gcc 
(GCC) 4.9.x 20150123 (prerelease)
   228.52 ubuntu:18.04-x-arm: Ok   arm-linux-gnueabihf-gcc 
(Ubuntu/Linaro 7.5.0-3ubuntu1~18.04) 7.5.0
[perfbuilder@five ~]$

previously it was failing in all 32-bit build test containers:

[perfbuilder@five linux-perf-tools-build]$ grep FAIL dm.log/summary 
 android-ndk:r12b-arm: FAIL
 android-ndk:r15c-arm: FAIL
 fedora:24-x-ARC-uClibc: FAIL
 fedora:30-x-ARC-uClibc: FAIL
 ubuntu:16.04-x-arm: FAIL
 ubuntu:16.04-x-powerpc: FAIL
 ubuntu:18.04-x-arm: FAIL
 ubuntu:18.04-x-m68k: FAIL
 ubuntu:18.04-x-powerpc: FAIL
 ubuntu:18.04-x-sh4: FAIL
 ubuntu:19.10-x-hppa: FAIL
[perfbuilder@five linux-perf-tools-build]$

I'll redo the full set of tests and push perf/core publicly.

- Arnaldo


Re: [PATCH v8 06/22] perf arm-spe: Refactor printing string to buffer

2020-11-11 Thread Arnaldo Carvalho de Melo
Em Wed, Nov 11, 2020 at 03:01:27PM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Wed, Nov 11, 2020 at 05:58:27PM +, Dave Martin escreveu:
> > 
> > On Wed, Nov 11, 2020 at 05:39:22PM +, Arnaldo Carvalho de Melo wrote:
> > > Em Wed, Nov 11, 2020 at 03:45:23PM +, Andr� Przywara escreveu:
> > > > On 11/11/2020 15:35, Arnaldo Carvalho de Melo wrote:
> > > > > Isn't this 'ret +=' ? Otherwise if any of these arm_spe_pkt_snprintf()
> > > > > calls are made the previous 'ret' value is simply discarded. Can you
> > > > > clarify this?
> 
> > > > ret is the same as err. If err is negative (from previous calls), we
> > > > return that straight away, so it does nothing but propagating the error.
> 
> > > Usually the return of a snprintf is used to account for buffer space, ok
> > > I'll have to read it, which I shouldn't as snprintf has a well defined
> > > meaning...
> 
> > > Ok, now that I look at it, I realize it is not a snprintf() routine, but
> > > something with different semantics, that will look at a pointer to an
> > > integer and then do nothing if it comes with some error, etc, confusing
> > > :-/
> 
> > Would you be happier if the function were renamed?
> 
> > Originally we were aiming for snprintf() semantics, but this still
> > spawns a lot of boilerplate code and encourages mistakes in the local
> > caller here -- hence the current sticky error approach.
> 
> > So maybe the name should now be less "snprintf"-like.
> 
> Please, its important to stick to semantics for such well known type of
> routines, helps reviewing, etc.
> 
> I'll keep the series up to that point and will run my build tests, then
> push it publicly to acme/perf/core and you can go from there, ok?
> 
> I've changed the BIT() to BIT_ULL() as Andre suggested and I'm testing
> it again.

To make it clear, this is what I have locally:

0a04244cabc5560c (HEAD -> perf/core) perf arm-spe: Fix packet length handling
b65577baf4829092 perf arm-spe: Refactor arm_spe_get_events()
b2ded2e2e2764e50 perf arm-spe: Refactor payload size calculation
903b659436b70692 perf arm-spe: Fix a typo in comment
c185f1cde46653cd perf arm-spe: Include bitops.h for BIT() macro
40714c58630aaaf1 perf mem: Support ARM SPE events
c825f7885178f994 perf c2c: Support AUX trace
13e5df1e3f1ba1a9 perf mem: Support AUX trace
014a771c7867fda5 perf auxtrace: Add itrace option '-M' for memory events
436cce00710a3f23 perf mem: Only initialize memory event for recording
8b8173b45a7a9709 perf c2c: Support memory event PERF_MEM_EVENTS__LOAD_STORE
4ba2452cd88f39da perf mem: Support new memory event PERF_MEM_EVENTS__LOAD_STORE
eaf6aaeec5fa301c perf mem: Introduce weak function perf_mem_events__ptr()
f9f16dfbe76e63ba perf mem: Search event name with more flexible path
644bf4b0f7acde64 (tag: perf-tools-tests-v5.11-2020-11-04, acme/perf/core) perf 
jevents: Add test for arch std events

The perf-tools-tests-v5.11-2020-11-04, is in git.kernel.org, as it was
tested, etc, test results are in that signed tag, as usual for some
months.

- Arnaldo


Re: [PATCH v8 06/22] perf arm-spe: Refactor printing string to buffer

2020-11-11 Thread Arnaldo Carvalho de Melo
Em Wed, Nov 11, 2020 at 05:58:27PM +, Dave Martin escreveu:
> 
> On Wed, Nov 11, 2020 at 05:39:22PM +, Arnaldo Carvalho de Melo wrote:
> > Em Wed, Nov 11, 2020 at 03:45:23PM +, Andr� Przywara escreveu:
> > > On 11/11/2020 15:35, Arnaldo Carvalho de Melo wrote:
> > > > Isn't this 'ret +=' ? Otherwise if any of these arm_spe_pkt_snprintf()
> > > > calls are made the previous 'ret' value is simply discarded. Can you
> > > > clarify this?

> > > ret is the same as err. If err is negative (from previous calls), we
> > > return that straight away, so it does nothing but propagating the error.

> > Usually the return of a snprintf is used to account for buffer space, ok
> > I'll have to read it, which I shouldn't as snprintf has a well defined
> > meaning...

> > Ok, now that I look at it, I realize it is not a snprintf() routine, but
> > something with different semantics, that will look at a pointer to an
> > integer and then do nothing if it comes with some error, etc, confusing
> > :-/

> Would you be happier if the function were renamed?

> Originally we were aiming for snprintf() semantics, but this still
> spawns a lot of boilerplate code and encourages mistakes in the local
> caller here -- hence the current sticky error approach.

> So maybe the name should now be less "snprintf"-like.

Please, its important to stick to semantics for such well known type of
routines, helps reviewing, etc.

I'll keep the series up to that point and will run my build tests, then
push it publicly to acme/perf/core and you can go from there, ok?

I've changed the BIT() to BIT_ULL() as Andre suggested and I'm testing
it again.

- Arnaldo


Re: [PATCH v8 00/22] perf arm-spe: Refactor decoding & dumping flow

2020-11-11 Thread Arnaldo Carvalho de Melo
Em Wed, Nov 11, 2020 at 04:20:26PM +, André Przywara escreveu:
> On 11/11/2020 16:15, Arnaldo Carvalho de Melo wrote:
> > Em Wed, Nov 11, 2020 at 01:10:51PM -0300, Arnaldo Carvalho de Melo escreveu:
> >> Em Wed, Nov 11, 2020 at 03:11:27PM +0800, Leo Yan escreveu:
> >>> This is patch set v8 for refactoring Arm SPE trace decoding and dumping.
> >>>
> >>> This version addresses Andre's comment to pass parameter '&buf_len' at
> >>> the last call arm_spe_pkt_snprintf() in the function arm_spe_pkt_desc().
> >>>
> >>> This patch set is cleanly applied on the top of perf/core branch
> >>> with commit 644bf4b0f7ac ("perf jevents: Add test for arch std events").
> >>>
> >>> I retested this patch set on Hisilicon D06 platform with commands
> >>> "perf report -D" and "perf script", compared the decoding results
> >>> between with this patch set and without this patch set, "diff" tool
> >>> shows the result as expected.
> >>
> >> With the patches I applied I'm getting:
> >>
> >> util/arm-spe-decoder/arm-spe-pkt-decoder.c: In function 'arm_spe_pkt_desc':
> >> util/arm-spe-decoder/arm-spe-pkt-decoder.c:410:3: error: left shift count 
> >> >= width of type [-Werror]
> >>case 1: ns = !!(packet->payload & NS_FLAG);
> >>^
> >> util/arm-spe-decoder/arm-spe-pkt-decoder.c:411:4: error: left shift count 
> >> >= width of type [-Werror]
> >> el = (packet->payload & EL_FLAG) >> 61;
> >> ^
> >> util/arm-spe-decoder/arm-spe-pkt-decoder.c:411:4: error: left shift count 
> >> >= width of type [-Werror]
> >> util/arm-spe-decoder/arm-spe-pkt-decoder.c:416:3: error: left shift count 
> >> >= width of type [-Werror]
> >>case 3: ns = !!(packet->payload & NS_FLAG);
> >>^
> >>   CC   /tmp/build/perf/util/arm-spe-decoder/arm-spe-decoder.o
> >>  
> >>
> >> On:
> >>
> >>   1611.70 android-ndk:r12b-arm  : FAIL 
> >> arm-linux-androideabi-gcc (GCC) 4.9.x 20150123 (prerelease)
> >>   1711.32 android-ndk:r15c-arm  : FAIL 
> >> arm-linux-androideabi-gcc (GCC) 4.9.x 20150123 (prerelease)
> >>
> >> That were building ok before, builds still under way, perhaps its just
> >> on these old systems...
> > 
> > [acme@five perf]$ git bisect good
> > cc6fa07fb1458cca3741919774eb050976471000 is the first bad commit
> > commit cc6fa07fb1458cca3741919774eb050976471000
> > Author: Leo Yan 
> > Date:   Wed Nov 11 15:11:28 2020 +0800
> > 
> > perf arm-spe: Include bitops.h for BIT() macro
> > 
> > Include header linux/bitops.h, directly use its BIT() macro and remove
> > the self defined macros.
> > 
> > Signed-off-by: Leo Yan 
> > Reviewed-by: Andre Przywara 
> > Link: https://lore.kernel.org/r/2020071149.815-2-leo@linaro.org
> > Signed-off-by: Arnaldo Carvalho de Melo 
> > 
> >  tools/perf/util/arm-spe-decoder/arm-spe-decoder.c | 5 +
> >  tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c | 3 +--
> >  2 files changed, 2 insertions(+), 6 deletions(-)
> 
> 
> Ah, thanks! I think I mentioned the missing usage of BIT_ULL() in an
> earlier review, and thought this was fixed. Possibly this gets fixed in
> a later patch in this series, and is a temporary regression?

you mean this on that patch that ditches the local BIT() macro, right?

[acme@five perf]$ vim tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c
[acme@five perf]$ git diff
diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c 
b/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c
index 46ddb53a645714bb..5f65a3a70c577207 100644
--- a/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c
+++ b/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c
@@ -12,8 +12,8 @@

 #include "arm-spe-pkt-decoder.h"

-#define NS_FLAGBIT(63)
-#define EL_FLAG(BIT(62) | BIT(61))
+#define NS_FLAGBIT_ULL(63)
+#define EL_FLAG(BIT_ULL(62) | BIT_ULL(61))

 #define SPE_HEADER0_PAD0x0
 #define SPE_HEADER0_END0x1
[acme@five perf]$
 
> How do you want to handle this? Shall Leo resend, amending this patch
> (and merging 06 and 07 on the way ;-)?


Re: [PATCH v8 06/22] perf arm-spe: Refactor printing string to buffer

2020-11-11 Thread Arnaldo Carvalho de Melo
Em Wed, Nov 11, 2020 at 03:58:27PM +, Dave Martin escreveu:
> On Wed, Nov 11, 2020 at 03:53:20PM +, Dave Martin wrote:
> > On Wed, Nov 11, 2020 at 07:11:33AM +, Leo Yan wrote:
> > > When outputs strings to the decoding buffer with function snprintf(),
> > > SPE decoder needs to detects if any error returns from snprintf() and if
> > > so needs to directly bail out.  If snprintf() returns success, it needs
> > > to update buffer pointer and reduce the buffer length so can continue to
> > > output the next string into the consequent memory space.
> > > 
> > > This complex logics are spreading in the function arm_spe_pkt_desc() so
> > > there has many duplicate codes for handling error detecting, increment
> > > buffer pointer and decrement buffer size.
> > > 
> > > To avoid the duplicate code, this patch introduces a new helper function
> > > arm_spe_pkt_snprintf() which is used to wrap up the complex logics, and
> > > it's used by the caller arm_spe_pkt_desc().
> > > 
> > > This patch also moves the variable 'blen' as the function's local
> > > variable, this allows to remove the unnecessary braces and improve the
> > > readability.
> > > 
> > > Suggested-by: Dave Martin 
> > > Signed-off-by: Leo Yan 
> > > Reviewed-by: Andre Przywara 
> > 
> > Mostly looks fine to me now, thought there are a few potentionalu
> > issues -- comments below.
> 
> Hmm, looks like patch 7 anticipated some of my comments here.
> 
> Rather than fixing up patch 6, maybe it would be better to squash these
> patches together after all...  sorry!

I'll take a look and probably do that, as it is what Andre suggests.

- Arnaldo


Re: [PATCH v8 06/22] perf arm-spe: Refactor printing string to buffer

2020-11-11 Thread Arnaldo Carvalho de Melo
Em Wed, Nov 11, 2020 at 03:45:23PM +, André Przywara escreveu:
> On 11/11/2020 15:35, Arnaldo Carvalho de Melo wrote:
> 
> Hi Arnaldo,
> 
> thanks for taking a look!
> 
> > Em Wed, Nov 11, 2020 at 03:11:33PM +0800, Leo Yan escreveu:
> >> When outputs strings to the decoding buffer with function snprintf(),
> >> SPE decoder needs to detects if any error returns from snprintf() and if
> >> so needs to directly bail out.  If snprintf() returns success, it needs
> >> to update buffer pointer and reduce the buffer length so can continue to
> >> output the next string into the consequent memory space.
> >>
> >> This complex logics are spreading in the function arm_spe_pkt_desc() so
> >> there has many duplicate codes for handling error detecting, increment
> >> buffer pointer and decrement buffer size.
> >>
> >> To avoid the duplicate code, this patch introduces a new helper function
> >> arm_spe_pkt_snprintf() which is used to wrap up the complex logics, and
> >> it's used by the caller arm_spe_pkt_desc().
> >>
> >> This patch also moves the variable 'blen' as the function's local
> >> variable, this allows to remove the unnecessary braces and improve the
> >> readability.
> >>
> >> Suggested-by: Dave Martin 
> >> Signed-off-by: Leo Yan 
> >> Reviewed-by: Andre Przywara 
> >> ---
> >>  .../arm-spe-decoder/arm-spe-pkt-decoder.c | 260 +-
> >>  1 file changed, 126 insertions(+), 134 deletions(-)
> >>
> >> diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c 
> >> b/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c
> >> index 04fd7fd7c15f..1970686f7020 100644
> >> --- a/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c
> >> +++ b/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c
> >> @@ -9,6 +9,7 @@
> >>  #include 
> >>  #include 
> >>  #include 
> >> +#include 
> >>  
> >>  #include "arm-spe-pkt-decoder.h"
> >>  
> >> @@ -258,192 +259,183 @@ int arm_spe_get_packet(const unsigned char *buf, 
> >> size_t len,
> >>return ret;
> >>  }
> >>  
> >> +static int arm_spe_pkt_snprintf(int *err, char **buf_p, size_t *blen,
> >> +  const char *fmt, ...)
> >> +{
> >> +  va_list ap;
> >> +  int ret;
> >> +
> >> +  /* Bail out if any error occurred */
> >> +  if (err && *err)
> >> +  return *err;
> >> +
> >> +  va_start(ap, fmt);
> >> +  ret = vsnprintf(*buf_p, *blen, fmt, ap);
> >> +  va_end(ap);
> >> +
> >> +  if (ret < 0) {
> >> +  if (err && !*err)
> >> +  *err = ret;
> >> +
> >> +  /*
> >> +   * A return value of (*blen - 1) or more means that the
> >> +   * output was truncated and the buffer is overrun.
> >> +   */
> >> +  } else if (ret >= ((int)*blen - 1)) {
> >> +  (*buf_p)[*blen - 1] = '\0';
> >> +
> >> +  /*
> >> +   * Set *err to 'ret' to avoid overflow if tries to
> >> +   * fill this buffer sequentially.
> >> +   */
> >> +  if (err && !*err)
> >> +  *err = ret;
> >> +  } else {
> >> +  *buf_p += ret;
> >> +  *blen -= ret;
> >> +  }
> >> +
> >> +  return ret;
> >> +}
> >> +
> >>  int arm_spe_pkt_desc(const struct arm_spe_pkt *packet, char *buf,
> >> size_t buf_len)
> >>  {
> >>int ret, ns, el, idx = packet->index;
> >>unsigned long long payload = packet->payload;
> >>const char *name = arm_spe_pkt_name(packet->type);
> >> +  size_t blen = buf_len;
> >> +  int err = 0;
> >>  
> >>switch (packet->type) {
> >>case ARM_SPE_BAD:
> >>case ARM_SPE_PAD:
> >>case ARM_SPE_END:
> >> -  return snprintf(buf, buf_len, "%s", name);
> >> -  case ARM_SPE_EVENTS: {
> >> -  size_t blen = buf_len;
> >> -
> >> -  ret = 0;
> >> -  ret = snprintf(buf, buf_len, "EV");
> >> -  buf += ret;
> >> -  blen -= ret;
> >> -  if (payload & 0x1) {
> >> -  ret = snprintf(buf

Re: [PATCH v8 00/22] perf arm-spe: Refactor decoding & dumping flow

2020-11-11 Thread Arnaldo Carvalho de Melo
Em Wed, Nov 11, 2020 at 01:10:51PM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Wed, Nov 11, 2020 at 03:11:27PM +0800, Leo Yan escreveu:
> > This is patch set v8 for refactoring Arm SPE trace decoding and dumping.
> > 
> > This version addresses Andre's comment to pass parameter '&buf_len' at
> > the last call arm_spe_pkt_snprintf() in the function arm_spe_pkt_desc().
> > 
> > This patch set is cleanly applied on the top of perf/core branch
> > with commit 644bf4b0f7ac ("perf jevents: Add test for arch std events").
> > 
> > I retested this patch set on Hisilicon D06 platform with commands
> > "perf report -D" and "perf script", compared the decoding results
> > between with this patch set and without this patch set, "diff" tool
> > shows the result as expected.
> 
> With the patches I applied I'm getting:
> 
> util/arm-spe-decoder/arm-spe-pkt-decoder.c: In function 'arm_spe_pkt_desc':
> util/arm-spe-decoder/arm-spe-pkt-decoder.c:410:3: error: left shift count >= 
> width of type [-Werror]
>case 1: ns = !!(packet->payload & NS_FLAG);
>^
> util/arm-spe-decoder/arm-spe-pkt-decoder.c:411:4: error: left shift count >= 
> width of type [-Werror]
> el = (packet->payload & EL_FLAG) >> 61;
> ^
> util/arm-spe-decoder/arm-spe-pkt-decoder.c:411:4: error: left shift count >= 
> width of type [-Werror]
> util/arm-spe-decoder/arm-spe-pkt-decoder.c:416:3: error: left shift count >= 
> width of type [-Werror]
>case 3: ns = !!(packet->payload & NS_FLAG);
>^
>   CC   /tmp/build/perf/util/arm-spe-decoder/arm-spe-decoder.o
>  
> 
> On:
> 
>   1611.70 android-ndk:r12b-arm  : FAIL arm-linux-androideabi-gcc 
> (GCC) 4.9.x 20150123 (prerelease)
>   1711.32 android-ndk:r15c-arm  : FAIL arm-linux-androideabi-gcc 
> (GCC) 4.9.x 20150123 (prerelease)
> 
> That were building ok before, builds still under way, perhaps its just
> on these old systems...

[acme@five perf]$ git bisect good
cc6fa07fb1458cca3741919774eb050976471000 is the first bad commit
commit cc6fa07fb1458cca3741919774eb050976471000
Author: Leo Yan 
Date:   Wed Nov 11 15:11:28 2020 +0800

perf arm-spe: Include bitops.h for BIT() macro

    Include header linux/bitops.h, directly use its BIT() macro and remove
the self defined macros.

Signed-off-by: Leo Yan 
Reviewed-by: Andre Przywara 
Link: https://lore.kernel.org/r/2020071149.815-2-leo@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo 

 tools/perf/util/arm-spe-decoder/arm-spe-decoder.c | 5 +
 tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c | 3 +--
 2 files changed, 2 insertions(+), 6 deletions(-)
[acme@five perf]$


Re: [PATCH v8 00/22] perf arm-spe: Refactor decoding & dumping flow

2020-11-11 Thread Arnaldo Carvalho de Melo
Em Wed, Nov 11, 2020 at 03:11:27PM +0800, Leo Yan escreveu:
> This is patch set v8 for refactoring Arm SPE trace decoding and dumping.
> 
> This version addresses Andre's comment to pass parameter '&buf_len' at
> the last call arm_spe_pkt_snprintf() in the function arm_spe_pkt_desc().
> 
> This patch set is cleanly applied on the top of perf/core branch
> with commit 644bf4b0f7ac ("perf jevents: Add test for arch std events").
> 
> I retested this patch set on Hisilicon D06 platform with commands
> "perf report -D" and "perf script", compared the decoding results
> between with this patch set and without this patch set, "diff" tool
> shows the result as expected.

With the patches I applied I'm getting:

util/arm-spe-decoder/arm-spe-pkt-decoder.c: In function 'arm_spe_pkt_desc':
util/arm-spe-decoder/arm-spe-pkt-decoder.c:410:3: error: left shift count >= 
width of type [-Werror]
   case 1: ns = !!(packet->payload & NS_FLAG);
   ^
util/arm-spe-decoder/arm-spe-pkt-decoder.c:411:4: error: left shift count >= 
width of type [-Werror]
el = (packet->payload & EL_FLAG) >> 61;
^
util/arm-spe-decoder/arm-spe-pkt-decoder.c:411:4: error: left shift count >= 
width of type [-Werror]
util/arm-spe-decoder/arm-spe-pkt-decoder.c:416:3: error: left shift count >= 
width of type [-Werror]
   case 3: ns = !!(packet->payload & NS_FLAG);
   ^
  CC   /tmp/build/perf/util/arm-spe-decoder/arm-spe-decoder.o
 

On:

  1611.70 android-ndk:r12b-arm  : FAIL arm-linux-androideabi-gcc 
(GCC) 4.9.x 20150123 (prerelease)
  1711.32 android-ndk:r15c-arm  : FAIL arm-linux-androideabi-gcc 
(GCC) 4.9.x 20150123 (prerelease)

That were building ok before, builds still under way, perhaps its just
on these old systems...

- Arnaldo
 
> Changes from v7:
> - Changed to pass '&buf_len' for the last call arm_spe_pkt_snprintf() in
>   the patch 07/22 (Andre).
> 
> Changes from v6:
> - Removed the redundant comma from the string in the patch 21/22 "perf
>   arm_spe: Decode memory tagging properties" (Dave);
> - Refined the return value for arm_spe_pkt_desc(): returns 0 for
>   success, otherwise returns non zero for failures; handle error code at
>   the end of function arm_spe_pkt_desc(); this is accomplished in the
>   new patch 07/22 "perf arm-spe: Consolidate arm_spe_pkt_desc()'s
>   return value" (Dave).
> 
> Changes from v5:
> - Directly bail out arm_spe_pkt_snprintf() if any error occurred
>   (Andre).
> 
> Changes from v4:
> - Implemented a cumulative error for arm_spe_pkt_snprintf() and changed
>   to condense code for printing strings (Dave);
> - Changed to check payload bits [55:52] for parse kernel address
>   (Andre).
> 
> Changes from v3:
> - Refined arm_spe_payload_len() and removed macro SPE_HEADER_SZ()
>   (Andre);
> - Refined packet header index macros (Andre);
> - Added patch "perf arm_spe: Fixup top byte for data virtual address" to
>   fixup the data virtual address for 64KB pages and refined comments for
>   the fixup (Andre);
> - Added Andre's review tag (using "b4 am" command);
> - Changed the macros to SPE_PKT_IS_XXX() format to check operation types
>   (Andre).
> 
> 
> Andre Przywara (1):
>   perf arm_spe: Decode memory tagging properties
> 
> Leo Yan (20):
>   perf arm-spe: Include bitops.h for BIT() macro
>   perf arm-spe: Fix a typo in comment
>   perf arm-spe: Refactor payload size calculation
>   perf arm-spe: Refactor arm_spe_get_events()
>   perf arm-spe: Fix packet length handling
>   perf arm-spe: Refactor printing string to buffer
>   perf arm-spe: Consolidate arm_spe_pkt_desc()'s return value
>   perf arm-spe: Refactor packet header parsing
>   perf arm-spe: Add new function arm_spe_pkt_desc_addr()
>   perf arm-spe: Refactor address packet handling
>   perf arm_spe: Fixup top byte for data virtual address
>   perf arm-spe: Refactor context packet handling
>   perf arm-spe: Add new function arm_spe_pkt_desc_counter()
>   perf arm-spe: Refactor counter packet handling
>   perf arm-spe: Add new function arm_spe_pkt_desc_event()
>   perf arm-spe: Refactor event type handling
>   perf arm-spe: Remove size condition checking for events
>   perf arm-spe: Add new function arm_spe_pkt_desc_op_type()
>   perf arm-spe: Refactor operation packet handling
>   perf arm-spe: Add more sub classes for operation packet
> 
> Wei Li (1):
>   perf arm-spe: Add support for ARMv8.3-SPE
> 
>  .../util/arm-spe-decoder/arm-spe-decoder.c|  59 +-
>  .../util/arm-spe-decoder/arm-spe-decoder.h|  17 -
>  .../arm-spe-decoder/arm-spe-pkt-decoder.c | 601 ++
>  .../arm-spe-decoder/arm-spe-pkt-decoder.h | 122 +++-
>  tools/perf/util/arm-spe.c |   2 +-
>  5 files changed, 479 insertions(+), 322 deletions(-)
> 
> -- 
> 2.17.1
> 

-- 

- Arnaldo


Re: [PATCH v8 06/22] perf arm-spe: Refactor printing string to buffer

2020-11-11 Thread Arnaldo Carvalho de Melo
Em Wed, Nov 11, 2020 at 03:11:33PM +0800, Leo Yan escreveu:
> When outputs strings to the decoding buffer with function snprintf(),
> SPE decoder needs to detects if any error returns from snprintf() and if
> so needs to directly bail out.  If snprintf() returns success, it needs
> to update buffer pointer and reduce the buffer length so can continue to
> output the next string into the consequent memory space.
> 
> This complex logics are spreading in the function arm_spe_pkt_desc() so
> there has many duplicate codes for handling error detecting, increment
> buffer pointer and decrement buffer size.
> 
> To avoid the duplicate code, this patch introduces a new helper function
> arm_spe_pkt_snprintf() which is used to wrap up the complex logics, and
> it's used by the caller arm_spe_pkt_desc().
> 
> This patch also moves the variable 'blen' as the function's local
> variable, this allows to remove the unnecessary braces and improve the
> readability.
> 
> Suggested-by: Dave Martin 
> Signed-off-by: Leo Yan 
> Reviewed-by: Andre Przywara 
> ---
>  .../arm-spe-decoder/arm-spe-pkt-decoder.c | 260 +-
>  1 file changed, 126 insertions(+), 134 deletions(-)
> 
> diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c 
> b/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c
> index 04fd7fd7c15f..1970686f7020 100644
> --- a/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c
> +++ b/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c
> @@ -9,6 +9,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include "arm-spe-pkt-decoder.h"
>  
> @@ -258,192 +259,183 @@ int arm_spe_get_packet(const unsigned char *buf, 
> size_t len,
>   return ret;
>  }
>  
> +static int arm_spe_pkt_snprintf(int *err, char **buf_p, size_t *blen,
> + const char *fmt, ...)
> +{
> + va_list ap;
> + int ret;
> +
> + /* Bail out if any error occurred */
> + if (err && *err)
> + return *err;
> +
> + va_start(ap, fmt);
> + ret = vsnprintf(*buf_p, *blen, fmt, ap);
> + va_end(ap);
> +
> + if (ret < 0) {
> + if (err && !*err)
> + *err = ret;
> +
> + /*
> +  * A return value of (*blen - 1) or more means that the
> +  * output was truncated and the buffer is overrun.
> +  */
> + } else if (ret >= ((int)*blen - 1)) {
> + (*buf_p)[*blen - 1] = '\0';
> +
> + /*
> +  * Set *err to 'ret' to avoid overflow if tries to
> +  * fill this buffer sequentially.
> +  */
> + if (err && !*err)
> + *err = ret;
> + } else {
> + *buf_p += ret;
> + *blen -= ret;
> + }
> +
> + return ret;
> +}
> +
>  int arm_spe_pkt_desc(const struct arm_spe_pkt *packet, char *buf,
>size_t buf_len)
>  {
>   int ret, ns, el, idx = packet->index;
>   unsigned long long payload = packet->payload;
>   const char *name = arm_spe_pkt_name(packet->type);
> + size_t blen = buf_len;
> + int err = 0;
>  
>   switch (packet->type) {
>   case ARM_SPE_BAD:
>   case ARM_SPE_PAD:
>   case ARM_SPE_END:
> - return snprintf(buf, buf_len, "%s", name);
> - case ARM_SPE_EVENTS: {
> - size_t blen = buf_len;
> -
> - ret = 0;
> - ret = snprintf(buf, buf_len, "EV");
> - buf += ret;
> - blen -= ret;
> - if (payload & 0x1) {
> - ret = snprintf(buf, buf_len, " EXCEPTION-GEN");
> - buf += ret;
> - blen -= ret;
> - }
> - if (payload & 0x2) {
> - ret = snprintf(buf, buf_len, " RETIRED");
> - buf += ret;
> - blen -= ret;
> - }
> - if (payload & 0x4) {
> - ret = snprintf(buf, buf_len, " L1D-ACCESS");
> - buf += ret;
> - blen -= ret;
> - }
> - if (payload & 0x8) {
> - ret = snprintf(buf, buf_len, " L1D-REFILL");
> - buf += ret;
> - blen -= ret;
> - }
> - if (payload & 0x10) {
> - ret = snprintf(buf, buf_len, " TLB-ACCESS");
> - buf += ret;
> - blen -= ret;
> - }
> - if (payload & 0x20) {
> - ret = snprintf(buf, buf_len, " TLB-REFILL");
> - buf += ret;
> - blen -= ret;
> - }
> - if (payload & 0x40) {
> - ret = snprintf(buf, buf_len, " NOT-TAKEN");
> - buf += ret;
> - blen -= ret;
> - }
> - if (payload & 0x80) {
> - ret = snprintf(buf, buf_len, " MISPRED");
> - buf += 

Re: [PATCH v8 00/22] perf arm-spe: Refactor decoding & dumping flow

2020-11-11 Thread Arnaldo Carvalho de Melo
Em Wed, Nov 11, 2020 at 10:13:52AM +, André Przywara escreveu:
> On 11/11/2020 07:11, Leo Yan wrote:
> 
> Hi Arnaldo, Ingo, Peter, (whoever feels responsible for taking this)
> 
> > This is patch set v8 for refactoring Arm SPE trace decoding and dumping.
> I have reviewed every patch of this in anger, and am now fine with this
> series. Given the bugs fixed, the improvements it brings in terms of
> readability and maintainability, and the low risk it has on breaking
> things, I would be happy to see it merged.

Ok, I'll have it in perf/core for v5.11, thanks!

- Arnaldo
 
> Thanks,
> Andre.
> 
> > This version addresses Andre's comment to pass parameter '&buf_len' at
> > the last call arm_spe_pkt_snprintf() in the function arm_spe_pkt_desc().
> > 
> > This patch set is cleanly applied on the top of perf/core branch
> > with commit 644bf4b0f7ac ("perf jevents: Add test for arch std events").
> > 
> > I retested this patch set on Hisilicon D06 platform with commands
> > "perf report -D" and "perf script", compared the decoding results
> > between with this patch set and without this patch set, "diff" tool
> > shows the result as expected.
> > 
> > Changes from v7:
> > - Changed to pass '&buf_len' for the last call arm_spe_pkt_snprintf() in
> >   the patch 07/22 (Andre).
> > 
> > Changes from v6:
> > - Removed the redundant comma from the string in the patch 21/22 "perf
> >   arm_spe: Decode memory tagging properties" (Dave);
> > - Refined the return value for arm_spe_pkt_desc(): returns 0 for
> >   success, otherwise returns non zero for failures; handle error code at
> >   the end of function arm_spe_pkt_desc(); this is accomplished in the
> >   new patch 07/22 "perf arm-spe: Consolidate arm_spe_pkt_desc()'s
> >   return value" (Dave).
> > 
> > Changes from v5:
> > - Directly bail out arm_spe_pkt_snprintf() if any error occurred
> >   (Andre).
> > 
> > Changes from v4:
> > - Implemented a cumulative error for arm_spe_pkt_snprintf() and changed
> >   to condense code for printing strings (Dave);
> > - Changed to check payload bits [55:52] for parse kernel address
> >   (Andre).
> > 
> > Changes from v3:
> > - Refined arm_spe_payload_len() and removed macro SPE_HEADER_SZ()
> >   (Andre);
> > - Refined packet header index macros (Andre);
> > - Added patch "perf arm_spe: Fixup top byte for data virtual address" to
> >   fixup the data virtual address for 64KB pages and refined comments for
> >   the fixup (Andre);
> > - Added Andre's review tag (using "b4 am" command);
> > - Changed the macros to SPE_PKT_IS_XXX() format to check operation types
> >   (Andre).
> > 
> > 
> > Andre Przywara (1):
> >   perf arm_spe: Decode memory tagging properties
> > 
> > Leo Yan (20):
> >   perf arm-spe: Include bitops.h for BIT() macro
> >   perf arm-spe: Fix a typo in comment
> >   perf arm-spe: Refactor payload size calculation
> >   perf arm-spe: Refactor arm_spe_get_events()
> >   perf arm-spe: Fix packet length handling
> >   perf arm-spe: Refactor printing string to buffer
> >   perf arm-spe: Consolidate arm_spe_pkt_desc()'s return value
> >   perf arm-spe: Refactor packet header parsing
> >   perf arm-spe: Add new function arm_spe_pkt_desc_addr()
> >   perf arm-spe: Refactor address packet handling
> >   perf arm_spe: Fixup top byte for data virtual address
> >   perf arm-spe: Refactor context packet handling
> >   perf arm-spe: Add new function arm_spe_pkt_desc_counter()
> >   perf arm-spe: Refactor counter packet handling
> >   perf arm-spe: Add new function arm_spe_pkt_desc_event()
> >   perf arm-spe: Refactor event type handling
> >   perf arm-spe: Remove size condition checking for events
> >   perf arm-spe: Add new function arm_spe_pkt_desc_op_type()
> >   perf arm-spe: Refactor operation packet handling
> >   perf arm-spe: Add more sub classes for operation packet
> > 
> > Wei Li (1):
> >   perf arm-spe: Add support for ARMv8.3-SPE
> > 
> >  .../util/arm-spe-decoder/arm-spe-decoder.c|  59 +-
> >  .../util/arm-spe-decoder/arm-spe-decoder.h|  17 -
> >  .../arm-spe-decoder/arm-spe-pkt-decoder.c | 601 ++
> >  .../arm-spe-decoder/arm-spe-pkt-decoder.h | 122 +++-
> >  tools/perf/util/arm-spe.c |   2 +-
> >  5 files changed, 479 insertions(+), 322 deletions(-)
> > 
> 

-- 

- Arnaldo


Re: [PATCH 03/24] perf: Add build id data in mmap2 event

2020-11-10 Thread Arnaldo Carvalho de Melo
Em Tue, Nov 10, 2020 at 07:23:34PM +0100, Jiri Olsa escreveu:
> On Tue, Nov 10, 2020 at 11:10:46AM +0100, Jiri Olsa wrote:
> > On Tue, Nov 10, 2020 at 09:28:51AM +0100, Peter Zijlstra wrote:
> > > On Mon, Nov 09, 2020 at 10:53:54PM +0100, Jiri Olsa wrote:
> > > > There's new misc bit for mmap2 to signal there's build
> > > > id data in it:
> > > > 
> > > >   #define PERF_RECORD_MISC_BUILD_ID  (1 << 14)
> > > 
> > > PERF_RECORD_MISC_MMAP_BUILD_ID would be consistent with the existing
> > > PERF_RECORD_MISC_MMAP_DATA naming.

Agreed.

> > ok
 
> > > 
> > > Also, AFAICT there's still a bunch of unused bits in misc.
> > > 
> > >   012 CDEF
> > >   |||-
> > > 
> > > Where:
> > >   0-2 CPUMODE_MASK
> > > 
> > >   C   PROC_MAP_PARSE_TIMEOUT
> > >   D   MMAP_DATA / COMM_EXEC / FORK_EXEC / SWITCH_OUT
> > >   E   EXACT_IP / SCHED_OUT_PREEMPT
> > >   F   (reserved)
> > > 
> > > Maybe we should put in a comment to keep track of the hole ?
> > 
> > ook
> 
> how about the change below.. I also switch the build_id with the size,
> but I kept the build_id size 20, because I think there's bigger chance
> we will use those reserved bytes for something, than that we will need
> those extra 3 bytes in build_id array
> 
>   struct {
>   u8  build_id_size;
>   u8  __reserved_1;
>   u16 __reserved_2;
>   u8  build_id[20];
>   };

For "maybe we'll use it for something else" doesn't require that it gets
before build_id, i.e. to use it for something else it can be as above or

   struct {
   u8  build_id_size;
   u8  build_id[20];
   u8  __reserved_1;
   u16 __reserved_2;
   };

that groups build_id size with it, but nah, this is getting funny by
now.

My suggestion was not about increasing build_id to 23, just to leave the
unused (reserved) bytes after it.

- Arnaldo


Re: [PATCH RESEND 2/2] perf test: Update branch sample parttern for cs-etm

2020-11-10 Thread Arnaldo Carvalho de Melo
Em Tue, Nov 10, 2020 at 11:08:29AM -0700, Mathieu Poirier escreveu:
> On Tue, Nov 10, 2020 at 02:34:17PM +0800, Leo Yan wrote:
> > Since the commit 943b69ac1884 ("perf parse-events: Set exclude_guest=1
> > for user-space counting"), 'exclude_guest=1' is set for user-space
> > counting; and the branch sample's modifier has been altered, the sample
> > event name has been changed from "branches:u:" to "branches:uH:", which
> > gives out info for "user-space and host counting".
> > 
> > But the cs-etm testing's regular expression cannot match the updated
> > branch sample event and leads to test failure.
> > 
> > This patch updates the branch sample parttern by using a more flexible
> 
> s/parttern/pattern

I'll fix it and add stable@ to the CC list, thanks
 
> > expression '.*' to match branch sample's modifiers, so that allows the
> > testing to work as expected.
> > 
> > Fixes: 943b69ac1884 ("perf parse-events: Set exclude_guest=1 for user-space 
> > counting")
> > Signed-off-by: Leo Yan 
> > ---
> >  tools/perf/tests/shell/test_arm_coresight.sh | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> Here too I would CC stable.  With the above:
> 
> Reviewed-by: Mathieu Poirier 
> 
> > 
> > diff --git a/tools/perf/tests/shell/test_arm_coresight.sh 
> > b/tools/perf/tests/shell/test_arm_coresight.sh
> > index 59d847d4981d..18fde2f179cd 100755
> > --- a/tools/perf/tests/shell/test_arm_coresight.sh
> > +++ b/tools/perf/tests/shell/test_arm_coresight.sh
> > @@ -44,7 +44,7 @@ perf_script_branch_samples() {
> > #   touch  6512  1 branches:u:  b22082e0 
> > strcmp+0xa0 (/lib/aarch64-linux-gnu/ld-2.27.so)
> > #   touch  6512  1 branches:u:  b2208320 
> > strcmp+0xe0 (/lib/aarch64-linux-gnu/ld-2.27.so)
> > perf script -F,-time -i ${perfdata} | \
> > -   egrep " +$1 +[0-9]+ .* +branches:([u|k]:)? +"
> > +   egrep " +$1 +[0-9]+ .* +branches:(.*:)? +"
> >  }
> >  
> >  perf_report_branch_samples() {
> > -- 
> > 2.17.1
> > 

-- 

- Arnaldo


Re: [PATCH 03/24] perf: Add build id data in mmap2 event

2020-11-10 Thread Arnaldo Carvalho de Melo
Em Tue, Nov 10, 2020 at 01:22:32PM +0100, Peter Zijlstra escreveu:
> On Tue, Nov 10, 2020 at 08:54:26AM -0300, Arnaldo Carvalho de Melo wrote:
> > Em Tue, Nov 10, 2020 at 09:07:16AM +0100, Peter Zijlstra escreveu:
> > > On Mon, Nov 09, 2020 at 10:53:54PM +0100, Jiri Olsa wrote:
> > > > Adding support to carry build id data in mmap2 event.

> > > > The build id data replaces maj/min/ino/ino_generation
> > > > fields, whichc are also used to identify map's binary,
> > > > so it's ok to replace them with build id data:

> > > >   union {
> > > >   struct {
> > > >   u32   maj;
> > > >   u32   min;
> > > >   u64   ino;
> > > >   u64   ino_generation;
> > > >   };
> > > >   struct {
> > > >   u8build_id[20];
> > > >   u8build_id_size;

> > > What's the purpose of a size field for a fixed size array? Also, I'd
> > > flip the order of these fields, first have the size and then the array.

> > There can be different types of build-ids, with different sizes,
> > flipping the order of the fields is indeed sensible, as we could then
> > support even larger build_ids if the need arises :)

> 3 whole bytes.. whooo!

Hey, I agreed with you, flip the order of the fields, right? :-)

- Arnaldo


Re: [PATCH 03/24] perf: Add build id data in mmap2 event

2020-11-10 Thread Arnaldo Carvalho de Melo
Em Tue, Nov 10, 2020 at 09:07:16AM +0100, Peter Zijlstra escreveu:
> On Mon, Nov 09, 2020 at 10:53:54PM +0100, Jiri Olsa wrote:
> > Adding support to carry build id data in mmap2 event.
> > 
> > The build id data replaces maj/min/ino/ino_generation
> > fields, whichc are also used to identify map's binary,
> > so it's ok to replace them with build id data:
> > 
> >   union {
> >   struct {
> >   u32   maj;
> >   u32   min;
> >   u64   ino;
> >   u64   ino_generation;
> >   };
> >   struct {
> >   u8build_id[20];
> >   u8build_id_size;
> 
> What's the purpose of a size field for a fixed size array? Also, I'd
> flip the order of these fields, first have the size and then the array.

There can be different types of build-ids, with different sizes,
flipping the order of the fields is indeed sensible, as we could then
support even larger build_ids if the need arises :)

- Arnaldo
 
> >   u8__reserved_1;
> >   u16   __reserved_2;
> >   };
> >   };
> > 
> > Replaced maj/min/ino/ino_generation fields give us size
> > of 24 bytes. We use 20 bytes for build id data, 1 byte
> > for size and rest is unused.

-- 

- Arnaldo


Re: 5.10 tree fails to build

2020-11-09 Thread Arnaldo Carvalho de Melo
Em Mon, Nov 09, 2020 at 11:32:13AM +0100, Jiri Olsa escreveu:
> On Mon, Nov 09, 2020 at 05:57:37PM +0800, Ming Lei wrote:
> > On Thu, Nov 5, 2020 at 12:58 PM Amy Parker  wrote:
> > >
> > > On all attempts to build the 5.10 tree (from either release candidate,
> > > Linus's tree, Greg's tree, etc), the build crashes on the BTFID vmlinux
> > > stage. I've tested this on several different devices with completely
> > > different hardware and kernel configs. The symbol for vfs_getattr
> > > appears to be missing. Compiles for all of these work on any compile
> > > on any 5.9 tree. I've tested all 4 5.9 dot-releases as well as the first
> > > two and last two release candidates and Greg's staging tree.

> > > The specific error is:
> > >   BTFIDS  vmlinux
> > > FAILED unresolved symbol vfs_getattr
> > > make: *** [Makefile:1164: vmlinux] Error 255

> > > Any thoughts as to what's causing this? The main machine I'm
> > > compiling with is running kernel 5.8.17 with dwarves 1.17. My
> > > kernel config is attached as `kernel_config`.

> > Turns out the issue is introduced in the following commit:

> > commit 6e22ab9da79343532cd3cde39df25e5a5478c692
> > Author: Jiri Olsa 
> > Date:   Tue Aug 25 21:21:20 2020 +0200

> > bpf: Add d_path helper

> > The issue can be observed reliably when building kernel in Fedora 33 with
> > F33's kernel config.

> > GCC: gcc version 10.2.1 20200826 (Red Hat 10.2.1-3) (GCC)
 
> hi,
> it's gcc dwarf issue tracked in here:
>   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97060
 
> it's introduced by the gcc version 10.2.1 and we
> were told it will take some time to fix
 
> so we took steps to workaround that, the patchset
> just got acked and it's on its way to get merged:
 
>   https://lore.kernel.org/bpf/20201106222512.52454-1-jo...@kernel.org/
 
> it's change for both dwarves/pahole and kernel
 
> the quick workaround is to disable CONFIG_DEBUG_INFO_BTF
> option

I've applied the series and I'm now testing it, will tag v1.19 then.

- Arnaldo


Re: [PATCH v3 1/2] perf lock: Correct field name "flags"

2020-11-04 Thread Arnaldo Carvalho de Melo
Em Wed, Nov 04, 2020 at 05:42:28PM +0800, Leo Yan escreveu:
> The tracepoint "lock:lock_acquire" contains field "flags" but not
> "flag".  Current code wrongly retrieves value from field "flag" and it
> always gets zero for the value, thus "perf lock" doesn't report the
> correct result.
> 
> This patch replaces the field name "flag" with "flags", so can read out
> the correct flags for locking.


Thanks, applied both patches.

- Arnaldo

 
> Fixes: e4cef1f65061 ("perf lock: Fix state machine to recognize lock 
> sequence")
> Signed-off-by: Leo Yan 
> Acked-by: Jiri Olsa 
> ---
>  tools/perf/builtin-lock.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/tools/perf/builtin-lock.c b/tools/perf/builtin-lock.c
> index f0a1dbacb46c..5cecc1ad78e1 100644
> --- a/tools/perf/builtin-lock.c
> +++ b/tools/perf/builtin-lock.c
> @@ -406,7 +406,7 @@ static int report_lock_acquire_event(struct evsel *evsel,
>   struct lock_seq_stat *seq;
>   const char *name = evsel__strval(evsel, sample, "name");
>   u64 tmp  = evsel__intval(evsel, sample, "lockdep_addr");
> - int flag = evsel__intval(evsel, sample, "flag");
> + int flag = evsel__intval(evsel, sample, "flags");
>  
>   memcpy(&addr, &tmp, sizeof(void *));
>  
> -- 
> 2.17.1
> 

-- 

- Arnaldo


[GIT PULL] perf tools changes for v5.11

2020-11-03 Thread Arnaldo Carvalho de Melo
Hi Linus,

Please consider pulling, only fixes and a sync of the headers so
that the perf build is silent, please let me know if I made any other
mistake,

Best regards,

- Arnaldo

The following changes since commit b7cbaf59f62f8ab8f157698f9e31642bff525bd0:

  Merge branch 'akpm' (patches from Andrew) (2020-11-02 14:47:37 -0800)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git 
tags/perf-tools-for-v5.10-2020-11-03

for you to fetch changes up to 5d020cbd86204e51da05628623a6f9729d4b04c8:

  tools feature: Fixup fast path feature detection (2020-11-03 09:24:20 -0300)


perf tools updates for v5.10: 2nd batch.

- Fix visibility attribute in python module init code with newer gcc.

- Fix DRAM_BW_Use 0 issue for CLX/SKX in intel JSON vendor event files.

- Fix the build on new fedora by removing LTO compiler options when
  building perl support.

- Remove broken __no_tail_call attribute.

- Fix segfault when trying to trace events by cgroup.

- Fix crash with non-jited BPF progs.

- Increase buffer size in TUI browser, fixing format truncation.

- Fix printing of build-id for objects lacking one.

- Fix byte swapping for ino_generation field in MMAP2 perf.data records.

- Fix byte swapping for CGROUP perf.data records, for cross arch
  analysis of perf.data files.

- Fix the fast path of feature detection.

- Update kernel header copies.

Signed-off-by: Arnaldo Carvalho de Melo 

Test results in the signed tag at:

https://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git/tag/?h=perf-tools-for-v5.10-2020-11-03

--------
Arnaldo Carvalho de Melo (14):
  perf tools: Update copy of libbpf's hashmap.c
  tools headers UAPI: Update process_madvise affected files
  perf scripting python: Avoid declaring function pointers with a 
visibility attribute
  tools headers UAPI: Sync prctl.h with the kernel sources
  tools headers UAPI: Sync drm/i915_drm.h with the kernel sources
  tools headers UAPI: Update fscrypt.h copy
  tools x86 headers: Update cpufeatures.h headers copies
  tools x86 headers: Update required-features.h header from the kernel
  tools arch x86: Sync the msr-index.h copy with the kernel sources
  tools UAPI: Update copy of linux/mman.h from the kernel sources
  tools kvm headers: Update KVM headers from the kernel sources
  tools headers UAPI: Update tools's copy of linux/perf_event.h
  tools include UAPI: Update linux/mount.h copy
  tools feature: Fixup fast path feature detection

Jin Yao (1):
  perf vendor events: Fix DRAM_BW_Use 0 issue for CLX/SKX

Jiri Olsa (2):
  perf tools: Initialize output buffer in build_id__sprintf
  perf tools: Add missing swap for ino_generation

Justin M. Forbes (1):
  perf tools: Remove LTO compiler options when building perl support

Namhyung Kim (1):
  perf tools: Add missing swap for cgroup events

Peter Zijlstra (1):
  perf tools: Remove broken __no_tail_call attribute

Song Liu (1):
  perf hists browser: Increase size of 'buf' in perf_evsel__hists_browse()

Stanislav Ivanichkin (1):
  perf trace: Fix segfault when trying to trace events by cgroup

Tommi Rantala (1):
  perf tools: Fix crash with non-jited bpf progs

 tools/arch/arm64/include/uapi/asm/kvm.h| 25 +
 tools/arch/s390/include/uapi/asm/sie.h |  2 +-
 tools/arch/x86/include/asm/cpufeatures.h   |  6 ++-
 tools/arch/x86/include/asm/disabled-features.h |  9 +++-
 tools/arch/x86/include/asm/msr-index.h | 10 
 tools/arch/x86/include/asm/required-features.h |  2 +-
 tools/arch/x86/include/uapi/asm/kvm.h  | 20 
 tools/arch/x86/include/uapi/asm/svm.h  | 13 +
 tools/build/feature/test-all.c |  1 -
 tools/include/linux/compiler-gcc.h | 12 -
 tools/include/linux/compiler.h |  3 --
 tools/include/uapi/asm-generic/unistd.h|  4 +-
 tools/include/uapi/drm/i915_drm.h  | 59 --
 tools/include/uapi/linux/fscrypt.h |  6 +--
 tools/include/uapi/linux/kvm.h | 19 +++
 tools/include/uapi/linux/mman.h|  1 +
 tools/include/uapi/linux/mount.h   |  1 +
 tools/include/uapi/linux/perf_event.h  |  2 +-
 tools/include/uapi/linux/prctl.h   |  9 
 tools/include/uapi/linux/vhost.h   |  4 ++
 tools/perf/Makefile.config |  1 +
 tools/perf/arch/x86/entry/syscalls/syscall_64.tbl  | 11 ++--
 tools/perf/builtin-trace.c | 15 +++---
 .../arch/x86/cascadelakex/clx-metrics.json |  2 +-
 .../pmu-events/arch/x86/skylakex/skx-metrics.json  |  2 +-
 t

Re: [PATCH] perf tools: Add missing swap for cgroup events

2020-11-03 Thread Arnaldo Carvalho de Melo
Em Mon, Nov 02, 2020 at 06:49:56PM +0100, Jiri Olsa escreveu:
> On Mon, Nov 02, 2020 at 11:02:28PM +0900, Namhyung Kim wrote:
> > It was missed to add a swap function for PERF_RECORD_CGROUP.
> > 
> > Fixes: ba78c1c5461c ("perf tools: Basic support for CGROUP event")
> 
> Acked-by: Jiri Olsa 


Thanks, applied.

- Arnaldo

 
> thanks,
> jirka
> 
> > Signed-off-by: Namhyung Kim 
> > ---
> >  tools/perf/util/session.c | 13 +
> >  1 file changed, 13 insertions(+)
> > 
> > diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
> > index 7a5f03764702..c44c8e8c09c6 100644
> > --- a/tools/perf/util/session.c
> > +++ b/tools/perf/util/session.c
> > @@ -710,6 +710,18 @@ static void perf_event__namespaces_swap(union 
> > perf_event *event,
> > swap_sample_id_all(event, &event->namespaces.link_info[i]);
> >  }
> >  
> > +static void perf_event__cgroup_swap(union perf_event *event, bool 
> > sample_id_all)
> > +{
> > +   event->cgroup.id = bswap_64(event->cgroup.id);
> > +
> > +   if (sample_id_all) {
> > +   void *data = &event->cgroup.path;
> > +
> > +   data += PERF_ALIGN(strlen(data) + 1, sizeof(u64));
> > +   swap_sample_id_all(event, data);
> > +   }
> > +}
> > +
> >  static u8 revbyte(u8 b)
> >  {
> > int rev = (b >> 4) | ((b & 0xf) << 4);
> > @@ -952,6 +964,7 @@ static perf_event__swap_op perf_event__swap_ops[] = {
> > [PERF_RECORD_SWITCH]  = perf_event__switch_swap,
> > [PERF_RECORD_SWITCH_CPU_WIDE] = perf_event__switch_swap,
> > [PERF_RECORD_NAMESPACES]  = perf_event__namespaces_swap,
> > +   [PERF_RECORD_CGROUP]  = perf_event__cgroup_swap,
> > [PERF_RECORD_TEXT_POKE]   = perf_event__text_poke_swap,
> > [PERF_RECORD_HEADER_ATTR] = perf_event__hdr_attr_swap,
> > [PERF_RECORD_HEADER_EVENT_TYPE]   = perf_event__event_type_swap,
> > -- 
> > 2.29.1.341.ge80a0c044ae-goog
> > 
> 

-- 

- Arnaldo


Re: [PATCH 1/2] perf tools: Initialize output buffer in build_id__sprintf

2020-11-03 Thread Arnaldo Carvalho de Melo
Em Mon, Nov 02, 2020 at 10:50:00PM +0900, Namhyung Kim escreveu:
> Hi Jiri,
> 
> On Mon, Nov 2, 2020 at 8:31 AM Jiri Olsa  wrote:
> >
> > We display garbage for undefined build_id objects,
> > because we don't initialize the output buffer.
> >
> > Signed-off-by: Jiri Olsa 
> 
> Acked-by: Namhyung Kim 

Thanks, applied.

- Arnaldo

 
> Thanks
> Namhyung
> 
> 
> > ---
> >  tools/perf/util/build-id.c | 2 ++
> >  1 file changed, 2 insertions(+)
> >
> > diff --git a/tools/perf/util/build-id.c b/tools/perf/util/build-id.c
> > index 8763772f1095..6b410c3d52dc 100644
> > --- a/tools/perf/util/build-id.c
> > +++ b/tools/perf/util/build-id.c
> > @@ -102,6 +102,8 @@ int build_id__sprintf(const struct build_id *build_id, 
> > char *bf)
> > const u8 *raw = build_id->data;
> > size_t i;
> >
> > +   bf[0] = 0x0;
> > +
> > for (i = 0; i < build_id->size; ++i) {
> > sprintf(bid, "%02x", *raw);
> > ++raw;
> > --
> > 2.26.2
> >

-- 

- Arnaldo


Re: [PATCH] perf: increase size of buf in perf_evsel__hists_browse()

2020-11-03 Thread Arnaldo Carvalho de Melo
Em Sat, Oct 31, 2020 at 09:29:20PM +0100, Jiri Olsa escreveu:
> On Fri, Oct 30, 2020 at 04:54:31PM -0700, Song Liu wrote:
> > Making perf with gcc-9.1.1 generates the following warning:
> > 
> >   CC   ui/browsers/hists.o
> > ui/browsers/hists.c: In function 'perf_evsel__hists_browse':
> > ui/browsers/hists.c:3078:61: error: '%d' directive output may be \
> > truncated writing between 1 and 11 bytes into a region of size \
> > between 2 and 12 [-Werror=format-truncation=]
> > 
> >  3078 |   "Max event group index to sort is %d (index from 0 to %d)",
> >   | ^~
> > ui/browsers/hists.c:3078:7: note: directive argument in the range 
> > [-2147483648, 8]
> >  3078 |   "Max event group index to sort is %d (index from 0 to %d)",
> >   |   ^~
> > In file included from /usr/include/stdio.h:937,
> >  from ui/browsers/hists.c:5:
> > 
> > IOW, the string in line 3078 might be too long for buf[] of 64 bytes.
> > 
> > Fix this by increasing the size of buf[] to 128.
> > 
> > Fixes: dbddf1747441  ("perf report/top TUI: Support hotkeys to let user 
> > select any event for sorting")
> > Cc: stable  # v5.7+
> > Cc: Jin Yao 
> > Cc: Jiri Olsa 
> > Cc: Arnaldo Carvalho de Melo 
> > Cc: Arnaldo Carvalho de Melo 
> > Signed-off-by: Song Liu 
> 
> Acked-by: Jiri Olsa 



Thanks, applied.

- Arnaldo



[GIT PULL] perf tools changes for v5.10: 2nd batch

2020-10-30 Thread Arnaldo Carvalho de Melo
Hi Linus,

Please consider pulling,

Best regards,

- Arnaldo


The following changes since commit 7cf726a59435301046250c42131554d9ccc566b8:

  Merge tag 'linux-kselftest-kunit-5.10-rc1' of 
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest (2020-10-18 
14:45:59 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git 
tags/perf-tools-for-v5.10-2020-10-30

for you to fetch changes up to 7b3bcedf5ee50ca9b0ec74003e14ccbd088408d1:

  perf scripting python: Avoid declaring function pointers with a visibility 
attribute (2020-10-30 08:35:16 -0300)


perf tools updates for v5.10: 2nd batch.

- Update documentation about CAP_PERFMON.

- Add --quiet option to 'perf stat record'.

- Update kernel header copies.

- Do not compile BPF specific code if libbpf isn't available.

- Fix visibility attribute in python module init code with newer gcc.

- Add perf arch instructions annotate handlers for MIPS.

- Show in 'perf version' if libpfm4 is linked in.

- Fix DRAM_BW_Use 0 issue for CLX/SKX in intel JSON vendor event files.

- Add test for JSON defined arch std events.

- Fix the build on new fedora by removing LTO compiler options when
  building perl support.

- Improve warning if no memory nodes are detected.

- Make 'perf test tsc' present in arm64.

- Support regex pattern in --for-each-cgroup in 'perf stat'.

- Remove broken __no_tail_call attribute.

- Add kvm-stat for arm64.

- Fix segfault when trying to trace events by cgroup.

- Fix crash with non-jited BPF progs

Signed-off-by: Arnaldo Carvalho de Melo 

Test results on the signed tag at:

https://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git/tag/?h=perf-tools-for-v5.10-2020-10-30


Re: [PATCH v3 2/2] perf stat: Support regex pattern in --for-each-cgroup

2020-10-27 Thread Arnaldo Carvalho de Melo
Em Tue, Oct 27, 2020 at 02:44:21PM +0100, Jiri Olsa escreveu:
> On Tue, Oct 27, 2020 at 04:28:55PM +0900, Namhyung Kim wrote:
> > To make the command line even more compact with cgroups, support regex
> > pattern matching in cgroup names.
> > 
> >   $ perf stat -a -e cpu-clock,cycles --for-each-cgroup ^foo sleep 1
> > 
> >   3,000.73 msec cpu-clock foo #2.998 CPUs 
> > utilized
> > 12,530,992,699  cyclesfoo #7.517 GHz
> >   (100.00%)
> >   1,000.61 msec cpu-clock foo/bar #1.000 CPUs 
> > utilized
> >  4,178,529,579  cyclesfoo/bar #2.506 GHz
> >   (100.00%)
> >   1,000.03 msec cpu-clock foo/baz #0.999 CPUs 
> > utilized
> >  4,176,104,315  cyclesfoo/baz #2.505 GHz
> >   (100.00%)
> > 
> >1.000892614 seconds time elapsed
> > 
> > Signed-off-by: Namhyung Kim 
> 
> Acked-by: Jiri Olsa 

Thanks, applied both patches.

- Arnaldo
 
> thanks,
> jirka
> 
> > ---
> >  tools/perf/Documentation/perf-stat.txt |   5 +-
> >  tools/perf/builtin-stat.c  |   5 +-
> >  tools/perf/util/cgroup.c   | 198 ++---
> >  3 files changed, 182 insertions(+), 26 deletions(-)
> > 
> > diff --git a/tools/perf/Documentation/perf-stat.txt 
> > b/tools/perf/Documentation/perf-stat.txt
> > index 9f9f29025e49..2b44c08b3b23 100644
> > --- a/tools/perf/Documentation/perf-stat.txt
> > +++ b/tools/perf/Documentation/perf-stat.txt
> > @@ -168,8 +168,9 @@ command line can be used: 'perf stat -e cycles -G 
> > cgroup_name -a -e cycles'.
> >  
> >  --for-each-cgroup name::
> >  Expand event list for each cgroup in "name" (allow multiple cgroups 
> > separated
> > -by comma).  This has same effect that repeating -e option and -G option for
> > -each event x name.  This option cannot be used with -G/--cgroup option.
> > +by comma).  It also support regex patterns to match multiple groups.  This 
> > has same
> > +effect that repeating -e option and -G option for each event x name.  This 
> > option
> > +cannot be used with -G/--cgroup option.
> >  
> >  -o file::
> >  --output file::
> > diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
> > index b01af171d94f..6709578128c9 100644
> > --- a/tools/perf/builtin-stat.c
> > +++ b/tools/perf/builtin-stat.c
> > @@ -2235,8 +2235,11 @@ int cmd_stat(int argc, const char **argv)
> > }
> >  
> > if (evlist__expand_cgroup(evsel_list, stat_config.cgroup_list,
> > - &stat_config.metric_events, true) < 0)
> > + &stat_config.metric_events, true) < 
> > 0) {
> > +   parse_options_usage(stat_usage, stat_options,
> > +   "for-each-cgroup", 0);
> > goto out;
> > +   }
> > }
> >  
> > target__validate(&target);
> > diff --git a/tools/perf/util/cgroup.c b/tools/perf/util/cgroup.c
> > index b81324a13a2b..704333748549 100644
> > --- a/tools/perf/util/cgroup.c
> > +++ b/tools/perf/util/cgroup.c
> > @@ -13,9 +13,19 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> > +#include 
> >  
> >  int nr_cgroups;
> >  
> > +/* used to match cgroup name with patterns */
> > +struct cgroup_name {
> > +   struct list_head list;
> > +   bool used;
> > +   char name[];
> > +};
> > +static LIST_HEAD(cgroup_list);
> > +
> >  static int open_cgroup(const char *name)
> >  {
> > char path[PATH_MAX + 1];
> > @@ -149,6 +159,137 @@ void evlist__set_default_cgroup(struct evlist 
> > *evlist, struct cgroup *cgroup)
> > evsel__set_default_cgroup(evsel, cgroup);
> >  }
> >  
> > +/* helper function for ftw() in match_cgroups and list_cgroups */
> > +static int add_cgroup_name(const char *fpath, const struct stat *sb 
> > __maybe_unused,
> > +  int typeflag)
> > +{
> > +   struct cgroup_name *cn;
> > +
> > +   if (typeflag != FTW_D)
> > +   return 0;
> > +
> > +   cn = malloc(sizeof(*cn) + strlen(fpath) + 1);
> > +   if (cn == NULL)
> > +   return -1;
> > +
> > +   cn->used = false;
> > +   strcpy(cn->name, fpath);
> > +
> > +   list_add_tail(&cn->list, &cgroup_list);
> > +   return 0;
> > +}
> > +
> > +static void release_cgroup_list(void)
> > +{
> > +   struct cgroup_name *cn;
> > +
> > +   while (!list_empty(&cgroup_list)) {
> > +   cn = list_first_entry(&cgroup_list, struct cgroup_name, list);
> > +   list_del(&cn->list);
> > +   free(cn);
> > +   }
> > +}
> > +
> > +/* collect given cgroups only */
> > +static int list_cgroups(const char *str)
> > +{
> > +   const char *p, *e, *eos = str + strlen(str);
> > +   struct cgroup_name *cn;
> > +   char *s;
> > +
> > +   /* use given name as is - for testing purpose */
> > +   for (;;) {
> > +   p = strchr(str, ',');
> > +  

Re: [PATCH 1/2] perf tools: Add --quiet option to perf stat

2020-10-27 Thread Arnaldo Carvalho de Melo
Em Mon, Oct 26, 2020 at 05:27:36PM -0700, Andi Kleen escreveu:
> Add a new --quiet option to perf stat. This is useful with perf stat
> record to write the data only to the perf.data file, which can lower
> measurement overhead because the data doesn't need to be formatted.
> 
> On my 4C desktop:
> 
> % time ./perf stat record  -e $(python -c 'print ",".join(["cycles"]*1000)')  
> -a -I 1000 sleep 5
> ...
> real0m5.377s
> user0m0.238s
> sys 0m0.452s
> % time ./perf stat record --quiet -e $(python -c 'print 
> ",".join(["cycles"]*1000)')  -a -I 1000 sleep 5
> 
> real0m5.452s
> user0m0.183s
> sys 0m0.423s
> 
> In this example it cuts the user time by 20%. On systems with more cores
> the savings are higher.

Applied 1/2,

Thanks,

- Arnaldo
 
> Signed-off-by: Andi Kleen 
> ---
>  tools/perf/Documentation/perf-stat.txt | 4 
>  tools/perf/builtin-stat.c  | 6 +-
>  tools/perf/util/stat.h | 1 +
>  3 files changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/perf/Documentation/perf-stat.txt 
> b/tools/perf/Documentation/perf-stat.txt
> index 9f9f29025e49..b138dd192423 100644
> --- a/tools/perf/Documentation/perf-stat.txt
> +++ b/tools/perf/Documentation/perf-stat.txt
> @@ -316,6 +316,10 @@ small group that need not have multiplexing is lowered. 
> This option
>  forbids the event merging logic from sharing events between groups and
>  may be used to increase accuracy in this case.
>  
> +--quiet::
> +Don't print output. This is useful with perf stat record below to only
> +write data to the perf.data file.
> +
>  STAT RECORD
>  ---
>  Stores stat data into perf data file.
> diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
> index b01af171d94f..743fe47e7a88 100644
> --- a/tools/perf/builtin-stat.c
> +++ b/tools/perf/builtin-stat.c
> @@ -972,6 +972,8 @@ static void print_counters(struct timespec *ts, int argc, 
> const char **argv)
>   /* Do not print anything if we record to the pipe. */
>   if (STAT_RECORD && perf_stat.data.is_pipe)
>   return;
> + if (stat_config.quiet)
> + return;
>  
>   perf_evlist__print_counters(evsel_list, &stat_config, &target,
>   ts, argc, argv);
> @@ -1171,6 +1173,8 @@ static struct option stat_options[] = {
>   "threads of same physical core"),
>   OPT_BOOLEAN(0, "summary", &stat_config.summary,
>  "print summary for interval mode"),
> + OPT_BOOLEAN(0, "quiet", &stat_config.quiet,
> + "don't print output (useful with record)"),
>  #ifdef HAVE_LIBPFM
>   OPT_CALLBACK(0, "pfm-events", &evsel_list, "event",
>   "libpfm4 event selector. use 'perf list' to list available 
> events",
> @@ -2132,7 +2136,7 @@ int cmd_stat(int argc, const char **argv)
>   goto out;
>   }
>  
> - if (!output) {
> + if (!output && !stat_config.quiet) {
>   struct timespec tm;
>   mode = append_file ? "a" : "w";
>  
> diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h
> index 487010c624be..05adf8165025 100644
> --- a/tools/perf/util/stat.h
> +++ b/tools/perf/util/stat.h
> @@ -122,6 +122,7 @@ struct perf_stat_config {
>   bool metric_no_group;
>   bool metric_no_merge;
>   bool stop_read_counter;
> + bool quiet;
>   FILE*output;
>   unsigned int interval;
>   unsigned int timeout;
> -- 
> 2.28.0
> 

-- 

- Arnaldo


Re: [PATCH v2 0/2] perf PMU events test: Add scenario for arch std events

2020-10-27 Thread Arnaldo Carvalho de Melo
Em Thu, Oct 22, 2020 at 07:02:25PM +0800, John Garry escreveu:
> The small series covers the following:
> - Tidy error handling in jevents a bit
> - Expands on PMU events test to cover jevents arch std events support
> 
> Differences to v1:
> - Revert to original logic in jevents.c error path

Thanks, applied both together with Kajol's Reviewed-by tags.

- Arnaldo
 
> John Garry (2):
>   perf jevents: Tidy error handling
>   perf jevents: Add test for arch std events
> 
>  .../pmu-events/arch/test/arch-std-events.json |  8 ++
>  .../pmu-events/arch/test/test_cpu/cache.json  |  5 ++
>  tools/perf/pmu-events/jevents.c   | 87 +--
>  tools/perf/tests/pmu-events.c | 14 +++
>  4 files changed, 66 insertions(+), 48 deletions(-)
>  create mode 100644 tools/perf/pmu-events/arch/test/arch-std-events.json
>  create mode 100644 tools/perf/pmu-events/arch/test/test_cpu/cache.json
> 
> -- 
> 2.26.2
> 

-- 

- Arnaldo


Re: [PATCH] perf vendor events: Fix DRAM_BW_Use 0 issue for CLX/SKX

2020-10-27 Thread Arnaldo Carvalho de Melo
Em Thu, Oct 22, 2020 at 06:02:31PM -0700, Ian Rogers escreveu:
> On Thu, Oct 22, 2020 at 5:54 PM Jin Yao  wrote:
> >
> > Ian reports an issue that the metric DRAM_BW_Use often remains 0.
> >
> > The metric expression for DRAM_BW_Use on CLX/SKX:
> >
> > "( 64 * ( uncore_imc@cas_count_read@ + uncore_imc@cas_count_write@ ) / 
> > 10 ) / duration_time"
> >
> > The counts of uncore_imc/cas_count_read/ and uncore_imc/cas_count_write/
> > are scaled up by 64, that is to turn a count of cache lines into bytes,
> > the count is then divided by 10 to give GB.
> >
> > However, the counts of uncore_imc/cas_count_read/ and
> > uncore_imc/cas_count_write/ have been scaled yet.
> >
> > The scale values are from sysfs, such as
> > /sys/devices/uncore_imc_0/events/cas_count_read.scale.
> > It's 6.103515625e-5 (64 / 1024.0 / 1024.0).
> >
> > So if we use original metric expression, the result is not correct.
> >
> > But the difficulty is, for SKL client, the counts are not scaled.
> >
> > The metric expression for DRAM_BW_Use on SKL:
> >
> > "64 * ( arb@event\\=0x81\\,umask\\=0x1@ + arb@event\\=0x84\\,umask\\=0x1@ ) 
> > / 100 / duration_time / 1000"
> >
> > root@kbl-ppc:~# perf stat -M DRAM_BW_Use -a -- sleep 1
> >
> >  Performance counter stats for 'system wide':
> >
> >190  arb/event=0x84,umask=0x1/ # 1.86 DRAM_BW_Use
> > 29,093,178  arb/event=0x81,umask=0x1/
> >  1,000,703,287 ns   duration_time
> >
> >1.000703287 seconds time elapsed
> >
> > The result is expected.
> >
> > So the easy way is just change the metric expression for CLX/SKX.
> > This patch changes the metric expression to:
> >
> > "( ( ( uncore_imc@cas_count_read@ + uncore_imc@cas_count_write@ ) * 1048576 
> > ) / 10 ) / duration_time"
> >
> > 1048576 = 1024 * 1024.
> >
> > Before (tested on CLX):
> >
> > root@lkp-csl-2sp5 ~# perf stat -M DRAM_BW_Use -a -- sleep 1
> >
> >  Performance counter stats for 'system wide':
> >
> > 765.35 MiB  uncore_imc/cas_count_read/ # 0.00 DRAM_BW_Use
> >   5.42 MiB  uncore_imc/cas_count_write/
> > 1001515088 ns   duration_time
> >
> >1.001515088 seconds time elapsed
> >
> > After:
> >
> > root@lkp-csl-2sp5 ~# perf stat -M DRAM_BW_Use -a -- sleep 1
> >
> >  Performance counter stats for 'system wide':
> >
> > 767.95 MiB  uncore_imc/cas_count_read/ # 0.80 DRAM_BW_Use
> 
> Nit, using ScaleUnit would allow this to be 0.80GB/s.
> 
> >   5.02 MiB  uncore_imc/cas_count_write/
> > 1001900010 ns   duration_time
> >
> >1.001900010 seconds time elapsed
> >
> > Fixes: 038d3b53c284 ("perf vendor events intel: Update CascadelakeX events 
> > to v1.08")
> > Fixes: b5ff7f2799a4 ("perf vendor events: Update SkylakeX events to v1.21")
> > Signed-off-by: Jin Yao 
> 
> Acked-by: Ian Rogers 

Thanks, applied.

- Arnaldo

 
> Thanks,
> Ian
> 
> > ---
> >  tools/perf/pmu-events/arch/x86/cascadelakex/clx-metrics.json | 2 +-
> >  tools/perf/pmu-events/arch/x86/skylakex/skx-metrics.json | 2 +-
> >  2 files changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/tools/perf/pmu-events/arch/x86/cascadelakex/clx-metrics.json 
> > b/tools/perf/pmu-events/arch/x86/cascadelakex/clx-metrics.json
> > index de3193552277..00f4fcffa815 100644
> > --- a/tools/perf/pmu-events/arch/x86/cascadelakex/clx-metrics.json
> > +++ b/tools/perf/pmu-events/arch/x86/cascadelakex/clx-metrics.json
> > @@ -329,7 +329,7 @@
> >  },
> >  {
> >  "BriefDescription": "Average external Memory Bandwidth Use for 
> > reads and writes [GB / sec]",
> > -"MetricExpr": "( 64 * ( uncore_imc@cas_count_read@ + 
> > uncore_imc@cas_count_write@ ) / 10 ) / duration_time",
> > +"MetricExpr": "( ( ( uncore_imc@cas_count_read@ + 
> > uncore_imc@cas_count_write@ ) * 1048576 ) / 10 ) / duration_time",
> >  "MetricGroup": "Memory_BW;SoC",
> >  "MetricName": "DRAM_BW_Use"
> >  },
> > diff --git a/tools/perf/pmu-events/arch/x86/skylakex/skx-metrics.json 
> > b/tools/perf/pmu-events/arch/x86/skylakex/skx-metrics.json
> > index f31794d3b926..0dd8b13b5cfb 100644
> > --- a/tools/perf/pmu-events/arch/x86/skylakex/skx-metrics.json
> > +++ b/tools/perf/pmu-events/arch/x86/skylakex/skx-metrics.json
> > @@ -323,7 +323,7 @@
> >  },
> >  {
> >  "BriefDescription": "Average external Memory Bandwidth Use for 
> > reads and writes [GB / sec]",
> > -"MetricExpr": "( 64 * ( uncore_imc@cas_count_read@ + 
> > uncore_imc@cas_count_write@ ) / 10 ) / duration_time",
> > +"MetricExpr": "( ( ( uncore_imc@cas_count_read@ + 
> > uncore_imc@cas_count_write@ ) * 1048576 ) / 10 ) / duration_time",
> >  "MetricGroup": "Memory_BW;SoC",
> >  "MetricName": "DRAM_BW_Use"
> >  },
> > --
> > 2.17.1
> >

-- 

- Arnaldo


Re: [PATCH] perf trace beauty: Allow header files in a different path

2020-10-27 Thread Arnaldo Carvalho de Melo
Em Thu, Oct 22, 2020 at 07:09:12PM -0700, Ian Rogers escreveu:
> On Thu, Oct 22, 2020 at 7:06 PM Namhyung Kim  wrote:
> >
> > Current script to generate mmap flags and prot checks headers from the
> > uapi/asm-generic directory but it might come from a different
> > directory in some environment.  So change the pattern to accept it.
> >
> > Signed-off-by: Namhyung Kim 
> 
> Acked-by: Ian Rogers 

Thanks, applied.

- Arnaldo

 
> Thanks,
> Ian
> 
> > ---
> >  tools/perf/trace/beauty/mmap_flags.sh | 4 ++--
> >  tools/perf/trace/beauty/mmap_prot.sh  | 2 +-
> >  2 files changed, 3 insertions(+), 3 deletions(-)
> >
> > diff --git a/tools/perf/trace/beauty/mmap_flags.sh 
> > b/tools/perf/trace/beauty/mmap_flags.sh
> > index 39eb2595983b..76825710c725 100755
> > --- a/tools/perf/trace/beauty/mmap_flags.sh
> > +++ b/tools/perf/trace/beauty/mmap_flags.sh
> > @@ -28,12 +28,12 @@ egrep -q $regex ${linux_mman} && \
> > egrep -vw 'MAP_(UNINITIALIZED|TYPE|SHARED_VALIDATE)' | \
> > sed -r "s/$regex/\2 \1 \1 \1 \2/g" | \
> > xargs printf "\t[ilog2(%s) + 1] = \"%s\",\n#ifndef MAP_%s\n#define 
> > MAP_%s %s\n#endif\n")
> > -([ ! -f ${arch_mman} ] || egrep -q 
> > '#[[:space:]]*include[[:space:]]+ > +([ ! -f ${arch_mman} ] || egrep -q 
> > '#[[:space:]]*include[[:space:]]+.*uapi/asm-generic/mman.*' ${arch_mman}) &&
> >  (egrep $regex ${header_dir}/mman-common.h | \
> > egrep -vw 'MAP_(UNINITIALIZED|TYPE|SHARED_VALIDATE)' | \
> > sed -r "s/$regex/\2 \1 \1 \1 \2/g"  | \
> > xargs printf "\t[ilog2(%s) + 1] = \"%s\",\n#ifndef MAP_%s\n#define 
> > MAP_%s %s\n#endif\n")
> > -([ ! -f ${arch_mman} ] || egrep -q 
> > '#[[:space:]]*include[[:space:]]+.*' ${arch_mman}) 
> > &&
> > +([ ! -f ${arch_mman} ] || egrep -q 
> > '#[[:space:]]*include[[:space:]]+.*uapi/asm-generic/mman.h>.*' 
> > ${arch_mman}) &&
> >  (egrep $regex ${header_dir}/mman.h | \
> > sed -r "s/$regex/\2 \1 \1 \1 \2/g"  | \
> > xargs printf "\t[ilog2(%s) + 1] = \"%s\",\n#ifndef MAP_%s\n#define 
> > MAP_%s %s\n#endif\n")
> > diff --git a/tools/perf/trace/beauty/mmap_prot.sh 
> > b/tools/perf/trace/beauty/mmap_prot.sh
> > index 28f638f8d216..664d8d534a50 100755
> > --- a/tools/perf/trace/beauty/mmap_prot.sh
> > +++ b/tools/perf/trace/beauty/mmap_prot.sh
> > @@ -17,7 +17,7 @@ prefix="PROT"
> >
> >  printf "static const char *mmap_prot[] = {\n"
> >  regex=`printf 
> > '^[[:space:]]*#[[:space:]]*define[[:space:]]+%s_([[:alnum:]_]+)[[:space:]]+(0x[[:xdigit:]]+)[[:space:]]*.*'
> >  ${prefix}`
> > -([ ! -f ${arch_mman} ] || egrep -q 
> > '#[[:space:]]*include[[:space:]]+ > +([ ! -f ${arch_mman} ] || egrep -q 
> > '#[[:space:]]*include[[:space:]]+.*uapi/asm-generic/mman.*' ${arch_mman}) &&
> >  (egrep $regex ${common_mman} | \
> > egrep -vw PROT_NONE | \
> > sed -r "s/$regex/\2 \1 \1 \1 \2/g"  | \
> > --
> > 2.29.0.rc1.297.gfa9743e501-goog
> >

-- 

- Arnaldo


Re: [PATCHv5] perf kvm: add kvm-stat for arm64

2020-10-27 Thread Arnaldo Carvalho de Melo
Em Tue, Oct 27, 2020 at 03:24:21PM +0900, Sergey Senozhatsky escreveu:
> Add support for perf kvm stat on arm64 platform.
> 
> Example:
>  # perf kvm stat report
> 
> Analyze events for all VMs, all VCPUs:
> 
> VM-EXITSamples  Samples% Time%Min TimeMax Time 
> Avg time
> 
>DABT_LOW 66186798.91%40.45%  2.19us   3364.65us  
> 6.24us ( +-   0.34% )
> IRQ   4598 0.69%57.44%  2.89us   3397.59us   
> 1276.27us ( +-   1.61% )
> WFx   1475 0.22% 1.71%  2.22us   3388.63us
> 118.31us ( +-   8.69% )
>IABT_LOW   1018 0.15% 0.38%  2.22us   2742.07us 
> 38.29us ( +-  12.55% )
>   SYS64180 0.03% 0.01%  2.07us112.91us  
> 6.57us ( +-  14.95% )
>   HVC64 17 0.00% 0.01%  2.19us322.35us 
> 42.95us ( +-  58.98% )
> 
> Total Samples:669155, Total events handled time:10216387.86us.
> 
> Signed-off-by: Sergey Senozhatsky 
> Reviewed-by: Leo Yan 
> Tested-by: Leo Yan 

Thanks, applied.

- Arnaldo

> ---
> 
> v5: rebased against perf/core (Arnaldo)
> v4: rebased against perf/core (Leo)
> v3: report ARM_EXCEPTION_IL exceptions (Leo)
> v2: reworked the patch after offline discussion with Suleiman
> 
>  tools/perf/arch/arm64/Makefile|  1 +
>  tools/perf/arch/arm64/util/Build  |  1 +
>  .../arch/arm64/util/arm64_exception_types.h   | 92 +++
>  tools/perf/arch/arm64/util/kvm-stat.c | 85 +
>  4 files changed, 179 insertions(+)
>  create mode 100644 tools/perf/arch/arm64/util/arm64_exception_types.h
>  create mode 100644 tools/perf/arch/arm64/util/kvm-stat.c
> 
> diff --git a/tools/perf/arch/arm64/Makefile b/tools/perf/arch/arm64/Makefile
> index dbef716a1913..fab3095fb5d0 100644
> --- a/tools/perf/arch/arm64/Makefile
> +++ b/tools/perf/arch/arm64/Makefile
> @@ -4,6 +4,7 @@ PERF_HAVE_DWARF_REGS := 1
>  endif
>  PERF_HAVE_JITDUMP := 1
>  PERF_HAVE_ARCH_REGS_QUERY_REGISTER_OFFSET := 1
> +HAVE_KVM_STAT_SUPPORT := 1
>  
>  #
>  # Syscall table generation for perf
> diff --git a/tools/perf/arch/arm64/util/Build 
> b/tools/perf/arch/arm64/util/Build
> index b53294d74b01..8d2b9bcfffca 100644
> --- a/tools/perf/arch/arm64/util/Build
> +++ b/tools/perf/arch/arm64/util/Build
> @@ -2,6 +2,7 @@ perf-y += header.o
>  perf-y += machine.o
>  perf-y += perf_regs.o
>  perf-y += tsc.o
> +perf-y += kvm-stat.o
>  perf-$(CONFIG_DWARF) += dwarf-regs.o
>  perf-$(CONFIG_LOCAL_LIBUNWIND) += unwind-libunwind.o
>  perf-$(CONFIG_LIBDW_DWARF_UNWIND) += unwind-libdw.o
> diff --git a/tools/perf/arch/arm64/util/arm64_exception_types.h 
> b/tools/perf/arch/arm64/util/arm64_exception_types.h
> new file mode 100644
> index ..27c981ebe401
> --- /dev/null
> +++ b/tools/perf/arch/arm64/util/arm64_exception_types.h
> @@ -0,0 +1,92 @@
> +// SPDX-License-Identifier: GPL-2.0
> +#ifndef ARCH_PERF_ARM64_EXCEPTION_TYPES_H
> +#define ARCH_PERF_ARM64_EXCEPTION_TYPES_H
> +
> +/* Per asm/virt.h */
> +#define HVC_STUB_ERR   0xbadca11
> +
> +/* Per asm/kvm_asm.h */
> +#define ARM_EXCEPTION_IRQ0
> +#define ARM_EXCEPTION_EL1_SERROR 1
> +#define ARM_EXCEPTION_TRAP   2
> +#define ARM_EXCEPTION_IL 3
> +/* The hyp-stub will return this for any kvm_call_hyp() call */
> +#define ARM_EXCEPTION_HYP_GONE   HVC_STUB_ERR
> +
> +#define kvm_arm_exception_type   \
> + {ARM_EXCEPTION_IRQ, "IRQ"   },  \
> + {ARM_EXCEPTION_EL1_SERROR,  "SERROR"},  \
> + {ARM_EXCEPTION_TRAP,"TRAP"  },  \
> + {ARM_EXCEPTION_IL,  "ILLEGAL"   },  \
> + {ARM_EXCEPTION_HYP_GONE,"HYP_GONE"  }
> +
> +/* Per asm/esr.h */
> +#define ESR_ELx_EC_UNKNOWN   (0x00)
> +#define ESR_ELx_EC_WFx   (0x01)
> +/* Unallocated EC: 0x02 */
> +#define ESR_ELx_EC_CP15_32   (0x03)
> +#define ESR_ELx_EC_CP15_64   (0x04)
> +#define ESR_ELx_EC_CP14_MR   (0x05)
> +#define ESR_ELx_EC_CP14_LS   (0x06)
> +#define ESR_ELx_EC_FP_ASIMD  (0x07)
> +#define ESR_ELx_EC_CP10_ID   (0x08)  /* EL2 only */
> +#define ESR_ELx_EC_PAC   (0x09)  /* EL2 and above */
> +/* Unallocated EC: 0x0A - 0x0B */
> +#define ESR_ELx_EC_CP14_64   (0x0C)
> +/* Unallocated EC: 0x0d */
> +#define ESR_ELx_EC_ILL   (0x0E)
> +/* Unallocated EC: 0x0F - 0x10 */
> +#define ESR_ELx_EC_SVC32 (0x11)
> +#define ESR_ELx_EC_HVC32 (0x12)  /* EL2 only */
> +#define ESR_ELx_EC_SMC32 (0x13)  /* EL2 and above */
> +/* Unallocated EC: 0x14 */
> +#define ESR_ELx_EC_SVC64 (0x15)
> +#define ESR_ELx_EC_HVC64 (0x16)  /* EL2 and above */
> +#define ESR_ELx_EC_SMC64 (0x17)  /* EL2 and above */
> +#define ESR_ELx_EC_SYS64 (0x18)
> +#define ESR_ELx_EC_SVE   (0x19)
> +#define ESR_ELx_EC_ERET  (0x1a)  /* EL2 only */
> +/* Unallocated EC: 0x1b - 0x1E */
> +#

Re: [PATCH v2 2/2] perf stat: Support regex pattern in --for-each-cgroup

2020-10-26 Thread Arnaldo Carvalho de Melo
Em Mon, Oct 26, 2020 at 09:32:34PM +0900, Namhyung Kim escreveu:
> Hi Jiri,
> 
> On Mon, Oct 26, 2020 at 8:40 PM Jiri Olsa  wrote:
> >
> > On Sat, Oct 24, 2020 at 11:59:18AM +0900, Namhyung Kim wrote:
> > > To make the command line even more compact with cgroups, support regex
> > > pattern matching in cgroup names.
> > >
> > >   $ perf stat -a -e cpu-clock,cycles --for-each-cgroup ^foo sleep 1
> > >
> > >   3,000.73 msec cpu-clock foo #2.998 CPUs 
> > > utilized
> > > 12,530,992,699  cyclesfoo #7.517 GHz  
> > > (100.00%)
> > >   1,000.61 msec cpu-clock foo/bar #1.000 CPUs 
> > > utilized
> > >  4,178,529,579  cyclesfoo/bar #2.506 GHz  
> > > (100.00%)
> > >   1,000.03 msec cpu-clock foo/baz #0.999 CPUs 
> > > utilized
> > >  4,176,104,315  cyclesfoo/baz #2.505 GHz  
> > > (100.00%)
> >
> > just curious.. there was another foo/XXX group using the
> > rest of the cycles, right?
> 
> No, if so it should be displayed too.  But actually there was a process
> in the foo cgroup itself.
> 
> >
> > also perhaps we want to warn if there's no match found:
> >
> > $ sudo ./perf stat -a -e cpu-clock,cycles --for-each-cgroup ^foo 
> > sleep 1
> >
> >  Performance counter stats for 'system wide':
> >
> >
> >1.002375575 seconds time elapsed
> >
> 
> Right, will check this case.

Hum, I thought that could be done on top of this one, but then, the
ambiguity of:

1. No samples for a cgroups matching that expression

2. No cgroups match that expression

Is real and warrants a warning for the 'no cgroups match the
--for-each-group regexp' case.

So I'll wait for v3 with that warning,

Thanks,

- Arnaldo


Re: Segfault in pahole 1.18 when building kernel 5.9.1 for arm64

2020-10-21 Thread Arnaldo Carvalho de Melo
Em Wed, Oct 21, 2020 at 08:53:16AM -0700, Andrii Nakryiko escreveu:
> On Wed, Oct 21, 2020 at 6:48 AM Arnaldo Carvalho de Melo
>  wrote:
> >
> > Em Wed, Oct 21, 2020 at 08:22:40AM +0200, Jiri Slaby escreveu:
> > > On 20. 10. 20, 14:20, Arnaldo Carvalho de Melo wrote:
> > > > > Yeah, I observe the very same. I reported it at:
> > > > > https://bugzilla.suse.com/show_bug.cgi?id=1177921
> >
> > > > Would it be possible to try with
> > > > https://git.kernel.org/pub/scm/devel/pahole/pahole.git/commit/?h=tmp.libbtf_encoder
> > > > ?
> >
> > > Yes, that branch fixes the crashes and the kernel build finishes. The
> > > zero-sized symbol error remains.
> >
> > > So what should distributions do now -- should we switch to a pahole 
> > > snapshot
> > > for a while?
> >
> > That would do the trick, I just completed my testing and pushed to the
> > master branch on kernel.org and github, tests detailed at:
> >
> > https://git.kernel.org/pub/scm/devel/pahole/pahole.git/commit/?id=040fd7f585c9b9fcf4475d294b3f5ddf78405297
> >
> > There are some minor bug reports I want to address but my ETA right now
> > is the end of this week to release v1.19.
> 
> I've just sent a patch that skips zero-sized ELF symbols without a
> warning or error.

Got it, applied, Jiri, please consider testing it and providing a
Tested-by, in addition to the Reported-by that I'm adding now.

Thanks!

- Arnaldo


Re: Segfault in pahole 1.18 when building kernel 5.9.1 for arm64

2020-10-21 Thread Arnaldo Carvalho de Melo
Em Wed, Oct 21, 2020 at 08:22:40AM +0200, Jiri Slaby escreveu:
> On 20. 10. 20, 14:20, Arnaldo Carvalho de Melo wrote:
> > > Yeah, I observe the very same. I reported it at:
> > > https://bugzilla.suse.com/show_bug.cgi?id=1177921

> > Would it be possible to try with
> > https://git.kernel.org/pub/scm/devel/pahole/pahole.git/commit/?h=tmp.libbtf_encoder
> > ?
 
> Yes, that branch fixes the crashes and the kernel build finishes. The
> zero-sized symbol error remains.
 
> So what should distributions do now -- should we switch to a pahole snapshot
> for a while?

That would do the trick, I just completed my testing and pushed to the
master branch on kernel.org and github, tests detailed at:

https://git.kernel.org/pub/scm/devel/pahole/pahole.git/commit/?id=040fd7f585c9b9fcf4475d294b3f5ddf78405297

There are some minor bug reports I want to address but my ETA right now
is the end of this week to release v1.19.

- Arnaldo


Re: Segfault in pahole 1.18 when building kernel 5.9.1 for arm64

2020-10-20 Thread Arnaldo Carvalho de Melo
Em Tue, Oct 20, 2020 at 03:14:59PM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Tue, Oct 20, 2020 at 10:10:19AM -0700, Andrii Nakryiko escreveu:
> > On Tue, Oct 20, 2020 at 10:05 AM Hao Luo  wrote:
> > > Thanks for reporting this and cc'ing me. I forgot to update the error
> > > messages when renaming the flags. I will send a patch to fix the error
> > > message.
> 
> > > The commit
> 
> > > commit f3d9054ba8ff1df0fc44e507e3a01c0964cabd42
> > > Author: Hao Luo 
> > > AuthorDate: Wed Jul 8 13:44:10 2020 -0700
> 
> > >  btf_encoder: Teach pahole to store percpu variables in vmlinux BTF.
> 
> > > encodes kernel global variables into BTF so that bpf programs can
> > > directly access them. If there is no need to access kernel global
> > > variables, it's perfectly fine to use '--btf_encode_force' to skip
> > > encoding bad symbols into BTF, or '--skip_encoding_btf_vars' to skip
> > > encoding all global vars all together. I will add these info into the
> > > updated error message.
> 
> > > Also cc bpf folks for attention of this bug.
> 
> > I've already fixed the message as part of
> > 2e719cca6672 ("btf_encoder: revamp how per-CPU variables are encoded")
> 
> > It's currently still in the tmp.libbtf_encoder branch in pahole repo.
> 
> I'm now running:
> 
>   $ grep BTF=y ../build/s390x-v5.9.0+/.config
>   CONFIG_DEBUG_INFO_BTF=y
>   $ make -j24 CROSS_COMPILE=s390x-linux-gnu- ARCH=s390 
> O=../build/s390x-v5.9.0+/

  $ ls -la /home/acme/git/build/s390x-v5.9.0+/.tmp_vmlinux.btf
  -rwxrwxr-x. 1 acme acme 304592928 Oct 20 15:26 
/home/acme/git/build/s390x-v5.9.0+/.tmp_vmlinux.btf
  $ file /home/acme/git/build/s390x-v5.9.0+/.tmp_vmlinux.btf
  /home/acme/git/build/s390x-v5.9.0+/.tmp_vmlinux.btf: ELF 64-bit MSB 
executable, IBM S/390, version 1 (SYSV), statically linked, 
BuildID[sha1]=ed39402fdbd7108c1055baaa61cfc6b0e431901d, with debug_info, not 
stripped
  $ pahole -F btf -C list_head 
/home/acme/git/build/s390x-v5.9.0+/.tmp_vmlinux.btf
  struct list_head {
struct list_head * next; /* 0 8 */
struct list_head * prev; /* 8 8 */
  
/* size: 16, cachelines: 1, members: 2 */
/* last cacheline: 16 bytes */
  };
  $
  $ readelf -wi /home/acme/git/build/s390x-v5.9.0+/vmlinux | grep -m2 
DW_AT_producer
  <28>   DW_AT_producer: (indirect string, offset: 0x51): GNU AS 2.34
  <3b>   DW_AT_producer: (indirect string, offset: 0xeb46): GNU C89 
9.2.1 20190827 (Red Hat Cross 9.2.1-3) -m64 -mwarn-dynamicstack -mbackchain 
-msoft-float -march=z196 -mtune=z196 -mpacked-stack -mindirect-branch=thunk 
-mfunction-return=thunk -mindirect-branch-table -mrecord-mcount -mnop-mcount 
-mfentry -mzarch -g -O2 -std=gnu90 -p -fno-strict-aliasing -fno-common 
-fshort-wchar -fPIE -fno-asynchronous-unwind-tables 
-fno-delete-null-pointer-checks -fno-reorder-blocks -fno-ipa-cp-clone 
-fno-partial-inlining -fno-stack-protector -fno-var-tracking-assignments 
-fno-inline-functions-called-once -falign-functions=32 -fno-strict-overflow 
-fstack-check=no -fconserve-stack -fno-function-sections -fno-data-sections 
-fsanitize=kernel-address -fasan-shadow-offset=0x18 
-fsanitize=bounds -fsanitize=shift -fsanitize=integer-divide-by-zero 
-fsanitize=unreachable -fsanitize=signed-integer-overflow 
-fsanitize=object-size -fsanitize=bool -fsanitize=enum 
-fsanitize-undefined-trap-on-error -fsanitize-coverage=trace-pc 
-fsanitize-coverage=trace-cmp --param allow-store-data-races=0 --param 
asan-globals=1 --param asan-instrumentation-with-call-threshold=0 --param 
asan-stack=1 --param asan-instrument-allocas=1
  $
  $ file /home/acme/git/build/s390x-v5.9.0+/vmlinux
  /home/acme/git/build/s390x-v5.9.0+/vmlinux: ELF 64-bit MSB executable, IBM 
S/390, version 1 (SYSV), statically linked, 
BuildID[sha1]=fbb252d8dccc11d8e66d6f248d06bcdca4e7db7a, with debug_info, not 
stripped
  $

But I noticed that 'btfdiff' is showing differences from output
generated from DWARF and the one generated from BTF, the first issue
is:

[acme@five pahole]$ btfdiff /home/acme/git/build/v5.9.0+/vmlinux

@@ -115549,7 +120436,7 @@ struct irq_router_handler {
 
/* XXX 6 bytes hole, try to pack */
 
-   int(*probe)(struct irq_router * , struct 
pci_dev * , u16 ); /* 8 8 */
+   int(*probe)(struct irq_router *, struct pci_dev 
*, u16); /* 8 8 */
 
/* size: 16, cachelines: 1, members: 2 */
/* sum members: 10, holes: 1, sum holes: 6 */
[acme@five pahole]$

The BTF output (the one starting with '+' in the diff output) is better, just
different than it was before, I'll fix the DWARF

Re: Segfault in pahole 1.18 when building kernel 5.9.1 for arm64

2020-10-20 Thread Arnaldo Carvalho de Melo
Em Tue, Oct 20, 2020 at 10:10:19AM -0700, Andrii Nakryiko escreveu:
> On Tue, Oct 20, 2020 at 10:05 AM Hao Luo  wrote:
> > Thanks for reporting this and cc'ing me. I forgot to update the error
> > messages when renaming the flags. I will send a patch to fix the error
> > message.

> > The commit

> > commit f3d9054ba8ff1df0fc44e507e3a01c0964cabd42
> > Author: Hao Luo 
> > AuthorDate: Wed Jul 8 13:44:10 2020 -0700

> >  btf_encoder: Teach pahole to store percpu variables in vmlinux BTF.

> > encodes kernel global variables into BTF so that bpf programs can
> > directly access them. If there is no need to access kernel global
> > variables, it's perfectly fine to use '--btf_encode_force' to skip
> > encoding bad symbols into BTF, or '--skip_encoding_btf_vars' to skip
> > encoding all global vars all together. I will add these info into the
> > updated error message.

> > Also cc bpf folks for attention of this bug.

> I've already fixed the message as part of
> 2e719cca6672 ("btf_encoder: revamp how per-CPU variables are encoded")

> It's currently still in the tmp.libbtf_encoder branch in pahole repo.

I'm now running:

  $ grep BTF=y ../build/s390x-v5.9.0+/.config
  CONFIG_DEBUG_INFO_BTF=y
  $ make -j24 CROSS_COMPILE=s390x-linux-gnu- ARCH=s390 O=../build/s390x-v5.9.0+/

To do the last test I wanted before moving it to master.

- Arnaldo


Re: Segfault in pahole 1.18 when building kernel 5.9.1 for arm64

2020-10-20 Thread Arnaldo Carvalho de Melo
Em Tue, Oct 20, 2020 at 11:01:39AM +0200, Jiri Slaby escreveu:
> Hi,
> 
> On 19. 10. 20, 1:18, Érico Rolim wrote:
> > I'm trying to build kernel 5.9.1 for arm64, and my dotconfig has
> > `CONFIG_DEBUG_INFO_BTF=y`, which requires pahole for building. However, 
> > pahole
> > version 1.18 segfaults during the build, as can be seen below:
> > 
> > PAHOLE: Error: Found symbol of zero size when encoding btf (sym:
> > '__kvm_nvhe_arm64_ssbd_callback_required', cu:
> > 'arch/arm64/kernel/cpu_errata.c').
> 
> The symbol is an alias coming from arch/arm64/kernel/vmlinux.lds:
> __kvm_nvhe_arm64_ssbd_callback_required = arm64_ssbd_callback_required;;
> 
> > PAHOLE: Error: Use '-j' or '--force' to ignore such symbols and force
> > emit the btf.
> > scripts/link-vmlinux.sh: line 141: 43837 Segmentation fault
> > LLVM_OBJCOPY=${OBJCOPY} ${PAHOLE} -J ${1}
> >LD  .tmp_vmlinux.kallsyms1
> >KSYM.tmp_vmlinux.kallsyms1.o
> >LD  .tmp_vmlinux.kallsyms2
> >KSYM.tmp_vmlinux.kallsyms2.o
> >LD  vmlinux
> >BTFIDS  vmlinux
> > FAILED: load BTF from vmlinux: Unknown error -2make: ***
> > [Makefile:1162: vmlinux] Error 255
> > 
> > It is possible to force the build to continue if
> > 
> >LLVM_OBJCOPY=${OBJCOPY} ${PAHOLE} -J ${1}
> > 
> > in scripts/link-vmlinux.sh is changed to
> > 
> >LLVM_OBJCOPY=${OBJCOPY} ${PAHOLE} -J --btf_encode_force ${1}
> > 
> > The suggested `-j` or `--force` flags don't exist, since they were removed 
> > in
> > [1]. I believe `--btf_encode_force` should be suggested instead.
> 
> Agreed, '--btf_encode_force' makes pahole to proceed without crashes.
> 
> > It should be noted that the same build, but with pahole version 1.17, works
> > without issue, so I think this is either a regression in pahole or the 
> > script
> > will need to be changed for newer versions of pahole.
> 
> Yeah, I observe the very same. I reported it at:
> https://bugzilla.suse.com/show_bug.cgi?id=1177921

Would it be possible to try with
https://git.kernel.org/pub/scm/devel/pahole/pahole.git/commit/?h=tmp.libbtf_encoder
?

This switches to using libbpf for the BTF encoder and may have fixed
this problem.

- Arnaldo
 
> The backtrace:
> > (gdb) where
> > #0  __memmove_sse2_unaligned_erms () at
> ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:300
> > #1  0x77f78346 in memcpy (__len=, __src= out>, __dest=, __dest=, __src=,
> __len=) at /usr/include/bits/string_fortified.h:34
> > #2  gobuffer__add (gb=0x55569aa0, s=0x7fffb50c, len=12) at
> /usr/src/debug/dwarves-1.18-1.1.x86_64/gobuffer.c:87
> > #3  0x77f8671f in btf_elf__add_datasec_type
> (btfe=btfe@entry=0x55569a40,
> section_name=section_name@entry=0x77fa43ad ".data..percpu",
> var_secinfo_buf=var_secinfo_buf@entry=0x55569ac0) at
> /usr/src/debug/dwarves-1.18-1.1.x86_64/libbtf.c:721
> > #4  0x77f8d766 in btf_elf__encode (flags=0 '\000',
> btfe=0x55569a40) at /usr/src/debug/dwarves-1.18-1.1.x86_64/libbtf.c:857
> > #5  btf_elf__encode (btfe=0x55569a40, flags=) at
> /usr/src/debug/dwarves-1.18-1.1.x86_64/libbtf.h:71
> > #6  0x77f7fc70 in btf_encoder__encode () at
> /usr/src/debug/dwarves-1.18-1.1.x86_64/btf_encoder.c:213
> > #7  0x77f80d17 in cu__encode_btf (cu=0x5638d9b0, verbose=0,
> force=false, skip_encoding_vars=false) at
> /usr/src/debug/dwarves-1.18-1.1.x86_64/btf_encoder.c:255
> > #8  0xac4d in pahole_stealer (cu=0x5638d9b0,
> conf_load=) at
> /usr/src/debug/dwarves-1.18-1.1.x86_64/pahole.c:2366
> > #9  0x77f89dab in finalize_cu (cus=0x555622d0,
> dcu=0x7fffd080, conf=0x555610e0 , cu=0x5638d9b0) at
> /usr/src/debug/dwarves-1.18-1.1.x86_64/dwarf_loader.c:2473
> > #10 finalize_cu_immediately (conf=0x555610e0 ,
> dcu=0x7fffd080, cu=0x5638d9b0, cus=0x555622d0) at
> /usr/src/debug/dwarves-1.18-1.1.x86_64/dwarf_loader.c:2317
> > #11 cus__load_module (cus=cus@entry=0x555622d0, conf=0x555610e0
> , mod=mod@entry=0x55564760, dw=0x55565960,
> elf=elf@entry=0x55562360, filename=0x7fffe846 "ss") at
> /usr/src/debug/dwarves-1.18-1.1.x86_64/dwarf_loader.c:2473
> > #12 0x77f8a0f1 in cus__process_dwflmod (dwflmod=0x55564760,
> userdata=, name=, base=,
> arg=0x7fffe1b0) at
> /usr/src/debug/dwarves-1.18-1.1.x86_64/dwarf_loader.c:2518
> > #13 0x77d4f571 in dwfl_getmodules () from /usr/lib64/libdw.so.1
> > #14 0x77f823ed in cus__process_file (filename=0x7fffe846 "ss",
> fd=3, conf=, cus=0x555622d0) at
> /usr/src/debug/dwarves-1.18-1.1.x86_64/dwarf_loader.c:2571
> > #15 dwarf__load_file (cus=0x555622d0, conf=,
> filename=0x7fffe846 "ss") at
> /usr/src/debug/dwarves-1.18-1.1.x86_64/dwarf_loader.c:2588
> > #16 0x77f76771 in cus__load_file (cus=cus@entry=0x555622d0,
> conf=conf@entry=0x555610e0 , filename=0x7fffe846 "ss") at
> /usr/src/debug/dwarves-1.18-1.1.x86_64/dwarves.c:1958
> > #17 0x77f798a8 in cus__load_files (cu

Re: [PATCH] perf test: Implement skip_reason callback for watchpoint tests

2020-10-20 Thread Arnaldo Carvalho de Melo
Em Tue, Oct 20, 2020 at 03:07:15PM +0900, Namhyung Kim escreveu:
> Hello,
> On Fri, Oct 16, 2020 at 10:17 PM Tommi Rantala
>  wrote:
> >
> > Currently reason for skipping the read only watchpoint test is only seen
> > when running in verbose mode:
> >
> >   $ perf test watchpoint
> >   23: Watchpoint:
> >   23.1: Read Only Watchpoint: Skip
> >   23.2: Write Only Watchpoint   : Ok
> >   23.3: Read / Write Watchpoint : Ok
> >   23.4: Modify Watchpoint   : Ok
> >
> >   $ perf test -v watchpoint
> >   23: Watchpoint:
> >   23.1: Read Only Watchpoint:
> >   --- start ---
> >   test child forked, pid 60204
> >   Hardware does not support read only watchpoints.
> >   test child finished with -2
> >
> > Implement skip_reason callback for the watchpoint tests, so that it's
> > easy to see reason why the test is skipped:
> >
> >   $ perf test watchpoint
> >   23: Watchpoint:
> >   23.1: Read Only Watchpoint: Skip (missing 
> > hardware support)
> >   23.2: Write Only Watchpoint   : Ok
> >   23.3: Read / Write Watchpoint : Ok
> >   23.4: Modify Watchpoint   : Ok
> >
> > Signed-off-by: Tommi Rantala 
> 
> Acked-by: Namhyung Kim 

Thanks, applied.

- Arnaldo
 
> 
> Thanks
> Namhyung
> 
> 
> > ---
> >  tools/perf/tests/builtin-test.c |  1 +
> >  tools/perf/tests/tests.h|  1 +
> >  tools/perf/tests/wp.c   | 21 +++--
> >  3 files changed, 17 insertions(+), 6 deletions(-)
> >
> > diff --git a/tools/perf/tests/builtin-test.c 
> > b/tools/perf/tests/builtin-test.c
> > index d328caaba45d..3bfad4ee31ae 100644
> > --- a/tools/perf/tests/builtin-test.c
> > +++ b/tools/perf/tests/builtin-test.c
> > @@ -142,6 +142,7 @@ static struct test generic_tests[] = {
> > .skip_if_fail   = false,
> > .get_nr = test__wp_subtest_get_nr,
> > .get_desc   = test__wp_subtest_get_desc,
> > +   .skip_reason= test__wp_subtest_skip_reason,
> > },
> > },
> > {
> > diff --git a/tools/perf/tests/tests.h b/tools/perf/tests/tests.h
> > index 4447a516c689..0630301087a6 100644
> > --- a/tools/perf/tests/tests.h
> > +++ b/tools/perf/tests/tests.h
> > @@ -66,6 +66,7 @@ int test__bp_signal_overflow(struct test *test, int 
> > subtest);
> >  int test__bp_accounting(struct test *test, int subtest);
> >  int test__wp(struct test *test, int subtest);
> >  const char *test__wp_subtest_get_desc(int subtest);
> > +const char *test__wp_subtest_skip_reason(int subtest);
> >  int test__wp_subtest_get_nr(void);
> >  int test__task_exit(struct test *test, int subtest);
> >  int test__mem(struct test *test, int subtest);
> > diff --git a/tools/perf/tests/wp.c b/tools/perf/tests/wp.c
> > index d262d6639829..9387fa76faa5 100644
> > --- a/tools/perf/tests/wp.c
> > +++ b/tools/perf/tests/wp.c
> > @@ -174,10 +174,12 @@ static bool wp_ro_supported(void)
> >  #endif
> >  }
> >
> > -static void wp_ro_skip_msg(void)
> > +static const char *wp_ro_skip_msg(void)
> >  {
> >  #if defined (__x86_64__) || defined (__i386__)
> > -   pr_debug("Hardware does not support read only watchpoints.\n");
> > +   return "missing hardware support";
> > +#else
> > +   return NULL;
> >  #endif
> >  }
> >
> > @@ -185,7 +187,7 @@ static struct {
> > const char *desc;
> > int (*target_func)(void);
> > bool (*is_supported)(void);
> > -   void (*skip_msg)(void);
> > +   const char *(*skip_msg)(void);
> >  } wp_testcase_table[] = {
> > {
> > .desc = "Read Only Watchpoint",
> > @@ -219,16 +221,23 @@ const char *test__wp_subtest_get_desc(int i)
> > return wp_testcase_table[i].desc;
> >  }
> >
> > +const char *test__wp_subtest_skip_reason(int i)
> > +{
> > +   if (i < 0 || i >= (int)ARRAY_SIZE(wp_testcase_table))
> > +   return NULL;
> > +   if (!wp_testcase_table[i].skip_msg)
> > +   return NULL;
> > +   return wp_testcase_table[i].skip_msg();
> > +}
> > +
> >  int test__wp(struct test *test __maybe_unused, int i)
> >  {
> > if (i < 0 || i >= (int)ARRAY_SIZE(wp_testcase_table))
> > return TEST_FAIL;
> >
> > if (wp_testcase_table[i].is_supported &&
> > -   !wp_testcase_table[i].is_supported()) {
> > -   wp_testcase_table[i].skip_msg();
> > +   !wp_testcase_table[i].is_supported())
> > return TEST_SKIP;
> > -   }
> >
> > return !wp_testcase_table[i].target_func() ? TEST_OK : TEST_FAIL;
> >  }
> > --
> > 2.26.2
> >

-- 

- Arnaldo


Re: [PATCH] perf tools: Fix crash with non-jited bpf progs

2020-10-20 Thread Arnaldo Carvalho de Melo
Em Mon, Oct 19, 2020 at 10:53:43PM +0200, Jiri Olsa escreveu:
> On Fri, Oct 16, 2020 at 02:47:18PM +0300, Tommi Rantala wrote:
> > The addr in PERF_RECORD_KSYMBOL events for non-jited bpf progs points to
> > the bpf interpreter, ie. within kernel text section. When processing the
> > unregister event, this causes unexpected removal of vmlinux_map,
> > crashing perf later in cleanup:
> > 
> >   # perf record -- timeout --signal=INT 2s /usr/share/bcc/tools/execsnoop
> >   PCOMMPIDPPID   RET ARGS
> >   [ perf record: Woken up 1 times to write data ]
> >   [ perf record: Captured and wrote 0.208 MB perf.data (5155 samples) ]
> >   perf: tools/include/linux/refcount.h:131: refcount_sub_and_test: 
> > Assertion `!(new > val)' failed.
> >   Aborted (core dumped)
> > 
> >   # perf script -D|grep KSYM
> >   0 0xa40 [0x48]: PERF_RECORD_KSYMBOL addr a9b6b530 len 0 type 1 
> > flags 0x0 name bpf_prog_f958f6eb72ef5af6
> >   0 0xab0 [0x48]: PERF_RECORD_KSYMBOL addr a9b6b530 len 0 type 1 
> > flags 0x0 name bpf_prog_8c42dee26e8cd4c2
> >   0 0xb20 [0x48]: PERF_RECORD_KSYMBOL addr a9b6b530 len 0 type 1 
> > flags 0x0 name bpf_prog_f958f6eb72ef5af6
> >   108563691893 0x33d98 [0x58]: PERF_RECORD_KSYMBOL addr a9b6b3b0 
> > len 0 type 1 flags 0x0 name bpf_prog_bc5697a410556fc2_syscall__execve
> >   108568518458 0x34098 [0x58]: PERF_RECORD_KSYMBOL addr a9b6b3f0 
> > len 0 type 1 flags 0x0 name bpf_prog_45e2203c2928704d_do_ret_sys_execve
> >   109301967895 0x34830 [0x58]: PERF_RECORD_KSYMBOL addr a9b6b3b0 
> > len 0 type 1 flags 0x1 name bpf_prog_bc5697a410556fc2_syscall__execve
> >   109302007356 0x348b0 [0x58]: PERF_RECORD_KSYMBOL addr a9b6b3f0 
> > len 0 type 1 flags 0x1 name bpf_prog_45e2203c2928704d_do_ret_sys_execve
> >   perf: tools/include/linux/refcount.h:131: refcount_sub_and_test: 
> > Assertion `!(new > val)' failed.
> > 
> > Here the addresses match the bpf interpreter:
> > 
> >   # grep -e a9b6b530 -e a9b6b3b0 -e a9b6b3f0 
> > /proc/kallsyms
> >   a9b6b3b0 t __bpf_prog_run224
> >   a9b6b3f0 t __bpf_prog_run192
> >   a9b6b530 t __bpf_prog_run32
> > 
> > Fix by not allowing vmlinux_map to be removed by PERF_RECORD_KSYMBOL
> > unregister event.
> > 
> > Signed-off-by: Tommi Rantala 
> 
> nice, I almost forgot about non jit mode by now ;-)
> 
> Acked/Tested-by: Jiri Olsa 

Thanks, applied.

- Arnaldo

 
> thanks,
> jirka
> 
> > ---
> >  tools/perf/util/machine.c | 11 ++-
> >  tools/perf/util/symbol.c  |  7 +++
> >  tools/perf/util/symbol.h  |  2 ++
> >  3 files changed, 19 insertions(+), 1 deletion(-)
> > 
> > diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
> > index 85587de027a5..d93d35463c61 100644
> > --- a/tools/perf/util/machine.c
> > +++ b/tools/perf/util/machine.c
> > @@ -786,11 +786,20 @@ static int machine__process_ksymbol_unregister(struct 
> > machine *machine,
> >union perf_event *event,
> >struct perf_sample *sample 
> > __maybe_unused)
> >  {
> > +   struct symbol *sym;
> > struct map *map;
> >  
> > map = maps__find(&machine->kmaps, event->ksymbol.addr);
> > -   if (map)
> > +   if (!map)
> > +   return 0;
> > +
> > +   if (map != machine->vmlinux_map)
> > maps__remove(&machine->kmaps, map);
> > +   else {
> > +   sym = dso__find_symbol(map->dso, map->map_ip(map, map->start));
> > +   if (sym)
> > +   dso__delete_symbol(map->dso, sym);
> > +   }
> >  
> > return 0;
> >  }
> > diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
> > index 5151a8c0b791..6bf8e74ea1d1 100644
> > --- a/tools/perf/util/symbol.c
> > +++ b/tools/perf/util/symbol.c
> > @@ -515,6 +515,13 @@ void dso__insert_symbol(struct dso *dso, struct symbol 
> > *sym)
> > }
> >  }
> >  
> > +void dso__delete_symbol(struct dso *dso, struct symbol *sym)
> > +{
> > +   rb_erase_cached(&sym->rb_node, &dso->symbols);
> > +   symbol__delete(sym);
> > +   dso__reset_find_symbol_cache(dso);
> > +}
> > +
> >  struct symbol *dso__find_symbol(struct dso *dso, u64 addr)
> >  {
> > if (dso->last_find_result.addr != addr || dso->last_find_result.symbol 
> > == NULL) {
> > diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
> > index 03e264a27cd3..60345691db09 100644
> > --- a/tools/perf/util/symbol.h
> > +++ b/tools/perf/util/symbol.h
> > @@ -130,6 +130,8 @@ int dso__load_kallsyms(struct dso *dso, const char 
> > *filename, struct map *map);
> >  
> >  void dso__insert_symbol(struct dso *dso,
> > struct symbol *sym);
> > +void dso__delete_symbol(struct dso *dso,
> > +   struct symbol *sym);
> >  
> >  struct symbol *dso__find_symbol(struct dso *dso, u64 addr);
> >  struct symbol *dso__find_symbol_by_name(struct dso *dso, const char *name);
> > -- 
> > 2.26.2
> > 
> 

-- 

- Arnaldo


Re: [PATCH v6 0/2] perf: Make tsc testing as a common testing case

2020-10-20 Thread Arnaldo Carvalho de Melo
Em Tue, Oct 20, 2020 at 08:11:59AM +0200, Jiri Olsa escreveu:
> On Mon, Oct 19, 2020 at 06:02:34PM +0800, Leo Yan wrote:
> > This patch set is to move tsc testing from x86 specific to common
> > testing case.  Since Arnaldo found the building failure for patch set
> > v4 [1], the first four patches have been merged but the last two patches
> > were left out; this patch set is to resend the last two patches with
> > fixed the building failure (by removing the header "arch-tests.h" from the
> > testing code).
> > 
> > These two patches have been tested on x86_64 and Arm64.  Though I don't
> > test them on archs MIPS, PowerPC, etc, I tried to search every header so
> > ensure included headers are supported for all archs.
> > 
> > These two patches have been rebased on the perf/core branch with its
> > latest commit 744aec4df2c5 ("perf c2c: Update documentation for metrics
> > reorganization").
> > 
> > Changes from v5:
> > * Found the merging confliction on latest perf/core, so rebased it.
> > 
> > [1] https://lore.kernel.org/patchwork/cover/1305382/#1505752
> > 
> > 
> > Leo Yan (2):
> >   perf tests tsc: Make tsc testing as a common testing
> >   perf tests tsc: Add checking helper is_supported()
> 
> Acked-by: Jiri Olsa 



Thanks, applied.

- Arnaldo



Re: [PATCH] perf mem2node: Improve warning if detected no memory nodes

2020-10-20 Thread Arnaldo Carvalho de Melo
Em Tue, Oct 20, 2020 at 08:06:27AM +0200, Jiri Olsa escreveu:
> On Mon, Oct 19, 2020 at 08:36:13AM +0800, Leo Yan wrote:
> > Some archs (e.g. x86 and Arm64) don't enable the configuration
> > CONFIG_MEMORY_HOTPLUG by default, if this configuration is not enabled
> > when build the kernel image, the SysFS for memory nodes will be missed.
> > This results in perf tool has no chance to catpure the memory nodes
> > information, when perf tool reports the result and detects no memory
> > nodes, it outputs "assertion failed at util/mem2node.c:99".
> > 
> > The output log doesn't give out reason for the failure and users have no
> > clue for how to fix it.  This patch changes to use explicit way for
> > warning: it tells user that detected no memory nodes and suggests to
> > enable CONFIG_MEMORY_HOTPLUG for kernel building.
> > 
> > Signed-off-by: Leo Yan 
> 
> Acked-by: Jiri Olsa 

Thanks, applied.

- Arnaldo

 
> thanks,
> jirka
> 
> > ---
> >  tools/perf/util/mem2node.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/tools/perf/util/mem2node.c b/tools/perf/util/mem2node.c
> > index c84f5841c7ab..03a7d7b27737 100644
> > --- a/tools/perf/util/mem2node.c
> > +++ b/tools/perf/util/mem2node.c
> > @@ -96,7 +96,8 @@ int mem2node__init(struct mem2node *map, struct perf_env 
> > *env)
> >  
> > /* Cut unused entries, due to merging. */
> > tmp_entries = realloc(entries, sizeof(*entries) * j);
> > -   if (tmp_entries || WARN_ON_ONCE(j == 0))
> > +   if (tmp_entries ||
> > +   WARN_ONCE(j == 0, "No memory nodes, is CONFIG_MEMORY_HOTPLUG 
> > enabled?\n"))
> > entries = tmp_entries;
> >  
> > for (i = 0; i < j; i++) {
> > -- 
> > 2.17.1
> > 
> 

-- 

- Arnaldo


Re: [PATCH] perf version: Add a feature for libpfm4.

2020-10-20 Thread Arnaldo Carvalho de Melo
Em Mon, Oct 19, 2020 at 04:25:45PM -0700, Ian Rogers escreveu:
> If perf is built with libpfm4 (LIBPFM4=1) then advertise it in perf -vv.

Thanks, applied.

- Arnaldo

 
> Signed-off-by: Ian Rogers 
> ---
>  tools/perf/builtin-version.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/tools/perf/builtin-version.c b/tools/perf/builtin-version.c
> index d09ec2f03071..9cd074a3d825 100644
> --- a/tools/perf/builtin-version.c
> +++ b/tools/perf/builtin-version.c
> @@ -80,6 +80,7 @@ static void library_status(void)
>   STATUS(HAVE_LIBBPF_SUPPORT, bpf);
>   STATUS(HAVE_AIO_SUPPORT, aio);
>   STATUS(HAVE_ZSTD_SUPPORT, zstd);
> + STATUS(HAVE_LIBPFM, libpfm4);
>  }
>  
>  int cmd_version(int argc, const char **argv)
> -- 
> 2.29.0.rc1.297.gfa9743e501-goog
> 

-- 

- Arnaldo


Re: [PATCH v1 0/2] doc/admin-guide: update perf-security.rst with CAP_PERFMON usage flows

2020-10-20 Thread Arnaldo Carvalho de Melo
Em Mon, Oct 19, 2020 at 08:15:14PM +0300, Alexey Budankov escreveu:
> 
> Assignment of CAP_PERFMON [1] Linux capability to an executable located
> on a file system requires extended attributes (xattrs) [2] to be supported
> by the file system. Even if the file system supports xattrs an fs device
> should be mounted with permission to use xattrs for files located on the
> device (e.g. without nosuid option [3]). No xattrs support and nosuid
> mounts are quite common in HPC and Cloud multiuser environments thus
> applicability of privileged Perf user groups based on file capabilities
> [4] is limited in that environments. Alternative method to confer Linux
> capabilities into a process does still exist and it is thru creation of
> capabilities-enabled-semi-privileged shell environment. Usage of this
> method to extend privileged Perf user groups approach is documented in
> this patch set as an extension to perf-security.rst admin guide file.
> 
> [1] https://man7.org/linux/man-pages/man7/capabilities.7.html
> [2] https://man7.org/linux/man-pages/man7/xattr.7.html
> [3] https://man7.org/linux/man-pages/man8/mount.8.html
> [4] 
> https://www.kernel.org/doc/html/latest/admin-guide/perf-security.html#privileged-perf-users-groups



Thanks, applied.

- Arnaldo

 
> ---
> Alexey Budankov (2):
>   doc/admin-guide: note credentials consolidation under CAP_PERFMON
>   doc/admin-guide: document creation of CAP_PERFMON privileged shell
> 
>  Documentation/admin-guide/perf-security.rst | 81 ++---
>  1 file changed, 70 insertions(+), 11 deletions(-)
> 
> -- 
> 2.24.1
> 

-- 

- Arnaldo


[GIT PULL] perf tools changes for v5.10

2020-10-15 Thread Arnaldo Carvalho de Melo
Hi Linus,

Please consider pulling,

Best regards,

- Arnaldo

The following changes since commit fb0155a09b0224a7147cb07a4ce6034c8d29667f:

  Merge tag 'nfs-for-5.9-3' of 
git://git.linux-nfs.org/projects/trondmy/linux-nfs (2020-09-28 11:05:56 -0700)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git 
tags/perf-tools-for-v5.10-2020-10-15

for you to fetch changes up to 744aec4df2c5b4d12af26a57d8858af2f59ef3d0:

  perf c2c: Update documentation for metrics reorganization (2020-10-15 
12:02:12 -0300)


perf tools changes for v5.10: 1st batch

- cgroup improvements for 'perf stat', allowing for compact specification of 
events
  and cgroups in the command line.

- Support per thread topdown metrics in 'perf stat'.

- Support sample-read topdown metric group in 'perf record'

- Show start of latency in addition to its start in 'perf sched latency'.

- Add min, max to 'perf script' futex-contention output, in addition to avg.

- Allow usage of 'perf_event_attr->exclusive' attribute via the new ':e' event
  modifier.

- Add 'snapshot' command to 'perf record --control', using it with Intel PT.

- Support FIFO file names as alternative options to 'perf record --control'.

- Introduce branch history "streams", to compare 'perf record' runs with
  'perf diff' based on branch records and report hot streams.

- Support PE executable symbol tables using libbfd, to profile, for instance, 
wine binaries.

- Add filter support for option 'perf ftrace -F/--funcs'.

- Allow configuring the 'disassembler_style' 'perf annotate' knob via 'perf 
config'

- Update CascadelakeX and SkylakeX JSON vendor events files.

- Add support for parsing perchip/percore JSON vendor events.

- Add power9 hv_24x7 core level metric events.

- Add L2 prefetch, ITLB instruction fetch hits JSON events for AMD zen1.

- Enable Family 19h users by matching Zen2 AMD vendor events.

- Use debuginfod in 'perf probe' when required debug files not found locally.

- Display negative tid in non-sample events in 'perf script'.

- Make GTK2 support opt-in

- Add build test with GTK+

- Add missing -lzstd to the fast path feature detection

- Add scripts to auto generate 'mmap', 'mremap' string<->id tables for use in 
'perf trace'.

- Show python test script in verbose mode.

- Fix uncore metric expressions

- Msan uninitialized use fixes.

- Use condition variables in 'perf bench numa'

- Autodetect python3 binary in systems without python2.

- Support md5 build ids in addition to sha1.

- Add build id 'perf test' regression test.

- Fix printable strings in python3 scripts.

- Fix off by ones in 'perf trace' in arches using libaudit.

- Fix JSON event code for events referencing std arch events.

- Introduce 'perf test' shell script for Arm CoreSight testing.

- Add rdtsc() for Arm64 for used in the PERF_RECORD_TIME_CONV metadata
  event and in 'perf test tsc'.

- 'perf c2c' improvements: Add "RMT Load Hit" metric, "Total Stores", fixes
  and documentation update.

- Fix usage of reloc_sym in 'perf probe' when using both kallsyms and debuginfo 
files.

- Do not print 'Metric Groups:' unnecessarily in 'perf list'

- Refcounting fixes in the event parsing code.

- Add expand cgroup event 'perf test' entry.

- Fix out of bounds CPU map access when handling armv8_pmu events in 'perf 
stat'.

- Add build-id injection 'perf bench' benchmark.

- Enter namespace when reading build-id in 'perf inject'.

- Do not load map/dso when injecting build-id speeding up the 'perf inject' 
process.

- Add --buildid-all option to avoid processing all samples, just the mmap 
metadata events.

- Add feature test to check if libbfd has buildid support

- Add 'perf test' entry for PE binary format support.

- Fix typos in power8 PMU vendor events JSON files.

- Hide libtraceevent non API functions.

Signed-off-by: Arnaldo Carvalho de Melo 

Test results in the signed tag.

Adrian Hunter (9):
  perf tools: Consolidate --control option parsing into one function
  perf tools: Handle read errors from ctl_fd
  perf tools: Use AsciiDoc formatting for --control option documentation
  perf tools: Add FIFO file names as alternative options to --control
  perf record: Add 'snapshot' control command
  perf intel-pt: Document snapshot control command
  perf tools: Consolidate close_control_option()'s into one function
  perf script: Display negative tid in non-sample events
  perf intel-pt: Fix "cont

Re: perf test 67 dumps core on linux v5.9

2020-10-15 Thread Arnaldo Carvalho de Melo
Em Thu, Oct 15, 2020 at 05:09:17PM +0200, Jiri Olsa escreveu:
> ah when puting it on top of perf/core I found it's already fixed there:
>   a55b7bb1c146 (tag: perf-tools-tests-v5.10-2020-09-28) perf test: Fix msan 
> uninitialized use.
 
> so we should be fine

For 5.10, yes, but probably we need to send this to stable@ since Thomas
reported it failing on v5.9.

Does a55b7bb1c146 have a Fixes: tag?

Yes!

[acme@five perf]$ git show a55b7bb1c146 | grep Fixes:
Fixes: commit f5a56570a3f2 ("perf test: Fix memory leaks in parse-metric 
test")
[acme@five perf]$ git tag --contains f5a56570a3f2 | grep ^v | head -1
v5.9
[acme@five perf]$

So v5.9.1 will probably get this automagically cherry-picked.

Good.

- Arnaldo


Re: [PATCH v1 0/8] perf c2c: Sort cacheline with LLC load

2020-10-15 Thread Arnaldo Carvalho de Melo
Em Thu, Oct 15, 2020 at 03:50:33PM +0100, Leo Yan escreveu:
> If the memory event doesn't contain HITM tag (like Arm SPE), it cannot
> rely on HITM display to report cache false sharing.  Alternatively, we
> can use the LLC access and multi-threads info to locate the potential
> false sharing's data address, and if we connect with source code and
> analyze the multi-threads' execution timing, if can conclude load and
> store the same cache line at the meantime, thus this can be helpful for
> resolve the cache false sharing issue.
> 
> This patch set is to enable the display with sorting on LLC load
> accesses; it adds dimensions for total LLC hit and LLC load accesses,
> and these dimensions are used for shared cache line table and pareto.
> 
> This patch set is dependend on the patch set "perf c2c: Refine the
> organization of metrics" [1].
> 
> [1] https://lore.kernel.org/patchwork/cover/1321499/

Ok, that one is applied and will appear publicly as soon as it goes thru
my usual set of build tests.

- Arnaldo
 
> With this patch set, we can get display 'llc' as follows:
> 
>   # perf c2c report -d llc --coalesce tid,pid,iaddr,dso --stdio
> 
>   [...]
> 
>   =
>  Shared Data Cache Line Table
>   =
>   #
>   #--- Cacheline --  LLC Hit   LLC HitTotal
> TotalTotal   Stores   - Core Load Hit -  - LLC Load Hit 
> --  - RMT Load Hit --  --- Load Dram 
>   # Index Address  Node  PA cnt  Pct Total  records
> Loads   StoresL1Hit   L1Miss   FB   L1   L2LclHit  
> LclHitmRmtHit  RmtHitm   Lcl   Rmt
>   # .  ..    ..  ...    ...  
> ...  ...  ...  ...  ...  ...  ...    
> ...    ...    
>   #
> 0  0x563b01e83100 01401   65.32%   648 7011 
> 3738 3273 2582  691  515 2516   59   143  505 
> 00 0 0
> 1  0x563b01e830c0 0   1   26.51%   263  400  
> 400000  13034   2621  
>00 0 0
> 2  0x563b01e83080 0   17.76%77  650  
> 650000  180  348   4514   63  
>00 0 0
> 3  0x88c3d74e82c0 0   10.10% 11   
>  1000000 10   
>   00 0 0
> 4  0xa587c11e38c0   N/A   00.10% 12   
>  1110000 10   
>   00 0 0
> 5  0xbd5e6fc0 0   10.10% 11   
>  1000000 01   
>   00 0 0
> 6  0x7f90a4d6c2c0 0   10.10% 11   
>  1000000 10   
>   00 0 0
> 
>   =
> Shared Cache Line Distribution Pareto
>   =
>   #
>   # LLC LD   -- Store Refs --  - Data address 
> -   -- cycles 
> --Total   cpu  Shared
>   #   Num   LclHit  LclHitm   L1 Hit  L1 Miss  Offset  Node  PA 
> cnt  Pid TidCode address  rmt hitm  lcl hitm  
> load  records   cnt   Symbol Object   
>Source:Line  Node
>   # .  ...  ...  ...  ...  ..    
> ..  ...  ..  ..      
>   ...    ...  .  
> ...  
>   #
> -
> 0  143  505 2582  691  0x563b01e83100
> -
> 96.50%7.72%   46.79%0.00% 0x0 0   
> 11410014102:lock_th 0x563b01c81c16 0  1949  
> 1331 1876 1  [.] read_write_func  false_sharing.exe  
> false_sharing_example.c:145   0
>  0.00%   35.05%0.00%0.00% 0x0 0   
> 11410014102:lock_th 0x563b01c81c1d 0  2651   
> 975  748 1  [.] read_write_func  false_sharing.exe  
> false_sharing_example.c:146   0

Re: [PATCH] perf jevents: Fix event code for events referencing std arch events

2020-10-15 Thread Arnaldo Carvalho de Melo
Em Wed, Oct 14, 2020 at 06:46:12PM +0100, John Garry escreveu:
> On 14/10/2020 17:49, Arnaldo Carvalho de Melo wrote:
> > Ok, applied,
> 
> Thanks
> 
> > please consider adding a Fixes tag next time.
> > 
> 
> Can do if it helps, but I only thought it appropriate when fixing something
> merged to mainline.

Please do, I think it appropriate in all cases, people doing backports
may decide to pick something and if it has some subtle issue this can be
automatically checked for by looking at any later Fixes for that
specific cset.

I decided not to do any rebase on perf/core, so those Fixes will remain
valid.

- Arnaldo


Re: [PATCH v2] perf bench: Use condition variables in numa.

2020-10-14 Thread Arnaldo Carvalho de Melo
Em Wed, Oct 14, 2020 at 06:14:18PM +0200, Jiri Olsa escreveu:
> On Wed, Oct 14, 2020 at 08:39:51AM -0700, Ian Rogers wrote:
> > The pthread_mutex_lock avoids any race on g->nr_tasks_started and
> > g->p.nr_tasks is set up in init() along with all the global state. I
> > don't think there's any race on g->nr_tasks_started and doing a signal
> > for every thread starting will just cause unnecessary wake-ups for the
> > main thread. I think it is better to keep it. I added loops on all the
> > pthread_cond_waits so the code is robust against spurious wake ups.
> 
> ah, I missed that mutex call
> 
> Acked-by: Jiri Olsa 

Thanks, applied.

- Arnaldo


Re: [PATCH] perf jevents: Fix event code for events referencing std arch events

2020-10-14 Thread Arnaldo Carvalho de Melo
Em Mon, Oct 12, 2020 at 01:24:19PM +0200, Jiri Olsa escreveu:
> On Mon, Oct 12, 2020 at 12:15:04PM +0100, John Garry wrote:
> > On 12/10/2020 11:54, Jiri Olsa wrote:
> > > > ff --git a/tools/perf/pmu-events/jevents.c 
> > > > b/tools/perf/pmu-events/jevents.c
> > > > index 99df41a9543d..e47644cab3fa 100644
> > > > --- a/tools/perf/pmu-events/jevents.c
> > > > +++ b/tools/perf/pmu-events/jevents.c
> > > > @@ -505,20 +505,15 @@ static char *real_event(const char *name, char 
> > > > *event)
> > > >   }
> > > >   static int
> > > > -try_fixup(const char *fn, char *arch_std, unsigned long long eventcode,
> > > > - struct json_event *je)
> > > > +try_fixup(const char *fn, char *arch_std, struct json_event *je, char 
> > > > **event)
> > > >   {
> > > > /* try to find matching event from arch standard values */
> > > > struct event_struct *es;
> > > > list_for_each_entry(es, &arch_std_events, list) {
> > > > if (!strcmp(arch_std, es->name)) {
> > > > -   if (!eventcode && es->event) {
> > > > -   /* allow EventCode to be overridden */
> > > > -   free(je->event);
> > > > -   je->event = NULL;
> > > > -   }
> > > > FOR_ALL_EVENT_STRUCT_FIELDS(TRY_FIXUP_FIELD);
> > > > +   *event = je->event;
> > > I'm bit rusty on this code, but isn't je->event NULL at this point?
> > 
> > je->event should be now assigned from es->event because of
> > FOR_ALL_EVENT_STRUCT_FIELDS(TRY_FIXUP_FIELD):
> > 
> > #define TRY_FIXUP_FIELD(field) do { if (es->field && !*field) {\
> > *field = strdup(es->field); \
> > if (!*field)\
> > return -ENOMEM; \
> > } } while (0)
> > 
> > And es->event should be set.
> 
> right, thanks
> 
> Acked-by: Jiri Olsa 

Ok, applied, please consider adding a Fixes tag next time.

- Arnaldo


Re: [PATCH] perf: Improve PT documentation slightly

2020-10-14 Thread Arnaldo Carvalho de Melo
Em Tue, Oct 13, 2020 at 08:53:46PM -0700, Andi Kleen escreveu:
> Document the higher level --insn-trace etc. perf script options.
> 
> Include the howto how to build xed into the manpage

Thanks, applied.
 
> Cc: adrian.hun...@intel.com
> Signed-off-by: Andi Kleen 
> ---
>  tools/perf/Documentation/perf-intel-pt.txt | 30 ++
>  1 file changed, 30 insertions(+)
> 
> diff --git a/tools/perf/Documentation/perf-intel-pt.txt 
> b/tools/perf/Documentation/perf-intel-pt.txt
> index d5a266d7f15b..cc2a8b2be31a 100644
> --- a/tools/perf/Documentation/perf-intel-pt.txt
> +++ b/tools/perf/Documentation/perf-intel-pt.txt
> @@ -112,6 +112,32 @@ The flags are "bcrosyiABEx" which stand for branch, 
> call, return, conditional,
>  system, asynchronous, interrupt, transaction abort, trace begin, trace end, 
> and
>  in transaction, respectively.
>  
> +perf script also supports higher level ways to dump instruction traces:
> +
> + perf script --insn-trace --xed
> +
> +Dump all instructions. This requires installing the xed tool (see XED below)
> +Dumping all instructions in a long trace can be fairly slow. It is usually 
> better
> +to start with higher level decoding, like
> +
> + perf script --call-trace
> +
> +or
> +
> + perf script --call-ret-trace
> +
> +and then select a time range of interest. The time range can then be examined
> +in detail with
> +
> + perf script --time starttime,stoptime --insn-trace --xed
> +
> +While examining the trace it's also useful to filter on specific CPUs using
> +the -C option
> +
> + perf script --time starttime,stoptime --insn-trace --xed -C 1
> +
> +Dump all instructions in time range on CPU 1.
> +
>  Another interesting field that is not printed by default is 'ipc' which can 
> be
>  displayed as follows:
>  
> @@ -1093,6 +1119,10 @@ To display PEBS events from the Intel PT trace, use 
> the itrace 'o' option e.g.
>  
>   perf script --itrace=oe
>  
> +XED
> +---
> +
> +include::build-xed.txt[]
>  
>  SEE ALSO
>  
> -- 
> 2.28.0
> 

-- 

- Arnaldo


Re: [PATCH 7/9] perf tools: Add size to struct perf_record_header_build_id

2020-10-14 Thread Arnaldo Carvalho de Melo
Em Wed, Oct 14, 2020 at 03:21:46PM +0200, Jiri Olsa escreveu:
> On Wed, Oct 14, 2020 at 08:59:08AM -0300, Arnaldo Carvalho de Melo wrote:
> > Em Tue, Oct 13, 2020 at 09:24:39PM +0200, Jiri Olsa escreveu:
> > > We do not store size with build ids in perf data,
> > > but there's enough space to do it. Adding misc bit
> > > PERF_RECORD_MISC_BUILD_ID_SIZE to mark build id event
> > > with size.
> > > 
> > > With this fix the dso with md5 build id will have correct
> > > build id data and will be usable for debuginfod processing
> > > if needed (coming in following patches).
> > > 
> > > Acked-by: Ian Rogers 
> > > Signed-off-by: Jiri Olsa 
> > > ---
> > >  tools/lib/perf/include/perf/event.h | 12 +++-
> > >  tools/perf/util/build-id.c  |  8 +---
> > >  tools/perf/util/header.c| 10 +++---
> > >  3 files changed, 23 insertions(+), 7 deletions(-)
> > > 
> > > diff --git a/tools/lib/perf/include/perf/event.h 
> > > b/tools/lib/perf/include/perf/event.h
> > > index a6dbba6b9073..988c539bedb6 100644
> > > --- a/tools/lib/perf/include/perf/event.h
> > > +++ b/tools/lib/perf/include/perf/event.h
> > > @@ -201,10 +201,20 @@ struct perf_record_header_tracing_data {
> > >   __u32size;
> > >  };
> > >  
> > > +#define PERF_RECORD_MISC_BUILD_ID_SIZE (1 << 15)
> > > +
> > >  struct perf_record_header_build_id {
> > >   struct perf_event_header header;
> > >   pid_tpid;
> > > - __u8 build_id[24];
> > > + union {
> > > + __u8 build_id[24];
> > > + struct {
> > > + __u8 data[20];
> > > + __u8 size;
> > > + __u8 reserved1__;
> > > + __u16reserved2__;
> > > + };
> > > + };
> > >   char filename[];
> > >  };

> > Hey, shouldn't we just append the extra info at the end, i.e. keep it
> > like:

> >  struct perf_record_header_build_id {
> > struct perf_event_header header;
> > pid_tpid;
> > __u8 build_id[24];
> > char filename[];
> > __u8 size;
> >  };

> > No need for PERF_RECORD_MISC_BUILD_ID_SIZE, older tools will continue
> > working with new perf data files.
 
> hum, then how would we tell if the last byte (size) is present or not?

IT would be different than '\0' ;-)

- Arnaldo
 
> > 
> > OTOH BUILD_ID_SIZE is 20 and the space on this header is 24, so the last
> > 4 bytes were not being used, so older tools don't look into it, they
> > should continue working, have you tested this case? I.e. getting the
> > perf binary in, say, fedora and check that it works with this new
> > perf_record_header_build_id layout?
> 
> yes, that still works (tested), because we copied only 20 bytes
> of the build_id[24] and did not care about the rest

Great that you actually tested it.

- Arnaldo


Re: [PATCH v2] perf: Add support for exclusive groups/events

2020-10-14 Thread Arnaldo Carvalho de Melo
Em Wed, Oct 14, 2020 at 07:42:55AM -0700, Andi Kleen escreveu:
> Peter suggested that using the exclusive mode in perf could
> avoid some problems with bad scheduling of groups. Exclusive
> is implemented in the kernel, but wasn't exposed by the perf tool,
> so hard to use without custom low level API users.
> 
> Add support for marking groups or events with :e for exclusive
> in the perf tool.  The implementation is basically the same as the
> existing pinned attribute.
> 
> Cc: pet...@infradead.org
> Signed-off-by: Andi Kleen 

Jiri, I'm taking you "I'm ok" with this as an Acked-by, thanks

- Arnaldo

 
> --
> 
> v2: Update check_modifier too (Jiri)
> ---
>  tools/perf/Documentation/perf-list.txt |  1 +
>  tools/perf/tests/parse-events.c| 58 +-
>  tools/perf/util/parse-events.c | 11 -
>  tools/perf/util/parse-events.l |  2 +-
>  4 files changed, 68 insertions(+), 4 deletions(-)
> 
> diff --git a/tools/perf/Documentation/perf-list.txt 
> b/tools/perf/Documentation/perf-list.txt
> index 10ed539a8859..4c7db1da8fcc 100644
> --- a/tools/perf/Documentation/perf-list.txt
> +++ b/tools/perf/Documentation/perf-list.txt
> @@ -58,6 +58,7 @@ counted. The following modifiers exist:
>   S - read sample value (PERF_SAMPLE_READ)
>   D - pin the event to the PMU
>   W - group is weak and will fallback to non-group if not schedulable,
> + e - group or event are exclusive and do not share the PMU
>  
>  The 'p' modifier can be used for specifying how precise the instruction
>  address should be. The 'p' modifier can be specified multiple times:
> diff --git a/tools/perf/tests/parse-events.c b/tools/perf/tests/parse-events.c
> index 7f9f87a470c3..7411dd4d76cf 100644
> --- a/tools/perf/tests/parse-events.c
> +++ b/tools/perf/tests/parse-events.c
> @@ -557,6 +557,7 @@ static int test__checkevent_pmu_events(struct evlist 
> *evlist)
>   TEST_ASSERT_VAL("wrong exclude_hv", evsel->core.attr.exclude_hv);
>   TEST_ASSERT_VAL("wrong precise_ip", !evsel->core.attr.precise_ip);
>   TEST_ASSERT_VAL("wrong pinned", !evsel->core.attr.pinned);
> + TEST_ASSERT_VAL("wrong exclusive", !evsel->core.attr.exclusive);
>  
>   return 0;
>  }
> @@ -575,6 +576,7 @@ static int test__checkevent_pmu_events_mix(struct evlist 
> *evlist)
>   TEST_ASSERT_VAL("wrong exclude_hv", evsel->core.attr.exclude_hv);
>   TEST_ASSERT_VAL("wrong precise_ip", !evsel->core.attr.precise_ip);
>   TEST_ASSERT_VAL("wrong pinned", !evsel->core.attr.pinned);
> + TEST_ASSERT_VAL("wrong exclusive", !evsel->core.attr.exclusive);
>  
>   /* cpu/pmu-event/u*/
>   evsel = evsel__next(evsel);
> @@ -587,6 +589,7 @@ static int test__checkevent_pmu_events_mix(struct evlist 
> *evlist)
>   TEST_ASSERT_VAL("wrong exclude_hv", evsel->core.attr.exclude_hv);
>   TEST_ASSERT_VAL("wrong precise_ip", !evsel->core.attr.precise_ip);
>   TEST_ASSERT_VAL("wrong pinned", !evsel->core.attr.pinned);
> + TEST_ASSERT_VAL("wrong exclusive", !evsel->core.attr.pinned);
>  
>   return 0;
>  }
> @@ -1277,6 +1280,49 @@ static int test__pinned_group(struct evlist *evlist)
>   return 0;
>  }
>  
> +static int test__checkevent_exclusive_modifier(struct evlist *evlist)
> +{
> + struct evsel *evsel = evlist__first(evlist);
> +
> + TEST_ASSERT_VAL("wrong exclude_user", !evsel->core.attr.exclude_user);
> + TEST_ASSERT_VAL("wrong exclude_kernel", 
> evsel->core.attr.exclude_kernel);
> + TEST_ASSERT_VAL("wrong exclude_hv", evsel->core.attr.exclude_hv);
> + TEST_ASSERT_VAL("wrong precise_ip", evsel->core.attr.precise_ip);
> + TEST_ASSERT_VAL("wrong exclusive", evsel->core.attr.exclusive);
> +
> + return test__checkevent_symbolic_name(evlist);
> +}
> +
> +static int test__exclusive_group(struct evlist *evlist)
> +{
> + struct evsel *evsel, *leader;
> +
> + TEST_ASSERT_VAL("wrong number of entries", 3 == 
> evlist->core.nr_entries);
> +
> + /* cycles - group leader */
> + evsel = leader = evlist__first(evlist);
> + TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE == 
> evsel->core.attr.type);
> + TEST_ASSERT_VAL("wrong config",
> + PERF_COUNT_HW_CPU_CYCLES == evsel->core.attr.config);
> + TEST_ASSERT_VAL("wrong group name", !evsel->group_name);
> + TEST_ASSERT_VAL("wrong leader", evsel->leader == leader);
> + TEST_ASSERT_VAL("wrong exclusive", evsel->core.attr.exclusive);
> +
> + /* cache-misses - can not be pinned, but will go on with the leader */
> + evsel = evsel__next(evsel);
> + TEST_ASSERT_VAL("wrong type", PERF_TYPE_HARDWARE == 
> evsel->core.attr.type);
> + TEST_ASSERT_VAL("wrong config",
> + PERF_COUNT_HW_CACHE_MISSES == evsel->core.attr.config);
> + TEST_ASSERT_VAL("wrong exclusive", !evsel->core.attr.exclusive);
> +
> + /* branch-misses - ditto */
> + evsel = evsel__next(evsel);
> + TEST_ASSERT_VAL("wrong config",
> + PERF_COUNT_HW_BR

Re: [PATCH] perf: sched: Show start of latency as well

2020-10-14 Thread Arnaldo Carvalho de Melo
Em Wed, Oct 14, 2020 at 11:05:17AM -0400, j...@joelfernandes.org escreveu:
> On Tue, Oct 13, 2020 at 09:37:48AM -0300, Arnaldo Carvalho de Melo wrote:
> > Em Sat, Sep 26, 2020 at 11:45:39AM -0400, Joel Fernandes escreveu:
> > > On Sat, Sep 26, 2020 at 10:10 AM Namhyung Kim  wrote:
> > > [...]
> > > > On Sat, Sep 26, 2020 at 8:56 AM Joel Fernandes (Google)
> > > > Then the remaining concern is the screen
> > > > width (of 114 or 115?) but I think it should be fine for most of us.

> > > It is 114 without the patch and 140 with it. I tried my best to trim
> > > it a little. It fits fine on my screen with the patch. So I think we
> > > should be good!

> > So, what do you think of removing all the redundant info so that we can
> > get it in a more compact way, i.e.:
 
> Doing it this way looks good to me too!

Ingo, do you have a problem with that? I see that if you have it as it
is now one can just copy a line out of the output and have the relevant
column tags in each line, like with cyclictest, so there is value in
keeping it as is.

- Arnaldo
 
> >  
> > -
> >   Task  |  Runtime| Switches |Avg| Max   | 
> > Max start| Max end|
> >  
> > -
> >   MediaScannerSer:11936 |  651.296 ms |67978 |  0.113 ms | 77.250 ms | 
> > 477.691360 s | 477.768610 s
> >   audio@2.0-servi:(3)   |0.000 ms | 3440 |  0.034 ms | 72.267 ms | 
> > 477.697051 s | 477.769318 s
> >   AudioOut_1D:8112  |0.000 ms | 2588 |  0.083 ms | 64.020 ms | 
> > 477.710740 s | 477.774760 s
> >   Time-limited te:14973 | 7966.090 ms |24807 |  0.073 ms | 15.563 ms | 
> > 477.162746 s | 477.178309 s
> >   surfaceflinger:8049   |9.680 ms |  603 |  0.063 ms | 13.275 ms | 
> > 476.931791 s | 476.945067 s
> >   HeapTaskDaemon:(3)| 1588.830 ms | 7040 |  0.065 ms |  6.880 ms | 
> > 473.666043 s | 473.672922 s
> >   mount-passthrou:(3)   | 1370.809 ms |68904 |  0.011 ms |  6.524 ms | 
> > 478.090630 s | 478.097154 s
> >   ReferenceQueueD:(3)   |   11.794 ms | 1725 |  0.014 ms |  6.521 ms | 
> > 476.119782 s | 476.126303 s
> >   writer:14077  |   18.410 ms | 1427 |  0.036 ms |  6.131 ms | 
> > 474.169675 s | 474.175805 s
> >  
> > > > Acked-by: Namhyung Kim 
> > > 
> > > Thanks, Namyhung!
> > > 
> > >  - Joel
> > > 
> > > > > Signed-off-by: Joel Fernandes (Google) 
> > > > >
> > > > >
> > > > > ---
> > > > > A sample output can be seen after applying patch:
> > > > > https://hastebin.com/raw/ivinimaler
> > > > >
> > > > >  tools/perf/builtin-sched.c | 24 ++--
> > > > >  1 file changed, 14 insertions(+), 10 deletions(-)
> > > > >
> > > > > diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
> > > > > index 459e4229945e..2791da1fe5f7 100644
> > > > > --- a/tools/perf/builtin-sched.c
> > > > > +++ b/tools/perf/builtin-sched.c
> > > > > @@ -130,7 +130,8 @@ struct work_atoms {
> > > > > struct thread   *thread;
> > > > > struct rb_node  node;
> > > > > u64 max_lat;
> > > > > -   u64 max_lat_at;
> > > > > +   u64 max_lat_start;
> > > > > +   u64 max_lat_end;
> > > > > u64 total_lat;
> > > > > u64 nb_atoms;
> > > > > u64 total_runtime;
> > > > > @@ -1096,7 +1097,8 @@ add_sched_in_event(struct work_atoms *atoms, 
> > > > > u64 timestamp)
> > > > > atoms->total_lat += delta;
> > > > > if (delta > atoms->max_lat) {
> > > > > atoms->max_lat = delta;
> > > > > -   atoms->max_lat_at = timestamp;
> > > > > +   atoms->max_lat_start = atom->wake_up_time;
> > > > > +   atoms->max_lat_end = timestamp;
> > > > > }
> > > > > atoms->nb_atoms++;
> > > > >  }
> > > > > @@ -1322,7 +1324,

Re: [PATCH 7/9] perf tools: Add size to struct perf_record_header_build_id

2020-10-14 Thread Arnaldo Carvalho de Melo
Em Tue, Oct 13, 2020 at 09:24:39PM +0200, Jiri Olsa escreveu:
> We do not store size with build ids in perf data,
> but there's enough space to do it. Adding misc bit
> PERF_RECORD_MISC_BUILD_ID_SIZE to mark build id event
> with size.
> 
> With this fix the dso with md5 build id will have correct
> build id data and will be usable for debuginfod processing
> if needed (coming in following patches).
> 
> Acked-by: Ian Rogers 
> Signed-off-by: Jiri Olsa 
> ---
>  tools/lib/perf/include/perf/event.h | 12 +++-
>  tools/perf/util/build-id.c  |  8 +---
>  tools/perf/util/header.c| 10 +++---
>  3 files changed, 23 insertions(+), 7 deletions(-)
> 
> diff --git a/tools/lib/perf/include/perf/event.h 
> b/tools/lib/perf/include/perf/event.h
> index a6dbba6b9073..988c539bedb6 100644
> --- a/tools/lib/perf/include/perf/event.h
> +++ b/tools/lib/perf/include/perf/event.h
> @@ -201,10 +201,20 @@ struct perf_record_header_tracing_data {
>   __u32size;
>  };
>  
> +#define PERF_RECORD_MISC_BUILD_ID_SIZE (1 << 15)
> +
>  struct perf_record_header_build_id {
>   struct perf_event_header header;
>   pid_tpid;
> - __u8 build_id[24];
> + union {
> + __u8 build_id[24];
> + struct {
> + __u8 data[20];
> + __u8 size;
> + __u8 reserved1__;
> + __u16reserved2__;
> + };
> + };
>   char filename[];
>  };
>  
> diff --git a/tools/perf/util/build-id.c b/tools/perf/util/build-id.c
> index b5648735f01f..8763772f1095 100644
> --- a/tools/perf/util/build-id.c
> +++ b/tools/perf/util/build-id.c
> @@ -296,7 +296,7 @@ char *dso__build_id_filename(const struct dso *dso, char 
> *bf, size_t size,
>   continue;   \
>   else
>  
> -static int write_buildid(const char *name, size_t name_len, u8 *build_id,
> +static int write_buildid(const char *name, size_t name_len, struct build_id 
> *bid,
>pid_t pid, u16 misc, struct feat_fd *fd)
>  {
>   int err;
> @@ -307,7 +307,9 @@ static int write_buildid(const char *name, size_t 
> name_len, u8 *build_id,
>   len = PERF_ALIGN(len, NAME_ALIGN);
>  
>   memset(&b, 0, sizeof(b));
> - memcpy(&b.build_id, build_id, BUILD_ID_SIZE);
> + memcpy(&b.data, bid->data, bid->size);
> + b.size = (u8) bid->size;
> + misc |= PERF_RECORD_MISC_BUILD_ID_SIZE;
>   b.pid = pid;
>   b.header.misc = misc;
>   b.header.size = sizeof(b) + len;
> @@ -354,7 +356,7 @@ static int machine__write_buildid_table(struct machine 
> *machine,
>   in_kernel = pos->kernel ||
>   is_kernel_module(name,
>   PERF_RECORD_MISC_CPUMODE_UNKNOWN);
> - err = write_buildid(name, name_len, pos->bid.data, machine->pid,
> + err = write_buildid(name, name_len, &pos->bid, machine->pid,
>   in_kernel ? kmisc : umisc, fd);
>   if (err)
>   break;
> diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
> index 21243adbb9fd..8da3886f10a8 100644
> --- a/tools/perf/util/header.c
> +++ b/tools/perf/util/header.c
> @@ -2083,8 +2083,12 @@ static int __event_process_build_id(struct 
> perf_record_header_build_id *bev,
>   if (dso != NULL) {
>   char sbuild_id[SBUILD_ID_SIZE];
>   struct build_id bid;
> + size_t size = BUILD_ID_SIZE;
>  
> - build_id__init(&bid, bev->build_id, BUILD_ID_SIZE);
> + if (bev->header.misc & PERF_RECORD_MISC_BUILD_ID_SIZE)
> + size = bev->size;
> +
> + build_id__init(&bid, bev->data, size);
>   dso__set_build_id(dso, &bid);
>  
>   if (dso_space != DSO_SPACE__USER) {
> @@ -2098,8 +2102,8 @@ static int __event_process_build_id(struct 
> perf_record_header_build_id *bev,
>   }
>  
>   build_id__sprintf(&dso->bid, sbuild_id);
> - pr_debug("build id event received for %s: %s\n",
> -  dso->long_name, sbuild_id);
> + pr_debug("build id event received for %s: %s [%lu]\n",
> +  dso->long_name, sbuild_id, size);


util/header.c: In function '__event_process_build_id':
util/header.c:2105:3: error: format '%lu' expects argument of type 'long 
unsigned int', but argument 6 has type 'size_t' [-Werror=format=]
   pr_debug("build id event received for %s: %s [%lu]\n",
   ^

Fixing this to '%zd'

- Arnaldo

>   dso__put(dso);
>   }
>  
> -- 
> 2.26.2
> 

-- 

- Arnaldo


Re: [PATCH 7/9] perf tools: Add size to struct perf_record_header_build_id

2020-10-14 Thread Arnaldo Carvalho de Melo
Em Tue, Oct 13, 2020 at 09:24:39PM +0200, Jiri Olsa escreveu:
> We do not store size with build ids in perf data,
> but there's enough space to do it. Adding misc bit
> PERF_RECORD_MISC_BUILD_ID_SIZE to mark build id event
> with size.
> 
> With this fix the dso with md5 build id will have correct
> build id data and will be usable for debuginfod processing
> if needed (coming in following patches).
> 
> Acked-by: Ian Rogers 
> Signed-off-by: Jiri Olsa 
> ---
>  tools/lib/perf/include/perf/event.h | 12 +++-
>  tools/perf/util/build-id.c  |  8 +---
>  tools/perf/util/header.c| 10 +++---
>  3 files changed, 23 insertions(+), 7 deletions(-)
> 
> diff --git a/tools/lib/perf/include/perf/event.h 
> b/tools/lib/perf/include/perf/event.h
> index a6dbba6b9073..988c539bedb6 100644
> --- a/tools/lib/perf/include/perf/event.h
> +++ b/tools/lib/perf/include/perf/event.h
> @@ -201,10 +201,20 @@ struct perf_record_header_tracing_data {
>   __u32size;
>  };
>  
> +#define PERF_RECORD_MISC_BUILD_ID_SIZE (1 << 15)
> +
>  struct perf_record_header_build_id {
>   struct perf_event_header header;
>   pid_tpid;
> - __u8 build_id[24];
> + union {
> + __u8 build_id[24];
> + struct {
> + __u8 data[20];
> + __u8 size;
> + __u8 reserved1__;
> + __u16reserved2__;
> + };
> + };
>   char filename[];
>  };

Hey, shouldn't we just append the extra info at the end, i.e. keep it
like:

 struct perf_record_header_build_id {
struct perf_event_header header;
pid_tpid;
__u8 build_id[24];
char filename[];
__u8 size;
 };


No need for PERF_RECORD_MISC_BUILD_ID_SIZE, older tools will continue
working with new perf data files.

OTOH BUILD_ID_SIZE is 20 and the space on this header is 24, so the last
4 bytes were not being used, so older tools don't look into it, they
should continue working, have you tested this case? I.e. getting the
perf binary in, say, fedora and check that it works with this new
perf_record_header_build_id layout?

- Arnaldo
  
> diff --git a/tools/perf/util/build-id.c b/tools/perf/util/build-id.c
> index b5648735f01f..8763772f1095 100644
> --- a/tools/perf/util/build-id.c
> +++ b/tools/perf/util/build-id.c
> @@ -296,7 +296,7 @@ char *dso__build_id_filename(const struct dso *dso, char 
> *bf, size_t size,
>   continue;   \
>   else
>  
> -static int write_buildid(const char *name, size_t name_len, u8 *build_id,
> +static int write_buildid(const char *name, size_t name_len, struct build_id 
> *bid,
>pid_t pid, u16 misc, struct feat_fd *fd)
>  {
>   int err;
> @@ -307,7 +307,9 @@ static int write_buildid(const char *name, size_t 
> name_len, u8 *build_id,
>   len = PERF_ALIGN(len, NAME_ALIGN);
>  
>   memset(&b, 0, sizeof(b));
> - memcpy(&b.build_id, build_id, BUILD_ID_SIZE);
> + memcpy(&b.data, bid->data, bid->size);
> + b.size = (u8) bid->size;
> + misc |= PERF_RECORD_MISC_BUILD_ID_SIZE;
>   b.pid = pid;
>   b.header.misc = misc;
>   b.header.size = sizeof(b) + len;
> @@ -354,7 +356,7 @@ static int machine__write_buildid_table(struct machine 
> *machine,
>   in_kernel = pos->kernel ||
>   is_kernel_module(name,
>   PERF_RECORD_MISC_CPUMODE_UNKNOWN);
> - err = write_buildid(name, name_len, pos->bid.data, machine->pid,
> + err = write_buildid(name, name_len, &pos->bid, machine->pid,
>   in_kernel ? kmisc : umisc, fd);
>   if (err)
>   break;
> diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
> index 21243adbb9fd..8da3886f10a8 100644
> --- a/tools/perf/util/header.c
> +++ b/tools/perf/util/header.c
> @@ -2083,8 +2083,12 @@ static int __event_process_build_id(struct 
> perf_record_header_build_id *bev,
>   if (dso != NULL) {
>   char sbuild_id[SBUILD_ID_SIZE];
>   struct build_id bid;
> + size_t size = BUILD_ID_SIZE;
>  
> - build_id__init(&bid, bev->build_id, BUILD_ID_SIZE);
> + if (bev->header.misc & PERF_RECORD_MISC_BUILD_ID_SIZE)
> + size = bev->size;
> +
> + build_id__init(&bid, bev->data, size);
>   dso__set_build_id(dso, &bid);
>  
>   if (dso_space != DSO_SPACE__USER) {
> @@ -2098,8 +2102,8 @@ static int __event_process_build_id(struct 
> perf_record_header_build_id *bev,
>   }
>  
>   build_id__sprintf(&dso->bid, sbuild_id);
> - pr_debug("build id event received for %s: %s\n",
> - 

Re: [PATCH 5/9] perf tools: Pass build_id object to dso__set_build_id

2020-10-14 Thread Arnaldo Carvalho de Melo
Em Tue, Oct 13, 2020 at 09:24:37PM +0200, Jiri Olsa escreveu:
> Passing build_id object to dso__set_build_id, so it's easier
> to initialize dos's build id object.
> 
> Acked-by: Ian Rogers 
> Signed-off-by: Jiri Olsa 
> ---
>  tools/perf/util/dso.c| 4 ++--
>  tools/perf/util/dso.h| 2 +-
>  tools/perf/util/header.c | 4 +++-
>  tools/perf/util/symbol-minimal.c | 2 +-
>  tools/perf/util/symbol.c | 2 +-
>  5 files changed, 8 insertions(+), 6 deletions(-)
> 
> diff --git a/tools/perf/util/dso.c b/tools/perf/util/dso.c
> index 2f7f01ead9a1..4415ce83150b 100644
> --- a/tools/perf/util/dso.c
> +++ b/tools/perf/util/dso.c
> @@ -1326,9 +1326,9 @@ void dso__put(struct dso *dso)
>   dso__delete(dso);
>  }
>  
> -void dso__set_build_id(struct dso *dso, void *build_id)
> +void dso__set_build_id(struct dso *dso, struct build_id *bid)
>  {
> - memcpy(dso->bid.data, build_id, sizeof(dso->bid.data));
> + dso->bid = *bid;

Can't we use bid->size here?

dso->bid.size = bid->size;
memcpy(dso->bid.data, bid->data, bid->size));

?

Not worth it? Probably :-)

- Arnaldo

>   dso->has_build_id = 1;
>  }
>  
> diff --git a/tools/perf/util/dso.h b/tools/perf/util/dso.h
> index eac004210b47..5a5678dbdaa5 100644
> --- a/tools/perf/util/dso.h
> +++ b/tools/perf/util/dso.h
> @@ -260,7 +260,7 @@ bool dso__sorted_by_name(const struct dso *dso);
>  void dso__set_sorted_by_name(struct dso *dso);
>  void dso__sort_by_name(struct dso *dso);
>  
> -void dso__set_build_id(struct dso *dso, void *build_id);
> +void dso__set_build_id(struct dso *dso, struct build_id *bid);
>  bool dso__build_id_equal(const struct dso *dso, u8 *build_id);
>  void dso__read_running_kernel_build_id(struct dso *dso,
>  struct machine *machine);
> diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
> index fe220f01fc94..21243adbb9fd 100644
> --- a/tools/perf/util/header.c
> +++ b/tools/perf/util/header.c
> @@ -2082,8 +2082,10 @@ static int __event_process_build_id(struct 
> perf_record_header_build_id *bev,
>   dso = machine__findnew_dso(machine, filename);
>   if (dso != NULL) {
>   char sbuild_id[SBUILD_ID_SIZE];
> + struct build_id bid;
>  
> - dso__set_build_id(dso, &bev->build_id);
> + build_id__init(&bid, bev->build_id, BUILD_ID_SIZE);
> + dso__set_build_id(dso, &bid);
>  
>   if (dso_space != DSO_SPACE__USER) {
>   struct kmod_path m = { .name = NULL, };
> diff --git a/tools/perf/util/symbol-minimal.c 
> b/tools/perf/util/symbol-minimal.c
> index dba6b9e5d64e..f9eb0bee7f15 100644
> --- a/tools/perf/util/symbol-minimal.c
> +++ b/tools/perf/util/symbol-minimal.c
> @@ -349,7 +349,7 @@ int dso__load_sym(struct dso *dso, struct map *map 
> __maybe_unused,
>   dso->is_64_bit = ret;
>  
>   if (filename__read_build_id(ss->name, &bid) > 0)
> - dso__set_build_id(dso, bid.data);
> + dso__set_build_id(dso, &bid);
>   return 0;
>  }
>  
> diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
> index 369cbad09f0d..976632d0baa0 100644
> --- a/tools/perf/util/symbol.c
> +++ b/tools/perf/util/symbol.c
> @@ -1818,7 +1818,7 @@ int dso__load(struct dso *dso, struct map *map)
>   is_regular_file(dso->long_name)) {
>   __symbol__join_symfs(name, PATH_MAX, dso->long_name);
>   if (filename__read_build_id(name, &bid) > 0)
> - dso__set_build_id(dso, bid.data);
> + dso__set_build_id(dso, &bid);
>   }
>  
>   /*
> -- 
> 2.26.2
> 

-- 

- Arnaldo


Re: [PATCH RESEND 1/1] perf build: Allow nested externs to enable BUILD_BUG() usage

2020-10-13 Thread Arnaldo Carvalho de Melo
Em Mon, Oct 12, 2020 at 08:59:36AM +1100, Stephen Rothwell escreveu:
> Hi all,
> 
> On Fri, 9 Oct 2020 14:41:11 +0200 Jiri Olsa  wrote:
> >
> > On Fri, Oct 09, 2020 at 02:25:23PM +0200, Vasily Gorbik wrote:
> > > Currently BUILD_BUG() macro is expanded to smth like the following:
> > >do {
> > >extern void __compiletime_assert_0(void)
> > >__attribute__((error("BUILD_BUG failed")));
> > >if (!(!(1)))
> > >__compiletime_assert_0();
> > >} while (0);
> > > 
> > > If used in a function body this obviously would produce build errors
> > > with -Wnested-externs and -Werror.
> > > 
> > > To enable BUILD_BUG() usage in tools/arch/x86/lib/insn.c which perf
> > > includes in intel-pt-decoder, build perf without -Wnested-externs.
> > > 
> > > Reported-by: Stephen Rothwell 
> > > Signed-off-by: Vasily Gorbik   
> > 
> > that one applied nicely ;-) thanks
> > 
> > Acked-by: Jiri Olsa 
> 
> I will apply that patch to the merge of the tip tree today (instead of
> reverting the series I reverted in Friday) (unless I get an update of
> the tip tree containing it, of course).

Applied to perf/core that will go to Linus this week, maybe even today.

- Arnaldo


Re: [PATCH RESEND 1/1] perf build: Allow nested externs to enable BUILD_BUG() usage

2020-10-13 Thread Arnaldo Carvalho de Melo
Em Fri, Oct 09, 2020 at 02:41:11PM +0200, Jiri Olsa escreveu:
> On Fri, Oct 09, 2020 at 02:25:23PM +0200, Vasily Gorbik wrote:
> > Currently BUILD_BUG() macro is expanded to smth like the following:
> >do {
> >extern void __compiletime_assert_0(void)
> >__attribute__((error("BUILD_BUG failed")));
> >if (!(!(1)))
> >__compiletime_assert_0();
> >} while (0);
> > 
> > If used in a function body this obviously would produce build errors
> > with -Wnested-externs and -Werror.
> > 
> > To enable BUILD_BUG() usage in tools/arch/x86/lib/insn.c which perf
> > includes in intel-pt-decoder, build perf without -Wnested-externs.
> > 
> > Reported-by: Stephen Rothwell 
> > Signed-off-by: Vasily Gorbik 
> 
> that one applied nicely ;-) thanks
> 
> Acked-by: Jiri Olsa 



Thanks, applied.

- Arnaldo



Re: [PATCH v1 03/15] perf data: open data directory in read access mode

2020-10-13 Thread Arnaldo Carvalho de Melo
Em Mon, Oct 12, 2020 at 07:52:33PM +0300, Alexey Budankov escreveu:
> 
> On 12.10.2020 19:03, Andi Kleen wrote:
> > On Mon, Oct 12, 2020 at 11:55:07AM +0300, Alexey Budankov wrote:
> >>
> >> Open files located at data directory in case of read access mode.
> > 
> > Need some rationale. Is this a bug fix? Did the case not matter
> > before?
> 
> This is not a bug fix. The case didn't matter before.

In such cases can you please write a paragraph explaining why it didn't
matter before and now it is required?

- Arnaldo
 
 > 
> >>
> >> Signed-off-by: Alexey Budankov 
> >> ---
> >>  tools/perf/util/data.c | 4 
> >>  1 file changed, 4 insertions(+)
> >>
> >> diff --git a/tools/perf/util/data.c b/tools/perf/util/data.c
> >> index c47aa34fdc0a..6ad61ac6ba67 100644
> >> --- a/tools/perf/util/data.c
> >> +++ b/tools/perf/util/data.c
> >> @@ -321,6 +321,10 @@ static int open_dir(struct perf_data *data)
> >>return -1;
> >>  
> >>ret = open_file(data);
> >> +  if (!ret && perf_data__is_dir(data)) {
> >> +  if (perf_data__is_read(data))
> >> +  ret = perf_data__open_dir(data);
> >> +  }
> >>  
> >>/* Cleanup whatever we managed to create so far. */
> >>if (ret && perf_data__is_write(data))
> >> -- 
> >> 2.24.1
> 
> Alexei
> 

-- 

- Arnaldo


Re: [PATCH v1 00/15] Introduce threaded trace streaming for basic perf record operation

2020-10-13 Thread Arnaldo Carvalho de Melo
Em Mon, Oct 12, 2020 at 11:50:29AM +0300, Alexey Budankov escreveu:
> 
> Patch set provides threaded trace streaming for base perf record
> operation. Provided streaming mode (--threads) mitigates profiling
> data losses and resolves scalability issues of serial and asynchronous
> (--aio) trace streaming modes on multicore server systems. The patch
> set is based on the prototype [1], [2] and the most closely relates
> to mode 3) "mode that creates thread for every monitored memory map".
> 
> The threaded mode executes one-to-one mapping of trace streaming threads
> to mapped data buffers and streaming into per-CPU trace files located
> at data directory. The data buffers and threads are affined to NUMA
> nodes and monitored CPUs according to system topology. --cpu option
> can be used to specify exact CPUs to be monitored.
> 
> Basic analysis of data directories is provided for perf report mode.
> Raw dump (-D) and aggregated reports are available for data directories,
> still with no memory consumption optimizations. However data directories
> collected with --compression-level option enabled can be analyzed with
> little less memory because trace files are unmaped from tool process
> memory after loading collected data.
> 
> Provided streaming mode is available with Zstd compression/decompression
> (--compression-level) and handling of external commands (--control).
> AUX area tracing, related and derived modes like --snapshot or
> --aux-sample are not enabled. --switch-output, --switch-output-event, 
> --switch-max-files and --timestamp-filename options are not enabled.

Would be interesting to spell out what are the difficulties to have
those options working with this threaded mode, as I expect that once
this is all reviewed and tested we should switch to it by default, no?

- Arnaldo

> Threaded trace streaming is not enabled for pipe mode. Asynchronous
> (--aio) trace streaming and affinity (--affinity) modes are mutually
> exclusive to threaded streaming mode.
> 
> See testing results and validation examples below.
> 
> [1] git clone https://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git 
> -b perf/record_threads
> [2] https://lore.kernel.org/lkml/20180913125450.21342-1-jo...@kernel.org/
> 
> ---
> Alexey Budankov (15):
>   perf session: introduce trace file path to be shown in raw trace dump
>   perf report: output trace file name in raw trace dump
>   perf data: open data directory in read access mode
>   perf session: move reader object definition to header file
>   perf session: introduce decompressor into trace reader object
>   perf session: load data directory into tool process memory
>   perf record: introduce trace file, compressor and stats in mmap object
>   perf record: write trace data into mmap trace files
>   perf record: introduce thread specific objects for trace streaming
>   perf record: manage thread specific data array
>   perf evlist: introduce evlist__ctlfd_update() to update ctl fd status
>   perf record: introduce thread local variable for trace streaming
>   perf record: stop threads in the end of trace streaming
>   perf record: start threads in the beginning of trace streaming
>   perf record: introduce --threads command line option
> 
>  tools/perf/Documentation/perf-record.txt |   7 +
>  tools/perf/builtin-record.c  | 514 +--
>  tools/perf/util/data.c   |   4 +
>  tools/perf/util/evlist.c |  16 +
>  tools/perf/util/evlist.h |   1 +
>  tools/perf/util/mmap.c   |   6 +
>  tools/perf/util/mmap.h   |   5 +
>  tools/perf/util/ordered-events.h |   1 +
>  tools/perf/util/record.h |   1 +
>  tools/perf/util/session.c| 150 ---
>  tools/perf/util/session.h|  28 ++
>  tools/perf/util/tool.h   |   3 +-
>  12 files changed, 635 insertions(+), 101 deletions(-)
> 
> ---
> Testing results:
> 
>  $ perf test
>  1: vmlinux symtab matches kallsyms : Skip
>  2: Detect openat syscall event : Ok
>  3: Detect openat syscall event on all cpus : Ok
>  4: Read samples using the mmap interface   : Ok
>  5: Test data source output : Ok
>  6: Parse event definition strings  : Ok
>  7: Simple expression parser: Ok
>  8: PERF_RECORD_* events & perf_sample fields   : Ok
>  9: Parse perf pmu format   : Ok
> 10: PMU events  :
> 10.1: PMU event table sanity: Ok
> 10.2: PMU event map aliases : Ok
> 10.3: Parsing of PMU event table metrics: Skip 
> (some metrics failed)
> 1

Re: [PATCH] perf c2c: Update usage for showing memory events

2020-10-13 Thread Arnaldo Carvalho de Melo
Em Mon, Oct 12, 2020 at 08:25:42AM -0700, Ian Rogers escreveu:
> On Mon, Oct 12, 2020 at 2:13 AM Jiri Olsa  wrote:
> >
> > On Sun, Oct 11, 2020 at 08:10:22PM +0800, Leo Yan wrote:
> > > Since commit b027cc6fdf1b ("perf c2c: Fix 'perf c2c record -e list' to
> > > show the default events used"), "perf c2c" tool can show the memory
> > > events properly, it's no reason to still suggest user to use the
> > > command "perf mem record -e list" for showing events.
> > >
> > > This patch updates the usage for showing memory events with command
> > > "perf c2c record -e list".
> > >
> > > Signed-off-by: Leo Yan 
> >
> > Acked-by: Jiri Olsa 
> >
> > thanks,
> > jirka
> 
> Acked-by: Ian Rogers 


Thanks, applied.

- Arnaldo



Re: [PATCH 1/9] perf tools: Add build id shell test

2020-10-13 Thread Arnaldo Carvalho de Melo
Em Wed, Sep 30, 2020 at 07:15:04PM +0200, Jiri Olsa escreveu:
> Adding test for build id cache that adds binary
> with sha1 and md5 build ids and verifies it's
> added properly.
> 
> The test updates build id cache with perf record
> and perf buildid-cache -a.


[root@five ~]# perf test "build id"
82: build id cache operations   : Skip
[root@five ~]# set -o vi
[root@five ~]# perf test -v "build id"
82: build id cache operations   :
--- start ---
test child forked, pid 88384
failed: no test binaries
test child finished with -2
 end 
build id cache operations: Skip
[root@five ~]#

Also the other patches clashed with Namhyung's patch series, can you
please check?

I've just pushed what I have to acme/perf/core

- Arnaldo
 
> Signed-off-by: Jiri Olsa 
> ---
>  tools/perf/Makefile.perf  | 14 +
>  tools/perf/tests/shell/buildid.sh | 90 +++
>  2 files changed, 104 insertions(+)
>  create mode 100755 tools/perf/tests/shell/buildid.sh
> 
> diff --git a/tools/perf/Makefile.perf b/tools/perf/Makefile.perf
> index 920d8afb9238..b2aeefa64e92 100644
> --- a/tools/perf/Makefile.perf
> +++ b/tools/perf/Makefile.perf
> @@ -126,6 +126,8 @@ include ../scripts/utilities.mak
>  #
>  # Define NO_LIBDEBUGINFOD if you do not want support debuginfod
>  #
> +# Define NO_BUILDID_EX if you do not want buildid-ex-* binaries
> +#
>  
>  # As per kernel Makefile, avoid funny character set dependencies
>  unexport LC_ALL
> @@ -349,6 +351,11 @@ ifndef NO_PERF_READ_VDSOX32
>  PROGRAMS += $(OUTPUT)perf-read-vdsox32
>  endif
>  
> +ifndef NO_BUILDID_EX
> +PROGRAMS += $(OUTPUT)buildid-ex-sha1
> +PROGRAMS += $(OUTPUT)buildid-ex-md5
> +endif
> +
>  LIBJVMTI = libperf-jvmti.so
>  
>  ifndef NO_JVMTI
> @@ -756,6 +763,13 @@ $(OUTPUT)perf-read-vdsox32: perf-read-vdso.c 
> util/find-map.c
>   $(QUIET_CC)$(CC) -mx32 $(filter -static,$(LDFLAGS)) -Wall -Werror -o $@ 
> perf-read-vdso.c
>  endif
>  
> +ifndef NO_BUILDID_EX
> +$(OUTPUT)buildid-ex-sha1:
> + $(QUIET_LINK)echo 'int main(void) { return 0; }' | $(CC) 
> -Wl,--build-id=sha1 -o $@ -x c -
> +$(OUTPUT)buildid-ex-md5:
> + $(QUIET_LINK)echo 'int main(void) { return 0; }' | $(CC) 
> -Wl,--build-id=md5 -o $@ -x c -
> +endif
> +
>  ifndef NO_JVMTI
>  LIBJVMTI_IN := $(OUTPUT)jvmti/jvmti-in.o
>  
> diff --git a/tools/perf/tests/shell/buildid.sh 
> b/tools/perf/tests/shell/buildid.sh
> new file mode 100755
> index ..57fcd28bc4bd
> --- /dev/null
> +++ b/tools/perf/tests/shell/buildid.sh
> @@ -0,0 +1,90 @@
> +#!/bin/sh
> +# build id cache operations
> +# SPDX-License-Identifier: GPL-2.0
> +
> +# skip if there are no test binaries
> +if [ ! -x buildid-ex-sha1 -a ! -x buildid-ex-md5 ]; then
> + echo "failed: no test binaries"
> + exit 2
> +fi
> +
> +# skip if there's no readelf
> +if [ ! -x `which readelf` ]; then
> + echo "failed: no readelf, install binutils"
> + exit 2
> +fi
> +
> +check()
> +{
> + id=`readelf -n $1 2>/dev/null | grep 'Build ID' | awk '{print $3}'`
> +
> + echo "build id: ${id}"
> +
> + link=${build_id_dir}/.build-id/${id:0:2}/${id:2}
> + echo "link: ${link}"
> +
> + if [ ! -h $link ]; then
> + echo "failed: link ${link} does not exist"
> + exit 1
> + fi
> +
> + file=${build_id_dir}/.build-id/${id:0:2}/`readlink ${link}`/elf
> + echo "file: ${file}"
> +
> + if [ ! -x $file ]; then
> + echo "failed: file ${file} does not exist"
> + exit 1
> + fi
> +
> + diff ${file} ${1}
> + if [ $? -ne 0 ]; then
> + echo "failed: ${file} do not match"
> + exit 1
> + fi
> +
> + echo "OK for ${1}"
> +}
> +
> +test_add()
> +{
> + build_id_dir=$(mktemp -d /tmp/perf.debug.XXX)
> + perf="perf --buildid-dir ${build_id_dir}"
> +
> + ${perf} buildid-cache -v -a ${1}
> + if [ $? -ne 0 ]; then
> + echo "failed: add ${1} to build id cache"
> + exit 1
> + fi
> +
> + check ${1}
> +
> + rm -rf ${build_id_dir}
> +}
> +
> +test_record()
> +{
> + data=$(mktemp /tmp/perf.data.XXX)
> + build_id_dir=$(mktemp -d /tmp/perf.debug.XXX)
> + perf="perf --buildid-dir ${build_id_dir}"
> +
> + ${perf} record --buildid-all -o ${data} ${1}
> + if [ $? -ne 0 ]; then
> + echo "failed: record ${1}"
> + exit 1
> + fi
> +
> + check ${1}
> +
> + rm -rf ${build_id_dir}
> + rm -rf ${data}
> +}
> +
> +# add binaries manual via perf buildid-cache -a
> +test_add buildid-ex-sha1
> +test_add buildid-ex-md5
> +
> +# add binaries via perf record post processing
> +test_record buildid-ex-sha1
> +test_record buildid-ex-md5
> +
> +exit ${err}
> -- 
> 2.26.2
> 

-- 

- Arnaldo


Re: [PATCH v2 00/14] perf arm-spe: Refactor decoding & dumping flow

2020-10-13 Thread Arnaldo Carvalho de Melo
Em Tue, Sep 29, 2020 at 02:39:03PM +0100, Leo Yan escreveu:
> The prominent issue for the SPE trace decoding and dumping is the packet
> header and payload values are hard coded with numbers and it's not
> readable and difficult to maintain; and has other minor issues, e.g. the
> packet length (header + payload) calculation is not correct for some
> packet types, and the dumping flow misses to support specific sub
> classes for operation packet, etc.
> 
> So this patch set is to refactor the Arm SPE decoding SPE with:
> - Patches 01, 02 are minor cleans up;
> - Patches 03, 04 are used to fix and polish the packet and payload
>   length calculation;
> - Patch 05 is to add a helper to wrap up printing strings, this can
>   avoid bunch of duplicate code lines;
> - Patches 06 ~ 12 are used to refactor decoding for different types
>   packet one by one (address packet, context packet, counter packet,
>   event packet, operation packet);
> - Patch 13 is coming from Andre to dump memory tagging;
> - Patch 14 is coming from Wei Li to add decoding for ARMv8.3
>   extension, in this version it has been improved to use defined
>   macros, also is improved for failure handling and commit log.
> 
> This patch set is cleanly applied on the top of perf/core branch
> with commit a55b7bb1c146 ("perf test: Fix msan uninitialized use."),
> and the patches have been verified on Hisilicon D06 platform and I
> manually inspected the dumping result.
> 
> Changes from v1:
> - Heavily rewrote the patch 05 for refactoring printing strings; this
>   is fundamental change, so adjusted the sequence for patches and moved
>   the printing string patch ahead from patch 10 (v1) to patch 05;
> - Changed to use GENMASK_ULL() for bits mask;
> - Added Andre's patch 13 for dumping memory tagging;
> - Refined patch 12 for adding sub classes for Operation packet, merged
>   some commit log from Andre's patch, which allows commit log and code
>   to be more clear; Added "Co-developed-by: Andre Przywara" tag to
>   reflect this.

Ok, so I'll wait for v3, as Leo indicated he'll respin.

Thanks,

- Arnaldo
 
> 
> Andre Przywara (1):
>   perf arm_spe: Decode memory tagging properties
> 
> Leo Yan (12):
>   perf arm-spe: Include bitops.h for BIT() macro
>   perf arm-spe: Fix a typo in comment
>   perf arm-spe: Refactor payload length calculation
>   perf arm-spe: Fix packet length handling
>   perf arm-spe: Refactor printing string to buffer
>   perf arm-spe: Refactor packet header parsing
>   perf arm-spe: Refactor address packet handling
>   perf arm-spe: Refactor context packet handling
>   perf arm-spe: Refactor counter packet handling
>   perf arm-spe: Refactor event type handling
>   perf arm-spe: Refactor operation packet handling
>   perf arm-spe: Add more sub classes for operation packet
> 
> Wei Li (1):
>   perf arm-spe: Add support for ARMv8.3-SPE
> 
>  .../util/arm-spe-decoder/arm-spe-decoder.c|  54 +-
>  .../util/arm-spe-decoder/arm-spe-decoder.h|  17 -
>  .../arm-spe-decoder/arm-spe-pkt-decoder.c | 567 +++---
>  .../arm-spe-decoder/arm-spe-pkt-decoder.h | 124 +++-
>  4 files changed, 478 insertions(+), 284 deletions(-)
> 
> -- 
> 2.20.1
> 

-- 

- Arnaldo


Re: [PATCH 4/5] perf: arm_spe: Decode memory tagging properties

2020-10-13 Thread Arnaldo Carvalho de Melo
Em Tue, Oct 13, 2020 at 11:51:03AM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Sun, Sep 27, 2020 at 11:19:18AM +0800, Leo Yan escreveu:
> > On Tue, Sep 22, 2020 at 11:12:24AM +0100, Andre Przywara wrote:
> > > When SPE records a physical address, it can additionally tag the event
> > > with information from the Memory Tagging architecture extension.
> > > 
> > > Decode the two additional fields in the SPE event payload.
> > > 
> > > Signed-off-by: Andre Przywara 
> > > ---
> > >  .../util/arm-spe-decoder/arm-spe-pkt-decoder.c  | 17 -
> > >  1 file changed, 12 insertions(+), 5 deletions(-)
> > > 
> > > diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c 
> > > b/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c
> > > index 943e4155b246..a033f34846a6 100644
> > > --- a/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c
> > > +++ b/tools/perf/util/arm-spe-decoder/arm-spe-pkt-decoder.c
> > > @@ -8,13 +8,14 @@
> > >  #include 
> > >  #include 
> > >  #include 
> > > +#include 
> > >  
> > >  #include "arm-spe-pkt-decoder.h"
> > >  
> > > -#define BIT(n)   (1ULL << (n))
> > > -
> > >  #define NS_FLAG  BIT(63)
> > >  #define EL_FLAG  (BIT(62) | BIT(61))
> > > +#define CH_FLAG  BIT(62)
> > > +#define PAT_FLAG GENMASK_ULL(59, 56)
> > >  
> > >  #define SPE_HEADER0_PAD  0x0
> > >  #define SPE_HEADER0_END  0x1
> > > @@ -447,10 +448,16 @@ int arm_spe_pkt_desc(const struct arm_spe_pkt 
> > > *packet, char *buf,
> > >   return snprintf(buf, buf_len, "%s 0x%llx el%d ns=%d",
> > >   (idx == 1) ? "TGT" : "PC", payload, el, 
> > > ns);
> > >   case 2: return snprintf(buf, buf_len, "VA 0x%llx", payload);
> > > - case 3: ns = !!(packet->payload & NS_FLAG);
> > > + case 3: {
> > > + int ch = !!(packet->payload & CH_FLAG);
> > > + int pat = (packet->payload & PAT_FLAG) >> 56;
> > > +
> > > + ns = !!(packet->payload & NS_FLAG);
> > >   payload &= ~(0xffULL << 56);
> > > - return snprintf(buf, buf_len, "PA 0x%llx ns=%d",
> > > - payload, ns);
> > > + return snprintf(buf, buf_len,
> > > + "PA 0x%llx ns=%d ch=%d, pat=%x",
> > > + payload, ns, ch, pat);
> > > + }
> > 
> > Reviewed-by: Leo Yan 
> 
> Thanks, applied.

I take that back, I'm applying Leo's series that Andre reviewed instead.

- Arnaldo


Re: [PATCHv4] perf kvm: add kvm-stat for arm64

2020-10-13 Thread Arnaldo Carvalho de Melo
Em Tue, Sep 29, 2020 at 12:34:50PM +0900, Sergey Senozhatsky escreveu:
> On (20/09/17 19:02), Sergey Senozhatsky wrote:
> > Add support for perf kvm stat on arm64 platform.

> > Example:
> >  # perf kvm stat report

> > Analyze events for all VMs, all VCPUs:

> > VM-EXITSamples  Samples% Time%Min TimeMax Time 
> > Avg time
> > 
> >DABT_LOW 66186798.91%40.45%  2.19us   3364.65us  
> > 6.24us ( +-   0.34% )
> > IRQ   4598 0.69%57.44%  2.89us   3397.59us   
> > 1276.27us ( +-   1.61% )
> > WFx   1475 0.22% 1.71%  2.22us   3388.63us
> > 118.31us ( +-   8.69% )
> >IABT_LOW   1018 0.15% 0.38%  2.22us   2742.07us 
> > 38.29us ( +-  12.55% )
> >   SYS64180 0.03% 0.01%  2.07us112.91us  
> > 6.57us ( +-  14.95% )
> >   HVC64 17 0.00% 0.01%  2.19us322.35us 
> > 42.95us ( +-  58.98% )

> > Total Samples:669155, Total events handled time:10216387.86us.

> > Signed-off-by: Sergey Senozhatsky 
> > Reviewed-by: Leo Yan 
> > Tested-by: Leo Yan 

> Arnaldo, any opinion on this?

I'm not finding the actual patch, just this reply from you, lets try
with b4 using this message Message-Id... Magic! But it isn't applying,
can you please refresh the patch to what is in my perf/core branch?

- Arnaldo

[acme@five perf]$ b4 am -csl 20200929033450.GB529@jagdpanzerIV.localdomain
Looking up 
https://lore.kernel.org/r/20200929033450.GB529%40jagdpanzerIV.localdomain
Grabbing thread from lore.kernel.org/linux-arm-kernel
Analyzing 2 messages in the thread
---
Writing ./20200917_sergey_senozhatsky__1_2_3.mbx
  [PATCHv4] perf kvm: add kvm-stat for arm64
+ Signed-off-by: Arnaldo Carvalho de Melo 
+ Link: 
https://lore.kernel.org/r/20200917100225.208794-1-sergey.senozhat...@gmail.com
---
Total patches: 1
---
 Link: 
https://lore.kernel.org/r/20200917100225.208794-1-sergey.senozhat...@gmail.com
 Base: not found
   git am ./20200917_sergey_senozhatsky__1_2_3.mbx
[acme@five perf]$ vim ./20200917_sergey_senozhatsky__1_2_3.mbx
[acme@five perf]$
[acme@five perf]$git am ./20200917_sergey_senozhatsky__1_2_3.mbx
Applying: perf kvm: add kvm-stat for arm64
error: patch failed: tools/perf/arch/arm64/util/Build:1
error: tools/perf/arch/arm64/util/Build: patch does not apply
Patch failed at 0001 perf kvm: add kvm-stat for arm64
hint: Use 'git am --show-current-patch=diff' to see the failed patch
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".
[acme@five perf]$


Re: [PATCH] perf: sched: Show start of latency as well

2020-10-13 Thread Arnaldo Carvalho de Melo
Em Sat, Sep 26, 2020 at 11:45:39AM -0400, Joel Fernandes escreveu:
> On Sat, Sep 26, 2020 at 10:10 AM Namhyung Kim  wrote:
> [...]
> > On Sat, Sep 26, 2020 at 8:56 AM Joel Fernandes (Google)
> >  wrote:
> > >
> > > perf sched latency is really useful at showing worst-case latencies that 
> > > task
> > > encountered since wakeup. However it shows only the end of the latency. 
> > > Often
> > > times the start of a latency is interesting as it can show what else was 
> > > going
> > > on at the time to cause the latency. I certainly myself spending a lot of 
> > > time
> > > backtracking to the start of the latency in "perf sched script" which 
> > > wastes a
> > > lot of time.
> > >
> > > This patch therefore adds a new column "Max delay start". Considering 
> > > this,
> > > also rename "Maximum delay at" to "Max delay end" as its easier to 
> > > understand.
> >
> > Oh, I thought we print start time not the end time.  I think it's better
> > to print start time but others may think differently.
> 
> Right, glad you think so too.
> 
> > Actually we can calculate the start time from the end time and the
> > latency but it'd be convenient if the tool does that for us (as they are
> > printed in different units).
> 
> Correct, but as you mention it is more burdensome to calculate each time.
> 
> > Then the remaining concern is the screen
> > width (of 114 or 115?) but I think it should be fine for most of us.
> 
> It is 114 without the patch and 140 with it. I tried my best to trim
> it a little. It fits fine on my screen with the patch. So I think we
> should be good!

So, what do you think of removing all the redundant info so that we can
get it in a more compact way, i.e.:

 | Delays
 
-
  Task  |  Runtime| Switches |Avg| Max   | Max 
start| Max end|
 
-
  MediaScannerSer:11936 |  651.296 ms |67978 |  0.113 ms | 77.250 ms | 
477.691360 s | 477.768610 s
  audio@2.0-servi:(3)   |0.000 ms | 3440 |  0.034 ms | 72.267 ms | 
477.697051 s | 477.769318 s
  AudioOut_1D:8112  |0.000 ms | 2588 |  0.083 ms | 64.020 ms | 
477.710740 s | 477.774760 s
  Time-limited te:14973 | 7966.090 ms |24807 |  0.073 ms | 15.563 ms | 
477.162746 s | 477.178309 s
  surfaceflinger:8049   |9.680 ms |  603 |  0.063 ms | 13.275 ms | 
476.931791 s | 476.945067 s
  HeapTaskDaemon:(3)| 1588.830 ms | 7040 |  0.065 ms |  6.880 ms | 
473.666043 s | 473.672922 s
  mount-passthrou:(3)   | 1370.809 ms |68904 |  0.011 ms |  6.524 ms | 
478.090630 s | 478.097154 s
  ReferenceQueueD:(3)   |   11.794 ms | 1725 |  0.014 ms |  6.521 ms | 
476.119782 s | 476.126303 s
  writer:14077  |   18.410 ms | 1427 |  0.036 ms |  6.131 ms | 
474.169675 s | 474.175805 s
 
> > Acked-by: Namhyung Kim 
> 
> Thanks, Namyhung!
> 
>  - Joel
> 
> > > Signed-off-by: Joel Fernandes (Google) 
> > >
> > >
> > > ---
> > > A sample output can be seen after applying patch:
> > > https://hastebin.com/raw/ivinimaler
> > >
> > >  tools/perf/builtin-sched.c | 24 ++--
> > >  1 file changed, 14 insertions(+), 10 deletions(-)
> > >
> > > diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
> > > index 459e4229945e..2791da1fe5f7 100644
> > > --- a/tools/perf/builtin-sched.c
> > > +++ b/tools/perf/builtin-sched.c
> > > @@ -130,7 +130,8 @@ struct work_atoms {
> > > struct thread   *thread;
> > > struct rb_node  node;
> > > u64 max_lat;
> > > -   u64 max_lat_at;
> > > +   u64 max_lat_start;
> > > +   u64 max_lat_end;
> > > u64 total_lat;
> > > u64 nb_atoms;
> > > u64 total_runtime;
> > > @@ -1096,7 +1097,8 @@ add_sched_in_event(struct work_atoms *atoms, u64 
> > > timestamp)
> > > atoms->total_lat += delta;
> > > if (delta > atoms->max_lat) {
> > > atoms->max_lat = delta;
> > > -   atoms->max_lat_at = timestamp;
> > > +   atoms->max_lat_start = atom->wake_up_time;
> > > +   atoms->max_lat_end = timestamp;
> > > }
> > > atoms->nb_atoms++;
> > >  }
> > > @@ -1322,7 +1324,7 @@ static void output_lat_thread(struct perf_sched 
> > > *sched, struct work_atoms *work_
> > > int i;
> > > int ret;
> > > u64 avg;
> > > -   char max_lat_at[32];
> > > +   char max_lat_start[32], max_lat_end[32];
> > >
> > > if (!work_list->nb_atoms)
> > > return;
> > > @@ -1344,13 +1346,14 @@ static void output_lat_thread(struct perf_sched 
> > > *sched, struct work_atoms *work_
> > >  

Re: [PATCH] perf: sched: Show start of latency as well

2020-10-13 Thread Arnaldo Carvalho de Melo
Em Sat, Sep 26, 2020 at 11:10:46PM +0900, Namhyung Kim escreveu:
> Hi Joel,
> 
> On Sat, Sep 26, 2020 at 8:56 AM Joel Fernandes (Google)
>  wrote:
> >
> > perf sched latency is really useful at showing worst-case latencies that 
> > task
> > encountered since wakeup. However it shows only the end of the latency. 
> > Often
> > times the start of a latency is interesting as it can show what else was 
> > going
> > on at the time to cause the latency. I certainly myself spending a lot of 
> > time
> > backtracking to the start of the latency in "perf sched script" which 
> > wastes a
> > lot of time.
> >
> > This patch therefore adds a new column "Max delay start". Considering this,
> > also rename "Maximum delay at" to "Max delay end" as its easier to 
> > understand.
> 
> Oh, I thought we print start time not the end time.  I think it's better
> to print start time but others may think differently.
> 
> Actually we can calculate the start time from the end time and the
> latency but it'd be convenient if the tool does that for us (as they are
> printed in different units).  Then the remaining concern is the screen
> width (of 114 or 115?) but I think it should be fine for most of us.
> 
> Acked-by: Namhyung Kim 

Thanks, applied.

- Arnaldo

> 
> Thanks
> Namhyung
> 
> >
> > Signed-off-by: Joel Fernandes (Google) 
> >
> >
> > ---
> > A sample output can be seen after applying patch:
> > https://hastebin.com/raw/ivinimaler
> >
> >  tools/perf/builtin-sched.c | 24 ++--
> >  1 file changed, 14 insertions(+), 10 deletions(-)
> >
> > diff --git a/tools/perf/builtin-sched.c b/tools/perf/builtin-sched.c
> > index 459e4229945e..2791da1fe5f7 100644
> > --- a/tools/perf/builtin-sched.c
> > +++ b/tools/perf/builtin-sched.c
> > @@ -130,7 +130,8 @@ struct work_atoms {
> > struct thread   *thread;
> > struct rb_node  node;
> > u64 max_lat;
> > -   u64 max_lat_at;
> > +   u64 max_lat_start;
> > +   u64 max_lat_end;
> > u64 total_lat;
> > u64 nb_atoms;
> > u64 total_runtime;
> > @@ -1096,7 +1097,8 @@ add_sched_in_event(struct work_atoms *atoms, u64 
> > timestamp)
> > atoms->total_lat += delta;
> > if (delta > atoms->max_lat) {
> > atoms->max_lat = delta;
> > -   atoms->max_lat_at = timestamp;
> > +   atoms->max_lat_start = atom->wake_up_time;
> > +   atoms->max_lat_end = timestamp;
> > }
> > atoms->nb_atoms++;
> >  }
> > @@ -1322,7 +1324,7 @@ static void output_lat_thread(struct perf_sched 
> > *sched, struct work_atoms *work_
> > int i;
> > int ret;
> > u64 avg;
> > -   char max_lat_at[32];
> > +   char max_lat_start[32], max_lat_end[32];
> >
> > if (!work_list->nb_atoms)
> > return;
> > @@ -1344,13 +1346,14 @@ static void output_lat_thread(struct perf_sched 
> > *sched, struct work_atoms *work_
> > printf(" ");
> >
> > avg = work_list->total_lat / work_list->nb_atoms;
> > -   timestamp__scnprintf_usec(work_list->max_lat_at, max_lat_at, 
> > sizeof(max_lat_at));
> > +   timestamp__scnprintf_usec(work_list->max_lat_start, max_lat_start, 
> > sizeof(max_lat_start));
> > +   timestamp__scnprintf_usec(work_list->max_lat_end, max_lat_end, 
> > sizeof(max_lat_end));
> >
> > -   printf("|%11.3f ms |%9" PRIu64 " | avg:%9.3f ms | max:%9.3f ms | 
> > max at: %13s s\n",
> > +   printf("|%11.3f ms |%9" PRIu64 " | avg:%8.3f ms | max:%8.3f ms | 
> > max start: %12s s | max end: %12s s\n",
> >   (double)work_list->total_runtime / NSEC_PER_MSEC,
> >  work_list->nb_atoms, (double)avg / NSEC_PER_MSEC,
> >  (double)work_list->max_lat / NSEC_PER_MSEC,
> > -max_lat_at);
> > +max_lat_start, max_lat_end);
> >  }
> >
> >  static int pid_cmp(struct work_atoms *l, struct work_atoms *r)
> > @@ -3118,7 +3121,8 @@ static void __merge_work_atoms(struct rb_root_cached 
> > *root, struct work_atoms *d
> > list_splice(&data->work_list, &this->work_list);
> > if (this->max_lat < data->max_lat) {
> > this->max_lat = data->max_lat;
> > -   this->max_lat_at = data->max_lat_at;
> > +   this->max_lat_start = data->max_lat_start;
> > +   this->max_lat_end = data->max_lat_end;
> > }
> > zfree(&data);
> > return;
> > @@ -3157,9 +3161,9 @@ static int perf_sched__lat(struct perf_sched *sched)
> > perf_sched__merge_lat(sched);
> > perf_sched__sort_lat(sched);
> >
> > -   printf("\n 
> > 

Re: [PATCH] perf vendor events: Fix typos in power8 PMU events

2020-10-13 Thread Arnaldo Carvalho de Melo
Em Mon, Oct 12, 2020 at 01:21:26PM +0530, kajoljain escreveu:
> 
> 
> On 10/12/20 10:32 AM, Sandipan Das wrote:
> > This replaces the incorrectly spelled word "localtion"
> > with "location" in some power8 PMU event descriptions.
> 
> Patch looks good to me, Thanks for correcting it.
> 
> Reviewed-By: Kajol Jain
 

Thanks, applied.

- Arnaldo

> Thanks,
> Kajol Jain
> > 
> > Fixes: 2a81fa3bb5ed ("perf vendor events: Add power8 PMU events")
> > Signed-off-by: Sandipan Das 
> > ---
> >  .../pmu-events/arch/powerpc/power8/cache.json| 10 +-
> >  .../pmu-events/arch/powerpc/power8/frontend.json | 12 ++--
> >  .../pmu-events/arch/powerpc/power8/marked.json   | 10 +-
> >  .../pmu-events/arch/powerpc/power8/other.json| 16 
> >  .../arch/powerpc/power8/translation.json |  2 +-
> >  5 files changed, 25 insertions(+), 25 deletions(-)
> > 
> > diff --git a/tools/perf/pmu-events/arch/powerpc/power8/cache.json 
> > b/tools/perf/pmu-events/arch/powerpc/power8/cache.json
> > index 6b792b2c87e2..05a17084d939 100644
> > --- a/tools/perf/pmu-events/arch/powerpc/power8/cache.json
> > +++ b/tools/perf/pmu-events/arch/powerpc/power8/cache.json
> > @@ -32,8 +32,8 @@
> >{
> >  "EventCode": "0x1c04e",
> >  "EventName": "PM_DATA_FROM_L2MISS_MOD",
> > -"BriefDescription": "The processor's data cache was reloaded from a 
> > localtion other than the local core's L2 due to a demand load",
> > -"PublicDescription": "The processor's data cache was reloaded from a 
> > localtion other than the local core's L2 due to either only demand loads or 
> > demand loads plus prefetches if MMCR1[16] is 1"
> > +"BriefDescription": "The processor's data cache was reloaded from a 
> > location other than the local core's L2 due to a demand load",
> > +"PublicDescription": "The processor's data cache was reloaded from a 
> > location other than the local core's L2 due to either only demand loads or 
> > demand loads plus prefetches if MMCR1[16] is 1"
> >},
> >{
> >  "EventCode": "0x3c040",
> > @@ -74,8 +74,8 @@
> >{
> >  "EventCode": "0x4c04e",
> >  "EventName": "PM_DATA_FROM_L3MISS_MOD",
> > -"BriefDescription": "The processor's data cache was reloaded from a 
> > localtion other than the local core's L3 due to a demand load",
> > -"PublicDescription": "The processor's data cache was reloaded from a 
> > localtion other than the local core's L3 due to either only demand loads or 
> > demand loads plus prefetches if MMCR1[16] is 1"
> > +"BriefDescription": "The processor's data cache was reloaded from a 
> > location other than the local core's L3 due to a demand load",
> > +"PublicDescription": "The processor's data cache was reloaded from a 
> > location other than the local core's L3 due to either only demand loads or 
> > demand loads plus prefetches if MMCR1[16] is 1"
> >},
> >{
> >  "EventCode": "0x3c042",
> > @@ -134,7 +134,7 @@
> >{
> >  "EventCode": "0x4e04e",
> >  "EventName": "PM_DPTEG_FROM_L3MISS",
> > -"BriefDescription": "A Page Table Entry was loaded into the TLB from a 
> > localtion other than the local core's L3 due to a data side request",
> > +"BriefDescription": "A Page Table Entry was loaded into the TLB from a 
> > location other than the local core's L3 due to a data side request",
> >  "PublicDescription": ""
> >},
> >{
> > diff --git a/tools/perf/pmu-events/arch/powerpc/power8/frontend.json 
> > b/tools/perf/pmu-events/arch/powerpc/power8/frontend.json
> > index 1ddc30655d43..1c902a8263b6 100644
> > --- a/tools/perf/pmu-events/arch/powerpc/power8/frontend.json
> > +++ b/tools/perf/pmu-events/arch/powerpc/power8/frontend.json
> > @@ -116,8 +116,8 @@
> >{
> >  "EventCode": "0x1404e",
> >  "EventName": "PM_INST_FROM_L2MISS",
> > -"BriefDescription": "The processor's Instruction cache was reloaded 
> > from a localtion other than the local core's L2 due to an instruction fetch 
> > (not prefetch)",
> > -"PublicDescription": "The processor's Instruction cache was reloaded 
> > from a localtion other than the local core's L2 due to either an 
> > instruction fetch or instruction fetch plus prefetch if MMCR1[17] is 1"
> > +"BriefDescription": "The processor's Instruction cache was reloaded 
> > from a location other than the local core's L2 due to an instruction fetch 
> > (not prefetch)",
> > +"PublicDescription": "The processor's Instruction cache was reloaded 
> > from a location other than the local core's L2 due to either an instruction 
> > fetch or instruction fetch plus prefetch if MMCR1[17] is 1"
> >},
> >{
> >  "EventCode": "0x34040",
> > @@ -158,8 +158,8 @@
> >{
> >  "EventCode": "0x4404e",
> >  "EventName": "PM_INST_FROM_L3MISS_MOD",
> > -"BriefDescription": "The processor's Instruction cache was reloaded 
> > from a localtion other than the local core's L3 due to a instruction fetch",
> > -"Publi

Re: [PATCHSET v4 0/6] perf inject: Speed build-id injection

2020-10-13 Thread Arnaldo Carvalho de Melo
Em Mon, Oct 12, 2020 at 04:02:08PM +0900, Namhyung Kim escreveu:
> Hello,
> 
> This is the new version of speed up build-id injection.  As this is
> to improve performance, I've added a benchmark for it.  Please look at
> the usage in the first commit.
> 
> By default, it measures average processing time of 100 MMAP2 events
> and 1 SAMPLE events.  Below is the current result on my laptop.
> 
>   $ perf bench internals inject-build-id
>   # Running 'internals/inject-build-id' benchmark:
> Average build-id injection took: 25.789 msec (+- 0.202 msec)
> Average time per event: 2.528 usec (+- 0.020 usec)
> Average memory usage: 8411 KB (+- 7 KB)
> 
> With this patchset applied, it got this:
> 
>   $ perf bench internals inject-build-id
>   # Running 'internals/inject-build-id' benchmark:
> Average build-id injection took: 20.838 msec (+- 0.093 msec)
> Average time per event: 2.043 usec (+- 0.009 usec)
> Average memory usage: 8261 KB (+- 0 KB)
> Average build-id-all injection took: 19.361 msec (+- 0.118 msec)
> Average time per event: 1.898 usec (+- 0.012 usec)
> Average memory usage: 7440 KB (+- 0 KB)
> 
> 
> Real usecases might be different as it depends on the number of
> mmap/sample events as well as how many DSOs are actually hit.
> 
> The benchmark result now includes memory footprint in terms of maximum
> RSS.  Also I've update the benchmark code to use timestamp so that it
> can be queued to the ordered_events (and flushed at the end).  It's
> also important how well it sorts the input events in the queue so I
> randomly chose a timestamp at the beginning of each MMAP event
> injection to resemble actual behavior.
> 
> As I said in other thread, perf inject currently doesn't flush the
> input events and processes all at the end.  This gives a good speedup
> but spends more memory (in proprotion to the input size).  While the
> build-id-all injection bypasses the queue so it uses less memory as
> well as faster processing.  The downside is that it'll mark all DSOs
> as hit so later processing steps (like perf report) likely handle them
> unnecessarily.

Thanks, tested and applied, first patchkit I process using that b4 tool,
cool!

- Arnaldo
 
> 
> This code is available at 'perf/inject-speedup-v4' branch on
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/namhyung/linux-perf.git
> 
> 
> Changes from v3:
>  - add timestamp to the synthesized events in the benchmark
>  - add a separate thread to read pipe in the benchmark
> 
> Changes from v2:
>  - fix benchmark to read required data
>  - add Acked-by from Jiri and Ian
>  - pass map flag to check huge pages  (Jiri)
>  - add comments on some functions  (Ian)
>  - show memory (max-RSS) usage in the benchmark  (Ian)
>  - drop build-id marking patch at the last  (Adrian)
> 
> 
> Namhyung Kim (6):
>   perf bench: Add build-id injection benchmark
>   perf inject: Add missing callbacks in perf_tool
>   perf inject: Enter namespace when reading build-id
>   perf inject: Do not load map/dso when injecting build-id
>   perf inject: Add --buildid-all option
>   perf bench: Run inject-build-id with --buildid-all option too
> 
>  tools/perf/Documentation/perf-inject.txt |   6 +-
>  tools/perf/bench/Build   |   1 +
>  tools/perf/bench/bench.h |   1 +
>  tools/perf/bench/inject-buildid.c| 457 +++
>  tools/perf/builtin-bench.c   |   1 +
>  tools/perf/builtin-inject.c  | 199 --
>  tools/perf/util/build-id.h   |   4 +
>  tools/perf/util/map.c|  17 +-
>  tools/perf/util/map.h|  14 +
>  9 files changed, 645 insertions(+), 55 deletions(-)
>  create mode 100644 tools/perf/bench/inject-buildid.c
> 
> -- 
> 2.28.0.681.g6f77f65b4e-goog
> 
> 
> *** BLURB HERE ***
> 
> Namhyung Kim (6):
>   perf bench: Add build-id injection benchmark
>   perf inject: Add missing callbacks in perf_tool
>   perf inject: Enter namespace when reading build-id
>   perf inject: Do not load map/dso when injecting build-id
>   perf inject: Add --buildid-all option
>   perf bench: Run inject-build-id with --buildid-all option too
> 
>  tools/perf/Documentation/perf-inject.txt |   6 +-
>  tools/perf/bench/Build   |   1 +
>  tools/perf/bench/bench.h |   1 +
>  tools/perf/bench/inject-buildid.c| 476 +++
>  tools/perf/builtin-bench.c   |   1 +
>  tools/perf/builtin-inject.c  | 199 --
>  tools/perf/util/build-id.h   |   4 +
>  tools/perf/util/map.c|  17 +-
>  tools/perf/util/map.h|  14 +
>  9 files changed, 664 insertions(+), 55 deletions(-)
>  create mode 100644 tools/perf/bench/inject-buildid.c
> 
> -- 
> 2.28.0.1011.ga647a8990f-goog
> 

-- 

- Arnaldo


Re: [PATCH] perf stat: Fix segfault on armv8_pmu events

2020-10-07 Thread Arnaldo Carvalho de Melo
Em Wed, Oct 07, 2020 at 01:42:19PM +0200, Jiri Olsa escreveu:
> On Wed, Oct 07, 2020 at 05:13:11PM +0900, Namhyung Kim wrote:
> > It was reported that perf stat crashed when using with armv8_pmu (cpu)
> > events with the task mode.  As perf stat uses an empty cpu map for
> > task mode but armv8_pmu has its own cpu mask, it confused which map
> > should use when accessing file descriptors and caused segfaults:
> > 
> >   (gdb) bt
> >   #0  0x00603fc8 in perf_evsel__close_fd_cpu (evsel=,
> >   cpu=) at evsel.c:122
> >   #1  perf_evsel__close_cpu (evsel=evsel@entry=0x716e950, cpu=7) at 
> > evsel.c:156
> >   #2  0x004d4718 in evlist__close (evlist=0x70a7cb0) at 
> > util/evlist.c:1242
> >   #3  0x00453404 in __run_perf_stat (argc=3, argc@entry=1, 
> > argv=0x30,
> >   argv@entry=0xfaea2f90, run_idx=119, run_idx@entry=1701998435)
> >   at builtin-stat.c:929
> >   #4  0x00455058 in run_perf_stat (run_idx=1701998435, 
> > argv=0xfaea2f90,
> >   argc=1) at builtin-stat.c:947
> >   #5  cmd_stat (argc=1, argv=0xfaea2f90) at builtin-stat.c:2357
> >   #6  0x004bb888 in run_builtin (p=p@entry=0x9764b8 ,
> >   argc=argc@entry=4, argv=argv@entry=0xfaea2f90) at perf.c:312
> >   #7  0x004bbb54 in handle_internal_command (argc=argc@entry=4,
> >   argv=argv@entry=0xfaea2f90) at perf.c:364
> >   #8  0x00435378 in run_argv (argcp=,
> >   argv=) at perf.c:408
> >   #9  main (argc=4, argv=0xfaea2f90) at perf.c:538
> > 
> > To fix this, I simply used the given cpu map unless the evsel actually
> > is not a system-wide event (like uncore events).
> > 
> > Reported-by: Wei Li 
> > Tested-by: Barry Song 
> > Fixes: 7736627b865d ("perf stat: Use affinity for closing file descriptors")
> > Signed-off-by: Namhyung Kim 
> 
> Acked-by: Jiri Olsa 

Thanks, applied.

- Arnaldo

 
> thanks,
> jirka
> 
> > ---
> >  tools/lib/perf/evlist.c | 3 +++
> >  1 file changed, 3 insertions(+)
> > 
> > diff --git a/tools/lib/perf/evlist.c b/tools/lib/perf/evlist.c
> > index 2208444ecb44..cfcdbd7be066 100644
> > --- a/tools/lib/perf/evlist.c
> > +++ b/tools/lib/perf/evlist.c
> > @@ -45,6 +45,9 @@ static void __perf_evlist__propagate_maps(struct 
> > perf_evlist *evlist,
> > if (!evsel->own_cpus || evlist->has_user_cpus) {
> > perf_cpu_map__put(evsel->cpus);
> > evsel->cpus = perf_cpu_map__get(evlist->cpus);
> > +   } else if (!evsel->system_wide && perf_cpu_map__empty(evlist->cpus)) {
> > +   perf_cpu_map__put(evsel->cpus);
> > +   evsel->cpus = perf_cpu_map__get(evlist->cpus);
> > } else if (evsel->cpus != evsel->own_cpus) {
> > perf_cpu_map__put(evsel->cpus);
> > evsel->cpus = perf_cpu_map__get(evsel->own_cpus);
> > -- 
> > 2.28.0.806.g8561365e88-goog
> > 
> 

-- 

- Arnaldo


ANNOUNCE: pahole v1.18 (raw data pretty printer, BTF global vars)

2020-10-02 Thread Arnaldo Carvalho de Melo
 the
  later being just flatly refused, that got left for v1.19.

- Bail out on partial units for now, avoiding segfaults and providing warning
  to user, hopefully will be addressed in v1.19.

Signed-off-by: Arnaldo Carvalho de Melo 


Re: [PATCHv2 1/9] perf tools: Add build id shell test

2020-10-02 Thread Arnaldo Carvalho de Melo
Em Fri, Oct 02, 2020 at 10:34:51AM -0700, Ian Rogers escreveu:
> On Fri, Oct 2, 2020 at 6:07 AM Namhyung Kim  wrote:
> >
> > Hi Jiri,
> >
> > On Fri, Oct 2, 2020 at 4:05 AM Jiri Olsa  wrote:
> > >
> > > Adding test for build id cache that adds binary
> > > with sha1 and md5 build ids and verifies it's
> > > added properly.
> > >
> > > The test updates build id cache with perf record
> > > and perf buildid-cache -a.
> > >
> > > Signed-off-by: Jiri Olsa 
> > > ---
> > > v2 changes:
> > >   - detect perf build directory when checking for build-ex* binaries
> > >
> > >  tools/perf/Makefile.perf  |  14 +
> > >  tools/perf/tests/shell/buildid.sh | 101 ++
> > >  2 files changed, 115 insertions(+)
> > >  create mode 100755 tools/perf/tests/shell/buildid.sh
> > >
> > > diff --git a/tools/perf/Makefile.perf b/tools/perf/Makefile.perf
> > > index 920d8afb9238..b2aeefa64e92 100644
> > > --- a/tools/perf/Makefile.perf
> > > +++ b/tools/perf/Makefile.perf
> > > @@ -126,6 +126,8 @@ include ../scripts/utilities.mak
> > >  #
> > >  # Define NO_LIBDEBUGINFOD if you do not want support debuginfod
> > >  #
> > > +# Define NO_BUILDID_EX if you do not want buildid-ex-* binaries
> > > +#
> > >
> > >  # As per kernel Makefile, avoid funny character set dependencies
> > >  unexport LC_ALL
> > > @@ -349,6 +351,11 @@ ifndef NO_PERF_READ_VDSOX32
> > >  PROGRAMS += $(OUTPUT)perf-read-vdsox32
> > >  endif
> > >
> > > +ifndef NO_BUILDID_EX
> > > +PROGRAMS += $(OUTPUT)buildid-ex-sha1
> > > +PROGRAMS += $(OUTPUT)buildid-ex-md5
> > > +endif
> > > +
> > >  LIBJVMTI = libperf-jvmti.so
> > >
> > >  ifndef NO_JVMTI
> > > @@ -756,6 +763,13 @@ $(OUTPUT)perf-read-vdsox32: perf-read-vdso.c 
> > > util/find-map.c
> > > $(QUIET_CC)$(CC) -mx32 $(filter -static,$(LDFLAGS)) -Wall -Werror 
> > > -o $@ perf-read-vdso.c
> > >  endif
> > >
> > > +ifndef NO_BUILDID_EX
> > > +$(OUTPUT)buildid-ex-sha1:
> > > +   $(QUIET_LINK)echo 'int main(void) { return 0; }' | $(CC) 
> > > -Wl,--build-id=sha1 -o $@ -x c -
> > > +$(OUTPUT)buildid-ex-md5:
> > > +   $(QUIET_LINK)echo 'int main(void) { return 0; }' | $(CC) 
> > > -Wl,--build-id=md5 -o $@ -x c -
> > > +endif
> >
> > Can we just build them in the test shell script instead?
> >
> > Thanks
> > Namhyung
> 
> That'd mean perf test having a dependency on a compiler :-/ That said
> there are some existing dependencies for BPF compilers.

If doing it in the test shell script ends up being advantageous, we
could skip the test if a suitable compiler isn't available.

- Arnaldo
 
> Thanks,
> Ian
> 
> >
> > > +
> > >  ifndef NO_JVMTI
> > >  LIBJVMTI_IN := $(OUTPUT)jvmti/jvmti-in.o
> > >
> > > diff --git a/tools/perf/tests/shell/buildid.sh 
> > > b/tools/perf/tests/shell/buildid.sh
> > > new file mode 100755
> > > index ..dd9f9c306c34
> > > --- /dev/null
> > > +++ b/tools/perf/tests/shell/buildid.sh
> > > @@ -0,0 +1,101 @@
> > > +#!/bin/sh
> > > +# build id cache operations
> > > +# SPDX-License-Identifier: GPL-2.0
> > > +
> > > +ex_md5=buildid-ex-md5
> > > +ex_sha1=buildid-ex-sha1
> > > +
> > > +# skip if there are no test binaries
> > > +if [ ! -x buildid-ex-sha1 -a ! -x buildid-ex-md5 ]; then
> > > +   ex_dir=$(dirname `which perf`)
> > > +   ex_md5=${ex_dir}/buildid-ex-md5
> > > +   ex_sha1=${ex_dir}/buildid-ex-sha1
> > > +
> > > +   if [ ! -x ${ex_sha1} -a ! -x ${ex_md5} ]; then
> > > +   echo "failed: no test binaries"
> > > +   exit 2
> > > +   fi
> > > +fi
> > > +
> > > +echo "test binaries: ${ex_sha1} ${ex_md5}"
> > > +
> > > +# skip if there's no readelf
> > > +if [ ! -x `which readelf` ]; then
> > > +   echo "failed: no readelf, install binutils"
> > > +   exit 2
> > > +fi
> > > +
> > > +check()
> > > +{
> > > +   id=`readelf -n $1 2>/dev/null | grep 'Build ID' | awk '{print 
> > > $3}'`
> > > +
> > > +   echo "build id: ${id}"
> > > +
> > > +   link=${build_id_dir}/.build-id/${id:0:2}/${id:2}
> > > +   echo "link: ${link}"
> > > +
> > > +   if [ ! -h $link ]; then
> > > +   echo "failed: link ${link} does not exist"
> > > +   exit 1
> > > +   fi
> > > +
> > > +   file=${build_id_dir}/.build-id/${id:0:2}/`readlink ${link}`/elf
> > > +   echo "file: ${file}"
> > > +
> > > +   if [ ! -x $file ]; then
> > > +   echo "failed: file ${file} does not exist"
> > > +   exit 1
> > > +   fi
> > > +
> > > +   diff ${file} ${1}
> > > +   if [ $? -ne 0 ]; then
> > > +   echo "failed: ${file} do not match"
> > > +   exit 1
> > > +   fi
> > > +
> > > +   echo "OK for ${1}"
> > > +}
> > > +
> > > +test_add()
> > > +{
> > > +   build_id_dir=$(mktemp -d /tmp/perf.debug.XXX)
> > > +   perf="perf --buildid-dir ${build_id_dir}"
> > > +
> > > +   ${perf} buildid-cache -v -a ${1}
> > > +   if [ $? -ne 0 ]; then
> > > +   echo "failed: add ${1} to build id cache"
> > 

Re: [PATCH] perf tools: Fix printable strings in python3 scripts

2020-10-01 Thread Arnaldo Carvalho de Melo
Em Mon, Sep 28, 2020 at 10:11:35PM +0200, Jiri Olsa escreveu:
> Hagen reported broken strings in python3 tracepoint scripts:
> 
>   make PYTHON=python3
>   ./perf record -e sched:sched_switch -a -- sleep 5
>   ./perf script --gen-script py
>   ./perf script -s ./perf-script.py
> 
>   [..]
>   sched__sched_switch  7 563231.7595257920 swapper   \
>   prev_comm=bytearray(b'swapper/7\x00\x00\x00\x00\x00\x00\x00'), \
>   prev_pid=0, prev_prio=120, prev_state=, 
> next_comm=bytearray(b'mutex-thread-co\x00'),
> 
> The problem is in is_printable_array function that does not take
> zero byte into account and claim such string as not printable,
> so the code will create byte array instead of string.

Thanks, tested and applied.

- Arnaldo
 
> Cc: sta...@vger.kernel.org
> Fixes: 249de6e07458 ("perf script python: Fix string vs byte array resolving")
> Tested-by: Hagen Paul Pfeifer 
> Signed-off-by: Jiri Olsa 
> ---
>  tools/perf/util/print_binary.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/tools/perf/util/print_binary.c b/tools/perf/util/print_binary.c
> index 599a1543871d..13fdc51c61d9 100644
> --- a/tools/perf/util/print_binary.c
> +++ b/tools/perf/util/print_binary.c
> @@ -50,7 +50,7 @@ int is_printable_array(char *p, unsigned int len)
>  
>   len--;
>  
> - for (i = 0; i < len; i++) {
> + for (i = 0; i < len && p[i]; i++) {
>   if (!isprint(p[i]) && !isspace(p[i]))
>   return 0;
>   }
> -- 
> 2.26.2
> 

-- 

- Arnaldo


Re: [PATCH v4] tools lib traceevent: Hide non API functions

2020-09-30 Thread Arnaldo Carvalho de Melo
Em Wed, Sep 30, 2020 at 12:50:27PM -0400, Steven Rostedt escreveu:
> On Wed, 30 Sep 2020 14:07:33 +0300
> "Tzvetomir Stoyanov (VMware)"  wrote:
> 
> > There are internal library functions, which are not declared as a static.
> > They are used inside the library from different files. Hide them from
> > the library users, as they are not part of the API.
> > These functions are made hidden and are renamed without the prefix "tep_":
> >  tep_free_plugin_paths
> >  tep_peek_char
> >  tep_buffer_init
> >  tep_get_input_buf_ptr
> >  tep_get_input_buf
> >  tep_read_token
> >  tep_free_token
> >  tep_free_event
> >  tep_free_format_field
> >  __tep_parse_format
> > 
> > Link: 
> > https://lore.kernel.org/linux-trace-devel/e4afdd82deb5e023d53231bb13e08dca78085fb0.ca...@decadent.org.uk/
> > Reported-by: Ben Hutchings 
> > Signed-off-by: Tzvetomir Stoyanov (VMware) 
> > ---
> 
> Reviewed-by: Steven Rostedt (VMware) 
> 
> Arnaldo,
> 
> Can you pull this in?

Sure, I was just waiting for this to get to some conclusion.

- Arnaldo


Re: perf script, libperf: python binding bug (bytearrays vs. strings)

2020-09-28 Thread Arnaldo Carvalho de Melo
Em Mon, Sep 28, 2020 at 03:39:42PM +0200, Jiri Olsa escreveu:
> On Mon, Sep 28, 2020 at 12:43:11PM +0200, Hagen Paul Pfeifer wrote:
> > * Jiri Olsa | 2020-09-28 12:08:08 [+0200]:
> > 
> > >patch below fixes it for me, but seems strange this was
> > >working till now.. maybe you're the only one using this
> > >with python3 ;-)
> > 
> > and I thought python2 is obsolete and not maintained anymore ... ;-)
> > Anyway, the patch fixed everything: no more garbage for Python2 and Python3
> > as well as no bytearray type Python3!
> > 
> > Tested-by: Hagen Paul Pfeifer 
> > 
> > Thank you Jiri!
> > 
> > Probably this patch should be applied on stable too!? Not sure when the 
> > problem was introduced.
> 
> great, I'll check on that and send full patch later, thanks 

Thanks, I'll do one more pull req for v5.9, will have that in.

Hagen, please consider sending a patch making using python3 the default,
with python2 left just for whoever still needs it.

Thanks!

- Arnaldo
 
> jirka
> 
> > 
> > Hagen
> > 
> > >jirka
> > >
> > >
> > >---
> > >diff --git a/tools/perf/util/print_binary.c 
> > >b/tools/perf/util/print_binary.c
> > >index 599a1543871d..13fdc51c61d9 100644
> > >--- a/tools/perf/util/print_binary.c
> > >+++ b/tools/perf/util/print_binary.c
> > >@@ -50,7 +50,7 @@ int is_printable_array(char *p, unsigned int len)
> > > 
> > >   len--;
> > > 
> > >-  for (i = 0; i < len; i++) {
> > >+  for (i = 0; i < len && p[i]; i++) {
> > >   if (!isprint(p[i]) && !isspace(p[i]))
> > >   return 0;
> > >   }
> > >
> > 
> 

-- 

- Arnaldo


<    1   2   3   4   5   6   7   8   9   10   >