Re: [PATCH 02/26] perf: Introduce mmap3 version of mmap event

2020-09-14 Thread Adrian Hunter
On 14/09/20 11:07 pm, Jiri Olsa wrote:
> On Mon, Sep 14, 2020 at 10:08:01AM -0700, Ian Rogers wrote:
> 
> SNIP
> 
>>>
>>> Using one of the MISC bits to resolve the union. Might actually bring
>>> benefit to everyone. Us normal people get to have a smaller MMAP record,
>>> while the buildid folks can have it too.
>>>
>>> Even more extreme would be using 2 MISC bits and allowing the union to
>>> be 0 sized for anon.
>>>
>>> That said; I have the nagging feeling there were unresolved issues with
>>> mmap2, but I can't seem to find any relevant emails on it :/ My
>>> google-fu is weak today.
>>
>> Firstly, thanks Jiri for this really useful patch set for our
>> use-cases! Some thoughts:
>>
>> One issue with mmap2 events at the moment is that they happen "after"
>> the mmap is performed. This allows the mapped address to be known but
>> has the unfortunate problem of causing mmap events to get "extended"
>> due to adjacent vmas being merged. There are some workarounds like
>> commit c8f6ae1fb28d (perf inject jit: Remove //anon mmap events).
>> Perhaps these events can switch to reporting the length the mmap
>> requested rather than the length of the vma?
> 
> seems like separate feature, perhaps we could use another MISC bit for that?
> 
>>
>> I could imagine that changes here could be interesting to folks doing
>> CHERI or hypervisors, for example, they may want more address bits.
>>
>> BPF has stack traces with elements of buildid + offset. Using these in
>> perf samples would avoid the need for mmap events, synthesis, etc. but
>> at the cost of larger and possibly slower stack traces. Perhaps we
>> should just remove the idea of mmap events altogether?
> 
> hm, we also need mmap events to resolve PERF_SAMPLE_IP, right?
> also not sure how would we do that for dwarf unwind, and perhaps
> there are some other places.. c2c data address resolving?

Not to mention Intel PT and any other hw trace that puts ip's into the AUX area.

And branch stacks, call chains.


Re: [PATCH 02/26] perf: Introduce mmap3 version of mmap event

2020-09-14 Thread Adrian Hunter
On 15/09/20 1:00 am, Arnaldo Carvalho de Melo wrote:
> Em Mon, Sep 14, 2020 at 09:39:07PM +0200, Jiri Olsa escreveu:
>> On Mon, Sep 14, 2020 at 12:28:41PM -0300, Arnaldo Carvalho de Melo wrote:
>>> Em Mon, Sep 14, 2020 at 02:38:27PM +0900, Namhyung Kim escreveu:
 On Mon, Sep 14, 2020 at 6:03 AM Jiri Olsa  wrote:
> Add new version of mmap event. The MMAP3 record is an
> augmented version of MMAP2, it adds build id value to
> identify the exact binary object behind memory map:
>  
>   struct {
> struct perf_event_header header;
>  
> u32  pid, tid;
> u64  addr;
> u64  len;
> u64  pgoff;
> u32  maj;
> u32  min;
> u64  ino;
> u64  ino_generation;
> u32  prot, flags;
> u32  reserved;
>  
>>> What for this reserved? its all nicely aligned already, u64 followed by
>>> two u32 (prot, flags).
>  
> u8   buildid[20];
>   
 Do we need maj, min, ino, ino_generation for mmap3 event?
 I think they are to compare binaries, then we can do it with
 build-id (and I think it'd be better)..
>  
>>> Humm, I thought MMAP2 would be a superset of MMAP and MMAP3 would be a
>>> superset of MMAP2.
>  
>>> If we want to ditch useless stuff, then trow away pid, tid too, as we
>>> can select those via sample_type.
>  
>>> Having said that, at this point I don't even know if adding new
>>> PERF_RECORD_ that are an update for a preexisting one is the right way
>>> to proceed.
> 
>>> Perhaps we should attach a BPF program to point where a mmap/munmap is
>>> being done (perf_event_mmap()) and allow userspace to ask for whatever
>>> it wants?  With a kprobes there right now we can implement this MMAP3
>>> easily, no?
>  
>> hmm, I'm always woried about solutions based on kprobes,
>> because once the function is moved/removed you're screwed
>> and need to keep up with function name changes and be
>> backward compatible..
> 
> Well, I'm not advocating to have it as kprobes permanently, but we can
> implement it now using a kprobes, i.e. systems wouldn't have to have its
> kernel updated to have this feature, but once then need, for some other
> reason, to have their kernel upgraded, then perf would notice that there
> is a tracepoint for that and would happily use it.
> 
> So we would be able to use that tracepoint with things like ftrace,
> bpftrace, everything that knows about tracepoints, and perf would get
> build-ids and whatever else is needed to have a mmap record, in the
> future we could even ask for some more (or less) according to the what
> is needed for some new feature.
> 
> I.e. the point wasn't about kprobes was about using BPF to state what
> we want to collect when a mmap is being put in place.

Isn't the problem with krpobes / tracepoints etc that non-privileged users
can't use them.


Re: [PATCH 02/26] perf: Introduce mmap3 version of mmap event

2020-09-14 Thread Namhyung Kim
Hi Jiri,

On Tue, Sep 15, 2020 at 4:38 AM Jiri Olsa  wrote:
>
> On Mon, Sep 14, 2020 at 02:38:27PM +0900, Namhyung Kim wrote:
>
> SNIP
>
> > > diff --git a/include/uapi/linux/perf_event.h 
> > > b/include/uapi/linux/perf_event.h
> > > index 077e7ee69e3d..facfc3c673ed 100644
> > > --- a/include/uapi/linux/perf_event.h
> > > +++ b/include/uapi/linux/perf_event.h
> > > @@ -384,7 +384,8 @@ struct perf_event_attr {
> > > aux_output :  1, /* generate AUX 
> > > records instead of events */
> > > cgroup :  1, /* include cgroup 
> > > events */
> > > text_poke  :  1, /* include text poke 
> > > events */
> > > -   __reserved_1   : 30;
> > > +   mmap3  :  1, /* include bpf 
> > > events */
> >
> > ???
> >
> > > +   __reserved_1   : 29;
> > >
> > > union {
> > > __u32   wakeup_events;/* wakeup every n 
> > > events */
> > > @@ -1060,6 +1061,30 @@ enum perf_event_type {
> > >  */
> > > PERF_RECORD_TEXT_POKE   = 20,
> > >
> > > +   /*
> > > +* The MMAP3 records are an augmented version of MMAP2, they add
> > > +* build id value to identify the exact binary behind map
> > > +*
> > > +* struct {
> > > +*  struct perf_event_headerheader;
> > > +*
> > > +*  u32 pid, tid;
> > > +*  u64 addr;
> > > +*  u64 len;
> > > +*  u64 pgoff;
> > > +*  u32 maj;
> > > +*  u32 min;
> > > +*  u64 ino;
> > > +*  u64 ino_generation;
> > > +*  u32 prot, flags;
> > > +*  u32 reserved;
> > > +*  u8  buildid[20];
> > > +*  charfilename[];
> > > +*  struct sample_idsample_id;
> > > +* };
> > > +*/
> > > +   PERF_RECORD_MMAP3   = 21,
> > > +
> > > PERF_RECORD_MAX,/* non-ABI */
> > >  };
> > >
> > [SNIP]
> > > @@ -8098,6 +8116,9 @@ static void perf_event_mmap_event(struct 
> > > perf_mmap_event *mmap_event)
> > > mmap_event->prot = prot;
> > > mmap_event->flags = flags;
> > >
> > > +   if (atomic_read(&nr_mmap3_events))
> > > +   build_id_parse(vma, mmap_event->buildid);
> >
> > What about if it failed?  We should zero out the build-id..
>
> it is initialized to zero in perf_event_mmap
>
> mmap_event = (struct perf_mmap_event){
> .vma= vma,
> ...
>
> I'll double check build_id_parse won't leave anything half
> baked there, but I dont think so

Oh, you're right.  I missed that..

Thanks
Namhyung


Re: [PATCH 02/26] perf: Introduce mmap3 version of mmap event

2020-09-14 Thread Ian Rogers
On Mon, Sep 14, 2020 at 10:26 AM Stephane Eranian  wrote:
>
> On Mon, Sep 14, 2020 at 2:08 AM  wrote:
> >
> > On Sun, Sep 13, 2020 at 11:41:00PM -0700, Stephane Eranian wrote:
> > > On Sun, Sep 13, 2020 at 2:03 PM Jiri Olsa  wrote:
> > > what happens if I set mmap3 and mmap2?
> > >
> > > I think using mmap3 for every mmap may be overkill as you add useless
> > > 20 bytes to an mmap record.
> > > I am not sure if your code handles the case where mmap3 is not needed
> > > because there is no buildid, e.g, anonymous memory.
> > > It seems to me you've written the patch in such a way that if the user
> > > tool supports mmap3, then it supersedes mmap2, and thus
> > > you need all the fields of mmap2. But if could be more interesting to
> > > return either MMAP2 or MMAP3 depending on tool support
> > > and type of mmap, that would certainly save 20 bytes on any anon mmap.
> > > But maybe that logic is already in your patch and I missed it.
> >
> > That, and what if you don't want any of that buildid nonsense at all? I
> > always kill that because it makes perf pointlessly slow and has
> > absolutely no upsides for me.
> >
> I have seen situations where the perf tool takes a visibly significant
> amount of time (many seconds) to inject the buildids at the end of the
> collection
> of perf record (same if using perf inject -b). That is because it
> needs to go through all the records in the perf.data to find MMAP
> records and then read
> the buildids from the filesystem. This has caused some problems in our
> environment. Having the kernel add the buildid to *relevant* mmaps
> would avoid
> a lot of that penalty, by avoiding having to parse the perf.data file
> and leveraging the fact that the buildid may be in memory already.
> Although my concern on
> this has to do with large pages and the impact they have on alignment
> of sections in memory.  I think Ian can comment better on this.

I believe this is a problem we have and that is going away. For
context, we map huge pages and move executable code to them, not from
a file, but using anonymous memory or other sources of huge pages. By
definition we will fail to find build ids for such anonymous memory,
but we may also break the non file backed hugepage case if the
alignment is such that the ELF header is on the hugepage and for some
reason not in the page cache. File backed huge pages solve this
problem.

Thanks,
Ian

> I think this patch series is useful if it can demonstrate a speedup
> during recording (perf record or perf record | perf inject -b). But it
> needs to be
> optimized to minimize the volume of useless info returned. And Jiri
> needs to decide if MMAP3 is a replacement of MMAP2, or a different
> kind of record
> targeted at ELF images only in which case some of the fields may be
> removed. My tendency would be to go for the latter.


Re: [PATCH 02/26] perf: Introduce mmap3 version of mmap event

2020-09-14 Thread Arnaldo Carvalho de Melo
Em Mon, Sep 14, 2020 at 09:50:45PM +0200, Jiri Olsa escreveu:
> On Mon, Sep 14, 2020 at 12:31:34PM -0300, Arnaldo Carvalho de Melo wrote:
> 
> SNIP
> 
> > > > ---
> > > >  include/uapi/linux/perf_event.h | 27 ++-
> > > >  kernel/events/core.c| 38 +++--
> > > >  2 files changed, 57 insertions(+), 8 deletions(-)
> > > >
> > > > diff --git a/include/uapi/linux/perf_event.h 
> > > > b/include/uapi/linux/perf_event.h
> > > > index 077e7ee69e3d..facfc3c673ed 100644
> > > > --- a/include/uapi/linux/perf_event.h
> > > > +++ b/include/uapi/linux/perf_event.h
> > > > @@ -384,7 +384,8 @@ struct perf_event_attr {
> > > > aux_output :  1, /* generate AUX 
> > > > records instead of events */
> > > > cgroup :  1, /* include cgroup 
> > > > events */
> > > > text_poke  :  1, /* include text 
> > > > poke events */
> > > > -   __reserved_1   : 30;
> > > > +   mmap3  :  1, /* include bpf 
> > > > events */
> > > > +   __reserved_1   : 29;
> > > >
> > > what happens if I set mmap3 and mmap2?
> > > 
> > > I think using mmap3 for every mmap may be overkill as you add useless
> > > 20 bytes to an mmap record.
> > 
> > So use just PERF_RECORD_MMAP2.
> > 
> > I think if the user says: I need buildids, then, in kernels with support
> > for getting the buildid in MMAP records, use it as its more accurate,
> > otherwise fall back to traversing all records at the end to go over lots
> > of files haversting those build-ids.
> 
> ok, so special record option to enable this
> 
> > 
> > If the user says I don't want build-ids, nothing changes, no collection
> > at the end, perf continues using PERF_RECORD_MMAP2.
> 
> and that's -B option in record

Yeah, so if -B is used, MMAP2, otherwise, the best available option,
which is MMAP3, which by now means more how you  tweak the misc bits and
what you collect, buildids or just the maj/min/ino :)

- Arnaldo
 
> > 
> > > I am not sure if your code handles the case where mmap3 is not needed
> > > because there is no buildid, e.g, anonymous memory.
> > > It seems to me you've written the patch in such a way that if the user
> > > tool supports mmap3, then it supersedes mmap2, and thus
> > > you need all the fields of mmap2. But if could be more interesting to
> > > return either MMAP2 or MMAP3 depending on tool support
> > > and type of mmap, that would certainly save 20 bytes on any anon mmap.
> > > But maybe that logic is already in your patch and I missed it.
> > 
> > Right, it should take into account if the user asked for build-ids or
> > not in addition to checking if the kernel supports MMAP3.
> 
> right, thanks,
> 
> jirka
> 

-- 

- Arnaldo


Re: [PATCH 02/26] perf: Introduce mmap3 version of mmap event

2020-09-14 Thread Arnaldo Carvalho de Melo
Em Mon, Sep 14, 2020 at 09:39:07PM +0200, Jiri Olsa escreveu:
> On Mon, Sep 14, 2020 at 12:28:41PM -0300, Arnaldo Carvalho de Melo wrote:
> > Em Mon, Sep 14, 2020 at 02:38:27PM +0900, Namhyung Kim escreveu:
> > > On Mon, Sep 14, 2020 at 6:03 AM Jiri Olsa  wrote:
> > > > Add new version of mmap event. The MMAP3 record is an
> > > > augmented version of MMAP2, it adds build id value to
> > > > identify the exact binary object behind memory map:
 
> > > >   struct {
> > > > struct perf_event_header header;
 
> > > > u32  pid, tid;
> > > > u64  addr;
> > > > u64  len;
> > > > u64  pgoff;
> > > > u32  maj;
> > > > u32  min;
> > > > u64  ino;
> > > > u64  ino_generation;
> > > > u32  prot, flags;
> > > > u32  reserved;
 
> > What for this reserved? its all nicely aligned already, u64 followed by
> > two u32 (prot, flags).
 
> > > > u8   buildid[20];
  
> > > Do we need maj, min, ino, ino_generation for mmap3 event?
> > > I think they are to compare binaries, then we can do it with
> > > build-id (and I think it'd be better)..
 
> > Humm, I thought MMAP2 would be a superset of MMAP and MMAP3 would be a
> > superset of MMAP2.
 
> > If we want to ditch useless stuff, then trow away pid, tid too, as we
> > can select those via sample_type.
 
> > Having said that, at this point I don't even know if adding new
> > PERF_RECORD_ that are an update for a preexisting one is the right way
> > to proceed.

> > Perhaps we should attach a BPF program to point where a mmap/munmap is
> > being done (perf_event_mmap()) and allow userspace to ask for whatever
> > it wants?  With a kprobes there right now we can implement this MMAP3
> > easily, no?
 
> hmm, I'm always woried about solutions based on kprobes,
> because once the function is moved/removed you're screwed
> and need to keep up with function name changes and be
> backward compatible..

Well, I'm not advocating to have it as kprobes permanently, but we can
implement it now using a kprobes, i.e. systems wouldn't have to have its
kernel updated to have this feature, but once then need, for some other
reason, to have their kernel upgraded, then perf would notice that there
is a tracepoint for that and would happily use it.

So we would be able to use that tracepoint with things like ftrace,
bpftrace, everything that knows about tracepoints, and perf would get
build-ids and whatever else is needed to have a mmap record, in the
future we could even ask for some more (or less) according to the what
is needed for some new feature.

I.e. the point wasn't about kprobes was about using BPF to state what
we want to collect when a mmap is being put in place.

- Arnaldo


Re: [PATCH 02/26] perf: Introduce mmap3 version of mmap event

2020-09-14 Thread Jiri Olsa
On Mon, Sep 14, 2020 at 10:08:01AM -0700, Ian Rogers wrote:

SNIP

> >
> > Using one of the MISC bits to resolve the union. Might actually bring
> > benefit to everyone. Us normal people get to have a smaller MMAP record,
> > while the buildid folks can have it too.
> >
> > Even more extreme would be using 2 MISC bits and allowing the union to
> > be 0 sized for anon.
> >
> > That said; I have the nagging feeling there were unresolved issues with
> > mmap2, but I can't seem to find any relevant emails on it :/ My
> > google-fu is weak today.
> 
> Firstly, thanks Jiri for this really useful patch set for our
> use-cases! Some thoughts:
> 
> One issue with mmap2 events at the moment is that they happen "after"
> the mmap is performed. This allows the mapped address to be known but
> has the unfortunate problem of causing mmap events to get "extended"
> due to adjacent vmas being merged. There are some workarounds like
> commit c8f6ae1fb28d (perf inject jit: Remove //anon mmap events).
> Perhaps these events can switch to reporting the length the mmap
> requested rather than the length of the vma?

seems like separate feature, perhaps we could use another MISC bit for that?

> 
> I could imagine that changes here could be interesting to folks doing
> CHERI or hypervisors, for example, they may want more address bits.
> 
> BPF has stack traces with elements of buildid + offset. Using these in
> perf samples would avoid the need for mmap events, synthesis, etc. but
> at the cost of larger and possibly slower stack traces. Perhaps we
> should just remove the idea of mmap events altogether?

hm, we also need mmap events to resolve PERF_SAMPLE_IP, right?
also not sure how would we do that for dwarf unwind, and perhaps
there are some other places.. c2c data address resolving?

thanks,
jirka



Re: [PATCH 02/26] perf: Introduce mmap3 version of mmap event

2020-09-14 Thread Jiri Olsa
On Mon, Sep 14, 2020 at 10:26:05AM -0700, Stephane Eranian wrote:
> On Mon, Sep 14, 2020 at 2:08 AM  wrote:
> >
> > On Sun, Sep 13, 2020 at 11:41:00PM -0700, Stephane Eranian wrote:
> > > On Sun, Sep 13, 2020 at 2:03 PM Jiri Olsa  wrote:
> > > what happens if I set mmap3 and mmap2?
> > >
> > > I think using mmap3 for every mmap may be overkill as you add useless
> > > 20 bytes to an mmap record.
> > > I am not sure if your code handles the case where mmap3 is not needed
> > > because there is no buildid, e.g, anonymous memory.
> > > It seems to me you've written the patch in such a way that if the user
> > > tool supports mmap3, then it supersedes mmap2, and thus
> > > you need all the fields of mmap2. But if could be more interesting to
> > > return either MMAP2 or MMAP3 depending on tool support
> > > and type of mmap, that would certainly save 20 bytes on any anon mmap.
> > > But maybe that logic is already in your patch and I missed it.
> >
> > That, and what if you don't want any of that buildid nonsense at all? I
> > always kill that because it makes perf pointlessly slow and has
> > absolutely no upsides for me.
> >
> I have seen situations where the perf tool takes a visibly significant
> amount of time (many seconds) to inject the buildids at the end of the
> collection
> of perf record (same if using perf inject -b). That is because it
> needs to go through all the records in the perf.data to find MMAP
> records and then read
> the buildids from the filesystem. This has caused some problems in our
> environment. Having the kernel add the buildid to *relevant* mmaps
> would avoid
> a lot of that penalty, by avoiding having to parse the perf.data file
> and leveraging the fact that the buildid may be in memory already.
> Although my concern on
> this has to do with large pages and the impact they have on alignment
> of sections in memory.  I think Ian can comment better on this.
> 
> I think this patch series is useful if it can demonstrate a speedup
> during recording (perf record or perf record | perf inject -b). But it

I haven't meassured, but I assume skipping of perf.data search
at the end of the record will make up for reading buildid for
each mmap event.. migt be tricky in mmap events heavy workloads

> needs to be
> optimized to minimize the volume of useless info returned. And Jiri
> needs to decide if MMAP3 is a replacement of MMAP2, or a different
> kind of record
> targeted at ELF images only in which case some of the fields may be
> removed. My tendency would be to go for the latter.

yes, I like the latter as well, let's see

thanks,
jirka



Re: [PATCH 02/26] perf: Introduce mmap3 version of mmap event

2020-09-14 Thread Jiri Olsa
On Mon, Sep 14, 2020 at 12:31:34PM -0300, Arnaldo Carvalho de Melo wrote:

SNIP

> > > ---
> > >  include/uapi/linux/perf_event.h | 27 ++-
> > >  kernel/events/core.c| 38 +++--
> > >  2 files changed, 57 insertions(+), 8 deletions(-)
> > >
> > > diff --git a/include/uapi/linux/perf_event.h 
> > > b/include/uapi/linux/perf_event.h
> > > index 077e7ee69e3d..facfc3c673ed 100644
> > > --- a/include/uapi/linux/perf_event.h
> > > +++ b/include/uapi/linux/perf_event.h
> > > @@ -384,7 +384,8 @@ struct perf_event_attr {
> > > aux_output :  1, /* generate AUX 
> > > records instead of events */
> > > cgroup :  1, /* include cgroup 
> > > events */
> > > text_poke  :  1, /* include text poke 
> > > events */
> > > -   __reserved_1   : 30;
> > > +   mmap3  :  1, /* include bpf 
> > > events */
> > > +   __reserved_1   : 29;
> > >
> > what happens if I set mmap3 and mmap2?
> > 
> > I think using mmap3 for every mmap may be overkill as you add useless
> > 20 bytes to an mmap record.
> 
> So use just PERF_RECORD_MMAP2.
> 
> I think if the user says: I need buildids, then, in kernels with support
> for getting the buildid in MMAP records, use it as its more accurate,
> otherwise fall back to traversing all records at the end to go over lots
> of files haversting those build-ids.

ok, so special record option to enable this

> 
> If the user says I don't want build-ids, nothing changes, no collection
> at the end, perf continues using PERF_RECORD_MMAP2.

and that's -B option in record

> 
> > I am not sure if your code handles the case where mmap3 is not needed
> > because there is no buildid, e.g, anonymous memory.
> > It seems to me you've written the patch in such a way that if the user
> > tool supports mmap3, then it supersedes mmap2, and thus
> > you need all the fields of mmap2. But if could be more interesting to
> > return either MMAP2 or MMAP3 depending on tool support
> > and type of mmap, that would certainly save 20 bytes on any anon mmap.
> > But maybe that logic is already in your patch and I missed it.
> 
> Right, it should take into account if the user asked for build-ids or
> not in addition to checking if the kernel supports MMAP3.

right, thanks,

jirka



Re: [PATCH 02/26] perf: Introduce mmap3 version of mmap event

2020-09-14 Thread Jiri Olsa
On Sun, Sep 13, 2020 at 11:41:00PM -0700, Stephane Eranian wrote:
> On Sun, Sep 13, 2020 at 2:03 PM Jiri Olsa  wrote:
> >
> > Add new version of mmap event. The MMAP3 record is an
> > augmented version of MMAP2, it adds build id value to
> > identify the exact binary object behind memory map:
> >
> >   struct {
> > struct perf_event_header header;
> >
> > u32  pid, tid;
> > u64  addr;
> > u64  len;
> > u64  pgoff;
> > u32  maj;
> > u32  min;
> > u64  ino;
> > u64  ino_generation;
> > u32  prot, flags;
> > u32  reserved;
> > u8   buildid[20];
> > char filename[];
> > struct sample_id sample_id;
> >   };
> >
> > Adding 4 bytes reserved field to align buildid data to 8 bytes,
> > so sample_id data is properly aligned.
> >
> > The mmap3 event is enabled by new mmap3 bit in perf_event_attr
> > struct.  When set for an event, it enables the build id retrieval
> > and will use mmap3 format for the event.
> >
> > Keeping track of mmap3 events and calling build_id_parse
> > in perf_event_mmap_event only if we have any defined.
> >
> > Having build id attached directly to the mmap event will help
> > tool like perf to skip final search through perf data for
> > binaries that are needed in the report time. Also it prevents
> > possible race when the binary could be removed or replaced
> > during profiling.
> >
> > Signed-off-by: Jiri Olsa 
> > ---
> >  include/uapi/linux/perf_event.h | 27 ++-
> >  kernel/events/core.c| 38 +++--
> >  2 files changed, 57 insertions(+), 8 deletions(-)
> >
> > diff --git a/include/uapi/linux/perf_event.h 
> > b/include/uapi/linux/perf_event.h
> > index 077e7ee69e3d..facfc3c673ed 100644
> > --- a/include/uapi/linux/perf_event.h
> > +++ b/include/uapi/linux/perf_event.h
> > @@ -384,7 +384,8 @@ struct perf_event_attr {
> > aux_output :  1, /* generate AUX 
> > records instead of events */
> > cgroup :  1, /* include cgroup 
> > events */
> > text_poke  :  1, /* include text poke 
> > events */
> > -   __reserved_1   : 30;
> > +   mmap3  :  1, /* include bpf events 
> > */
> > +   __reserved_1   : 29;
> >
> what happens if I set mmap3 and mmap2?

hum bad things probably ;-) I think mmap3 would overload mmap2

> 
> I think using mmap3 for every mmap may be overkill as you add useless
> 20 bytes to an mmap record.
> I am not sure if your code handles the case where mmap3 is not needed
> because there is no buildid, e.g, anonymous memory.
> It seems to me you've written the patch in such a way that if the user
> tool supports mmap3, then it supersedes mmap2, and thus
> you need all the fields of mmap2. But if could be more interesting to
> return either MMAP2 or MMAP3 depending on tool support
> and type of mmap, that would certainly save 20 bytes on any anon mmap.
> But maybe that logic is already in your patch and I missed it.

I like peter's idea of ditching mmap3 and use that maj/min..
area in mmap2 for buildid based on misc bits

jirka



Re: [PATCH 02/26] perf: Introduce mmap3 version of mmap event

2020-09-14 Thread Jiri Olsa
On Mon, Sep 14, 2020 at 06:35:34PM +0200, pet...@infradead.org wrote:
> On Mon, Sep 14, 2020 at 12:28:41PM -0300, Arnaldo Carvalho de Melo wrote:
> 
> > > >   struct {
> > > > struct perf_event_header header;
> > 
> > > > u32  pid, tid;
> > > > u64  addr;
> > > > u64  len;
> > > > u64  pgoff;
> > > > u32  maj;
> > > > u32  min;
> > > > u64  ino;
> > > > u64  ino_generation;
> > > > u32  prot, flags;
> > > > u32  reserved;
> > 
> > What for this reserved? its all nicely aligned already, u64 followed by
> > two u32 (prot, flags).
> 
> I suspect it is so that sizeof(reserve+buildid) is a multiple of 8. But
> yes, that's a wee bit daft, since the next field is a variable length
> character array.
> 
> > > > u8   buildid[20];
> >  
> > > Do we need maj, min, ino, ino_generation for mmap3 event?
> > > I think they are to compare binaries, then we can do it with
> > > build-id (and I think it'd be better)..
> > 
> > Humm, I thought MMAP2 would be a superset of MMAP and MMAP3 would be a
> > superset of MMAP2.
> 
> Well, the 'funny' thing is that if you do use buildid, then
> {maj,min,ino,ino_generation} are indeed superfluous, but are combined
> also large enough to contain buildid.

yay! nice

> 
> > If we want to ditch useless stuff, then trow away pid, tid too, as we
> > can select those via sample_type.
> 
> Correct.

can we? I think you could disable sample_id then
you won't have pid/tid in mmap event

> 
> So something like:
> 
> struct {
>   struct perf_event_header header;
> 
>   u64  addr;
>   u64  len;
>   u64  pgoff;
>   union {
> struct {
>   u32  maj;
>   u32  min;
>   u64  ino;
>   u64  ino_generation;
> };
> u8 buildid[20];
>   };
>   u32  prot, flags;
>   char   filename[];
>   struct sample_id sample_id;
> };
> 
> Using one of the MISC bits to resolve the union. Might actually bring
> benefit to everyone. Us normal people get to have a smaller MMAP record,
> while the buildid folks can have it too.
> 
> Even more extreme would be using 2 MISC bits and allowing the union to
> be 0 sized for anon.

I like that idea, I'll check on it

thanks,
jirka



Re: [PATCH 02/26] perf: Introduce mmap3 version of mmap event

2020-09-14 Thread Jiri Olsa
On Sun, Sep 13, 2020 at 11:20:31PM -0700, Song Liu wrote:
> On Sun, Sep 13, 2020 at 10:40 PM Namhyung Kim  wrote:
> >
> > On Mon, Sep 14, 2020 at 6:03 AM Jiri Olsa  wrote:
> > >
> > > Add new version of mmap event. The MMAP3 record is an
> > > augmented version of MMAP2, it adds build id value to
> > > identify the exact binary object behind memory map:
> > >
> > >   struct {
> > > struct perf_event_header header;
> > >
> > > u32  pid, tid;
> > > u64  addr;
> > > u64  len;
> > > u64  pgoff;
> > > u32  maj;
> > > u32  min;
> > > u64  ino;
> > > u64  ino_generation;
> > > u32  prot, flags;
> > > u32  reserved;
> 
> I guess we need reserved _after_ buildid, no?

it's there to align the size to 8 bytes,
so the sample_id is in proper place

but yes, perhaps after buildid would make more sense

> 
> > > u8   buildid[20];
> >
> > Do we need maj, min, ino, ino_generation for mmap3 event?
> > I think they are to compare binaries, then we can do it with
> > build-id (and I think it'd be better)..
> 
> +1 we shouldn't need maj, min, etc.

right, and as peter already wrote buildid could fit
in that space.. yay :)

thanks,
jirka

> 
> Thanks,
> Song
> 
> [...]
> 



Re: [PATCH 02/26] perf: Introduce mmap3 version of mmap event

2020-09-14 Thread Jiri Olsa
On Mon, Sep 14, 2020 at 02:38:27PM +0900, Namhyung Kim wrote:

SNIP

> > diff --git a/include/uapi/linux/perf_event.h 
> > b/include/uapi/linux/perf_event.h
> > index 077e7ee69e3d..facfc3c673ed 100644
> > --- a/include/uapi/linux/perf_event.h
> > +++ b/include/uapi/linux/perf_event.h
> > @@ -384,7 +384,8 @@ struct perf_event_attr {
> > aux_output :  1, /* generate AUX 
> > records instead of events */
> > cgroup :  1, /* include cgroup 
> > events */
> > text_poke  :  1, /* include text poke 
> > events */
> > -   __reserved_1   : 30;
> > +   mmap3  :  1, /* include bpf events 
> > */
> 
> ???
> 
> > +   __reserved_1   : 29;
> >
> > union {
> > __u32   wakeup_events;/* wakeup every n events 
> > */
> > @@ -1060,6 +1061,30 @@ enum perf_event_type {
> >  */
> > PERF_RECORD_TEXT_POKE   = 20,
> >
> > +   /*
> > +* The MMAP3 records are an augmented version of MMAP2, they add
> > +* build id value to identify the exact binary behind map
> > +*
> > +* struct {
> > +*  struct perf_event_headerheader;
> > +*
> > +*  u32 pid, tid;
> > +*  u64 addr;
> > +*  u64 len;
> > +*  u64 pgoff;
> > +*  u32 maj;
> > +*  u32 min;
> > +*  u64 ino;
> > +*  u64 ino_generation;
> > +*  u32 prot, flags;
> > +*  u32 reserved;
> > +*  u8  buildid[20];
> > +*  charfilename[];
> > +*  struct sample_idsample_id;
> > +* };
> > +*/
> > +   PERF_RECORD_MMAP3   = 21,
> > +
> > PERF_RECORD_MAX,/* non-ABI */
> >  };
> >
> [SNIP]
> > @@ -8098,6 +8116,9 @@ static void perf_event_mmap_event(struct 
> > perf_mmap_event *mmap_event)
> > mmap_event->prot = prot;
> > mmap_event->flags = flags;
> >
> > +   if (atomic_read(&nr_mmap3_events))
> > +   build_id_parse(vma, mmap_event->buildid);
> 
> What about if it failed?  We should zero out the build-id..

it is initialized to zero in perf_event_mmap

mmap_event = (struct perf_mmap_event){
.vma= vma,
...

I'll double check build_id_parse won't leave anything half
baked there, but I dont think so

thanks,
jirka



Re: [PATCH 02/26] perf: Introduce mmap3 version of mmap event

2020-09-14 Thread Jiri Olsa
On Mon, Sep 14, 2020 at 12:28:41PM -0300, Arnaldo Carvalho de Melo wrote:
> Em Mon, Sep 14, 2020 at 02:38:27PM +0900, Namhyung Kim escreveu:
> > On Mon, Sep 14, 2020 at 6:03 AM Jiri Olsa  wrote:
> > > Add new version of mmap event. The MMAP3 record is an
> > > augmented version of MMAP2, it adds build id value to
> > > identify the exact binary object behind memory map:
> 
> > >   struct {
> > > struct perf_event_header header;
> 
> > > u32  pid, tid;
> > > u64  addr;
> > > u64  len;
> > > u64  pgoff;
> > > u32  maj;
> > > u32  min;
> > > u64  ino;
> > > u64  ino_generation;
> > > u32  prot, flags;
> > > u32  reserved;
> 
> What for this reserved? its all nicely aligned already, u64 followed by
> two u32 (prot, flags).
> 
> > > u8   buildid[20];
>  
> > Do we need maj, min, ino, ino_generation for mmap3 event?
> > I think they are to compare binaries, then we can do it with
> > build-id (and I think it'd be better)..
> 
> Humm, I thought MMAP2 would be a superset of MMAP and MMAP3 would be a
> superset of MMAP2.
> 
> If we want to ditch useless stuff, then trow away pid, tid too, as we
> can select those via sample_type.
> 
> Having said that, at this point I don't even know if adding new
> PERF_RECORD_ that are an update for a preexisting one is the right way
> to proceed.
> 
> Perhaps we should attach a BPF program to point where a mmap/munmap is
> being done (perf_event_mmap()) and allow userspace to ask for whatever
> it wants?  With a kprobes there right now we can implement this MMAP3
> easily, no?

hmm, I'm always woried about solutions based on kprobes,
because once the function is moved/removed you're screwed
and need to keep up with function name changes and be
backward compatible..

jirka



Re: [PATCH 02/26] perf: Introduce mmap3 version of mmap event

2020-09-14 Thread Stephane Eranian
On Mon, Sep 14, 2020 at 2:08 AM  wrote:
>
> On Sun, Sep 13, 2020 at 11:41:00PM -0700, Stephane Eranian wrote:
> > On Sun, Sep 13, 2020 at 2:03 PM Jiri Olsa  wrote:
> > what happens if I set mmap3 and mmap2?
> >
> > I think using mmap3 for every mmap may be overkill as you add useless
> > 20 bytes to an mmap record.
> > I am not sure if your code handles the case where mmap3 is not needed
> > because there is no buildid, e.g, anonymous memory.
> > It seems to me you've written the patch in such a way that if the user
> > tool supports mmap3, then it supersedes mmap2, and thus
> > you need all the fields of mmap2. But if could be more interesting to
> > return either MMAP2 or MMAP3 depending on tool support
> > and type of mmap, that would certainly save 20 bytes on any anon mmap.
> > But maybe that logic is already in your patch and I missed it.
>
> That, and what if you don't want any of that buildid nonsense at all? I
> always kill that because it makes perf pointlessly slow and has
> absolutely no upsides for me.
>
I have seen situations where the perf tool takes a visibly significant
amount of time (many seconds) to inject the buildids at the end of the
collection
of perf record (same if using perf inject -b). That is because it
needs to go through all the records in the perf.data to find MMAP
records and then read
the buildids from the filesystem. This has caused some problems in our
environment. Having the kernel add the buildid to *relevant* mmaps
would avoid
a lot of that penalty, by avoiding having to parse the perf.data file
and leveraging the fact that the buildid may be in memory already.
Although my concern on
this has to do with large pages and the impact they have on alignment
of sections in memory.  I think Ian can comment better on this.

I think this patch series is useful if it can demonstrate a speedup
during recording (perf record or perf record | perf inject -b). But it
needs to be
optimized to minimize the volume of useless info returned. And Jiri
needs to decide if MMAP3 is a replacement of MMAP2, or a different
kind of record
targeted at ELF images only in which case some of the fields may be
removed. My tendency would be to go for the latter.


Re: [PATCH 02/26] perf: Introduce mmap3 version of mmap event

2020-09-14 Thread Ian Rogers
On Mon, Sep 14, 2020 at 9:35 AM  wrote:
>
> On Mon, Sep 14, 2020 at 12:28:41PM -0300, Arnaldo Carvalho de Melo wrote:
>
> > > >   struct {
> > > > struct perf_event_header header;
> >
> > > > u32  pid, tid;
> > > > u64  addr;
> > > > u64  len;
> > > > u64  pgoff;
> > > > u32  maj;
> > > > u32  min;
> > > > u64  ino;
> > > > u64  ino_generation;
> > > > u32  prot, flags;
> > > > u32  reserved;
> >
> > What for this reserved? its all nicely aligned already, u64 followed by
> > two u32 (prot, flags).
>
> I suspect it is so that sizeof(reserve+buildid) is a multiple of 8. But
> yes, that's a wee bit daft, since the next field is a variable length
> character array.
>
> > > > u8   buildid[20];
> >
> > > Do we need maj, min, ino, ino_generation for mmap3 event?
> > > I think they are to compare binaries, then we can do it with
> > > build-id (and I think it'd be better)..
> >
> > Humm, I thought MMAP2 would be a superset of MMAP and MMAP3 would be a
> > superset of MMAP2.
>
> Well, the 'funny' thing is that if you do use buildid, then
> {maj,min,ino,ino_generation} are indeed superfluous, but are combined
> also large enough to contain buildid.
>
> > If we want to ditch useless stuff, then trow away pid, tid too, as we
> > can select those via sample_type.
>
> Correct.
>
> So something like:
>
> struct {
>   struct perf_event_header header;
>
>   u64  addr;
>   u64  len;
>   u64  pgoff;
>   union {
> struct {
>   u32  maj;
>   u32  min;
>   u64  ino;
>   u64  ino_generation;
> };
> u8 buildid[20];
>   };
>   u32  prot, flags;
>   char filename[];
>   struct sample_id sample_id;
> };
>
> Using one of the MISC bits to resolve the union. Might actually bring
> benefit to everyone. Us normal people get to have a smaller MMAP record,
> while the buildid folks can have it too.
>
> Even more extreme would be using 2 MISC bits and allowing the union to
> be 0 sized for anon.
>
> That said; I have the nagging feeling there were unresolved issues with
> mmap2, but I can't seem to find any relevant emails on it :/ My
> google-fu is weak today.

Firstly, thanks Jiri for this really useful patch set for our
use-cases! Some thoughts:

One issue with mmap2 events at the moment is that they happen "after"
the mmap is performed. This allows the mapped address to be known but
has the unfortunate problem of causing mmap events to get "extended"
due to adjacent vmas being merged. There are some workarounds like
commit c8f6ae1fb28d (perf inject jit: Remove //anon mmap events).
Perhaps these events can switch to reporting the length the mmap
requested rather than the length of the vma?

I could imagine that changes here could be interesting to folks doing
CHERI or hypervisors, for example, they may want more address bits.

BPF has stack traces with elements of buildid + offset. Using these in
perf samples would avoid the need for mmap events, synthesis, etc. but
at the cost of larger and possibly slower stack traces. Perhaps we
should just remove the idea of mmap events altogether?

Thanks,
Ian


Re: [PATCH 02/26] perf: Introduce mmap3 version of mmap event

2020-09-14 Thread peterz
On Mon, Sep 14, 2020 at 12:28:41PM -0300, Arnaldo Carvalho de Melo wrote:

> > >   struct {
> > > struct perf_event_header header;
> 
> > > u32  pid, tid;
> > > u64  addr;
> > > u64  len;
> > > u64  pgoff;
> > > u32  maj;
> > > u32  min;
> > > u64  ino;
> > > u64  ino_generation;
> > > u32  prot, flags;
> > > u32  reserved;
> 
> What for this reserved? its all nicely aligned already, u64 followed by
> two u32 (prot, flags).

I suspect it is so that sizeof(reserve+buildid) is a multiple of 8. But
yes, that's a wee bit daft, since the next field is a variable length
character array.

> > > u8   buildid[20];
>  
> > Do we need maj, min, ino, ino_generation for mmap3 event?
> > I think they are to compare binaries, then we can do it with
> > build-id (and I think it'd be better)..
> 
> Humm, I thought MMAP2 would be a superset of MMAP and MMAP3 would be a
> superset of MMAP2.

Well, the 'funny' thing is that if you do use buildid, then
{maj,min,ino,ino_generation} are indeed superfluous, but are combined
also large enough to contain buildid.

> If we want to ditch useless stuff, then trow away pid, tid too, as we
> can select those via sample_type.

Correct.

So something like:

struct {
  struct perf_event_header header;

  u64  addr;
  u64  len;
  u64  pgoff;
  union {
struct {
  u32  maj;
  u32  min;
  u64  ino;
  u64  ino_generation;
};
u8 buildid[20];
  };
  u32  prot, flags;
  char filename[];
  struct sample_id sample_id;
};

Using one of the MISC bits to resolve the union. Might actually bring
benefit to everyone. Us normal people get to have a smaller MMAP record,
while the buildid folks can have it too.

Even more extreme would be using 2 MISC bits and allowing the union to
be 0 sized for anon.

That said; I have the nagging feeling there were unresolved issues with
mmap2, but I can't seem to find any relevant emails on it :/ My
google-fu is weak today.


Re: [PATCH 02/26] perf: Introduce mmap3 version of mmap event

2020-09-14 Thread Arnaldo Carvalho de Melo
Em Mon, Sep 14, 2020 at 11:08:11AM +0200, pet...@infradead.org escreveu:
> On Sun, Sep 13, 2020 at 11:41:00PM -0700, Stephane Eranian wrote:
> > On Sun, Sep 13, 2020 at 2:03 PM Jiri Olsa  wrote:
> > what happens if I set mmap3 and mmap2?
> > 
> > I think using mmap3 for every mmap may be overkill as you add useless
> > 20 bytes to an mmap record.
> > I am not sure if your code handles the case where mmap3 is not needed
> > because there is no buildid, e.g, anonymous memory.
> > It seems to me you've written the patch in such a way that if the user
> > tool supports mmap3, then it supersedes mmap2, and thus
> > you need all the fields of mmap2. But if could be more interesting to
> > return either MMAP2 or MMAP3 depending on tool support
> > and type of mmap, that would certainly save 20 bytes on any anon mmap.
> > But maybe that logic is already in your patch and I missed it.
> 
> That, and what if you don't want any of that buildid nonsense at all? I
> always kill that because it makes perf pointlessly slow and has
> absolutely no upsides for me.

So, for you nothing should change, no MMAP3 used, no collection at the
end (which is your pet peeve).

I'm not saying this is what is in his patches right now, but what I
think his patches should be doing.

- Arnaldo


Re: [PATCH 02/26] perf: Introduce mmap3 version of mmap event

2020-09-14 Thread Arnaldo Carvalho de Melo
Em Sun, Sep 13, 2020 at 11:41:00PM -0700, Stephane Eranian escreveu:
> On Sun, Sep 13, 2020 at 2:03 PM Jiri Olsa  wrote:
> >
> > Add new version of mmap event. The MMAP3 record is an
> > augmented version of MMAP2, it adds build id value to
> > identify the exact binary object behind memory map:
> >
> >   struct {
> > struct perf_event_header header;
> >
> > u32  pid, tid;
> > u64  addr;
> > u64  len;
> > u64  pgoff;
> > u32  maj;
> > u32  min;
> > u64  ino;
> > u64  ino_generation;
> > u32  prot, flags;
> > u32  reserved;
> > u8   buildid[20];
> > char filename[];
> > struct sample_id sample_id;
> >   };
> >
> > Adding 4 bytes reserved field to align buildid data to 8 bytes,
> > so sample_id data is properly aligned.
> >
> > The mmap3 event is enabled by new mmap3 bit in perf_event_attr
> > struct.  When set for an event, it enables the build id retrieval
> > and will use mmap3 format for the event.
> >
> > Keeping track of mmap3 events and calling build_id_parse
> > in perf_event_mmap_event only if we have any defined.
> >
> > Having build id attached directly to the mmap event will help
> > tool like perf to skip final search through perf data for
> > binaries that are needed in the report time. Also it prevents
> > possible race when the binary could be removed or replaced
> > during profiling.
> >
> > Signed-off-by: Jiri Olsa 
> > ---
> >  include/uapi/linux/perf_event.h | 27 ++-
> >  kernel/events/core.c| 38 +++--
> >  2 files changed, 57 insertions(+), 8 deletions(-)
> >
> > diff --git a/include/uapi/linux/perf_event.h 
> > b/include/uapi/linux/perf_event.h
> > index 077e7ee69e3d..facfc3c673ed 100644
> > --- a/include/uapi/linux/perf_event.h
> > +++ b/include/uapi/linux/perf_event.h
> > @@ -384,7 +384,8 @@ struct perf_event_attr {
> > aux_output :  1, /* generate AUX 
> > records instead of events */
> > cgroup :  1, /* include cgroup 
> > events */
> > text_poke  :  1, /* include text poke 
> > events */
> > -   __reserved_1   : 30;
> > +   mmap3  :  1, /* include bpf events 
> > */
> > +   __reserved_1   : 29;
> >
> what happens if I set mmap3 and mmap2?
> 
> I think using mmap3 for every mmap may be overkill as you add useless
> 20 bytes to an mmap record.

So use just PERF_RECORD_MMAP2.

I think if the user says: I need buildids, then, in kernels with support
for getting the buildid in MMAP records, use it as its more accurate,
otherwise fall back to traversing all records at the end to go over lots
of files haversting those build-ids.

If the user says I don't want build-ids, nothing changes, no collection
at the end, perf continues using PERF_RECORD_MMAP2.

> I am not sure if your code handles the case where mmap3 is not needed
> because there is no buildid, e.g, anonymous memory.
> It seems to me you've written the patch in such a way that if the user
> tool supports mmap3, then it supersedes mmap2, and thus
> you need all the fields of mmap2. But if could be more interesting to
> return either MMAP2 or MMAP3 depending on tool support
> and type of mmap, that would certainly save 20 bytes on any anon mmap.
> But maybe that logic is already in your patch and I missed it.

Right, it should take into account if the user asked for build-ids or
not in addition to checking if the kernel supports MMAP3.

- Arnaldo

> 
> > union {
> > __u32   wakeup_events;/* wakeup every n events 
> > */
> > @@ -1060,6 +1061,30 @@ enum perf_event_type {
> >  */
> > PERF_RECORD_TEXT_POKE   = 20,
> >
> > +   /*
> > +* The MMAP3 records are an augmented version of MMAP2, they add
> > +* build id value to identify the exact binary behind map
> > +*
> > +* struct {
> > +*  struct perf_event_headerheader;
> > +*
> > +*  u32 pid, tid;
> > +*  u64 addr;
> > +*  u64 len;
> > +*  u64 pgoff;
> > +*  u32 maj;
> > +*  u32 min;
> > +*  u64 ino;
> > +*  u64 ino_generation;
> > +*  u32 prot, flags;
> > +*  u32 reserved;
> > +*  u8   

Re: [PATCH 02/26] perf: Introduce mmap3 version of mmap event

2020-09-14 Thread Arnaldo Carvalho de Melo
Em Mon, Sep 14, 2020 at 02:38:27PM +0900, Namhyung Kim escreveu:
> On Mon, Sep 14, 2020 at 6:03 AM Jiri Olsa  wrote:
> > Add new version of mmap event. The MMAP3 record is an
> > augmented version of MMAP2, it adds build id value to
> > identify the exact binary object behind memory map:

> >   struct {
> > struct perf_event_header header;

> > u32  pid, tid;
> > u64  addr;
> > u64  len;
> > u64  pgoff;
> > u32  maj;
> > u32  min;
> > u64  ino;
> > u64  ino_generation;
> > u32  prot, flags;
> > u32  reserved;

What for this reserved? its all nicely aligned already, u64 followed by
two u32 (prot, flags).

> > u8   buildid[20];
 
> Do we need maj, min, ino, ino_generation for mmap3 event?
> I think they are to compare binaries, then we can do it with
> build-id (and I think it'd be better)..

Humm, I thought MMAP2 would be a superset of MMAP and MMAP3 would be a
superset of MMAP2.

If we want to ditch useless stuff, then trow away pid, tid too, as we
can select those via sample_type.

Having said that, at this point I don't even know if adding new
PERF_RECORD_ that are an update for a preexisting one is the right way
to proceed.

Perhaps we should attach a BPF program to point where a mmap/munmap is
being done (perf_event_mmap()) and allow userspace to ask for whatever
it wants?  With a kprobes there right now we can implement this MMAP3
easily, no?

Start with a kprobes and all this would be already available in kernels
with BPF, no need to reboot with a PERF_RECORD_MMAP3 enabled kernel,
when we get a tracepoint there, then use it, as its more efficient.

sample_id stuff would be done as with other records, etc, just the
things that are MMAP3 specific would be in the payload, perf.data has
the struct layout description, etc.

Then use a BPF_TRACE_ITER to generate preexisting MMAP records instead
of going thru /proc/ doing tons of syscalls, instead injecting directly
into the perf ring buffer the MMAP3 (or MMAP2 or MMAP or something else
according to the tools needs).

- Arnaldo
 
> 
> > char filename[];
> > struct sample_id sample_id;
> >   };
> >
> > Adding 4 bytes reserved field to align buildid data to 8 bytes,
> > so sample_id data is properly aligned.
> >
> > The mmap3 event is enabled by new mmap3 bit in perf_event_attr
> > struct.  When set for an event, it enables the build id retrieval
> > and will use mmap3 format for the event.
> >
> > Keeping track of mmap3 events and calling build_id_parse
> > in perf_event_mmap_event only if we have any defined.
> >
> > Having build id attached directly to the mmap event will help
> > tool like perf to skip final search through perf data for
> > binaries that are needed in the report time. Also it prevents
> > possible race when the binary could be removed or replaced
> > during profiling.
> >
> > Signed-off-by: Jiri Olsa 
> > ---
> >  include/uapi/linux/perf_event.h | 27 ++-
> >  kernel/events/core.c| 38 +++--
> >  2 files changed, 57 insertions(+), 8 deletions(-)
> >
> > diff --git a/include/uapi/linux/perf_event.h 
> > b/include/uapi/linux/perf_event.h
> > index 077e7ee69e3d..facfc3c673ed 100644
> > --- a/include/uapi/linux/perf_event.h
> > +++ b/include/uapi/linux/perf_event.h
> > @@ -384,7 +384,8 @@ struct perf_event_attr {
> > aux_output :  1, /* generate AUX 
> > records instead of events */
> > cgroup :  1, /* include cgroup 
> > events */
> > text_poke  :  1, /* include text poke 
> > events */
> > -   __reserved_1   : 30;
> > +   mmap3  :  1, /* include bpf events 
> > */
> 
> ???
> 
> > +   __reserved_1   : 29;
> >
> > union {
> > __u32   wakeup_events;/* wakeup every n events 
> > */
> > @@ -1060,6 +1061,30 @@ enum perf_event_type {
> >  */
> > PERF_RECORD_TEXT_POKE   = 20,
> >
> > +   /*
> > +* The MMAP3 records are an augmented version of MMAP2, they add
> > +* build id value to identify the exact binary behind map
> > +*
> > +* struct {
> > +*  struct perf_event_headerheader;
> > +*
> > +*  u32 pid, tid;
> > +*  u64 addr;
> > +*  u64 len;
> > +*  u64 pgoff;
> > +*  u32 maj;
> > +*  u32 min;
> > +*  u64  

Re: [PATCH 02/26] perf: Introduce mmap3 version of mmap event

2020-09-14 Thread peterz
On Sun, Sep 13, 2020 at 11:02:49PM +0200, Jiri Olsa wrote:
> Add new version of mmap event. The MMAP3 record is an
> augmented version of MMAP2, it adds build id value to
> identify the exact binary object behind memory map:
> 
>   struct {
> struct perf_event_header header;
> 
> u32  pid, tid;
> u64  addr;
> u64  len;
> u64  pgoff;
> u32  maj;
> u32  min;
> u64  ino;
> u64  ino_generation;
> u32  prot, flags;
> u32  reserved;
> u8   buildid[20];
> char filename[];
> struct sample_id sample_id;
>   };
> 

So weren't there still open problems with mmap2 that also needed
addressing? I seem to have forgotten :/


Re: [PATCH 02/26] perf: Introduce mmap3 version of mmap event

2020-09-14 Thread peterz
On Sun, Sep 13, 2020 at 11:41:00PM -0700, Stephane Eranian wrote:
> On Sun, Sep 13, 2020 at 2:03 PM Jiri Olsa  wrote:
> what happens if I set mmap3 and mmap2?
> 
> I think using mmap3 for every mmap may be overkill as you add useless
> 20 bytes to an mmap record.
> I am not sure if your code handles the case where mmap3 is not needed
> because there is no buildid, e.g, anonymous memory.
> It seems to me you've written the patch in such a way that if the user
> tool supports mmap3, then it supersedes mmap2, and thus
> you need all the fields of mmap2. But if could be more interesting to
> return either MMAP2 or MMAP3 depending on tool support
> and type of mmap, that would certainly save 20 bytes on any anon mmap.
> But maybe that logic is already in your patch and I missed it.

That, and what if you don't want any of that buildid nonsense at all? I
always kill that because it makes perf pointlessly slow and has
absolutely no upsides for me.



Re: [PATCH 02/26] perf: Introduce mmap3 version of mmap event

2020-09-13 Thread Stephane Eranian
On Sun, Sep 13, 2020 at 2:03 PM Jiri Olsa  wrote:
>
> Add new version of mmap event. The MMAP3 record is an
> augmented version of MMAP2, it adds build id value to
> identify the exact binary object behind memory map:
>
>   struct {
> struct perf_event_header header;
>
> u32  pid, tid;
> u64  addr;
> u64  len;
> u64  pgoff;
> u32  maj;
> u32  min;
> u64  ino;
> u64  ino_generation;
> u32  prot, flags;
> u32  reserved;
> u8   buildid[20];
> char filename[];
> struct sample_id sample_id;
>   };
>
> Adding 4 bytes reserved field to align buildid data to 8 bytes,
> so sample_id data is properly aligned.
>
> The mmap3 event is enabled by new mmap3 bit in perf_event_attr
> struct.  When set for an event, it enables the build id retrieval
> and will use mmap3 format for the event.
>
> Keeping track of mmap3 events and calling build_id_parse
> in perf_event_mmap_event only if we have any defined.
>
> Having build id attached directly to the mmap event will help
> tool like perf to skip final search through perf data for
> binaries that are needed in the report time. Also it prevents
> possible race when the binary could be removed or replaced
> during profiling.
>
> Signed-off-by: Jiri Olsa 
> ---
>  include/uapi/linux/perf_event.h | 27 ++-
>  kernel/events/core.c| 38 +++--
>  2 files changed, 57 insertions(+), 8 deletions(-)
>
> diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
> index 077e7ee69e3d..facfc3c673ed 100644
> --- a/include/uapi/linux/perf_event.h
> +++ b/include/uapi/linux/perf_event.h
> @@ -384,7 +384,8 @@ struct perf_event_attr {
> aux_output :  1, /* generate AUX records 
> instead of events */
> cgroup :  1, /* include cgroup events 
> */
> text_poke  :  1, /* include text poke 
> events */
> -   __reserved_1   : 30;
> +   mmap3  :  1, /* include bpf events */
> +   __reserved_1   : 29;
>
what happens if I set mmap3 and mmap2?

I think using mmap3 for every mmap may be overkill as you add useless
20 bytes to an mmap record.
I am not sure if your code handles the case where mmap3 is not needed
because there is no buildid, e.g, anonymous memory.
It seems to me you've written the patch in such a way that if the user
tool supports mmap3, then it supersedes mmap2, and thus
you need all the fields of mmap2. But if could be more interesting to
return either MMAP2 or MMAP3 depending on tool support
and type of mmap, that would certainly save 20 bytes on any anon mmap.
But maybe that logic is already in your patch and I missed it.


> union {
> __u32   wakeup_events;/* wakeup every n events */
> @@ -1060,6 +1061,30 @@ enum perf_event_type {
>  */
> PERF_RECORD_TEXT_POKE   = 20,
>
> +   /*
> +* The MMAP3 records are an augmented version of MMAP2, they add
> +* build id value to identify the exact binary behind map
> +*
> +* struct {
> +*  struct perf_event_headerheader;
> +*
> +*  u32 pid, tid;
> +*  u64 addr;
> +*  u64 len;
> +*  u64 pgoff;
> +*  u32 maj;
> +*  u32 min;
> +*  u64 ino;
> +*  u64 ino_generation;
> +*  u32 prot, flags;
> +*  u32 reserved;
> +*  u8  buildid[20];
> +*  charfilename[];
> +*  struct sample_idsample_id;
> +* };
> +*/
> +   PERF_RECORD_MMAP3   = 21,
> +
> PERF_RECORD_MAX,/* non-ABI */
>  };
>
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 7ed5248f0445..719894492dac 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -51,6 +51,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>
>  #include "internal.h"
>
> @@ -386,6 +387,7 @@ static DEFINE_PER_CPU(int, perf_sched_cb_usages);
>  static DEFINE_PER_CPU(struct pmu_event_list, pmu_sb_events);
>
>  static atomic_t nr_mmap_events __read_mostly;
> +static atomic_t nr_mmap3_events __read_mostly;
>  static atomic_t

Re: [PATCH 02/26] perf: Introduce mmap3 version of mmap event

2020-09-13 Thread Song Liu
On Sun, Sep 13, 2020 at 10:40 PM Namhyung Kim  wrote:
>
> On Mon, Sep 14, 2020 at 6:03 AM Jiri Olsa  wrote:
> >
> > Add new version of mmap event. The MMAP3 record is an
> > augmented version of MMAP2, it adds build id value to
> > identify the exact binary object behind memory map:
> >
> >   struct {
> > struct perf_event_header header;
> >
> > u32  pid, tid;
> > u64  addr;
> > u64  len;
> > u64  pgoff;
> > u32  maj;
> > u32  min;
> > u64  ino;
> > u64  ino_generation;
> > u32  prot, flags;
> > u32  reserved;

I guess we need reserved _after_ buildid, no?

> > u8   buildid[20];
>
> Do we need maj, min, ino, ino_generation for mmap3 event?
> I think they are to compare binaries, then we can do it with
> build-id (and I think it'd be better)..

+1 we shouldn't need maj, min, etc.

Thanks,
Song

[...]


Re: [PATCH 02/26] perf: Introduce mmap3 version of mmap event

2020-09-13 Thread Namhyung Kim
On Mon, Sep 14, 2020 at 6:03 AM Jiri Olsa  wrote:
>
> Add new version of mmap event. The MMAP3 record is an
> augmented version of MMAP2, it adds build id value to
> identify the exact binary object behind memory map:
>
>   struct {
> struct perf_event_header header;
>
> u32  pid, tid;
> u64  addr;
> u64  len;
> u64  pgoff;
> u32  maj;
> u32  min;
> u64  ino;
> u64  ino_generation;
> u32  prot, flags;
> u32  reserved;
> u8   buildid[20];

Do we need maj, min, ino, ino_generation for mmap3 event?
I think they are to compare binaries, then we can do it with
build-id (and I think it'd be better)..


> char filename[];
> struct sample_id sample_id;
>   };
>
> Adding 4 bytes reserved field to align buildid data to 8 bytes,
> so sample_id data is properly aligned.
>
> The mmap3 event is enabled by new mmap3 bit in perf_event_attr
> struct.  When set for an event, it enables the build id retrieval
> and will use mmap3 format for the event.
>
> Keeping track of mmap3 events and calling build_id_parse
> in perf_event_mmap_event only if we have any defined.
>
> Having build id attached directly to the mmap event will help
> tool like perf to skip final search through perf data for
> binaries that are needed in the report time. Also it prevents
> possible race when the binary could be removed or replaced
> during profiling.
>
> Signed-off-by: Jiri Olsa 
> ---
>  include/uapi/linux/perf_event.h | 27 ++-
>  kernel/events/core.c| 38 +++--
>  2 files changed, 57 insertions(+), 8 deletions(-)
>
> diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
> index 077e7ee69e3d..facfc3c673ed 100644
> --- a/include/uapi/linux/perf_event.h
> +++ b/include/uapi/linux/perf_event.h
> @@ -384,7 +384,8 @@ struct perf_event_attr {
> aux_output :  1, /* generate AUX records 
> instead of events */
> cgroup :  1, /* include cgroup events 
> */
> text_poke  :  1, /* include text poke 
> events */
> -   __reserved_1   : 30;
> +   mmap3  :  1, /* include bpf events */

???

> +   __reserved_1   : 29;
>
> union {
> __u32   wakeup_events;/* wakeup every n events */
> @@ -1060,6 +1061,30 @@ enum perf_event_type {
>  */
> PERF_RECORD_TEXT_POKE   = 20,
>
> +   /*
> +* The MMAP3 records are an augmented version of MMAP2, they add
> +* build id value to identify the exact binary behind map
> +*
> +* struct {
> +*  struct perf_event_headerheader;
> +*
> +*  u32 pid, tid;
> +*  u64 addr;
> +*  u64 len;
> +*  u64 pgoff;
> +*  u32 maj;
> +*  u32 min;
> +*  u64 ino;
> +*  u64 ino_generation;
> +*  u32 prot, flags;
> +*  u32 reserved;
> +*  u8  buildid[20];
> +*  charfilename[];
> +*  struct sample_idsample_id;
> +* };
> +*/
> +   PERF_RECORD_MMAP3   = 21,
> +
> PERF_RECORD_MAX,/* non-ABI */
>  };
>
[SNIP]
> @@ -8098,6 +8116,9 @@ static void perf_event_mmap_event(struct 
> perf_mmap_event *mmap_event)
> mmap_event->prot = prot;
> mmap_event->flags = flags;
>
> +   if (atomic_read(&nr_mmap3_events))
> +   build_id_parse(vma, mmap_event->buildid);

What about if it failed?  We should zero out the build-id..

Thanks
Namhyung

> +
> if (!(vma->vm_flags & VM_EXEC))
> mmap_event->event_id.header.misc |= 
> PERF_RECORD_MISC_MMAP_DATA;
>
> @@ -8241,6 +8262,7 @@ void perf_event_mmap(struct vm_area_struct *vma)
> /* .ino_generation (attr_mmap2 only) */
> /* .prot (attr_mmap2 only) */
> /* .flags (attr_mmap2 only) */
> +   /* .buildid (attr_mmap3 only) */
> };
>
> perf_addr_filters_adjust(vma);
> @@ -11040,6 +11062,8 @@ static void account_event(struct perf_event *event)
> inc = true;
> if (event->attr.mmap || event->attr.mmap_data)