Re: [EXTERNAL] Re: [PATCH v4] perf inject --jit: Remove //anon mmap events

2020-06-12 Thread Ian Rogers
On Fri, Jun 12, 2020 at 12:00 PM Steve MacLean
 wrote:
>
> >>> Hi Ian,
> >>>
>  I tried this as well with latest perf/core. The difference is that
> >>> unresolved addresses currently look like:
> >>>
> >>>  0.00%  java [JIT] tid 221782   [.] 0x451499a4
> >>>  0.00%  java [JIT] tid 221782   [.] 0x4514f3e8
> >>>  0.00%  java [JIT] tid 221782   [.] 0x45149394
> >>>
> >>> But after Steve's patch this becomes:
> >>>
> >>>  0.00%  java [unknown]  [.] 0x58557d14
> >>>  0.00%  java [unknown]  [.] 0x785c03b4
> >>>  0.00%  java [unknown]  [.] 0x58386520
> >>>
> >>> I couldn't see any events that were symbolised before but are no
> >>> longer symbolised after this patch.
> >>
> >> I see this, thanks for digging into the explanation! Were you able to
> >> get a test case where the unknowns went down? For example, by forcing
> >> the code cache size to be small? This is the result I'd expect to see.
> >
> >I tried the same Dacapo benchmark as you with different values of 
> >InitialCodeCacheSize and grepped for -e '\[unknown\]' -e '\[JIT\]'.
> >
> >   Base   Patched
> > 100M  338373
> > 50M   333315
> > 25M   323368
> > 15M   1238   309
> > 10M   2600   333
> > 1M6035   337
> >
> >This looks fairly convincing to me: the cliff at 15M is where the code cache 
> >starts needing to be enlarged.
> >
>
> Removing the anonymous mappings causes a small regression.  Specifically,
> the reporting of the module name goes from "[JIT] tid " to "[unknown]".
> This occurs when the JIT fails to report memory used in jitdump before it
> is used.
>
> However there is also confirmation that JAVA does see the reported issue
> when using a small code cache.  The current patch resolves the issue in
> this case.
>
> I see two options:
>
> + Accept the regression. Since this is a regression for a jit dump
> reporting synchronization error, this may be a reasonable option.
>
> + Design a more complicated patch. Either
> + Only strip parts of // anon mmap events overlapping existing
>   jitted--.so mmap events.
> + Only strip parts of // anon mmap events overlapping prior
>   // anon mmap events
>
> Any opinions?

Hi Steve,

I think we should merge this change. I wanted to get a test case
together showing the benefit. Based on Nick Gasson's feedback I was
trying with command lines like:
/tmp/perf/perf record -k 1 -e cycles:u -o /tmp/perf.data
/usr/lib/jvm/java-14-openjdk-amd64/bin/java
-agentpath:/tmp/perf/libperf-jvmti.so -XX:+PreserveFramePointer
-XX:InitialCodeCacheSize=10M -jar dacapo-9.12-bach.jar jython

I wanted to be able to demonstrate a non-zero effect in getting
samples. I spent limited time on the test and didn't get a result that
demonstrated a benefit, things weren't worse. I think the lack of
benefit is another bug where HotSpot's JVMTI code runs in parallel to
the new code being available in the JIT cache. We get samples ahead of
when the jitdump thinks the code is there and so it symbolizes them as
unknown. We should get HotSpot to provide the earliest timestamp to
avoid this problem.

I think a good way forward would be to merge this code as other issues
can be fixed as a follow up. Arnaldo, does that work for you? If so,
Acked-by: Ian Rogers .

Thanks,
Ian


RE: [EXTERNAL] Re: [PATCH v4] perf inject --jit: Remove //anon mmap events

2020-06-12 Thread Steve MacLean
>>> Hi Ian,
>>>
 I tried this as well with latest perf/core. The difference is that 
>>> unresolved addresses currently look like:
>>>
>>>  0.00%  java [JIT] tid 221782   [.] 0x451499a4
>>>  0.00%  java [JIT] tid 221782   [.] 0x4514f3e8
>>>  0.00%  java [JIT] tid 221782   [.] 0x45149394
>>>
>>> But after Steve's patch this becomes:
>>>
>>>  0.00%  java [unknown]  [.] 0x58557d14
>>>  0.00%  java [unknown]  [.] 0x785c03b4
>>>  0.00%  java [unknown]  [.] 0x58386520
>>>
>>> I couldn't see any events that were symbolised before but are no 
>>> longer symbolised after this patch.
>>
>> I see this, thanks for digging into the explanation! Were you able to 
>> get a test case where the unknowns went down? For example, by forcing 
>> the code cache size to be small? This is the result I'd expect to see.
>
>I tried the same Dacapo benchmark as you with different values of 
>InitialCodeCacheSize and grepped for -e '\[unknown\]' -e '\[JIT\]'.
>
>   Base   Patched
> 100M  338373
> 50M   333315
> 25M   323368
> 15M   1238   309
> 10M   2600   333
> 1M6035   337
>
>This looks fairly convincing to me: the cliff at 15M is where the code cache 
>starts needing to be enlarged.
>

Removing the anonymous mappings causes a small regression.  Specifically,
the reporting of the module name goes from "[JIT] tid " to "[unknown]".
This occurs when the JIT fails to report memory used in jitdump before it 
is used.

However there is also confirmation that JAVA does see the reported issue 
when using a small code cache.  The current patch resolves the issue in
this case.

I see two options:

+ Accept the regression. Since this is a regression for a jit dump 
reporting synchronization error, this may be a reasonable option.

+ Design a more complicated patch. Either
+ Only strip parts of // anon mmap events overlapping existing 
  jitted--.so mmap events.
+ Only strip parts of // anon mmap events overlapping prior
  // anon mmap events

Any opinions?


Re: [EXTERNAL] Re: [PATCH v4] perf inject --jit: Remove //anon mmap events

2020-06-01 Thread Nick Gasson
On 06/01/20 16:53 PM, Ian Rogers wrote:
> On Sun, May 31, 2020 at 11:17 PM Nick Gasson  wrote:
>>
>> On 05/28/20 17:32 PM, Ian Rogers wrote:
>> >
>> > So on tip/perf/core with:
>> > 1c0cd2dbb993 perf jvmti: Fix jitdump for methods without debug info
>> > 3ce17c1e52f4 perf jvmti: remove redundant jitdump line table entries
>> >
>> > I've been trying variants of:
>> >
>> > Before:
>> > /tmp/perf/perf record -k 1 -e cycles:u -o /tmp/perf.data java
>> > -agentpath:/tmp/perf/libperf-jvmti.so -XX:+PreserveFramePointer
>> > -XX:InitialCodeCacheSize=20M -XX:ReservedCodeCacheSize=1G -jar
>> > dacapo-9.12-bach.jar jython
>> > /tmp/perf/perf inject -i /tmp/perf.data -o /tmp/perf-jit.data -j
>> > /tmp/perf/perf report -i /tmp/perf-jit.data |grep class\ |wc -l
>> > 578
>> > /tmp/perf/perf report -i /tmp/perf-jit.data |grep unknown |wc -l
>> > 6
>> >
>> > After:
>> > /tmp/perf/perf record -k 1 -e cycles:u -o /tmp/perf.data java
>> > -agentpath:/tmp/perf/libperf-jvmti.so -XX:+PreserveFramePointer
>> > -XX:InitialCodeCacheSize=20M -XX:ReservedCodeCacheSize=1G -jar
>> > dacapo-9.12-bach.jar jython
>> > /tmp/perf/perf inject -i /tmp/perf.data -o /tmp/perf-jit.data -j
>> > /tmp/perf/perf report -i /tmp/perf-jit.data |grep class\ |wc -l
>> > 589
>> > /tmp/perf/perf report -i /tmp/perf-jit.data |grep unknown |wc -l
>> > 60
>> >
>> > So maybe the jit cache isn't behaving the way that the patch cures,
>> > the uptick in unknowns appears consistent in my testing though. I
>> > expect user error, is there an obvious explanation I'm missing?
>> >
>>
>> Hi Ian,
>>
>> I tried this as well with latest perf/core. The difference is that
>> unresolved addresses currently look like:
>>
>>  0.00%  java [JIT] tid 221782   [.] 0x451499a4
>>  0.00%  java [JIT] tid 221782   [.] 0x4514f3e8
>>  0.00%  java [JIT] tid 221782   [.] 0x45149394
>>
>> But after Steve's patch this becomes:
>>
>>  0.00%  java [unknown]  [.] 0x58557d14
>>  0.00%  java [unknown]  [.] 0x785c03b4
>>  0.00%  java [unknown]  [.] 0x58386520
>>
>> I couldn't see any events that were symbolised before but are no longer
>> symbolised after this patch.
>
> I see this, thanks for digging into the explanation! Were you able to
> get a test case where the unknowns went down? For example, by forcing
> the code cache size to be small? This is the result I'd expect to see.

I tried the same Dacapo benchmark as you with different values of
InitialCodeCacheSize and grepped for -e '\[unknown\]' -e '\[JIT\]'.

   Base   Patched
 100M  338373
 50M   333315
 25M   323368
 15M   1238   309
 10M   2600   333
 1M6035   337

This looks fairly convincing to me: the cliff at 15M is where the code
cache starts needing to be enlarged.

>
>> I think most of these unknown events are caused by the asynchronous
>> nature of the JVMTI event handling. After an nmethod is compiled the
>> JVMTI event is posted to the Service Thread (*). So there can be a delay
>> between the time the compiled code starts executing and the time the
>> plugin receives the compiled code load event.
>>
>> Here's an edited down example:
>>
>> java 215881 750014.947873:1289634 cycles:u:  
>> 7856ad10 [unknown] ([unknown])
>>   Service Thread 215895 750014.947971: PERF_RECORD_MMAP2 215879/215895: 
>> [0x785694c0(0x640) @ 0x40 fd:01 121010091 1]:
>> java 215881 750014.948665:1295994 cycles:u:  
>> 7856ad10 org.python.core.__builtin__.range(org.python
>>
>> The plugin receives the event ~100us after the first sample inside that
>> method. Ideally we would use the timestamp when the method was actually
>> compiled, but I can't see any way to extract this information.
>
> Hmm.. this is a bit weird as the compile_info at one point was a
> resource allocation and so would be cleared out when the resource mark
> was passed on the compiler's return. Not something you'd want to do
> asynchronously. Presumably this has changed to improve performance,
> but doing the jvmti on a separate thread defeats jitdump - if we see
> samples in code ahead of the code being loaded. Perhaps a profiler
> like async-profiler
> (https://github.com/jvm-profiling-tools/async-profiler) has a
> workaround for the lag.
>

I had a brief look at the async-profiler JVMTI code but I can't see
anything it's doing differently. Their profiler.sh attaches to a running
JVM - I suspect this problem will be less obvious once the JIT has
warmed up.

The compile_info structure is still resource-allocated in
JvmtiExport::post_compiled_method_load() but that function is called
from the service thread not the compiler thread. The compiler thread
just pushes a nmethod* on a queue and the inlining data is recovered
from the nmethod object.

It seems we could extend JVMTI here to pass the code installation
timestamp as 

Re: [EXTERNAL] Re: [PATCH v4] perf inject --jit: Remove //anon mmap events

2020-06-01 Thread Ian Rogers
On Sun, May 31, 2020 at 11:17 PM Nick Gasson  wrote:
>
> On 05/28/20 17:32 PM, Ian Rogers wrote:
> >
> > So on tip/perf/core with:
> > 1c0cd2dbb993 perf jvmti: Fix jitdump for methods without debug info
> > 3ce17c1e52f4 perf jvmti: remove redundant jitdump line table entries
> >
> > I've been trying variants of:
> >
> > Before:
> > /tmp/perf/perf record -k 1 -e cycles:u -o /tmp/perf.data java
> > -agentpath:/tmp/perf/libperf-jvmti.so -XX:+PreserveFramePointer
> > -XX:InitialCodeCacheSize=20M -XX:ReservedCodeCacheSize=1G -jar
> > dacapo-9.12-bach.jar jython
> > /tmp/perf/perf inject -i /tmp/perf.data -o /tmp/perf-jit.data -j
> > /tmp/perf/perf report -i /tmp/perf-jit.data |grep class\ |wc -l
> > 578
> > /tmp/perf/perf report -i /tmp/perf-jit.data |grep unknown |wc -l
> > 6
> >
> > After:
> > /tmp/perf/perf record -k 1 -e cycles:u -o /tmp/perf.data java
> > -agentpath:/tmp/perf/libperf-jvmti.so -XX:+PreserveFramePointer
> > -XX:InitialCodeCacheSize=20M -XX:ReservedCodeCacheSize=1G -jar
> > dacapo-9.12-bach.jar jython
> > /tmp/perf/perf inject -i /tmp/perf.data -o /tmp/perf-jit.data -j
> > /tmp/perf/perf report -i /tmp/perf-jit.data |grep class\ |wc -l
> > 589
> > /tmp/perf/perf report -i /tmp/perf-jit.data |grep unknown |wc -l
> > 60
> >
> > So maybe the jit cache isn't behaving the way that the patch cures,
> > the uptick in unknowns appears consistent in my testing though. I
> > expect user error, is there an obvious explanation I'm missing?
> >
>
> Hi Ian,
>
> I tried this as well with latest perf/core. The difference is that
> unresolved addresses currently look like:
>
>  0.00%  java [JIT] tid 221782   [.] 0x451499a4
>  0.00%  java [JIT] tid 221782   [.] 0x4514f3e8
>  0.00%  java [JIT] tid 221782   [.] 0x45149394
>
> But after Steve's patch this becomes:
>
>  0.00%  java [unknown]  [.] 0x58557d14
>  0.00%  java [unknown]  [.] 0x785c03b4
>  0.00%  java [unknown]  [.] 0x58386520
>
> I couldn't see any events that were symbolised before but are no longer
> symbolised after this patch.

I see this, thanks for digging into the explanation! Were you able to
get a test case where the unknowns went down? For example, by forcing
the code cache size to be small? This is the result I'd expect to see.

> I think most of these unknown events are caused by the asynchronous
> nature of the JVMTI event handling. After an nmethod is compiled the
> JVMTI event is posted to the Service Thread (*). So there can be a delay
> between the time the compiled code starts executing and the time the
> plugin receives the compiled code load event.
>
> Here's an edited down example:
>
> java 215881 750014.947873:1289634 cycles:u:  7856ad10 
> [unknown] ([unknown])
>   Service Thread 215895 750014.947971: PERF_RECORD_MMAP2 215879/215895: 
> [0x785694c0(0x640) @ 0x40 fd:01 121010091 1]:
> java 215881 750014.948665:1295994 cycles:u:  7856ad10 
> org.python.core.__builtin__.range(org.python
>
> The plugin receives the event ~100us after the first sample inside that
> method. Ideally we would use the timestamp when the method was actually
> compiled, but I can't see any way to extract this information.

Hmm.. this is a bit weird as the compile_info at one point was a
resource allocation and so would be cleared out when the resource mark
was passed on the compiler's return. Not something you'd want to do
asynchronously. Presumably this has changed to improve performance,
but doing the jvmti on a separate thread defeats jitdump - if we see
samples in code ahead of the code being loaded. Perhaps a profiler
like async-profiler
(https://github.com/jvm-profiling-tools/async-profiler) has a
workaround for the lag.

> However I also saw a few recurring [unknown] addresses that never have a
> corresponding code load event. I'm not sure where these come from.

C2 is pretty bad for having assembly stub routine helpers (aka stub
routines) that aren't "documented". C1 is less like this, but it is
increasingly hard to force C1 as the compiler. If you are seeing these
things as leaf routines then its likely some stub routine clean up
could fix the issue.

Thanks,
Ian

> (*) 
> http://hg.openjdk.java.net/jdk/jdk/file/50fe8727ed79/src/hotspot/share/code/nmethod.cpp#l1591
>
> --
> Nick
>


Re: [EXTERNAL] Re: [PATCH v4] perf inject --jit: Remove //anon mmap events

2020-06-01 Thread Nick Gasson
On 05/28/20 17:32 PM, Ian Rogers wrote:
>
> So on tip/perf/core with:
> 1c0cd2dbb993 perf jvmti: Fix jitdump for methods without debug info
> 3ce17c1e52f4 perf jvmti: remove redundant jitdump line table entries
>
> I've been trying variants of:
>
> Before:
> /tmp/perf/perf record -k 1 -e cycles:u -o /tmp/perf.data java
> -agentpath:/tmp/perf/libperf-jvmti.so -XX:+PreserveFramePointer
> -XX:InitialCodeCacheSize=20M -XX:ReservedCodeCacheSize=1G -jar
> dacapo-9.12-bach.jar jython
> /tmp/perf/perf inject -i /tmp/perf.data -o /tmp/perf-jit.data -j
> /tmp/perf/perf report -i /tmp/perf-jit.data |grep class\ |wc -l
> 578
> /tmp/perf/perf report -i /tmp/perf-jit.data |grep unknown |wc -l
> 6
>
> After:
> /tmp/perf/perf record -k 1 -e cycles:u -o /tmp/perf.data java
> -agentpath:/tmp/perf/libperf-jvmti.so -XX:+PreserveFramePointer
> -XX:InitialCodeCacheSize=20M -XX:ReservedCodeCacheSize=1G -jar
> dacapo-9.12-bach.jar jython
> /tmp/perf/perf inject -i /tmp/perf.data -o /tmp/perf-jit.data -j
> /tmp/perf/perf report -i /tmp/perf-jit.data |grep class\ |wc -l
> 589
> /tmp/perf/perf report -i /tmp/perf-jit.data |grep unknown |wc -l
> 60
>
> So maybe the jit cache isn't behaving the way that the patch cures,
> the uptick in unknowns appears consistent in my testing though. I
> expect user error, is there an obvious explanation I'm missing?
>

Hi Ian,

I tried this as well with latest perf/core. The difference is that
unresolved addresses currently look like:

 0.00%  java [JIT] tid 221782   [.] 0x451499a4
 0.00%  java [JIT] tid 221782   [.] 0x4514f3e8
 0.00%  java [JIT] tid 221782   [.] 0x45149394

But after Steve's patch this becomes:

 0.00%  java [unknown]  [.] 0x58557d14
 0.00%  java [unknown]  [.] 0x785c03b4
 0.00%  java [unknown]  [.] 0x58386520

I couldn't see any events that were symbolised before but are no longer
symbolised after this patch.

I think most of these unknown events are caused by the asynchronous
nature of the JVMTI event handling. After an nmethod is compiled the
JVMTI event is posted to the Service Thread (*). So there can be a delay
between the time the compiled code starts executing and the time the
plugin receives the compiled code load event.

Here's an edited down example:

java 215881 750014.947873:1289634 cycles:u:  7856ad10 
[unknown] ([unknown])
  Service Thread 215895 750014.947971: PERF_RECORD_MMAP2 215879/215895: 
[0x785694c0(0x640) @ 0x40 fd:01 121010091 1]:
java 215881 750014.948665:1295994 cycles:u:  7856ad10 
org.python.core.__builtin__.range(org.python

The plugin receives the event ~100us after the first sample inside that
method. Ideally we would use the timestamp when the method was actually
compiled, but I can't see any way to extract this information.

However I also saw a few recurring [unknown] addresses that never have a
corresponding code load event. I'm not sure where these come from.

(*) 
http://hg.openjdk.java.net/jdk/jdk/file/50fe8727ed79/src/hotspot/share/code/nmethod.cpp#l1591

--
Nick



RE: [EXTERNAL] Re: [PATCH v4] perf inject --jit: Remove //anon mmap events

2020-05-29 Thread Steve MacLean
>/tmp/perf/perf record -k 1 -e cycles:u -o /tmp/perf.data java 
>-agentpath:/tmp/perf/libperf-jvmti.so -XX:+PreserveFramePointer 
>-XX:InitialCodeCacheSize=20M -XX:ReservedCodeCacheSize=1G -jar 
>dacapo-9.12-bach.jar jython 

Its also possible the `InitialCodeCacheSize=20M` argument is masking the 
problem.  Something like 4K would make the problem much easier to see.

I see the problem every time .NET grows the cache across a 64K page boundary.  
20M may mean you are allocating huge pages or something which might mask the 
problem. We may be allocating code pages 64K at a time.



RE: [EXTERNAL] Re: [PATCH v4] perf inject --jit: Remove //anon mmap events

2020-05-29 Thread Steve MacLean
> I've been trying variants of:
>
> Before:
>/tmp/perf/perf record -k 1 -e cycles:u -o /tmp/perf.data java 
>-agentpath:/tmp/perf/libperf-jvmti.so -XX:+PreserveFramePointer 
>-XX:InitialCodeCacheSize=20M -XX:ReservedCodeCacheSize=1G -jar 
>dacapo-9.12-bach.jar jython /tmp/perf/perf inject -i /tmp/perf.data -o 
>/tmp/perf-jit.data -j /tmp/perf/perf report -i /tmp/perf-jit.data |grep class\ 
>|wc -l
> 578
> /tmp/perf/perf report -i /tmp/perf-jit.data |grep unknown |wc -l
> 6
> 
> After:
> /tmp/perf/perf record -k 1 -e cycles:u -o /tmp/perf.data java 
> -agentpath:/tmp/perf/libperf-jvmti.so -XX:+PreserveFramePointer 
> -XX:InitialCodeCacheSize=20M -XX:ReservedCodeCacheSize=1G -jar 
> dacapo-9.12-bach.jar jython /tmp/perf/perf inject -i /tmp/perf.data -o 
> /tmp/perf-jit.data -j /tmp/perf/perf report -i /tmp/perf-jit.data |grep 
> class\ |wc -l
> 589
> /tmp/perf/perf report -i /tmp/perf-jit.data |grep unknown |wc -l
> 60
>
> So maybe the jit cache isn't behaving the way that the patch cures, the 
> uptick in unknowns appears consistent in my testing though. I expect user 
> error, is there an obvious explanation I'm missing?

That is certainly disturbing. I don't see an obvious user error, but I haven't 
played with java jitdump

Couple of ideas come to mind.

+ When I was drafting .NET jitdump support, I looked briefly at the java agent. 
 My impression was the java agent support wasn't thread safe.  The jitdump file 
format requires that certain records are collocated to simplify the reader.  I 
wasn't sure how that was being accomplished by the writer if JAVA's JIT was 
multithreaded (like CoreCLR JIT is).

+ The way I implemented jitdump on CoreCLR was to hook into the existing system 
to write perf-map files.  So I am pretty confident there is a 1:1 
correspondence of perf-map records with jitdump records.  Is it possible that 
Java choose to only emit interesting jitdump records?  Perhaps skipping 
thunks/stubs/trampolines?  

+ There are still some `unknown` records in CoreCLR, but I don't believe there 
is an increase.  I am pretty sure some of our stubs are not being generated.  
They were present in our before case too.

+ Your methodology would be more repeatable if you record once, and analyze 
multiple times.  record, report, inject / report, inject / report

+ I was focusing on eliminating duplicate symbols for the same address.  So 
having the module name in the report allowed me to see if the symbol was from 
jitdump or from perf-map.  In the before case I could see duplicate symbols for 
the same address.  This was how the problem was first observed.

+ I would be more interested in looking at the diff of the reports.  Possibly 
sorted by address.  Probably with zero context.

If I were to guess, I would think Java choose to optimize jitdump and only 
record interesting code.

So if we need to support that scenario the approach of this patch wouldn't work.

We would need to use a different approach.  Ideas...
+ Keep anon mappings from overwriting jitdump mappings.  Introduce a weak add 
map method, that would only fill empty/anon.
+ Move the anon mapping timestamp to the beginning of time, so they are 
processed first
+ Fix the kernel perf map events
+ Keep the anon mappings in a separate fall back map

I kind of like the add weak map approach.


Re: [EXTERNAL] Re: [PATCH v4] perf inject --jit: Remove //anon mmap events

2020-05-28 Thread Ian Rogers
On Wed, May 27, 2020 at 1:59 PM Ian Rogers  wrote:
>
> On Wed, May 27, 2020 at 12:27 PM Steve MacLean
>  wrote:
> >
> > >> ** Implemented solution
> > >>
> > >> This patch solves the issue by removing // anon mmap events for any
> > >> process which has a valid jit-.dump file.
> > >>
> > >> It tracks on a per process basis to handle the case where some running
> > >> apps support jit-.dump, but some only support perf-.map.
> > >>
> > >> It adds new assumptions:
> > >> * // anon mmap events are only required for perf-.map support.
> > >> * An app that uses jit-.dump, no longer needs perf-.map
> > >> support. It assumes that any perf-.map info is inferior.
> > >>
> > >> *** Details
> > >>
> > >> Use thread->priv to store whether a jitdump file has been processed
> > >>
> > >> During "perf inject --jit", discard "//anon*" mmap events for any pid
> > >> which has sucessfully processed a jitdump file.
> > >
> > >
> > > Thanks Steve this is an important fix! As //anon could be for malloc or 
> > > other uses, should the stripping behavior be behind a flag?
> > >
> > > Ian
> >
> > I hadn't anticipated a need to preserve the //anon mmap events when 
> > profiling JIT generated code.
> >
> > As far as I know mmap events are captured by perf only for mapping code to 
> > symbols.  File mappings are kept
> > by the change.  Only // anon mappings are stripped.  (Only for processes 
> > which emitted jitdump files.)
> > And these are stripped only during the `perf inject --jit` step. I believe 
> > the // Anon mapping are only
> > generally useful for mapping JIT code.
> >
> > I suppose if someone was trying to count mmap events it might be confusing, 
> > but `perf inject --jit` creates
> > synthetic mmap file events which would also make this scenario confusing.
> >
> > I personally don't see a good reason to add a flag.  I also don't see a 
> > simple way either.  Not running `perf inject --jit`
> > would preserve existing behavior w/o jitdump support.  Without stripping 
> > the anon events jitdump support is painfully
> > broken
>
> Agreed that things are broken. In general only executable mappings are
> held onto by perf, so it could be I'm over worrying about //anon
> stripping breaking around memory allocations. We have some other use
> cases for //anon at Google but they aren't impacted by jitdump. We
> have also been trying to migrate jit caches to using memfd_create,
> which has the same problem that this patch fixes for //anon. Fixing
> memfd_create is a separate issue to //anon. I'll try to get a repro
> for Java that demonstrates the problem and then add a Tested-by.
>
> Thanks,
> Ian

So on tip/perf/core with:
1c0cd2dbb993 perf jvmti: Fix jitdump for methods without debug info
3ce17c1e52f4 perf jvmti: remove redundant jitdump line table entries

I've been trying variants of:

Before:
/tmp/perf/perf record -k 1 -e cycles:u -o /tmp/perf.data java
-agentpath:/tmp/perf/libperf-jvmti.so -XX:+PreserveFramePointer
-XX:InitialCodeCacheSize=20M -XX:ReservedCodeCacheSize=1G -jar
dacapo-9.12-bach.jar jython
/tmp/perf/perf inject -i /tmp/perf.data -o /tmp/perf-jit.data -j
/tmp/perf/perf report -i /tmp/perf-jit.data |grep class\ |wc -l
578
/tmp/perf/perf report -i /tmp/perf-jit.data |grep unknown |wc -l
6

After:
/tmp/perf/perf record -k 1 -e cycles:u -o /tmp/perf.data java
-agentpath:/tmp/perf/libperf-jvmti.so -XX:+PreserveFramePointer
-XX:InitialCodeCacheSize=20M -XX:ReservedCodeCacheSize=1G -jar
dacapo-9.12-bach.jar jython
/tmp/perf/perf inject -i /tmp/perf.data -o /tmp/perf-jit.data -j
/tmp/perf/perf report -i /tmp/perf-jit.data |grep class\ |wc -l
589
/tmp/perf/perf report -i /tmp/perf-jit.data |grep unknown |wc -l
60

So maybe the jit cache isn't behaving the way that the patch cures,
the uptick in unknowns appears consistent in my testing though. I
expect user error, is there an obvious explanation I'm missing?

Thanks,
Ian


Re: [EXTERNAL] Re: [PATCH v4] perf inject --jit: Remove //anon mmap events

2020-05-27 Thread Ian Rogers
On Wed, May 27, 2020 at 12:27 PM Steve MacLean
 wrote:
>
> >> ** Implemented solution
> >>
> >> This patch solves the issue by removing // anon mmap events for any
> >> process which has a valid jit-.dump file.
> >>
> >> It tracks on a per process basis to handle the case where some running
> >> apps support jit-.dump, but some only support perf-.map.
> >>
> >> It adds new assumptions:
> >> * // anon mmap events are only required for perf-.map support.
> >> * An app that uses jit-.dump, no longer needs perf-.map
> >> support. It assumes that any perf-.map info is inferior.
> >>
> >> *** Details
> >>
> >> Use thread->priv to store whether a jitdump file has been processed
> >>
> >> During "perf inject --jit", discard "//anon*" mmap events for any pid
> >> which has sucessfully processed a jitdump file.
> >
> >
> > Thanks Steve this is an important fix! As //anon could be for malloc or 
> > other uses, should the stripping behavior be behind a flag?
> >
> > Ian
>
> I hadn't anticipated a need to preserve the //anon mmap events when profiling 
> JIT generated code.
>
> As far as I know mmap events are captured by perf only for mapping code to 
> symbols.  File mappings are kept
> by the change.  Only // anon mappings are stripped.  (Only for processes 
> which emitted jitdump files.)
> And these are stripped only during the `perf inject --jit` step. I believe 
> the // Anon mapping are only
> generally useful for mapping JIT code.
>
> I suppose if someone was trying to count mmap events it might be confusing, 
> but `perf inject --jit` creates
> synthetic mmap file events which would also make this scenario confusing.
>
> I personally don't see a good reason to add a flag.  I also don't see a 
> simple way either.  Not running `perf inject --jit`
> would preserve existing behavior w/o jitdump support.  Without stripping the 
> anon events jitdump support is painfully
> broken

Agreed that things are broken. In general only executable mappings are
held onto by perf, so it could be I'm over worrying about //anon
stripping breaking around memory allocations. We have some other use
cases for //anon at Google but they aren't impacted by jitdump. We
have also been trying to migrate jit caches to using memfd_create,
which has the same problem that this patch fixes for //anon. Fixing
memfd_create is a separate issue to //anon. I'll try to get a repro
for Java that demonstrates the problem and then add a Tested-by.

Thanks,
Ian


RE: [EXTERNAL] Re: [PATCH v4] perf inject --jit: Remove //anon mmap events

2020-05-27 Thread Steve MacLean
>> ** Implemented solution
>>
>> This patch solves the issue by removing // anon mmap events for any 
>> process which has a valid jit-.dump file.
>>
>> It tracks on a per process basis to handle the case where some running 
>> apps support jit-.dump, but some only support perf-.map.
>>
>> It adds new assumptions:
>> * // anon mmap events are only required for perf-.map support.
>> * An app that uses jit-.dump, no longer needs perf-.map 
>> support. It assumes that any perf-.map info is inferior.
>>
>> *** Details
>>
>> Use thread->priv to store whether a jitdump file has been processed
>>
>> During "perf inject --jit", discard "//anon*" mmap events for any pid 
>> which has sucessfully processed a jitdump file.
>
>
> Thanks Steve this is an important fix! As //anon could be for malloc or other 
> uses, should the stripping behavior be behind a flag? 
>
> Ian

I hadn't anticipated a need to preserve the //anon mmap events when profiling 
JIT generated code.

As far as I know mmap events are captured by perf only for mapping code to 
symbols.  File mappings are kept
by the change.  Only // anon mappings are stripped.  (Only for processes which 
emitted jitdump files.)
And these are stripped only during the `perf inject --jit` step. I believe the 
// Anon mapping are only 
generally useful for mapping JIT code.

I suppose if someone was trying to count mmap events it might be confusing, but 
`perf inject --jit` creates 
synthetic mmap file events which would also make this scenario confusing.

I personally don't see a good reason to add a flag.  I also don't see a simple 
way either.  Not running `perf inject --jit`
would preserve existing behavior w/o jitdump support.  Without stripping the 
anon events jitdump support is painfully
broken