A bit more on performance numbers for this application.

With no indy, monomorphic caches...the full application (a data load)
runs in about a minute. I fully recognize that this is a short run,
but JMC seems to indicate the bulk of code has compiled well before
the halfway point.

With 7u40 or 8, no tiered compilation, it takes about two minutes.

Tiered reduces non-indy time to 51s and indy time to 1m29s

Tiered + indy + only using monomorphic cache (no direct binding) runs
in 1m, still 9s slower than non-indy.

With normal settings, indy call sites do settle down and are mostly
monomorphic For the two phases of the data load, I stop seeing JRuby
bind indy call sites a couple seconds in.

There does not appear to be any difference in performance on this app
between 7u40 and 8b103.

Like I say...I think the user would be willing to share the
application, and I feel like the numbers warrant investigation.
Standing by! :-)

- Charlie

On Wed, Sep 18, 2013 at 10:39 AM, Charles Oliver Nutter
<head...@headius.com> wrote:
> I've been playing with JMC a bit tonight, running a user's application
> that's about 2x slower using indy than using trivial monomorphic
> caches (and no indy call sites). I'm trying to understand how to
> interpret what I see.
>
> In the Code/Overview results, where it lists "hot packages", the #1
> and #2 packages are java.lang.invoke.LambdaForm$MH and DMH, accounting
> for over 37% of samples. That sounds high, but I'm willing to grant
> they're hit pretty hard for a fully dynamic application.
>
> Results in the "Hot Methods" tab show similar things, like
> LambdaForm...invokeStatic_LL_L as the number one result and LambdaForm
> entries dominating the top 50 entries in the profile. Again, I know
> I'm hitting dynamic call sites hard and sampling is not always
> accurate.
>
> If I look at compilation events, I only see a handful of
> LambdaForm...convert being compiled. I'm not sure if that's good or
> bad. My assumption is that LFs don't show up here because they're
> always being inlined into a caller.
>
> The performance numbers for the app have me worried too. If I run
> JRuby with stock settings, we will chain up to 6 call targets at a
> call site. The lower I drop this number, the better performance gets;
> when I drop all the way to zero, forcing all invokedynamic call sites
> to fail over immediately to a monomorphic inline cache, performance
> *almost* gets back to the non-indy implementation. This leads me to
> believe that the less I use invokedynamic (or the fewer LFs involved),
> the better. That doesn't bode well.
>
> I believe the user would be happy to allow me to make these JMC
> recordings available, and I'm happy to re-run with additional events
> or gather other information. The JRuby community has a number of very
> large applications that push the limits of indy. We should work
> together to improve it.
>
> - Charlie
_______________________________________________
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev

Reply via email to