I will indeed! Just preparing ahead of time for the hype machine to go
into overdrive. Regardless of initial speed, there's an incredibly
long tail to any Ruby implementation, and new ones won't be useful
until months or years after they're first released.

- Charlie

On Wed, Oct 17, 2012 at 3:03 AM, Ben Evans
<benjamin.john.ev...@gmail.com> wrote:
> Hi Charlie,
>
> Can you send us a decent link or two once it actually does drop. I'm
> not much of a Ruby head generally, but would like to see the numbers
> (and, of course, take a quick look at their testing / benching
> methodology).
>
> Thanks,
>
> Ben
>
> On Wed, Oct 17, 2012 at 1:53 AM, Charles Oliver Nutter
> <head...@headius.com> wrote:
>> Hello all!
>>
>> I've recently been informed that a new Ruby implementation is about to
>> be announced that puts JRuby's numeric perf to shame. Boo hoo.
>>
>> It's not like I expected us to retain the numeric crown since we're
>> still allocating objects for every number in the system, but hopefully
>> we can get that crown back at some point.
>>
>> In an effort to start getting back to indy + perf work (with JRuby 1.7
>> almost released, finally), I bring you today's benchmark:
>>
>> 50.times { puts Benchmark.measure { f = 20.5; i = 0; while i <
>> 2000000; f += 0.1; f -= 0.1; f += 0.1; f -= 0.1; f += 0.1; f -= 0.1; f
>> += 0.1; f -= 0.1; f += 0.1; f -= 0.1; f += 0.1; f -= 0.1; f += 0.1; f
>> -= 0.1; f += 0.1; f -= 0.1; f += 0.1; f -= 0.1; f += 0.1; f -= 0.1;i
>> += 1; end } }
>>
>> So we have a 2M fixnum loop with ten float adds and ten float
>> subtracts. Other variations of this have more iterations and fewer
>> float operations or put the whole loop inside a times{} block. This
>> version runs in about 0.34s on hotspot-comp + Christian's patches,
>> which beats Java 7 at 0.39s. If I remove some rarely-followed boolean
>> logic in the creation of all Ruby objects (including floats) I can get
>> this down to 0.29s. This is many times faster than almost all the
>> current Ruby implementations.
>>
>> However, this new Ruby impl runs the same code in around 0.1s, so even
>> with everything inlining JRuby + indy + hotspot-comp + patches is
>> still 3x slower. I suspect Float allocation is the main bottleneck
>> here.
>>
>> Here's logc output for one of the adds:
>>
>>     @ 251 java.lang.invoke.LambdaForm$MH::linkToCallSite (18 bytes)
>>       @ 1 java.lang.invoke.Invokers::getCallSiteTarget (8 bytes)
>>         @ 4 java.lang.invoke.MutableCallSite::getTarget (5 bytes)
>>       @ 14 java.lang.invoke.MethodHandle::invokeBasic (0 bytes)
>>       @ 14 java.lang.invoke.LambdaForm$BMH::reinvoke (32 bytes)
>>         @ 13 java.lang.invoke.BoundMethodHandle$Species_LD::reinvokerTarget
>> (8 bytes)
>>         @ 28 java.lang.invoke.MethodHandle::invokeBasic (0 bytes)
>>         @ 28 java.lang.invoke.LambdaForm$DMH::invokeStatic_LLLD_L (20 bytes)
>>           @ 1 java.lang.invoke.DirectMethodHandle::internalMemberName (8 
>> bytes)
>>           @ 16 java.lang.invoke.MethodHandle::linkToStatic (0 bytes)
>>           @ 16 org.jruby.runtime.invokedynamic.MathLinker::float_op_plus
>> (10 bytes)
>>             @ 6 org.jruby.RubyFloat::op_plus (14 bytes)
>>               @ 1 org.jruby.RubyBasicObject::getRuntime (8 bytes)
>>                 @ 1 org.jruby.RubyBasicObject::getMetaClass (5 bytes)
>>                 @ 4 org.jruby.RubyClass::getClassRuntime (5 bytes)
>>               @ 10 org.jruby.RubyFloat::newFloat (10 bytes)
>>                 @ 6 org.jruby.RubyFloat::<init> (15 bytes)
>>                   @ 3 org.jruby.Ruby::getFloat (5 bytes)
>>                   @ 6 org.jruby.RubyNumeric::<init> (7 bytes)
>>                     @ 3 org.jruby.RubyObject::<init> (7 bytes)
>>                       @ 3 org.jruby.RubyBasicObject::<init> (30 bytes)
>>                         @ 1 java.lang.Object::<init> (1 bytes)
>>
>> This is *great*. We're getting all paths inlined, and allocation
>> inlines all the way up to Object::<init>, so in theory escape analysis
>> could get rid of this...RIGHT? WRONG!!!
>>
>> logc appears to be missing some ouput (either the tool or the
>> LogCompilation flag are dropping information). The same block of code
>> from PrintInlining:
>>
>>                             @ 207
>> java.lang.invoke.LambdaForm$MH/1942422426::linkToCallSite (18 bytes)
>> inline (hot)
>>                               @ 1
>> java.lang.invoke.Invokers::getCallSiteTarget (8 bytes)   inline (hot)
>>                                 @ 4
>> java.lang.invoke.MutableCallSite::getTarget (5 bytes)   inline (hot)
>>                               @ 14
>> java.lang.invoke.LambdaForm$MH/1896635336::guard (80 bytes)   inline
>> (hot)
>>                                 @ 12   java.lang.Class::cast (27
>> bytes)   inline (hot)
>>                                   @ 6   java.lang.Class::isInstance (0
>> bytes)   (intrinsic)
>>                                 @ 17
>> java.lang.invoke.LambdaForm$BMH/1650319731::reinvoke (30 bytes)
>> inline (hot)
>>                                   @ 13
>> java.lang.invoke.BoundMethodHandle$Species_LL::reinvokerTarget (8
>> bytes)   inline (hot)
>>                                   @ 26
>> java.lang.invoke.LambdaForm$DMH/842171382::invokeStatic_LL_I (15
>> bytes)   inline (hot)
>>                                     @ 1
>> java.lang.invoke.DirectMethodHandle::internalMemberName (8 bytes)
>> inline (hot)
>>                                     @ 11
>> org.jruby.runtime.invokedynamic.MathLinker::floatTest (20 bytes)
>> inline (hot)
>>                                       @ 8
>> org.jruby.Ruby::isFloatReopened (5 bytes)   inline (hot)
>>                                 @ 50
>> java.lang.invoke.LambdaForm$DMH/952682386::invokeSpecial_LLLL_L (20
>> bytes)   inline (hot)
>>                                   @ 1
>> java.lang.invoke.DirectMethodHandle::internalMemberName (8 bytes)
>> inline (hot)
>>                                   @ 16
>> java.lang.invoke.LambdaForm$BMH/1698703785::reinvoke (32 bytes)
>> inline (hot)
>>                                     @ 13
>> java.lang.invoke.BoundMethodHandle$Species_LD::reinvokerTarget (8
>> bytes)   inline (hot)
>>                                     @ 28
>> java.lang.invoke.LambdaForm$DMH/590335041::invokeStatic_LLLD_L (20
>> bytes)   inline (hot)
>>                                       @ 1
>> java.lang.invoke.DirectMethodHandle::internalMemberName (8 bytes)
>> inline (hot)
>>                                       @ 16
>> org.jruby.runtime.invokedynamic.MathLinker::float_op_plus (10 bytes)
>> inline (hot)
>>                                         @ 6
>> org.jruby.RubyFloat::op_plus (14 bytes)   inline (hot)
>>                                           @ 1
>> org.jruby.RubyBasicObject::getRuntime (8 bytes)   inline (hot)
>>                                             @ 1
>> org.jruby.RubyBasicObject::getMetaClass (5 bytes)   inline (hot)
>>                                             @ 4
>> org.jruby.RubyClass::getClassRuntime (5 bytes)   inline (hot)
>>                                           @ 10
>> org.jruby.RubyFloat::newFloat (10 bytes)   inline (hot)
>>                                             @ 6
>> org.jruby.RubyFloat::<init> (15 bytes)   inline (hot)
>>                                               @ 3
>> org.jruby.Ruby::getFloat (5 bytes)   inline (hot)
>>                                               @ 6
>> org.jruby.RubyNumeric::<init> (7 bytes)   inline (hot)
>>                                                 @ 3
>> org.jruby.RubyObject::<init> (7 bytes)   inline (hot)
>>                                                   @ 3
>> org.jruby.RubyBasicObject::<init> (30 bytes)   inline (hot)
>>                                                     @ 1
>> java.lang.Object::<init> (1 bytes)   inline (hot)
>>                                 @ 76
>> java.lang.invoke.LambdaForm$DMH/952682386::invokeSpecial_LLLL_L (20
>> bytes)   call site not reached
>>
>> So *almost* everything is inlining, but one path (I believe it's the
>> failure path from GWT after talking with Christian) is not reached.
>> Because Hotspot's EA can't do partial EA, any unfollowed paths that
>> would receive the allocated object have to be considered escapes, and
>> so anywhere we're doing guarded logic (either in indy or in Java code,
>> like Fixnum overflow checks) the unfollowed paths prevent EA from
>> happening. Boo-hoo.
>>
>> At this point there's nothing I can really do. I have to guard the
>> call sites in case we don't see a Float at some point, and for Fixnum
>> overflow I have to do that boolean check in most cases. There's always
>> going to be unfollowed paths dangling off the edges of even our
>> simplest logic.
>>
>> Bottom line is that the new indy stuff is starting to really look good
>> wrt inlining, but EA is still not up to the task of eliding
>> allocations in the places we need it to.
>>
>> Thoughts?
>>
>> - Charlie
>> _______________________________________________
>> mlvm-dev mailing list
>> mlvm-dev@openjdk.java.net
>> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
> _______________________________________________
> mlvm-dev mailing list
> mlvm-dev@openjdk.java.net
> http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev
_______________________________________________
mlvm-dev mailing list
mlvm-dev@openjdk.java.net
http://mail.openjdk.java.net/mailman/listinfo/mlvm-dev

Reply via email to