On Dec 5, 10:31 am, Rémi Forax <[email protected]> wrote:
> On 12/05/2010 04:58 PM, Subbu Sastry wrote:
>
>
>
>
>
> > On Dec 4, 9:08 am, R mi Forax<[email protected]>  wrote:
> >> On 12/04/2010 03:08 PM, Guillaume Laforge wrote:
>
> >>> On Sat, Dec 4, 2010 at 13:51, Charles Oliver Nutter
> >>> <[email protected]<mailto:[email protected]>>  wrote:
> >>>      [...]
> >>>      >  This is equivalent to implement your own EA + inlining
> >>>      >  in your runtime instead of relying on the EA done by hotspot.
> >>>      Perhaps, except that implementing my own EA + inlining is a lot 
> >>> harder
> >>>      :) My primary goal with JRuby optimization is to just get all the
> >>>      pieces in the right place so that the JVM can do the rest. I don't
> >>>      want to write my own optimizers if I can help it.
> >>> What about writing your own VM? ;-)
> >> Because
> >>     I don't want to write a generational copy-collector GC
> >>       in fact, I don't want to debug it :)
> >>     I don't want to write a register allocator
> >>       + all the optimizations done on SSA nodes
> >>     I don't want to know how to implement memory
> >>       fences on multiple hardware architectures.
> >>     I don't want to do all that stuff alone with no
> >>       community support.
>
> >> That's why I use the JVM.
> >> But this approach has one main drawback.
> >> The input of the VM is the bytecode is typed [1],
> >> like any assembler.
>
> >> So any runtime of a dynamic language has only one
> >> thing to do, transform the dynamic language to
> >> a typed bytecode.
> > I am not convinced that this is fully true.   I think for all
> > practical purposes, as far as implementing languages on top of the VM,
> > we can forget the fact that the JVM is actually software and treat it
> > as an implementation target.  Much in the same way that CPU micro-
> > architectures implement a whole host of sophisticated optimizations to
> > speed up the execution of machine code, and that doesn't eliminate the
> > need to implement a whole host of optimizations in the JVM, I dont
> > think you get performance for free simply because you target the JVM.
> > Just as a CPU frees the JVM implementer from having to implement
> > certain kind of optimizations (because of branch prediction, caches,
> > instruction scheduling, etc.), the JVM frees the dynamic language
> > implementer from having to worry about certain kind of opts (GC,
> > allocation, targetting a large variety of CPUs, etc).  The JVM cannot
> > optimize the semantics of the dynamic language.  That is the job of
> > the VM that is implemented on the JVM.  I agree that the VM has to be
> > judicious about what it does, but that is no different from the JVM
> > having to be judicious about what it does based on the characteristics
> > of the CPU it is targetting.  So, while the JVM is extremely smart, it
> > can optimize only so much (till the JVM get so smart as to realize the
> > promise of the Futamura projections :) at which time implementers of
> > dynamic languages can satisfy themselves with writing an interpreter).
>
> > So, while it is true that the runtime of a dynamic language has to
> > transform the dynamic language to a typed bytecode, I think it also
> > means that the rntime has to implement any optimizations (inlining,
> > type guards, unboxing, EA, etc.) in the dynamic language VM which the
> > JVM cannot see through.
>
> My first point is that the ability to statically type a dynamic
> language is a silver bullet optimization. So it's a must have.

Agreed.  But, it may not be possible to do this except as boxed types,
at least in Ruby.

Fixnum -> RubyFixnum
String -> RubyString
Hash -> RubyHash
etc.

So, when statically typed to the JVM, everything maps to non-primitive
"a*" bytecodes.  So, an across-the-board unboxing set of optimizations
would have to be implemented in the VM to map directly to Java classes
where possible or explode the boxed objects where possible.

> Otherwise, the runtime should not try to implement any optimizations
> but a selective list known to improve the performance.
> By example, implementing loop-unrolling is stupid, the JVM already
> does a great job here.

I agree with you broadly.  But, I wouldn't rule out loop-unrolling on
principle.  It may be an useful tool in the arsenal at a more advanced
level.  To give you a Ruby example, consider this:

(10..20).inject(0) { |s,i| s+i }

which computes the sum of numbers 10 through 20.  Let us assume that
the Ruby VM is smart enough (which it will be one of these days) to
inline the inject code and the block code.  After this inlining is
done, what you are left with is a while-loop.

i = 0
sum = 0
a = (10..20)
while (i < 10) do
  sum += a[i]
  i += 1
end

I am yet to catch up with the JVM and what it can do, but presumably
loop unrolling is only on for-loops, not while-loops.  So, assuming
the JVM doesn't unroll while loops, the Ruby VM would have to map this
to a for-loop, or unroll the loop itself.  In this Ruby example (and
presumably more broadly -- for arrays, ranges, etc.), you might
instead implement a Ruby-specific opt that unrolls the loop and
computes the sum by exploiting Ruby array semantics without actually
doing a dataflow / cfg-based set of optimizations (inline method,
inline block, unroll loop, propagate constants), and this might be
faster and simpler to do in the Ruby dynopt VM than in the JVM.

Or if you implement code specialization inside Ruby (Ruby's open
classes makes it easier to do in Ruby than Java, I think, without
having thought too much yet), you might decide to unroll loops to get
the full benefits of specialization.

But, you are right that loop unrolling wouldn't be the first
optimization you would target, for sure .. it is one of the later
ones.  But once you decide to implement a full dynamic language VM,
you might discover that many of these opts that might seem daunting
initially might not be as much once the optimizing VM infrastructure
is already in place.

> So the question at 1.000.000 $ is which optimization should be
> implemented ?

I would say inline caches (which most dynamic language VMs presumably
implement), method and block inlining, unboxing are probably
necessary, at least for Ruby.  Beyond that, I would say it is language
specific.

But, I think the most crucial requirement in all this, which you also
identified in your previous email, is a setup that enables code
replacement and invalidation.  Without relatively cheap invalidation
(either in the dynlang VM or via hooks into the JVM), most of the
dynlang VM opts will become expensive.

Off the top of my head, without having spent a lot of time thinking
about this, Ruby VM opts could be considered to be a code versioning
technique on methods.  Any condition that invalidates an existing opt
effectively bumps up a version number and invalidates all previous
code versions of that method.  This mechanism is anyway needed for
implementing Ruby open classes.  So, all kinds of opt guards and type
guards effectively become a single code version guard at optimization
sites -- and you can then start thinking about optimizing the
placement of code version guards (ex: moving method version guards
outside loops, having coarse class guards on method entry, etc.).  So,
if code versioning can somehow be supported efficiently (or support
for it in the JVM is available -- GCing of code versions, code caches,
bytecode limits, etc.), it enables a bunch of opts. in the dynlang
VM.  You can generate new versions of code and throw away old
versions.

Subbu.


> > Subbu.
>
> R mi

-- 
You received this message because you are subscribed to the Google Groups "JVM 
Languages" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/jvm-languages?hl=en.

Reply via email to