Re: [jvm-l] Battling the code size monster

Rémi Forax Sat, 11 Sep 2010 10:40:37 -0700

Le 11/09/2010 17:55, Jim Baker a écrit :

On Sat, Sep 11, 2010 at 6:52 AM, Rémi Forax <[email protected]<mailto:[email protected]>> wrote:
    Le 11/09/2010 04:11, Charles Oliver Nutter a écrit :

        On Fri, Sep 10, 2010 at 4:53 PM, Rémi Forax<[email protected]
        <mailto:[email protected]>>  wrote:

            Like Matt says, I do variable-uses analysis.

        Our new compiler will be able to do that. It's too hard to do
        against
        the AST (or hard enough that I don't want to try).


    It's not hard !
    create a fresh scope, when you find a new variable wich
    is not in the scope, find it in the old scope, and register that
    the part of the script need this variable.


        Of course banking on "the new compiler" is always a gamble :)
If we use a new compiler in Jython, it will only be for hot codeobjects, or those that we have gradual typing information on. I'mpretty certain the current code layout is just fine for everythingelse. So this strategy mitigates any gamble. But see below...
            You can push state on heap only between to block of N lines.
            It's a kind of register spilling.

        I don't understand. Can you elaborate?


    Because you know which variables you need,
    for each block of code, you only use local variables,
    between two blocks of code, you get the values of local
    variables you will need later and store them in
    heap allocated objects. Then you take the values
    from the heap allocated objects to the next block of code.
Now that's an interesting thought. Ever since the JVM language summitand Cliff Click's analysis of some Jython code, it's been quiteobvious that storing every Python local name into the heap-allocatedPyFrame associated with that function call is just a performancedrain. Among other things, it defeats escape analysis. So it's good tothink about this again.
Maybe if we did some sort of barrier where we only did the stores intothe PyFrame, either before a downcall or before coming out of anexception, we could preserve Python frame semantics without alwayspaying their cost. This would be especially true if we guardeddowncalls such that those not calling back into Python don't actuallyinvoke the barrier. So in particular this could mean something withtight loops, often similar to this contrived code
for i in xrange(1000):
    for j in xrange(1000):
        a = i * j
        # ...
could get a lot cheaper, and ideally the JVM would elide away theallocations of PyInteger (wrapping int) - xrange certainly has no needto access the frame.


Here, you should do a typechecking analysis to prove that i and j can
be encoded as native java integer and generate two for loops.

see page 8 of my presentation at last JVM Summit
(http://wiki.jvmlangsummit.com/images/2/2a/Jvmsummit-phpreboot.pdf)

We do something conceptually similar in our support of ContextGuard tominimize callpath costs in the with-statement. In particular, thisallowed us to significantly reduce locking overhead; see TobiasIvarsson's blog post on thishttp://journal.thobe.org/2009/07/improving-performance-in-jython.html
Also, we actually do barriers in our support for the Python yieldstatement in generators/coroutines, where we store the Java localvariables (not visible to Python itself, these are the unnamedvariables such as the iterator returned from the above xrangefunction) in the PyFrame so they can be reconstituted upon the nextiteration of the generator.
So this approach is something we should definitely look at - it wouldrequire minimal hacking on the current compiler, plus the guard support.


I'm not a big fan of guards without invokedynamic.
You usually end up by generating the fast path *and* the slow path
and reach the maximum code size.

Of course, it's ok, if you are able to bailout into an interpreter for
slow/uncommon cases.

    As you already know, I don't think that an IR-based compiler
    is a good idea. javac is not an IR-based compiler and
    Java is fast.
I think we have been convincing ourselves that we need an IR-basedcompiler. Tobias worked on one using SSA, but it proved to be a fairamount of work, and it has not been completed. Jython's currentcompiler is very simple/naive - AST directed, with the modest amountof scope analysis required for Python's lexical scoping (approx 360lines).


And what about adding a typechecking analysis.
After all, the bytecode is statically typed.

It has been pretty easy to hack on this compiler, whether a newlanguage feature or the occasional bug fix. This is because of thestrong degree of locality and easy correspondence from broken unittest to the resulting code path.
Remi, you may be right - maybe we just need to continue to generatenaive bytecode and let the JVM sort it out. Just somewhat better naivebytecode, that's all.


It depends what 'naive' means. Java is fast because the VM optimize
common bytecode patterns.
So if you generate the same code as javac, I mean literally,
then you will have performance :)

That why I think that creating a compiler that uses a SSA form
is a bad idea. SSA form is used in the backend of traditional compiler
(I mean since 2000) to be able to generate efficient assembler.
If you are on top of a JVM, the JIT already does SSA optimizations.

Your runtime environment should:
 - a type profiler that record type of variables
   Just separate objects from primitives can be sufficient.
 - a taken/untaken branches recorder
   to avoid type pollution from untaken branches.
 - a coarse grained live variable analysis
    to get a use-define info of any variables.
 - a data-flow analyzer that propagate exceptional events:
    a function does reflection on stack frames, etc.
 - a fast-forward typechecking analysis
    you can prove type assumptions derived from the type profiler.
    (This will avoid to generate guards where not necessary)

Your generator should:

- use previously calculated types to generate a specialized code for afunction

 - generate code-preamble and post-code instructions
   to go back and forth between the interpreter and the generated code.

- generate inlining cache (or better share inlining cache with theinterpreter :)- generate code to go back into the interpreter if an unusual thinghappens.

   an untaken branch is taken, etc
 - intrinsics some common constructs.
   A call to math.sin() should be a direct call, etc.
 - inline bloc of code like closures, remove yield, etc.

I'm surely forget something.
Also some items can be hard without invokedynamic/method handle
or the corresponding abstractions
(mutable polymorphic callsites and function pointers)


- Jim


Rémi

--
You received this message because you are subscribed to the Google Groups "JVM 
Languages" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/jvm-languages?hl=en.

Re: [jvm-l] Battling the code size monster

Reply via email to