Okay last mail for the day :-)

However--as I have pointed out to Charlie a number of times--in practice,
> classes are basically frozen after *some* time. In Rails, pretty much all
> classes reach their final stage at the end of the bootup phase. However,
> since JRuby only sees a parse phase and then a generic "runtime" it's not
> possible for it to determine when that has happened.
>


This may not be as hard.  If the assumption is that the compiler is going to
optimize for the case where all class mods. are going to happen in some
startup phase, then there are two approaches:

1. During the initial interpretation phase, track the profile of # of class
mods, and implicitly plot a curve of # mods across time.  And, at the point
when slope of this curve starts flattening out, you assume that you are
getting out of the boot phase.

2. Use a variant of exponential/random backoff technique: i.e. each time a
class mod. is encountered, you back off compilation for another X amount of
time where X is varied after each class mod.   If you also add a condition
that you need to hit at least N clean backoff phases (those that don't see
any class mods).  At that time, start compilation.

Note that these techniques will not work as well for cases where this code
modification profile isn't met.  Alternatively, you could develop different
compilation strategies for different code modification profiles ... One for
Rails, one for something else, etc. and use commandline option to select the
appropriate compilation strategy.


>
>>
>> 5. Dynamic code gen: This is the various forms of eval.  This means that
>> eval calls are hard boundaries for optimization since they can modify the
>> execution context of the currently executing code.  There is no clear way I
>> can think of at this time of getting around the performance penalties
>> associated with it.  But, I can imagine special case optimizations including
>> analyzing the target string, where it is known, and where the binding
>> context is local.
>
>
> This is extremely common, but mainly using the class_eval and instance_eval
> forms. These forms are EXACTLY equivalent to simply parsing and executing
> the code in the class or instance context. For instance:
>
> class Yehuda
> end
>
> Yehuda.class_eval <<-RUBY
>   def omg
>     "OMG"
>   end
> RUBY
>
> is exactly equivalent to:
>
> class Yehuda
>   def omg
>     "OMG"
>   end
> end
>
> As a result, I don't see why there are any special performance implications
> associated. There is the one-time cost of calculating the String, but then
> it should be identical to evaluating the code when requiring a file.
>


There is also regular eval as in eval("a=x+y").  The reason this introduces
a performance penalty is because before the eval, you have to dump all live
variables to memory, and after the eval, restore all live variables from
memory.  So, consider this:

a = 5
b = 3
c = 10
eval("x = a+b+c")
y = a + b
z = x + c

Normally, if you had x = a+b+c instead of the eval form, you would have
constant propagated and eliminated most of the instructions.  But, now, not
only can you not do that, you have to actually store a,b,c, to memory before
the eval, and then load them back along with x from memory after the eval.

But, Tom has a good argument that eval is probably not used as often, at
least not in loops.  In addition the parse cost of the eval may be
substantially higher, so, methods that use eval may not benefit much from
optimizing surrounding code anyway, so throwing up our hands and doing the
simple thing as above (allocate a frame, load/store live vars. to memory)
might be good enough.


>
>> Now, consider this code snippet:
>> ------
>> def foo(n,x)
>>   proc do
>>     n+1
>>   end
>> end
>>
>> def bar(i)
>>   proc do
>>     t = foo(i, "hello")
>>     send("eval", "puts x, n", t)
>>   end
>> end
>>
>> delayed_eval_procs = (1..10).collect { |i| bar(i) }
>> ... go round the world, do things, and come back ...
>> delayed_eval_procs.each { |p| p.call }
>> ------
>>
>> This is a contrived example, but basically this means you have to keep
>> around frames for long times till they are GCed.  In this case
>> delayed_eval_procs keeps around a live ref to the 20 frames created by foo
>> and bar.
>
>
> However, the only case where you care about the backref information in
> frames (for instance), means that you only care about the LAST backref that
> is generated, which means that you only need one slot. Are you thinking
> otherwise? If so, why?
>

I should look up what backref is.  We may be talking of two different
things.  In this example, for foo, every execution of foo has to allocate a
heap frame to store variables n, and x.  For bar, every execution has to
create a heap frame to store variables i, and t.  And, there will 10
instances of each frame.

It is good to hear from everyone about common code patterns and what the
common scenarios are, and what needs to be targeted.  But, the problem often
is that ensuring correctness for the 1% (or even 0.1%) uncommon case might
effectively block good performance for the 99% common case.  The specifics
will depend on the specific case being considered.

Fixnums are probably a good example.  In the absence of OSR or external
guarantees that fixnum methods are not modified, you are forced either to
not optimize fixnums to regular ints, or introduce guards before most calls
to check for class mods.  We could compile optimistically assuming that
fixnum.+ is not modified ever, with the strategy that if fixnum.+ is indeed
modified, we will back off and deoptimize.  But, this requires ability to do
an on-stack replacement of currently executing code.  But since you dont
control the JVM, you won't be able to do an OSR.  Barring other tricks, this
effectively kills optimistic compilation.  I am still hoping some trick can
be found that enables optimistic compilation without requiring external
(programmer) guarantees, but nothing has turned up yet so far.

On the other hand, you could introduce guards after all method calls to
check that fixnum.+ is not modified.  This is definitely an option, but is a
lot of overhead (for numeric computations relative to most other languages)
simply because of the possibility that someone somewhere has decided that
overriding fixnum.+ is a good thing!

So, this is one example where correctness requirements for the uncommon case
gets in the way of higher performance for the common case.  eval is another
example where the 1% case gets in the way, but Tom is right that parsing
overhead is probably the higher cost there anyway.  So, we should
investigate in greater detail the different 99%-1% scenarios to investigate
what it takes to not let the 1% uncommon case not hit performance for the
99% common case scenario.

Note that I am using relatively loose language w.r.t. 'performance' -- many
of the uses of this word begs the question of 'relative to what'.  At some
point, this language also needs to be tightened up.

Subbu.

Reply via email to