Okay last mail for the day :-) However--as I have pointed out to Charlie a number of times--in practice, > classes are basically frozen after *some* time. In Rails, pretty much all > classes reach their final stage at the end of the bootup phase. However, > since JRuby only sees a parse phase and then a generic "runtime" it's not > possible for it to determine when that has happened. >
This may not be as hard. If the assumption is that the compiler is going to optimize for the case where all class mods. are going to happen in some startup phase, then there are two approaches: 1. During the initial interpretation phase, track the profile of # of class mods, and implicitly plot a curve of # mods across time. And, at the point when slope of this curve starts flattening out, you assume that you are getting out of the boot phase. 2. Use a variant of exponential/random backoff technique: i.e. each time a class mod. is encountered, you back off compilation for another X amount of time where X is varied after each class mod. If you also add a condition that you need to hit at least N clean backoff phases (those that don't see any class mods). At that time, start compilation. Note that these techniques will not work as well for cases where this code modification profile isn't met. Alternatively, you could develop different compilation strategies for different code modification profiles ... One for Rails, one for something else, etc. and use commandline option to select the appropriate compilation strategy. > >> >> 5. Dynamic code gen: This is the various forms of eval. This means that >> eval calls are hard boundaries for optimization since they can modify the >> execution context of the currently executing code. There is no clear way I >> can think of at this time of getting around the performance penalties >> associated with it. But, I can imagine special case optimizations including >> analyzing the target string, where it is known, and where the binding >> context is local. > > > This is extremely common, but mainly using the class_eval and instance_eval > forms. These forms are EXACTLY equivalent to simply parsing and executing > the code in the class or instance context. For instance: > > class Yehuda > end > > Yehuda.class_eval <<-RUBY > def omg > "OMG" > end > RUBY > > is exactly equivalent to: > > class Yehuda > def omg > "OMG" > end > end > > As a result, I don't see why there are any special performance implications > associated. There is the one-time cost of calculating the String, but then > it should be identical to evaluating the code when requiring a file. > There is also regular eval as in eval("a=x+y"). The reason this introduces a performance penalty is because before the eval, you have to dump all live variables to memory, and after the eval, restore all live variables from memory. So, consider this: a = 5 b = 3 c = 10 eval("x = a+b+c") y = a + b z = x + c Normally, if you had x = a+b+c instead of the eval form, you would have constant propagated and eliminated most of the instructions. But, now, not only can you not do that, you have to actually store a,b,c, to memory before the eval, and then load them back along with x from memory after the eval. But, Tom has a good argument that eval is probably not used as often, at least not in loops. In addition the parse cost of the eval may be substantially higher, so, methods that use eval may not benefit much from optimizing surrounding code anyway, so throwing up our hands and doing the simple thing as above (allocate a frame, load/store live vars. to memory) might be good enough. > >> Now, consider this code snippet: >> ------ >> def foo(n,x) >> proc do >> n+1 >> end >> end >> >> def bar(i) >> proc do >> t = foo(i, "hello") >> send("eval", "puts x, n", t) >> end >> end >> >> delayed_eval_procs = (1..10).collect { |i| bar(i) } >> ... go round the world, do things, and come back ... >> delayed_eval_procs.each { |p| p.call } >> ------ >> >> This is a contrived example, but basically this means you have to keep >> around frames for long times till they are GCed. In this case >> delayed_eval_procs keeps around a live ref to the 20 frames created by foo >> and bar. > > > However, the only case where you care about the backref information in > frames (for instance), means that you only care about the LAST backref that > is generated, which means that you only need one slot. Are you thinking > otherwise? If so, why? > I should look up what backref is. We may be talking of two different things. In this example, for foo, every execution of foo has to allocate a heap frame to store variables n, and x. For bar, every execution has to create a heap frame to store variables i, and t. And, there will 10 instances of each frame. It is good to hear from everyone about common code patterns and what the common scenarios are, and what needs to be targeted. But, the problem often is that ensuring correctness for the 1% (or even 0.1%) uncommon case might effectively block good performance for the 99% common case. The specifics will depend on the specific case being considered. Fixnums are probably a good example. In the absence of OSR or external guarantees that fixnum methods are not modified, you are forced either to not optimize fixnums to regular ints, or introduce guards before most calls to check for class mods. We could compile optimistically assuming that fixnum.+ is not modified ever, with the strategy that if fixnum.+ is indeed modified, we will back off and deoptimize. But, this requires ability to do an on-stack replacement of currently executing code. But since you dont control the JVM, you won't be able to do an OSR. Barring other tricks, this effectively kills optimistic compilation. I am still hoping some trick can be found that enables optimistic compilation without requiring external (programmer) guarantees, but nothing has turned up yet so far. On the other hand, you could introduce guards after all method calls to check that fixnum.+ is not modified. This is definitely an option, but is a lot of overhead (for numeric computations relative to most other languages) simply because of the possibility that someone somewhere has decided that overriding fixnum.+ is a good thing! So, this is one example where correctness requirements for the uncommon case gets in the way of higher performance for the common case. eval is another example where the 1% case gets in the way, but Tom is right that parsing overhead is probably the higher cost there anyway. So, we should investigate in greater detail the different 99%-1% scenarios to investigate what it takes to not let the 1% uncommon case not hit performance for the 99% common case scenario. Note that I am using relatively loose language w.r.t. 'performance' -- many of the uses of this word begs the question of 'relative to what'. At some point, this language also needs to be tightened up. Subbu.