Re: [jruby-dev] JRuby IR: First thoughts

Charles Oliver Nutter Fri, 08 May 2009 15:53:53 -0700

Subramanya Sastry wrote:

I may have been wrong. I had a chance to think through this a littlebit more.
Consider this ruby code:

i = 5
v1 = i + 1
some_random_method_call()
v2 = i + 1
In this code snippet, the second '+' might not get optimized because'some_random_method_call' could monkeypatch Fixnum.+ in the course ofwhatever it does. This means that calls form hard walls beyond whichyou cannot hoist method and class modification guards (unless of course,you can precisely determine the set of methods any given call can modify-- for example, that 'some_random_method_call' will not modify Fixnum.+).

This is certainly a challenge for entirely eliminating guards, and Ibelieve even the fastest Ruby implementations currently are unable toremove those guards completely.

In this example, if we can at least prove that i, v1, and v2 will alwaysbe Fixnum when Fixnum#+ has not been replaced, we can emit guardedoptimized versions alongside deoptimized versions. Because Fixnum#+falls on the "rarely or never replaced" end of the spectrum, reducing +operations to a simple type guard plus "Fixnum has been modified" guardis probably the best we can do without code replacement. More on codereplacement thoughts later.

So, this is somewhat similar to pointer alias analysis in C programs.In C programs, the ability to pass around pointers and manipulate themin an unrestricted fashion effectively restricts the kind ofoptimizations that a compiler can do. Similarly, in Ruby programs, itseems to me that the ability for code to arbitrarily modify codeeffectively restricts the kind of optimizations that a compiler can do.Open classes and eval and all of that are to Ruby what pointers are toC. Potentially powerful for the programmer, but hell for the compiler.More on this further below after discussion of the multi-threaded scenario.

It is true that we generally must assume code modification events willhave a large impact on core methods. This means we need to always guard,even if such modifications are exceedingly rare. However it may bepessimistic to assume such guards will be unacceptably slow incomparison to the rest of the system, when this may or may not be thecase. Currently, all method calls in normal execution scenarios mustread a volatile field for the call site invalidation guard. Althoughit's difficult to measure accurately, the volatile guard should intheory be a very large expense, and John Rose said as much when Idescribed our current system to him. But it may be possible to make suchguards non-volatile if we have other mechanisms for triggering memorysynchronization that those non-volatile calls could see.

One theory I had was that since we periodically ping another volatilefield for cross-thread events (kill, raise), there may be a way we canrely on those events to trigger our memory sync. But I am unfamiliarenough with the Java memory model that I do not know if such avolatility-reducing mechanism would work or be reliable.

If thread 2 modifies Fixnum.+, you might expect the modification to bereflected in thread 1 at some point. If you hoist method/class guardsall the way outside the loop and convert the i+1 to integer addition,code modifications from thread 2 won't propagate to thread 1.But, in this scenario, I would argue that any code that relies onbehavior like this is broken. This is effectively a race condition anddifferent ruby implementations will yield different behavior. In fact,code modification by meta-programming is effectively modifying classmeta-data. So, concurrent code modification is not very different fromconcurrent writes to program data. I am not sure if the metaprogramming model treats modification to open classes as changes to aprogram-visible meta-data. If it were, then programmer can thensynchronize on that meta-data object before patching code. But, Iexpect this is not the case because that would then force every methodcall to acquire a lock on the class meta-data.

Again there's some unknowns here, but the likelihood of method tablemodifications happening across threads (or at least the likelihood ofthose changes producing runtime behavioral effects across threads) isprobably very low. The majority of Ruby libraries and frameworks haveeither been designed with single threads in mind--i.e. the expectationis that a given thread would exit the library before a code modificationwould happen, if it ever did--or have been designed with multi-threadingin mind--making code modifications at boot time or with an expectationthat only one thread would ever see them (like modifying a thread-localobject to add new methods or modules). Both cases have few expectationsabout vigorous or even common cross-thread code modification events.

Where does this leave us? Consider a Rails app where there is aUpdateCodeController with a method called load_new_code(C, m) whichbasically updates running code with a new version of a method 'm' inclass 'C' (lets say a bug fix). This discussion of code modificationthen boils down to asking the question: at what point does the newmethod become available? In the absence of code modificationsynchronization (on C's metadata object), the ruby implementation canblock on this request while continuing to run the old version of 'm'till it is convenient to switch over to new code!
This is obviously a contrived example, but the point of this is: Ifthere is no mechanism for the programmer to force code modifications tobe visible in a ruby implementation, the ruby implementation hasflexibility over how long it turns a blind eye to code modifications ina thread other than the currently executing thread. So, what you needto worry about is figuring out the hard boundaries for method and classguards assuming a single-threaded program and optimizing for that scenario.
Am I missing something?

No, I think this is right on. And to cement this even further: we areblazing a very new trail when it comes to parallel-executing Ruby, whichmeans we may be able to *set* expectations for cross-threadcode-modification behavior. A user moving their library to JRuby willthen (as now) have to pay a bit more attention to how that librarybehaves in a (probably) more formal and (definitely) better-specifiedthreading environment.

Now, back to the single-threaded scenario and the earlier example:

i = 5
v1 = i + 1
some_random_method_call()
v2 = i + 1
As discussed earlier, the problem here is determining the set of methodsthat a call can modify, and this is somewhat similar to the C pointeraliasing problem. That by itself is "not a big deal" because unlike Cpointers, code modification is much rarer. So, in the rubyimplementation, you could optimistically / speculatively optimize a tonof code assuming NO code modifications and move the burden ofinvalidation from call sites (which are everywhere) to the centralizedclass meta-data structures i.e. you install listeners/traps/callbacks(whatever you want to call them) on the class and when a method ismodified, you invalidate all optimized code that assumes that the methodis "static". This is non-trivial (will require on-stack replacement ofcode), but at least conceivable. But, since you are compiling for theJVM (or LLVM in the case of MacRuby), once you JIT, you effectively cedecontrol. You dont have a mechanism to invalidate JIT-ted code. So, atthis time, I am not sure if you can really get around this (extremelypessimistic) constraint. Only ruby implementations that can control thecompilation stack all the way to machine code (or have hooks toinvalidate / modify existing code) might be able to get around this. Iam curious how MacRuby-LLVM combination is tackling this ...

As early as last fall, JRuby's call site invalidation was "active" asyou describe, and a list of call sites attached to a method object werephysically flushed when changes occurred in that method's "home"hierarchy likely to render such caching invalid. This approach wasoriginally abandoned due to the complexity of ensuring multiple threadsreaching the same call site at the same time would not accidentally missan invalidation event and cache bad code forever. This is also, in fact,an open question about the invokedynamic active invalidation; when I andothers brought it up to John Rose, he recognized that most dynamic callswould still need to have "passive" invalidation guards even afterlinking. The original call site could still be actively invalidated, butwithout introducing locks at call sites (sure performance death) there'sno simple way to avoid "just in case" passive guards as well.

There is a way for us to do active code replacement without modifyingon-stack code: lift replaceable segments to virtual calls.

If we assume there's a way to coarsen our memory synchronization withoutvisibly impacting cross-thread event propagation, then keeping a list ofcall site *references* might enable active replacement. This is similarto how the Microsoft DLR optimizes call paths. The general idea would bethat calls pass through a generated shim, one that initially just does aslow and safe dynamic call. In response to repeated calls with the samereceiver and target method, we could replace these sites withnewly-generated stubs that include a fast (potentially *static-calling*path and a type guard leading to the slow, safe version. As additionaltypes come in, the entire stub would be replaced, either settling on ahandful of fast paths or eventually going "full dynamic" and revertingback to the safest, slowest version.

The synchronization effects would be mitigated by always having callsite replacement/stub generation include a time-sensitive token as we dotoday, but that token would only need to be checked when updating thecall site...not for every call. Active flushing would succeed inreplacing the call site only if while buiding the stub another threadhad not built a newer one, in which case the flush would be abandonedfor the newer call site version. The delay in propagating the updatedcall site reference would be subject only to Java memory model/memorysynchronization requirements.

In order to mitigate the problem of a call site going polymorphic andHotspot deoptimizing the code that contains it, active invalidationscould optionally also re-compile that containing code. This would be ararer case subject to profiling, since updating that code wouldnecessarily trigger another invalidation event, which would cause othermethods to recompile, and so on. This sounds like on-stack replacementwould be necessary, since ideally we would want to be able to replaceexisting bodies of code. But the cost here is not in having invalid codebodies remain on the stack...the only cost is having poorly optimizedbodies remain on the stack. On stack replacement would help us makereoptimization propagate system-wide more quickly, but it is *notnecessary to guarantee correctness*.

There's a lot of "ifs" here though, especially around coarsening memorysynchronization. We may want to get together to whiteboard some of this out.


- Charlie


---------------------------------------------------------------------
To unsubscribe from this list, please visit:

   http://xircles.codehaus.org/manage_email

Re: [jruby-dev] JRuby IR: First thoughts

Reply via email to