Subramanya Sastry wrote:
I may have been wrong. I had a chance to think through this a little bit more.

Consider this ruby code:

i = 5
v1 = i + 1
some_random_method_call()
v2 = i + 1

In this code snippet, the second '+' might not get optimized because 'some_random_method_call' could monkeypatch Fixnum.+ in the course of whatever it does. This means that calls form hard walls beyond which you cannot hoist method and class modification guards (unless of course, you can precisely determine the set of methods any given call can modify -- for example, that 'some_random_method_call' will not modify Fixnum.+).

This is certainly a challenge for entirely eliminating guards, and I believe even the fastest Ruby implementations currently are unable to remove those guards completely.

In this example, if we can at least prove that i, v1, and v2 will always be Fixnum when Fixnum#+ has not been replaced, we can emit guarded optimized versions alongside deoptimized versions. Because Fixnum#+ falls on the "rarely or never replaced" end of the spectrum, reducing + operations to a simple type guard plus "Fixnum has been modified" guard is probably the best we can do without code replacement. More on code replacement thoughts later.

So, this is somewhat similar to pointer alias analysis in C programs. In C programs, the ability to pass around pointers and manipulate them in an unrestricted fashion effectively restricts the kind of optimizations that a compiler can do. Similarly, in Ruby programs, it seems to me that the ability for code to arbitrarily modify code effectively restricts the kind of optimizations that a compiler can do. Open classes and eval and all of that are to Ruby what pointers are to C. Potentially powerful for the programmer, but hell for the compiler. More on this further below after discussion of the multi-threaded scenario.

It is true that we generally must assume code modification events will have a large impact on core methods. This means we need to always guard, even if such modifications are exceedingly rare. However it may be pessimistic to assume such guards will be unacceptably slow in comparison to the rest of the system, when this may or may not be the case. Currently, all method calls in normal execution scenarios must read a volatile field for the call site invalidation guard. Although it's difficult to measure accurately, the volatile guard should in theory be a very large expense, and John Rose said as much when I described our current system to him. But it may be possible to make such guards non-volatile if we have other mechanisms for triggering memory synchronization that those non-volatile calls could see.

One theory I had was that since we periodically ping another volatile field for cross-thread events (kill, raise), there may be a way we can rely on those events to trigger our memory sync. But I am unfamiliar enough with the Java memory model that I do not know if such a volatility-reducing mechanism would work or be reliable.

If thread 2 modifies Fixnum.+, you might expect the modification to be reflected in thread 1 at some point. If you hoist method/class guards all the way outside the loop and convert the i+1 to integer addition, code modifications from thread 2 won't propagate to thread 1. But, in this scenario, I would argue that any code that relies on behavior like this is broken. This is effectively a race condition and different ruby implementations will yield different behavior. In fact, code modification by meta-programming is effectively modifying class meta-data. So, concurrent code modification is not very different from concurrent writes to program data. I am not sure if the meta programming model treats modification to open classes as changes to a program-visible meta-data. If it were, then programmer can then synchronize on that meta-data object before patching code. But, I expect this is not the case because that would then force every method call to acquire a lock on the class meta-data.

Again there's some unknowns here, but the likelihood of method table modifications happening across threads (or at least the likelihood of those changes producing runtime behavioral effects across threads) is probably very low. The majority of Ruby libraries and frameworks have either been designed with single threads in mind--i.e. the expectation is that a given thread would exit the library before a code modification would happen, if it ever did--or have been designed with multi-threading in mind--making code modifications at boot time or with an expectation that only one thread would ever see them (like modifying a thread-local object to add new methods or modules). Both cases have few expectations about vigorous or even common cross-thread code modification events.

Where does this leave us? Consider a Rails app where there is a UpdateCodeController with a method called load_new_code(C, m) which basically updates running code with a new version of a method 'm' in class 'C' (lets say a bug fix). This discussion of code modification then boils down to asking the question: at what point does the new method become available? In the absence of code modification synchronization (on C's metadata object), the ruby implementation can block on this request while continuing to run the old version of 'm' till it is convenient to switch over to new code!

This is obviously a contrived example, but the point of this is: If there is no mechanism for the programmer to force code modifications to be visible in a ruby implementation, the ruby implementation has flexibility over how long it turns a blind eye to code modifications in a thread other than the currently executing thread. So, what you need to worry about is figuring out the hard boundaries for method and class guards assuming a single-threaded program and optimizing for that scenario.

Am I missing something?

No, I think this is right on. And to cement this even further: we are blazing a very new trail when it comes to parallel-executing Ruby, which means we may be able to *set* expectations for cross-thread code-modification behavior. A user moving their library to JRuby will then (as now) have to pay a bit more attention to how that library behaves in a (probably) more formal and (definitely) better-specified threading environment.

Now, back to the single-threaded scenario and the earlier example:

i = 5
v1 = i + 1
some_random_method_call()
v2 = i + 1

As discussed earlier, the problem here is determining the set of methods that a call can modify, and this is somewhat similar to the C pointer aliasing problem. That by itself is "not a big deal" because unlike C pointers, code modification is much rarer. So, in the ruby implementation, you could optimistically / speculatively optimize a ton of code assuming NO code modifications and move the burden of invalidation from call sites (which are everywhere) to the centralized class meta-data structures i.e. you install listeners/traps/callbacks (whatever you want to call them) on the class and when a method is modified, you invalidate all optimized code that assumes that the method is "static". This is non-trivial (will require on-stack replacement of code), but at least conceivable. But, since you are compiling for the JVM (or LLVM in the case of MacRuby), once you JIT, you effectively cede control. You dont have a mechanism to invalidate JIT-ted code. So, at this time, I am not sure if you can really get around this (extremely pessimistic) constraint. Only ruby implementations that can control the compilation stack all the way to machine code (or have hooks to invalidate / modify existing code) might be able to get around this. I am curious how MacRuby-LLVM combination is tackling this ...

As early as last fall, JRuby's call site invalidation was "active" as you describe, and a list of call sites attached to a method object were physically flushed when changes occurred in that method's "home" hierarchy likely to render such caching invalid. This approach was originally abandoned due to the complexity of ensuring multiple threads reaching the same call site at the same time would not accidentally miss an invalidation event and cache bad code forever. This is also, in fact, an open question about the invokedynamic active invalidation; when I and others brought it up to John Rose, he recognized that most dynamic calls would still need to have "passive" invalidation guards even after linking. The original call site could still be actively invalidated, but without introducing locks at call sites (sure performance death) there's no simple way to avoid "just in case" passive guards as well.

There is a way for us to do active code replacement without modifying on-stack code: lift replaceable segments to virtual calls.

If we assume there's a way to coarsen our memory synchronization without visibly impacting cross-thread event propagation, then keeping a list of call site *references* might enable active replacement. This is similar to how the Microsoft DLR optimizes call paths. The general idea would be that calls pass through a generated shim, one that initially just does a slow and safe dynamic call. In response to repeated calls with the same receiver and target method, we could replace these sites with newly-generated stubs that include a fast (potentially *static-calling* path and a type guard leading to the slow, safe version. As additional types come in, the entire stub would be replaced, either settling on a handful of fast paths or eventually going "full dynamic" and reverting back to the safest, slowest version.

The synchronization effects would be mitigated by always having call site replacement/stub generation include a time-sensitive token as we do today, but that token would only need to be checked when updating the call site...not for every call. Active flushing would succeed in replacing the call site only if while buiding the stub another thread had not built a newer one, in which case the flush would be abandoned for the newer call site version. The delay in propagating the updated call site reference would be subject only to Java memory model/memory synchronization requirements.

In order to mitigate the problem of a call site going polymorphic and Hotspot deoptimizing the code that contains it, active invalidations could optionally also re-compile that containing code. This would be a rarer case subject to profiling, since updating that code would necessarily trigger another invalidation event, which would cause other methods to recompile, and so on. This sounds like on-stack replacement would be necessary, since ideally we would want to be able to replace existing bodies of code. But the cost here is not in having invalid code bodies remain on the stack...the only cost is having poorly optimized bodies remain on the stack. On stack replacement would help us make reoptimization propagate system-wide more quickly, but it is *not necessary to guarantee correctness*.

There's a lot of "ifs" here though, especially around coarsening memory synchronization. We may want to get together to whiteboard some of this out.

- Charlie


---------------------------------------------------------------------
To unsubscribe from this list, please visit:

   http://xircles.codehaus.org/manage_email


Reply via email to