Subramanya Sastry wrote:
I may have been wrong. I had a chance to think through this a little
bit more.
Consider this ruby code:
i = 5
v1 = i + 1
some_random_method_call()
v2 = i + 1
In this code snippet, the second '+' might not get optimized because
'some_random_method_call' could monkeypatch Fixnum.+ in the course of
whatever it does. This means that calls form hard walls beyond which
you cannot hoist method and class modification guards (unless of course,
you can precisely determine the set of methods any given call can modify
-- for example, that 'some_random_method_call' will not modify Fixnum.+).
This is certainly a challenge for entirely eliminating guards, and I
believe even the fastest Ruby implementations currently are unable to
remove those guards completely.
In this example, if we can at least prove that i, v1, and v2 will always
be Fixnum when Fixnum#+ has not been replaced, we can emit guarded
optimized versions alongside deoptimized versions. Because Fixnum#+
falls on the "rarely or never replaced" end of the spectrum, reducing +
operations to a simple type guard plus "Fixnum has been modified" guard
is probably the best we can do without code replacement. More on code
replacement thoughts later.
So, this is somewhat similar to pointer alias analysis in C programs.
In C programs, the ability to pass around pointers and manipulate them
in an unrestricted fashion effectively restricts the kind of
optimizations that a compiler can do. Similarly, in Ruby programs, it
seems to me that the ability for code to arbitrarily modify code
effectively restricts the kind of optimizations that a compiler can do.
Open classes and eval and all of that are to Ruby what pointers are to
C. Potentially powerful for the programmer, but hell for the compiler.
More on this further below after discussion of the multi-threaded scenario.
It is true that we generally must assume code modification events will
have a large impact on core methods. This means we need to always guard,
even if such modifications are exceedingly rare. However it may be
pessimistic to assume such guards will be unacceptably slow in
comparison to the rest of the system, when this may or may not be the
case. Currently, all method calls in normal execution scenarios must
read a volatile field for the call site invalidation guard. Although
it's difficult to measure accurately, the volatile guard should in
theory be a very large expense, and John Rose said as much when I
described our current system to him. But it may be possible to make such
guards non-volatile if we have other mechanisms for triggering memory
synchronization that those non-volatile calls could see.
One theory I had was that since we periodically ping another volatile
field for cross-thread events (kill, raise), there may be a way we can
rely on those events to trigger our memory sync. But I am unfamiliar
enough with the Java memory model that I do not know if such a
volatility-reducing mechanism would work or be reliable.
If thread 2 modifies Fixnum.+, you might expect the modification to be
reflected in thread 1 at some point. If you hoist method/class guards
all the way outside the loop and convert the i+1 to integer addition,
code modifications from thread 2 won't propagate to thread 1.
But, in this scenario, I would argue that any code that relies on
behavior like this is broken. This is effectively a race condition and
different ruby implementations will yield different behavior. In fact,
code modification by meta-programming is effectively modifying class
meta-data. So, concurrent code modification is not very different from
concurrent writes to program data. I am not sure if the meta
programming model treats modification to open classes as changes to a
program-visible meta-data. If it were, then programmer can then
synchronize on that meta-data object before patching code. But, I
expect this is not the case because that would then force every method
call to acquire a lock on the class meta-data.
Again there's some unknowns here, but the likelihood of method table
modifications happening across threads (or at least the likelihood of
those changes producing runtime behavioral effects across threads) is
probably very low. The majority of Ruby libraries and frameworks have
either been designed with single threads in mind--i.e. the expectation
is that a given thread would exit the library before a code modification
would happen, if it ever did--or have been designed with multi-threading
in mind--making code modifications at boot time or with an expectation
that only one thread would ever see them (like modifying a thread-local
object to add new methods or modules). Both cases have few expectations
about vigorous or even common cross-thread code modification events.
Where does this leave us? Consider a Rails app where there is a
UpdateCodeController with a method called load_new_code(C, m) which
basically updates running code with a new version of a method 'm' in
class 'C' (lets say a bug fix). This discussion of code modification
then boils down to asking the question: at what point does the new
method become available? In the absence of code modification
synchronization (on C's metadata object), the ruby implementation can
block on this request while continuing to run the old version of 'm'
till it is convenient to switch over to new code!
This is obviously a contrived example, but the point of this is: If
there is no mechanism for the programmer to force code modifications to
be visible in a ruby implementation, the ruby implementation has
flexibility over how long it turns a blind eye to code modifications in
a thread other than the currently executing thread. So, what you need
to worry about is figuring out the hard boundaries for method and class
guards assuming a single-threaded program and optimizing for that scenario.
Am I missing something?
No, I think this is right on. And to cement this even further: we are
blazing a very new trail when it comes to parallel-executing Ruby, which
means we may be able to *set* expectations for cross-thread
code-modification behavior. A user moving their library to JRuby will
then (as now) have to pay a bit more attention to how that library
behaves in a (probably) more formal and (definitely) better-specified
threading environment.
Now, back to the single-threaded scenario and the earlier example:
i = 5
v1 = i + 1
some_random_method_call()
v2 = i + 1
As discussed earlier, the problem here is determining the set of methods
that a call can modify, and this is somewhat similar to the C pointer
aliasing problem. That by itself is "not a big deal" because unlike C
pointers, code modification is much rarer. So, in the ruby
implementation, you could optimistically / speculatively optimize a ton
of code assuming NO code modifications and move the burden of
invalidation from call sites (which are everywhere) to the centralized
class meta-data structures i.e. you install listeners/traps/callbacks
(whatever you want to call them) on the class and when a method is
modified, you invalidate all optimized code that assumes that the method
is "static". This is non-trivial (will require on-stack replacement of
code), but at least conceivable. But, since you are compiling for the
JVM (or LLVM in the case of MacRuby), once you JIT, you effectively cede
control. You dont have a mechanism to invalidate JIT-ted code. So, at
this time, I am not sure if you can really get around this (extremely
pessimistic) constraint. Only ruby implementations that can control the
compilation stack all the way to machine code (or have hooks to
invalidate / modify existing code) might be able to get around this. I
am curious how MacRuby-LLVM combination is tackling this ...
As early as last fall, JRuby's call site invalidation was "active" as
you describe, and a list of call sites attached to a method object were
physically flushed when changes occurred in that method's "home"
hierarchy likely to render such caching invalid. This approach was
originally abandoned due to the complexity of ensuring multiple threads
reaching the same call site at the same time would not accidentally miss
an invalidation event and cache bad code forever. This is also, in fact,
an open question about the invokedynamic active invalidation; when I and
others brought it up to John Rose, he recognized that most dynamic calls
would still need to have "passive" invalidation guards even after
linking. The original call site could still be actively invalidated, but
without introducing locks at call sites (sure performance death) there's
no simple way to avoid "just in case" passive guards as well.
There is a way for us to do active code replacement without modifying
on-stack code: lift replaceable segments to virtual calls.
If we assume there's a way to coarsen our memory synchronization without
visibly impacting cross-thread event propagation, then keeping a list of
call site *references* might enable active replacement. This is similar
to how the Microsoft DLR optimizes call paths. The general idea would be
that calls pass through a generated shim, one that initially just does a
slow and safe dynamic call. In response to repeated calls with the same
receiver and target method, we could replace these sites with
newly-generated stubs that include a fast (potentially *static-calling*
path and a type guard leading to the slow, safe version. As additional
types come in, the entire stub would be replaced, either settling on a
handful of fast paths or eventually going "full dynamic" and reverting
back to the safest, slowest version.
The synchronization effects would be mitigated by always having call
site replacement/stub generation include a time-sensitive token as we do
today, but that token would only need to be checked when updating the
call site...not for every call. Active flushing would succeed in
replacing the call site only if while buiding the stub another thread
had not built a newer one, in which case the flush would be abandoned
for the newer call site version. The delay in propagating the updated
call site reference would be subject only to Java memory model/memory
synchronization requirements.
In order to mitigate the problem of a call site going polymorphic and
Hotspot deoptimizing the code that contains it, active invalidations
could optionally also re-compile that containing code. This would be a
rarer case subject to profiling, since updating that code would
necessarily trigger another invalidation event, which would cause other
methods to recompile, and so on. This sounds like on-stack replacement
would be necessary, since ideally we would want to be able to replace
existing bodies of code. But the cost here is not in having invalid code
bodies remain on the stack...the only cost is having poorly optimized
bodies remain on the stack. On stack replacement would help us make
reoptimization propagate system-wide more quickly, but it is *not
necessary to guarantee correctness*.
There's a lot of "ifs" here though, especially around coarsening memory
synchronization. We may want to get together to whiteboard some of this out.
- Charlie
---------------------------------------------------------------------
To unsubscribe from this list, please visit:
http://xircles.codehaus.org/manage_email