On Tue, 2005-09-20 at 08:40 -0400, Geir Magnusson Jr. wrote:
On Sep 20, 2005, at 12:26 AM, Robin Garner wrote:
> I think it's important not to take Tim's point about performance too
> lightly here. There are some key interfaces between components that
> can't afford the overhead of a function call, let alone an indirect
> call via a function pointer.
>
> Three instances that come to mind are:
> - Allocation. For best performance, the common case of a new() needs
> to be inlined directly into the compiled code.
> - Write barrier (and read barrier if we need one). The common case
> of a write barrier should be a handful of instructions.
> - Yield points. Again should inline down to a couple of instructions.
> I'd be interested in any approaches you may have thought of for these
> interfaces.
Are these things the components would worry about, or can an
optimizer deal with these "later" if possible?
I believe these are so performance critical that a design needs to be in place
up front. If Harmony is to be competitive in performance terms with the
current production VMs, we need to make sure that allocation, barriers, locking
etc are as fast as possible. A design that adds an
extra memory reference to one of these operations would not be competitive.
Now, I'm really just making this up as I go along... what some people
call "thinking out loud", but I won't grace this with the term
"thinking". I've never written or been inside VM internals, so I'm
inventing out of whole cloth here...
In earlier discussions of componentization, I kept imagining a model
where we have a defined set of capabilities that a component could
optionally implement. There would be a required set, as well as
optional ones. (This thinking is inspired by the old MSFT COM
infrastructure...)
In the case of a memory manager, a required one would be the
"interface" containing the function pointers for a memory management.
An optional one would be "NativeInlineConversion" or something, where
an optimizer could find a new() and if it doesn't support the native
inlineing, use the function call into the component, and if it does,
ask the component for the bit of complied code to use.
I think this is probably do-able, although for some things it would be
reasonable to require the component to provide the inlineable code. The
'native code' passed back would have to be one level of compiler IR (internal
representation). Unfortunately this has the side effect of
making the compiler's IR public, breaking the modularity of the compiler(s) and
to an extent distributing chunks of the compiler into other components. This
is much less of a problem when doing Java in Java, but that's another thread.
Actually, the more I think about this, the less I see the value in designing a
component structure that allows run-time configurability. I believe
compile-time configurability is the thing to aim for. Achieving good levels of
abstraction without sacrificing performance is a difficult enough job at
compile time in any case.
Consider for example the design of the object model, and say that you want to
support:
1) A real-time oriented incremental garbage collector
2) A mark-sweep collector for memory constrained environments
3) A generational reference counting collector as the general purpose
high performance configuration.
Collector 1) (if say you used a Brooks-style barrier) requires an additional
word in the object header, to store a forwarding pointer for all objects, alive
and dead. 2) requires only 1 bit in the object header (for a mark bit) and
could conceivably steal a low-order bit from the class pointer (TIB in JikesRVM
terms). 3) requires an extra word (or at least several extra bits) in mature
objects, and gets better performance if nursery objects don't need that extra
word.
The object header also needs to store metadata associated with locking, address-based hashing and the TIB (pointer to per-class information).
There is a complex tradeoff between size and encoding of information in the
header word (see, for example,
http://www.research.ibm.com/people/d/dgrove/papers/ecoop02.html), and since
different layouts are possible depending on the different implementations
selected.
Much of the code in the runtime system needs to know how the object headers are
laid out, and to access the metadata critical to them very rapidly. For
example, a thin lock takes 5-10 instructions to acquire (in the frequent case).
Adding (for example) a table lookup to find where in the object header the
system had dynamically encoded the lock field would be disastrous for locking
performance.
So at least for modules that are critical to the performance of compiled code,
I believe runtime configurability will be dogged with performance problems.
Compilers, classloaders etc - fine, but the core runtime components - no.
cheers
Robin