On Tue, 2005-09-20 at 08:40 -0400, Geir Magnusson Jr. wrote:

On Sep 20, 2005, at 12:26 AM, Robin Garner wrote:

> I think it's important not to take Tim's point about performance too
> lightly here.  There are some key interfaces between components that
> can't afford the overhead of a function call, let alone an indirect
> call via a function pointer.
>
> Three instances that come to mind are:
> - Allocation.  For best performance, the common case of a new() needs
>   to be inlined directly into the compiled code.
> - Write barrier (and read barrier if we need one).  The common case
>   of a write barrier should be a handful of instructions.
> - Yield points.  Again should inline down to a couple of instructions.

> I'd be interested in any approaches you may have thought of for these
> interfaces.

Are these things the components would worry about, or can an optimizer deal with these "later" if possible?

I believe these are so performance critical that a design needs to be in place 
up front.  If Harmony is to be competitive in performance terms with the 
current production VMs, we need to make sure that allocation, barriers, locking 
etc are as fast as possible.  A design that adds an
extra memory reference to one of these operations would not be competitive.


Now, I'm really just making this up as I go along... what some people call "thinking out loud", but I won't grace this with the term "thinking". I've never written or been inside VM internals, so I'm inventing out of whole cloth here...

In earlier discussions of componentization, I kept imagining a model where we have a defined set of capabilities that a component could optionally implement. There would be a required set, as well as optional ones. (This thinking is inspired by the old MSFT COM infrastructure...)

In the case of a memory manager, a required one would be the "interface" containing the function pointers for a memory management.

An optional one would be "NativeInlineConversion" or something, where an optimizer could find a new() and if it doesn't support the native inlineing, use the function call into the component, and if it does, ask the component for the bit of complied code to use.

I think this is probably do-able, although for some things it would be 
reasonable to require the component to provide the inlineable code.  The 
'native code' passed back would have to be one level of compiler IR (internal 
representation).  Unfortunately this has the side effect of
making the compiler's IR public, breaking the modularity of the compiler(s) and 
to an extent distributing chunks of the compiler into other components.  This 
is much less of a problem when doing Java in Java, but that's another thread.


Actually, the more I think about this, the less I see the value in designing a 
component structure that allows run-time configurability.  I believe 
compile-time configurability is the thing to aim for.  Achieving good levels of 
abstraction without sacrificing performance is a difficult enough job at 
compile time in any case.

Consider for example the design of the object model, and say that you want to 
support:
1) A real-time oriented incremental garbage collector
2) A mark-sweep collector for memory constrained environments
3) A generational reference counting collector as the general purpose
high performance configuration.

Collector 1) (if say you used a Brooks-style barrier) requires an additional 
word in the object header, to store a forwarding pointer for all objects, alive 
and dead.  2) requires only 1 bit in the object header (for a mark bit) and 
could conceivably steal a low-order bit from the class pointer (TIB in JikesRVM 
terms).  3) requires an extra word (or at least several extra bits) in mature 
objects, and gets better performance if nursery objects don't need that extra 
word.

The object header also needs to store metadata associated with locking, address-based hashing and the TIB (pointer to per-class information).
There is a complex tradeoff between size and encoding of information in the 
header word (see, for example, 
http://www.research.ibm.com/people/d/dgrove/papers/ecoop02.html), and since 
different layouts are possible depending on the different implementations 
selected.

Much of the code in the runtime system needs to know how the object headers are 
laid out, and to access the metadata critical to them very rapidly.  For 
example, a thin lock takes 5-10 instructions to acquire (in the frequent case). 
 Adding (for example) a table lookup to find where in the object header the 
system had dynamically encoded the lock field would be disastrous for locking 
performance.

So at least for modules that are critical to the performance of compiled code, 
I believe runtime configurability will be dogged with performance problems.  
Compilers, classloaders etc - fine, but the core runtime components - no.

cheers
Robin


Reply via email to