Re: [arch] VMCore / Component Model

2005-09-22 Thread Robin Garner

On Tue, 2005-09-20 at 08:40 -0400, Geir Magnusson Jr. wrote:


On Sep 20, 2005, at 12:26 AM, Robin Garner wrote:

 


 I think it's important not to take Tim's point about performance too
 lightly here.  There are some key interfaces between components that
 can't afford the overhead of a function call, let alone an indirect
 call via a function pointer.

 Three instances that come to mind are:
 - Allocation.  For best performance, the common case of a new() needs
   to be inlined directly into the compiled code.
 - Write barrier (and read barrier if we need one).  The common case
   of a write barrier should be a handful of instructions.
 - Yield points.  Again should inline down to a couple of instructions.
   



 


 I'd be interested in any approaches you may have thought of for these
 interfaces.
   



Are these things the components would worry about, or can an  
optimizer deal with these later if possible?
 



I believe these are so performance critical that a design needs to be in place 
up front.  If Harmony is to be competitive in performance terms with the 
current production VMs, we need to make sure that allocation, barriers, locking 
etc are as fast as possible.  A design that adds an
extra memory reference to one of these operations would not be competitive.


Now, I'm really just making this up as I go along... what some people  
call thinking out loud, but I won't grace this with the term  
thinking.  I've never written or been inside VM internals, so I'm  
inventing out of whole cloth here...


In earlier discussions of componentization, I kept imagining a model  
where we have a defined set of capabilities that a component could  
optionally implement.  There would be a required set, as well as  
optional ones.  (This thinking is inspired by the old MSFT COM  
infrastructure...)


In the case of a memory manager, a required one would be the  
interface containing the function pointers for a memory management.


An optional one would be NativeInlineConversion or something, where  
an optimizer could find a new() and if it doesn't support the native  
inlineing, use the function call into the component, and if it does,  
ask the component for the bit of complied code to use.
 



I think this is probably do-able, although for some things it would be 
reasonable to require the component to provide the inlineable code.  The 
'native code' passed back would have to be one level of compiler IR (internal 
representation).  Unfortunately this has the side effect of
making the compiler's IR public, breaking the modularity of the compiler(s) and 
to an extent distributing chunks of the compiler into other components.  This 
is much less of a problem when doing Java in Java, but that's another thread.


Actually, the more I think about this, the less I see the value in designing a 
component structure that allows run-time configurability.  I believe 
compile-time configurability is the thing to aim for.  Achieving good levels of 
abstraction without sacrificing performance is a difficult enough job at 
compile time in any case.

Consider for example the design of the object model, and say that you want to 
support:
1) A real-time oriented incremental garbage collector
2) A mark-sweep collector for memory constrained environments
3) A generational reference counting collector as the general purpose
high performance configuration.

Collector 1) (if say you used a Brooks-style barrier) requires an additional 
word in the object header, to store a forwarding pointer for all objects, alive 
and dead.  2) requires only 1 bit in the object header (for a mark bit) and 
could conceivably steal a low-order bit from the class pointer (TIB in JikesRVM 
terms).  3) requires an extra word (or at least several extra bits) in mature 
objects, and gets better performance if nursery objects don't need that extra 
word.

The object header also needs to store metadata associated with locking, address-based hashing and the TIB (pointer to per-class information).  


There is a complex tradeoff between size and encoding of information in the 
header word (see, for example, 
http://www.research.ibm.com/people/d/dgrove/papers/ecoop02.html), and since 
different layouts are possible depending on the different implementations 
selected.

Much of the code in the runtime system needs to know how the object headers are 
laid out, and to access the metadata critical to them very rapidly.  For 
example, a thin lock takes 5-10 instructions to acquire (in the frequent case). 
 Adding (for example) a table lookup to find where in the object header the 
system had dynamically encoded the lock field would be disastrous for locking 
performance.

So at least for modules that are critical to the performance of compiled code, 
I believe runtime configurability will be dogged with performance problems.  
Compilers, classloaders etc - fine, but the core runtime components - no.

cheers
Robin




Re: [arch] Interpreter vs. JIT for Harmony VM

2005-09-22 Thread Geir Magnusson Jr.


On Sep 21, 2005, at 11:11 AM, Tom Tromey wrote:


Geir == Geir Magnusson [EMAIL PROTECTED] writes:





On the other hand, a fast code-generating JIT can call runtime
helpers and native methods without additional glue code whereas an
interpreter has to have special glue code to make it work in a JIT
environment.



Geir I believe you, but I don't understand this.  Can you explain  
in more

Geir detail?

It is about handling calling conventions.



[SNIP]

Thanks





Our experience is that a fast, zero optimizing JIT can yield low-
enough response time. So, I think at least Harmony has the option
of having a decent system without an interpreter. Thoughts?



Geir Basic thought is yes, I always figured we'd have this  
pluggable, with
Geir an interpreter for ease of porting, and then platform- 
specific JIT.


It seems to me that there's a design question here.  For instance, if
you want to eventually take interpreted code and compile it (when it
is hot), for full pluggability your JIT(s) and your interpreter need
to agree on some set of bookkeeping details in order to make this
possible.  OTOH, you could make other decisions that make this problem
go away, for instance having a single choice of execution engine up
front; so the fast JIT and the optimizing JIT are just part of the
same code base and only need to talk to each other, and can be built
in an ad hoc way.


Personally I'd be just as happy if we only had a JIT.  There are
already plenty of interpreters out there.


But I would think that we'd want both, right?   An interpreter that  
builds on anything to ensure wide platform portability, with the the  
ability to augment with a JIT for those platforms for which people  
are interested in creating a JIT...


geir



Tom




--
Geir Magnusson Jr  +1-203-665-6437
[EMAIL PROTECTED]




Re: [Arch] Suggestion to prioritize JVMTI over JVMPI and JVMDI

2005-09-22 Thread Geir Magnusson Jr.


On Sep 21, 2005, at 7:55 PM, Elford, Chris L wrote:


Hi all,

JVMTI is on track to replace the older JVMPI and JVMDI interfaces.  
J2SE

5 supports JVMTI and JVMPI/JVMDI but future followons to J2SE are
expected to remove support for the older interfaces. Tools vendors  
seem

to be in the process of transitioning to the JVMTI interface. It does
not really makes sense to invest too much effort in the Harmony  
project

supporting the JVMPI interface. It would be much more effective to
invest the effort making the JVMTI implementation more complete so  
that

it includes more of the optional functionality of JVMTI.

I suggest that we concentrate our debug/tools interface work in  
Harmony

to making JVMTI work really well and let JVMPI and JVMDI fall away.


My knowledge of specific implementation notwithstanding :) I think it  
makes sense to prioritize so in the event timing works out that we  
decide to just jump ahead to J2SE 6 rather than implement J2SE 5, we  
can ignore them (assuming they are dropped.)


However, if we do J2SE 5, as is the current goal, we would have to  
implement them, right?


--
Geir Magnusson Jr  +1-203-665-6437
[EMAIL PROTECTED]




Re: [arch] Interpreter vs. JIT for Harmony VM

2005-09-22 Thread Geir Magnusson Jr.


On Sep 21, 2005, at 11:38 PM, Frederick C Druseikis wrote:



Hello list,
Long time reader, first time writer.


Welcome



On Wed, 21 Sep 2005 14:30:10 -0300
Rodrigo Kumpera [EMAIL PROTECTED] wrote:



Having a mixed JITed-interpreted enviroment makes things harder.
Writing a baseline, single pass JITer is easy, but there are A LOT
more stuff to make a port that just the code execution part.



Agreed.
I'd like to amplify on that point about the more stuff.  I think my  
conclusion is
that if you try to live without it you'll end up creating the guts  
of the interpreter anyway.


[SNIP]



My sense is that a small interpreter with a pluggable JIT is a  
pragmatic approach.
Which means to me that the central question is one of what is the  
interface relationship
between the Interpreter and the JIT? What part of the interpreter  
makes the decision about what methods should be JITed?  I see the  
interpreter collecting data, some pluggable interface
making policy decisions about what to JIT, and one of them calling  
the JITer when it's time.


Yes - this is what I was trying to get at, and will note additionally  
that a clean portable interpreter helps us with the portability goal.


So, are there examples out there of what this interface may look  
like?  (Maybe a topic for a new thread on [arch] Interpreter/JIT  
interface rather than vs... )


geir

--
Geir Magnusson Jr  +1-203-665-6437
[EMAIL PROTECTED]




Re: [arch] Interpreter vs. JIT for Harmony VM

2005-09-22 Thread Michael Hind
Santiago Gala [EMAIL PROTECTED] wrote:

 Tomcat+Jetspeed runs (qualitatively) faster using an Optimized JikesRVM
 +classpath version in my TiBook than using IBM-jdk-1.4.2, but it
 requires 200 M heap, while IBM jdk runs it in 100 Megs.

Hi Santiago, glad to hear you are exploring this path.

One word of caution when comparing heapsizes of Jikes RVM and another JVM. 
 Since Jikes RVM is written in Java, all of the VM data and code is part 
of the same heap as the application, i.e., part of the 200MB you mention 
above.  In a C/C++ implementation this VM data/code would be part of the C 
heap, i.e., not part of the 100MB heap above. 

Thus, this comparisons really aren't apples-to-apples.

Mike
-
Michael Hind, Manager, Dynamic Optimization Group 
IBM Watson Research Center
http://www.research.ibm.com/people/h/hind



Re: [arch] Interpreter vs. JIT for Harmony VM

2005-09-22 Thread will pugh



El mi??, 21-09-2005 a las 08:29 -0700, will pugh escribi??:

I think having a FastJIT and forgoing the interpreter is a pretty 
elegant solution, however, there are a few things that may come out 
of this:


 1)  Implementing JVMTI will probabaly be more difficult than doing 
a straight interpreter
 2)  The FastJIT needs to be Fast!  Otherwise, you run the risk of 
people not wanting to use it for IDEs and Apps because the startup 
time is too slow.



I would have thought that implementing JVMTI for SlowJIT-ted code 
would have been about as difficult as for the FastJIT-ted code ?  Or 
are we to assume that tools will only be used at the lowest level of 
optimization ?


This is a good point.  I might have been a bit overly broad.  There are 
a lot of peices of JVMTI, and not all of them are affected by the 
compile vs. interpret decision, and then furthermore, given that you are 
compiling the src, some of them are not affected by the no-optimization 
vs. heavy optimization.


The cool thing about JVMTI is that it works by the agents asking for 
capabilities that you can then use for determining in the VM what kinds 
of optimizations to do.


I think that for a first release, getting the Debugging portions of 
JVMTI working well (and hopefully early) are important.  This is for a 
few reasons:
   1)  Potential users are going to feel more comfortable working on a 
rough VM that they can debug their problems on rather than one they can't
   2)  I suspsect we would get better bug reports out of it, because 
users can look deeper into their bugs
   3)  I suspect that there are a few special objects we could add in 
that could be accessed via a debugger to give more internal VM state 
information to either Harmony Developers or Harmony Users.  You could 
imagine a HarmonyVM object that can dump or report vm state that is 
useful in debugging VM problems.

  HarmonyVm.showReferencesToObject(fooObject)

That being said, I've grouped the JVMTI capabilities into groups of 
which I think are orthagonal to the issue, ones that would be 
significantly easier on an interpreter vs. compiled code, and then 
further into what I think would be more difficult in optimized code vs. 
non-optimized.


This is a sort of off the top of my head analysis.  Some things are in 
both lists because they are easier to do in a compiled world than an 
uncompiled world, but can also get really hard when trying to do them in 
a highly-optimized world.  If I have more time, today, I can try writing 
up notes for how I think each of these would be implemented and folks 
can tell me if they think I'm making some of them harder than need be.


EASIER IN INTERPRETER THAN COMPILED:

can_generate_field_modification_events
can_generate_field_access_events
can_pop_frame
can_signal_thread
can_generate_single_step_events
can_generate_exception_events
can_redefine_any_class
can_generate_breakpoint_events



EASIER IN NON_OPTIMIZED CODE THAN OPTIMIZED:

can_generate_field_modification_events
can_generate_field_access_events
can_access_local_variables
can_pop_frame
can_redefine_classes
can_get_line_numbers
can_access_local_variables
can_generate_frame_pop_events
can_generate_method_entry_events
can_generate_method_exit_events
can_generate_breakpoint_events



ORTHAGONAL

can_tag_objects
can_get_bytecodes
can_get_synthetic_attribute
can_get_owned_monitor_info
can_get_current_contended_monitor
can_get_monitor_info
can_get_source_file_name
can_signal_thread
can_get_source_debug_extension
can_maintain_original_method_order
can_suspend
can_get_current_thread_cpu_time
can_get_thread_cpu_time
can_generate_all_class_hook_events
can_generate_compiled_method_load_events
can_generate_monitor_events
can_generate_vm_object_alloc_events
can_generate_native_method_bind_events
can_generate_garbage_collection_events
can_generate_object_free_events