Re: [arch] VMCore / Component Model
On Tue, 2005-09-20 at 08:40 -0400, Geir Magnusson Jr. wrote: On Sep 20, 2005, at 12:26 AM, Robin Garner wrote: I think it's important not to take Tim's point about performance too lightly here. There are some key interfaces between components that can't afford the overhead of a function call, let alone an indirect call via a function pointer. Three instances that come to mind are: - Allocation. For best performance, the common case of a new() needs to be inlined directly into the compiled code. - Write barrier (and read barrier if we need one). The common case of a write barrier should be a handful of instructions. - Yield points. Again should inline down to a couple of instructions. I'd be interested in any approaches you may have thought of for these interfaces. Are these things the components would worry about, or can an optimizer deal with these later if possible? I believe these are so performance critical that a design needs to be in place up front. If Harmony is to be competitive in performance terms with the current production VMs, we need to make sure that allocation, barriers, locking etc are as fast as possible. A design that adds an extra memory reference to one of these operations would not be competitive. Now, I'm really just making this up as I go along... what some people call thinking out loud, but I won't grace this with the term thinking. I've never written or been inside VM internals, so I'm inventing out of whole cloth here... In earlier discussions of componentization, I kept imagining a model where we have a defined set of capabilities that a component could optionally implement. There would be a required set, as well as optional ones. (This thinking is inspired by the old MSFT COM infrastructure...) In the case of a memory manager, a required one would be the interface containing the function pointers for a memory management. An optional one would be NativeInlineConversion or something, where an optimizer could find a new() and if it doesn't support the native inlineing, use the function call into the component, and if it does, ask the component for the bit of complied code to use. I think this is probably do-able, although for some things it would be reasonable to require the component to provide the inlineable code. The 'native code' passed back would have to be one level of compiler IR (internal representation). Unfortunately this has the side effect of making the compiler's IR public, breaking the modularity of the compiler(s) and to an extent distributing chunks of the compiler into other components. This is much less of a problem when doing Java in Java, but that's another thread. Actually, the more I think about this, the less I see the value in designing a component structure that allows run-time configurability. I believe compile-time configurability is the thing to aim for. Achieving good levels of abstraction without sacrificing performance is a difficult enough job at compile time in any case. Consider for example the design of the object model, and say that you want to support: 1) A real-time oriented incremental garbage collector 2) A mark-sweep collector for memory constrained environments 3) A generational reference counting collector as the general purpose high performance configuration. Collector 1) (if say you used a Brooks-style barrier) requires an additional word in the object header, to store a forwarding pointer for all objects, alive and dead. 2) requires only 1 bit in the object header (for a mark bit) and could conceivably steal a low-order bit from the class pointer (TIB in JikesRVM terms). 3) requires an extra word (or at least several extra bits) in mature objects, and gets better performance if nursery objects don't need that extra word. The object header also needs to store metadata associated with locking, address-based hashing and the TIB (pointer to per-class information). There is a complex tradeoff between size and encoding of information in the header word (see, for example, http://www.research.ibm.com/people/d/dgrove/papers/ecoop02.html), and since different layouts are possible depending on the different implementations selected. Much of the code in the runtime system needs to know how the object headers are laid out, and to access the metadata critical to them very rapidly. For example, a thin lock takes 5-10 instructions to acquire (in the frequent case). Adding (for example) a table lookup to find where in the object header the system had dynamically encoded the lock field would be disastrous for locking performance. So at least for modules that are critical to the performance of compiled code, I believe runtime configurability will be dogged with performance problems. Compilers, classloaders etc - fine, but the core runtime components - no. cheers Robin
Re: [arch] Interpreter vs. JIT for Harmony VM
On Sep 21, 2005, at 11:11 AM, Tom Tromey wrote: Geir == Geir Magnusson [EMAIL PROTECTED] writes: On the other hand, a fast code-generating JIT can call runtime helpers and native methods without additional glue code whereas an interpreter has to have special glue code to make it work in a JIT environment. Geir I believe you, but I don't understand this. Can you explain in more Geir detail? It is about handling calling conventions. [SNIP] Thanks Our experience is that a fast, zero optimizing JIT can yield low- enough response time. So, I think at least Harmony has the option of having a decent system without an interpreter. Thoughts? Geir Basic thought is yes, I always figured we'd have this pluggable, with Geir an interpreter for ease of porting, and then platform- specific JIT. It seems to me that there's a design question here. For instance, if you want to eventually take interpreted code and compile it (when it is hot), for full pluggability your JIT(s) and your interpreter need to agree on some set of bookkeeping details in order to make this possible. OTOH, you could make other decisions that make this problem go away, for instance having a single choice of execution engine up front; so the fast JIT and the optimizing JIT are just part of the same code base and only need to talk to each other, and can be built in an ad hoc way. Personally I'd be just as happy if we only had a JIT. There are already plenty of interpreters out there. But I would think that we'd want both, right? An interpreter that builds on anything to ensure wide platform portability, with the the ability to augment with a JIT for those platforms for which people are interested in creating a JIT... geir Tom -- Geir Magnusson Jr +1-203-665-6437 [EMAIL PROTECTED]
Re: [Arch] Suggestion to prioritize JVMTI over JVMPI and JVMDI
On Sep 21, 2005, at 7:55 PM, Elford, Chris L wrote: Hi all, JVMTI is on track to replace the older JVMPI and JVMDI interfaces. J2SE 5 supports JVMTI and JVMPI/JVMDI but future followons to J2SE are expected to remove support for the older interfaces. Tools vendors seem to be in the process of transitioning to the JVMTI interface. It does not really makes sense to invest too much effort in the Harmony project supporting the JVMPI interface. It would be much more effective to invest the effort making the JVMTI implementation more complete so that it includes more of the optional functionality of JVMTI. I suggest that we concentrate our debug/tools interface work in Harmony to making JVMTI work really well and let JVMPI and JVMDI fall away. My knowledge of specific implementation notwithstanding :) I think it makes sense to prioritize so in the event timing works out that we decide to just jump ahead to J2SE 6 rather than implement J2SE 5, we can ignore them (assuming they are dropped.) However, if we do J2SE 5, as is the current goal, we would have to implement them, right? -- Geir Magnusson Jr +1-203-665-6437 [EMAIL PROTECTED]
Re: [arch] Interpreter vs. JIT for Harmony VM
On Sep 21, 2005, at 11:38 PM, Frederick C Druseikis wrote: Hello list, Long time reader, first time writer. Welcome On Wed, 21 Sep 2005 14:30:10 -0300 Rodrigo Kumpera [EMAIL PROTECTED] wrote: Having a mixed JITed-interpreted enviroment makes things harder. Writing a baseline, single pass JITer is easy, but there are A LOT more stuff to make a port that just the code execution part. Agreed. I'd like to amplify on that point about the more stuff. I think my conclusion is that if you try to live without it you'll end up creating the guts of the interpreter anyway. [SNIP] My sense is that a small interpreter with a pluggable JIT is a pragmatic approach. Which means to me that the central question is one of what is the interface relationship between the Interpreter and the JIT? What part of the interpreter makes the decision about what methods should be JITed? I see the interpreter collecting data, some pluggable interface making policy decisions about what to JIT, and one of them calling the JITer when it's time. Yes - this is what I was trying to get at, and will note additionally that a clean portable interpreter helps us with the portability goal. So, are there examples out there of what this interface may look like? (Maybe a topic for a new thread on [arch] Interpreter/JIT interface rather than vs... ) geir -- Geir Magnusson Jr +1-203-665-6437 [EMAIL PROTECTED]
Re: [arch] Interpreter vs. JIT for Harmony VM
Santiago Gala [EMAIL PROTECTED] wrote: Tomcat+Jetspeed runs (qualitatively) faster using an Optimized JikesRVM +classpath version in my TiBook than using IBM-jdk-1.4.2, but it requires 200 M heap, while IBM jdk runs it in 100 Megs. Hi Santiago, glad to hear you are exploring this path. One word of caution when comparing heapsizes of Jikes RVM and another JVM. Since Jikes RVM is written in Java, all of the VM data and code is part of the same heap as the application, i.e., part of the 200MB you mention above. In a C/C++ implementation this VM data/code would be part of the C heap, i.e., not part of the 100MB heap above. Thus, this comparisons really aren't apples-to-apples. Mike - Michael Hind, Manager, Dynamic Optimization Group IBM Watson Research Center http://www.research.ibm.com/people/h/hind
Re: [arch] Interpreter vs. JIT for Harmony VM
El mi??, 21-09-2005 a las 08:29 -0700, will pugh escribi??: I think having a FastJIT and forgoing the interpreter is a pretty elegant solution, however, there are a few things that may come out of this: 1) Implementing JVMTI will probabaly be more difficult than doing a straight interpreter 2) The FastJIT needs to be Fast! Otherwise, you run the risk of people not wanting to use it for IDEs and Apps because the startup time is too slow. I would have thought that implementing JVMTI for SlowJIT-ted code would have been about as difficult as for the FastJIT-ted code ? Or are we to assume that tools will only be used at the lowest level of optimization ? This is a good point. I might have been a bit overly broad. There are a lot of peices of JVMTI, and not all of them are affected by the compile vs. interpret decision, and then furthermore, given that you are compiling the src, some of them are not affected by the no-optimization vs. heavy optimization. The cool thing about JVMTI is that it works by the agents asking for capabilities that you can then use for determining in the VM what kinds of optimizations to do. I think that for a first release, getting the Debugging portions of JVMTI working well (and hopefully early) are important. This is for a few reasons: 1) Potential users are going to feel more comfortable working on a rough VM that they can debug their problems on rather than one they can't 2) I suspsect we would get better bug reports out of it, because users can look deeper into their bugs 3) I suspect that there are a few special objects we could add in that could be accessed via a debugger to give more internal VM state information to either Harmony Developers or Harmony Users. You could imagine a HarmonyVM object that can dump or report vm state that is useful in debugging VM problems. HarmonyVm.showReferencesToObject(fooObject) That being said, I've grouped the JVMTI capabilities into groups of which I think are orthagonal to the issue, ones that would be significantly easier on an interpreter vs. compiled code, and then further into what I think would be more difficult in optimized code vs. non-optimized. This is a sort of off the top of my head analysis. Some things are in both lists because they are easier to do in a compiled world than an uncompiled world, but can also get really hard when trying to do them in a highly-optimized world. If I have more time, today, I can try writing up notes for how I think each of these would be implemented and folks can tell me if they think I'm making some of them harder than need be. EASIER IN INTERPRETER THAN COMPILED: can_generate_field_modification_events can_generate_field_access_events can_pop_frame can_signal_thread can_generate_single_step_events can_generate_exception_events can_redefine_any_class can_generate_breakpoint_events EASIER IN NON_OPTIMIZED CODE THAN OPTIMIZED: can_generate_field_modification_events can_generate_field_access_events can_access_local_variables can_pop_frame can_redefine_classes can_get_line_numbers can_access_local_variables can_generate_frame_pop_events can_generate_method_entry_events can_generate_method_exit_events can_generate_breakpoint_events ORTHAGONAL can_tag_objects can_get_bytecodes can_get_synthetic_attribute can_get_owned_monitor_info can_get_current_contended_monitor can_get_monitor_info can_get_source_file_name can_signal_thread can_get_source_debug_extension can_maintain_original_method_order can_suspend can_get_current_thread_cpu_time can_get_thread_cpu_time can_generate_all_class_hook_events can_generate_compiled_method_load_events can_generate_monitor_events can_generate_vm_object_alloc_events can_generate_native_method_bind_events can_generate_garbage_collection_events can_generate_object_free_events