Re: Harmony Project Structure Attempt

Steve Blackburn Tue, 17 May 2005 20:08:12 -0700

I appeal to everyone to get the facts on the Java-in-Java issue.

It is an important issue and despite a number of comprehensive posts to the list from a variety of writers, people still perpetuate ideas which have been debunked on the list.

Listreader wrote:

A lot of people have expressed interest in a JVM written in Java providing performance is adequate. There has been substantial evidence present to show that it can be.
I simply collected what I felt was the opinion of the majority of posts
on the list thus far. Personally, I care little about the exact language
as long as it is relevant and optimal for writing the compiler.
Since Java lacks low-level memory management capabilities, and the JVM obviously needs to deal with these issues, I would be somewhat hesitant to write the full JVM in Java myself.

Please read through the earlier posts.

1. Java-in-Java VMs deal with low-level memory management through a small set of type safe extensions, supported by the compiler, which allow typed access to memory in a variety of ways.

http://jikesrvm.sourceforge.net/api/org/vmmagic/unboxed/package-summary.html

2. Having written an extensive high performance memory management toolkit, which includes a wide range of GC algorithms, and which can outperform the standard glibc malloc(), I can assure you (as I and others have stated in previous posts), that a Java-in-Java VM is not encumbered by any such limitations.

3. The Java-in-Java VM has some performance *advantages* through the lack of impedence mismatch between the supported language and the implementation language (this is one of the reasons for 2. above).

4. There are a number of successful instances of this technology, including OVM, Jikes RVM, and Bartok (a C# in C# VM at MSR which can match the MS product VM on some benchmarks, despite having vastly less resoures applied to it).

If the community decides that it would be more helpful to build the VM in C++ that's fine. I for one hope that the component-based architecture will support a variety of implementation languages.

But lets not make such a choice from a position of ignorance.

--Steve

Hi Dmitry,
<constructive_interest>
[...]
First one is of the chicken-vs-egg variety -- as the GC algorithm written in Java executes, won't it generate garbage of its own, and won't it then need to be stopped and the tiny little "real" garbage collector run to clean up after it? I can only see two alternatives -- either it is going to cleanup its own garbage, which would be truly fantastacal... Or it will somehow not generate any garbage, which I think is not realistic for a Java program...
This is a very important issue.
The short answer is as follows:
   a) Within the GC code itself we don't really use Java, we use
      a special subset of Java and a few extensions.
   b) We never call new() within the GC at runtime
   c) We try not to collect ourselves
You will find the long answer buried in the source code and a somewhat out of date paper:

http://cvs.sourceforge.net/viewcvs.py/jikesrvm/MMTk/ http://jikesrvm.sourceforge.net/api/org/vmmagic/unboxed/package-summary.html

http://jikesrvm.sourceforge.net/api/org/vmmagic/pragma/package-summary.html
http://cs.anu.edu.au/~Steve.Blackburn/pubs/abstracts.html#mmtk-icse-2004
I'll try to give a more succinct answer here:
As for a), we essentially apply a few design patterns and idioms for correctness and performance (more on performance later). We don't use patterns that depend on allocating instances. In fact the only instances we create are per-thread metadata instances which drive the GC. These are allocated only when new threads are instantiated (actually these are per posix thread, Jikes RVM uses an N-M threading model).

As for b), there is not much call for dynamic memory management within a GC. The exceptions are a) short-lived metadata such as work queues, and b) per-object metadata such as that associated with free lists and mark bits etc etc. We solve this by explicitly managing these special cases from within our own framework. We have a queue mechanism that works off raw memory and a mechanism for associating metadata with allocated space. The details are beyond the scope of this email.

Actually c) is one of the hardest parts. It is essential that heap objects associated with the VM and the GC are not inadvertently collected. This requires some very careful thought (remembering that the compiler will place our *code* into the heap too!).

As to whether this is feasible, its been done at least three times over. First in the original Jalapeno, then in GCTk (developed while I was at UMass) and now MMTk. Right now I am working with my students here to push the MMTk design even cleaner while not sacrificing performance---fun!

So, can it perform? Well it is very hard to do apples to apples comparisons, but we measure the performance of our raw mechanisms with C implementations as a milestone and we do very well (by this I mean we can beat glibc's malloc for allocation performance, but this claim needs to be covered with caveats because it is very hard to make fair comparisons). So the raw mechanisms perform well. But then the software engineering benefits of Java come to the fore and our capacity to implement a toolkit and thus have a choice of many different GC algorithms gives us a real advantage (the GC mechanism/algorithm thing was the subject of a previous thread).

I've glossed over a huge amount of important stuff (like how we get raw memory from the OS, how we introduce type safe pointers and object references, etc etc).

To summarize (and to get to the question already) - the point is that language shapes thought. In other words, a program designed in Java will naturally tend to be slower then a program designed in C, simply because Java most naturally expresses slower designs then C does. And the question is - does this agree with anyone elses experiences? And if it does, is it a valid argument against using Java for the design and implementation of the VM?
OK so there is already at least one response to this, but let me add my experience.

I am very focused on performance. The approach Perry Cheng and I took when writing the code for MMTk was very much that premature optimization is indeed the root of all evil. Moreover, we placed enormous faith in the optimizing compiler. The philosophy was to assume the optimizing compiler was smart enough to optimize around our coding abstractions, and then to do careful performance analysis after the fact and see where we were being let down. In some cases the compiler was improved to deal with our approach, other times we modified our approach.

Over time we learned certain idioms which on one hand meant we tended to get reasonable performance first shot, but on the other may have undermined the natural Java style we started with.

While I understand what you mean when you say: "a program designed in Java will naturally tend to be slower then a program designed in C", addressing that concern is one of the most important challenges of language implementation, and is why Java performance has improved so greatly over the past five years.

</constructive_interest>
Cheers,
--Steve

Re: Harmony Project Structure Attempt

Reply via email to