Hi Dmitry,

<constructive_interest>

[...]

First one is of the chicken-vs-egg variety -- as the GC algorithm written in Java executes, won't it generate garbage of its own, and won't it then need to be stopped and the tiny little "real" garbage collector run to clean up after it? I can only see two alternatives -- either it is going to cleanup its own garbage, which would be truly fantastacal... Or it will somehow not generate any garbage, which I think is not realistic for a Java program...

This is a very important issue.

The short answer is as follows:
   a) Within the GC code itself we don't really use Java, we use
      a special subset of Java and a few extensions.
   b) We never call new() within the GC at runtime
   c) We try not to collect ourselves ;-)

You will find the long answer buried in the source code and a somewhat out of date paper:

http://cvs.sourceforge.net/viewcvs.py/jikesrvm/MMTk/
http://jikesrvm.sourceforge.net/api/org/vmmagic/unboxed/package-summary.html
http://jikesrvm.sourceforge.net/api/org/vmmagic/pragma/package-summary.html
http://cs.anu.edu.au/~Steve.Blackburn/pubs/abstracts.html#mmtk-icse-2004

I'll try to give a more succinct answer here:

As for a), we essentially apply a few design patterns and idioms for correctness and performance (more on performance later). We don't use patterns that depend on allocating instances. In fact the only instances we create are per-thread metadata instances which drive the GC. These are allocated only when new threads are instantiated (actually these are per posix thread, Jikes RVM uses an N-M threading model).

As for b), there is not much call for dynamic memory management within a GC. The exceptions are a) short-lived metadata such as work queues, and b) per-object metadata such as that associated with free lists and mark bits etc etc. We solve this by explicitly managing these special cases from within our own framework. We have a queue mechanism that works off raw memory and a mechanism for associating metadata with allocated space. The details are beyond the scope of this email.

Actually c) is one of the hardest parts. It is essential that heap objects associated with the VM and the GC are not inadvertently collected. This requires some very careful thought (remembering that the compiler will place our *code* into the heap too!).

As to whether this is feasible, its been done at least three times over. First in the original Jalapeno, then in GCTk (developed while I was at UMass) and now MMTk. Right now I am working with my students here to push the MMTk design even cleaner while not sacrificing performance---fun!

So, can it perform? Well it is very hard to do apples to apples comparisons, but we measure the performance of our raw mechanisms with C implementations as a milestone and we do very well (by this I mean we can beat glibc's malloc for allocation performance, but this claim needs to be covered with caveats because it is very hard to make fair comparisons). So the raw mechanisms perform well. But then the software engineering benefits of Java come to the fore and our capacity to implement a toolkit and thus have a choice of many different GC algorithms gives us a real advantage (the GC mechanism/algorithm thing was the subject of a previous thread).

I've glossed over a huge amount of important stuff (like how we get raw memory from the OS, how we introduce type safe pointers and object references, etc etc).

To summarize (and to get to the question already) - the point is that language shapes thought. In other words, a program designed in Java will naturally tend to be slower then a program designed in C, simply because Java most naturally expresses slower designs then C does. And the question is - does this agree with anyone elses experiences? And if it does, is it a valid argument against using Java for the design and implementation of the VM?

OK so there is already at least one response to this, but let me add my experience.


I am very focused on performance. The approach Perry Cheng and I took when writing the code for MMTk was very much that premature optimization is indeed the root of all evil. Moreover, we placed enormous faith in the optimizing compiler. The philosophy was to assume the optimizing compiler was smart enough to optimize around our coding abstractions, and then to do careful performance analysis after the fact and see where we were being let down. In some cases the compiler was improved to deal with our approach, other times we modified our approach.

Over time we learned certain idioms which on one hand meant we tended to get reasonable performance first shot, but on the other may have undermined the natural Java style we started with.

While I understand what you mean when you say: "a program designed in Java will naturally tend to be slower then a program designed in C", addressing that concern is one of the most important challenges of language implementation, and is why Java performance has improved so greatly over the past five years.

</constructive_interest>


Cheers,

--Steve

Reply via email to