Current status
- Robin Garner's mmtk_20061016.zip snapshot of MMTk source now runs
the simple "user-level" MarkSweep.java test on DLRVM svn HEAD of
10/20/2006. The mods to MMTk porting layer to support the above have
been committed to DRLVM HEAD. The next steps for the MMTk/DRLVM port are as
follows. Comments, suggestions are welcome. It would be much
appreciated if Steve Blackburn and Robin Garner would reply to the questions
below directed to the "MMTk guys".
The following plan is roughly in the order this work should be done.
- Currentlly MMTk/DRLVM runs as a "user-level" app. That is, MMTk
port allocates a 450MB array from DRLVM's underlying GC. Currently
only GCV4.0 supports 450MB arrays. A vmmagic "Address" object is
created that points to the base of the 450MB array. MMTk is "booted"
with the "Address" of the 450MB object. A simple MMTK exerciser was
written in Java/vmmagic. It calls MMTk alloc() repeatedly. When MMTk
runs out of memory, it will GC its "user-level" heap (the 450MB array).
The underlying GCV4 is oblivious to what MMTK is doing with this 450MB
array.
- Project 1
- Move the existing MMTk port to GCV5. We need to remove the
dependency on GCV4.
- MMTk SemiSpace, GenMS and CopyMS collectors actually worked before
upgrading to latest the MMTk sources and current DRLVM svn HEAD.
- Project 2
- Get SemiSpace, GenMS and CopyMS collectors working again. Basically
this means running TestSemiSpace.java, TestGenMS.java and
TestCopyMS.java exerciser programs and debugging the problems. Known
issues that need fixing:
- Java write barrier in Jitrino.JET needs to be debugged
- MMTk needs the two LSBs of one of the object header
byte. This was somehow broken during the commit of
BBC.patch. It needs to be fixed. There is email on this
topic in harmony-dev.
- To reduce confusion, all of MMTk java source code needs to be
jitted before any MMTk methods are called.
- Project 3
- Modify DLRVM classloader to force all MMTk classes to be
loaded and all methods JITed before any classlib java code is executed.
- The next step is to integrate MMTk in the early DRLVM boot
process. The goal is to make sure all code the JIT generates
will allocate out of the MMTk heap. This is a "chicken and egg"
kind of problem since no JITed code can execute until DRLVM has
a GC that is
ready to support object allocation. Most likely we will use the
MMTk notion of "ImmortalSpace" for early object allocation. Objects
in ImmortalSpace are never collected, never moved. At this
stage of MMTk/DRLVM porting, the cost of dead uncollected objects wasting
ImmortalSpace memory is not a concern.
- The existing MMTk/DRLVM porting layer is in rough shape and is
incomplete. Now would be a good time to take another pass at refining
the porting layer. Below are notes on fixing the porting layer which
is located at:
drlvm/trunk/vm/MMTki/ext/vm/HarmonyDRLVM/org/apache/HarmonyDRLVM/mm/mmtk
- Project 4
- Assert class
- This class currently causes a Null Pointer Exception to
force a stack trace in each and every method. Unfortunately
this means MMTk always crashes even in situations where execution is
supposed to continue. Some of the APIs supposed to print
the stack trace then continue execution. This needs to be
fixed.
- Barriers class
- performWriteInBarrier() – MMTk expects to perform both
the write of an object slot as well as the write barrier.
The existing contract between Jitrino, VM and the GC
expects that the JIT will write on the object slot then
separately call the
write barrier api (I think). In any case, if the JIT is
still writing on the object slot, it needs to be changed to
let the MMTk
inlined support function to do it.
- performWriteInBarrierAtomic() – this needs to be
implemented (its not really needed until much later when
multithread support
is turned on.)
- setArrayNoBarrier() – need to write some simple vmmagic
code that will set an element of an array of objects without
triggering a
write barrier. This support is required by MMTk internal
functions to prevent endless write barrier recursion.
- Collection Class
- triggerCollection() method needs to be connected to the
Java API that forces a GC (this is low priority)
- prepareMutator() method probably needs to be integrated
with back-branch polling mechanism. Also need to confirm
the requirement that a thread suspend request does indeed
force the target
thread to be suspended at a GC safepoint. (MMTk guys, can
you confirm this?)
- prepareCollector() method – Its not clear MMTk/DRLVM
needs to do anything. (MMTk folks please comment on what the
VM is supposed
to do!)
- rendezvous() method current is "hacked" to support only
a single thread Java app. This needs to be fixed. Its
not critical until we need to support multithread GC apps.
- scheduleFinalizerThread() – do nothing at this stage (It
will need to be fixed when MMTk/DRLVM is capable of running
workloads that
need finalizers.)
- Lock class
- This looks complete. (Can MMTk folks take a look and
confirm?)
- Memory Class
- This looks compete except for the large 450MB byte array
that is allocated from existing DRLVM GC. This "hack"
will need to be removed (see below) to integrate MMTk into
early stage DRLVM
boot process. (Can MMTK folks confirm this analysis?)
- ObjectModel Class
- Interestingly, many methods are not called by any of the
initial GC algorithms targeted (MarkSweep, SemiSpace,
GenMS and CopyMS). These methods currently will execute a
"VM.assertions._assert(false);". The plan is to implement
these methods when the assert()s are hit. Most likely
this will happen when additional GC algorithms are tried.
- copy() implementation needs to be completed.
- getObjectType() returns an object of type MMTtype. Currently
there is a very simple cache of MMType objects. We need
to confirm this approach is functionally correct (MMTk guys, please
comment). Then determine if a simple cache is good enough
to bring up work loads such as Dacapo and SpecJBB. A
design issue that needs to be resolved – what part of the
MMTk heap should
MMType objects be allocated from? Maybe ImmortalSpace
(MMTk guys, is this correct?)
- Options class
- This prints out MMTk options and needs finishing
(low priority)
- ReferenceGlue class
- This manages SoftReference, WeakReference and
PhantomReference. Implementation of this class can
wait until advanced workloads require this support (probably 2007).
- Scanning class
- Most of the methods are never called by any of the
initial GC algorithms we are bringing up. (MMTk
guys, does this seem correct?)
- computeAllRoots() needs to be integrated with
DRLVM root set enumeration code. (NOTE: this might
actually impact DRLVM's JIT/VM/GC interface.)
- Selected {CollectorContext, MutatorContext, Plan,
PlanConstraints}
- This is a simple wrapper layer, it looks to be
completely implemented (MMTk guys, is this correct?)
- Statistics class
- Need to port this when performance becomes an
issue (probably 2007)
- Strings class
- The current implementation does a
System.out.println(). This works fine when the GC
has enough space to allocate objects as println() executes.
The corner case when GC runs out of space for object
new while attempting to println() GC diagnostics has not
been thought out.
Maybe the MMTk guys have advice on this one.
- SynchronizedCounter class
- Need to add critical sections to this code. (not
really needed until we bring up multithread GC apps)
- DRLVM modifications needed to support MMTk
- Need to figure out how to attach both CollectorContext
and MutatorContext objects to DRLVM internal java thread data
structure.
Also when the java thread exits, the CollectorContext and
MutatorContext reference pointers need to be set to NULL.
- GCSPY – this "should just work". Its probably best to wait
until after we go multithread to try to bring up GCSPY.
- Project 5
- Debug and verify JIT support for MMTk's "Uninterruptible"
class. This basically means that the JIT needs to not insert GC
polling calls when JITing an MMTk class that extends "Uninterruptible".
This project depends on VM and JIT support for Back-branch
polling. It probably does not need to be fully developed and
debugged until we try to run multithread java apps. The reason
is because it requires two or more running Java thread to create
a condition
where one thread want to arbitrarily suspend the other java threads at GC
safepoints.
--
Weldon Washburn
Intel Middleware Products Division