Re: Some questions about the architecture

Robin Garner Thu, 20 Oct 2005 13:08:35 -0700

> Robin, Rodrigo,
>
> Perhaps the two of you could get your heads together
> on GC issues?  I think both of you have been thinking
> along related lines on the structure of GC for this JVM.
> What do you think?

I think the current challenge is to get the GC people and the VM people
thinking along the same lines when it comes to GC issues.  I think we're
both coming from the same place.

> Further comments follow...
>
> -----Original Message-----
> From: Rodrigo Kumpera <[EMAIL PROTECTED]>
> Sent: Oct 19, 2005 4:49 PM
> To: harmony-dev@incubator.apache.org
> Subject: Re: Some questions about the architecture
>
> On 10/19/05, Apache Harmony Bootstrap JVM <[EMAIL PROTECTED]> wrote:
>>
>>
>> -----Original Message-----
>> From: Rodrigo Kumpera <[EMAIL PROTECTED]>
>> Sent: Oct 19, 2005 1:49 PM
>> To: harmony-dev@incubator.apache.org, Apache Harmony Bootstrap JVM
>> <[EMAIL PROTECTED]>
>> Subject: Re: Some questions about the architecture
>>
>> On 10/19/05, Apache Harmony Bootstrap JVM <[EMAIL PROTECTED]> wrote:
>> >
> ...snip...
>>
>> Notice that in 'jvm/src/jvmcfg.h' there is a JVMCFG_GC_THREAD
>> that is used in jvm_run() as a regular thread like any other.
>> It calls gc_run() on a scheduled basis.  Also, any time an object
>> finalize() is done, gc_run() is possible.  Yes, I treat GC as a
>> stop-the-world process, but here is the key:  Due to the lack
>> of asynchronous native POSIX threads, there are no safe points
>> required.  The only thread is the SIGALRM target that sets the
>> volatile boolean in timeslice_tick() for use by opcode_run() to
>> test.  <b>This is the _only_ formally asynchrous data structure in
>> the whole machine.</b>  (Bold if you use an HTML browser, otherwise
>> clutter meant for emphasis.)  Objects that contain no references can
>> be GC'd since they are merely table entries.  Depending on how the
>> GC algorithm is done, gc_run() may or may not even need to look
>> at a particular object.
>>
>> Notice also that classes are treated in the same way by the GC API.
>> If a class is no longer referenced by any objects, it may be GC'd also.
>> First, its intrinsic class object must be GC'd, then the class itself.
>> This
>> may take more than one pass of gc_run() to make it happen.

There's a major misconception here.  As I was describing it to someone a
while ago, conceptually a garbage collected heap is actually simpler than
an explicitly managed heap.  The standard heap has 'malloc' and 'free'.  A
managed heap (with GC) just has 'malloc'.

In practice it's more complex but the principle is the same.  From the
interpreter's point of view, you just allocate.  Forever.  Reclaiming free
space is the GC's problem, because it's the only part of the VM that can
know when something is dead.  Things die when (or soon after) all
references to them die.

GC is triggered in two cases: 1) the user code calls System.gc().  2) the
heap fills up (for some suitable definition of 'fills up').  There is
never any need for the VM code to call the garbage collector.

A consequence is that every call to 'new' needs to be a gc safe point.  If
the heap is full, there's no way to keep executing until a timer event
triggers.

What the VM needs to do is to provide services that allow the GC to do its
job.  These are at core:
- A way to allocate bulk memory (eg mmap)
- A way to enumerate roots (this is where stack scanning happens)
- A scheduling mechanism (especially for parallel GC)
- A way to enumerate the pointers in an object
- Notification (which the GC can ignore) for pointer read and write
operations (read and write barriers)

Understanding this will go a long way to getting past the disconnect we
currently have over GC issues.  When I propose the new gc interfaces, this
should become more concrete.

> That depends on the GC implementation.  Look at 'jvm/src/gc_stub.c'
> for the stub reference implementation.  To see the mechanics of
> how to fit it into the compile environment, look at the GC and heap
> setup in 'config.sh' and at 'jvm/src/heap.h' for how multiple heap
> implementations get configured in.

As mentioned before, the heap *is* the GC.

> The GC interface API that I defined may or may not be adequate
> for everything.  I basically set it up so that any time an object
> reference
> was added or deleted, I called a GC function.

So is this a write barrier ?  IE, are these functions called for every
PUTFIELD, PUTSTATIC and AASTORE bytecode ?

>                                                     The same goes for
> class loading and unloading.  For local variables on the JVM stack for
> each
> thread, the GC functions are slightly different than for fields in an
> object,
> but the principle is the same.

You can write the interface so that the GC needs to know when a new class
is loaded (or not, but IMO it's a good design).  As far as the GC is
concerned, a class is alive as long as there are objects of that type in
the heap.  If the class data structures are actually in the heap, this
becomes easy, but if you want to keep them on the VM side of the fence,
you could potentially hijack the weak reference mechanism to get notified
when the last object dies.

Regards,
Robin

Re: Some questions about the architecture

Reply via email to