Gah! Dan (or should we call you AHBJ? :) Can you turn on a standard quoting mechanism in your mailer to make threads easier to follow?

I keep running aground with this "-----" thing....

geir


On Oct 21, 2005, at 5:14 AM, Apache Harmony Bootstrap JVM wrote:



-----Original Message-----
From: Robin Garner <[EMAIL PROTECTED]>
Sent: Oct 20, 2005 3:08 PM
To: Apache Harmony Bootstrap JVM <[EMAIL PROTECTED]>
Cc: harmony-dev@incubator.apache.org
Subject: Re: Some questions about the architecture


Robin, Rodrigo,

Perhaps the two of you could get your heads together
on GC issues?  I think both of you have been thinking
along related lines on the structure of GC for this JVM.
What do you think?


I think the current challenge is to get the GC people and the VM people thinking along the same lines when it comes to GC issues. I think we're
both coming from the same place.

---

Probably!

---


Further comments follow...

-----Original Message-----
From: Rodrigo Kumpera <[EMAIL PROTECTED]>
Sent: Oct 19, 2005 4:49 PM
To: harmony-dev@incubator.apache.org
Subject: Re: Some questions about the architecture

On 10/19/05, Apache Harmony Bootstrap JVM <[EMAIL PROTECTED]> wrote:



-----Original Message-----
From: Rodrigo Kumpera <[EMAIL PROTECTED]>
Sent: Oct 19, 2005 1:49 PM
To: harmony-dev@incubator.apache.org, Apache Harmony Bootstrap JVM
<[EMAIL PROTECTED]>
Subject: Re: Some questions about the architecture

On 10/19/05, Apache Harmony Bootstrap JVM <[EMAIL PROTECTED]> wrote:



...snip...


Notice that in 'jvm/src/jvmcfg.h' there is a JVMCFG_GC_THREAD
that is used in jvm_run() as a regular thread like any other.
It calls gc_run() on a scheduled basis.  Also, any time an object
finalize() is done, gc_run() is possible.  Yes, I treat GC as a
stop-the-world process, but here is the key:  Due to the lack
of asynchronous native POSIX threads, there are no safe points
required.  The only thread is the SIGALRM target that sets the
volatile boolean in timeslice_tick() for use by opcode_run() to
test.  <b>This is the _only_ formally asynchrous data structure in
the whole machine.</b>  (Bold if you use an HTML browser, otherwise
clutter meant for emphasis.)  Objects that contain no references can
be GC'd since they are merely table entries.  Depending on how the
GC algorithm is done, gc_run() may or may not even need to look
at a particular object.

Notice also that classes are treated in the same way by the GC API.
If a class is no longer referenced by any objects, it may be GC'd also. First, its intrinsic class object must be GC'd, then the class itself.
This
may take more than one pass of gc_run() to make it happen.


There's a major misconception here. As I was describing it to someone a while ago, conceptually a garbage collected heap is actually simpler than an explicitly managed heap. The standard heap has 'malloc' and 'free'. A
managed heap (with GC) just has 'malloc'.

In practice it's more complex but the principle is the same.  From the
interpreter's point of view, you just allocate. Forever. Reclaiming free space is the GC's problem, because it's the only part of the VM that can
know when something is dead.  Things die when (or soon after) all
references to them die.

---

This design _only_ uses "heap.h" and friends for management of
internal JVM data structures, and _never_ repeat _never_ is available
or visible or controllable directly or indirectly by the effects of Java
bytecodes with the exception of the functions object_instance_new()
and object_instance_delete(), and then only for array objects and an
array of 'jvalue' for the fields in a class (one for static fields, the other
for instance fields), which then go to the heap for their data storage
only (an 'jlong' array of 10 elements gets 10x8, or 80 bytes, the rest
coming out of the object table, or say 5 fields, so 5 x 8, or 40 bytes).
I have separated GC out as a _completely_ different issue that relates
_only_ to the effects of Java bytecodes.

In other words, I have two separate memory management domains
so that, no matter what sort of GC is used by the virtual machine,
the real-machine code that implements it is not affected by it at all.
In this way, GC can have neither a positive nor a negative effect
on the real-machine implementation of the JVM.  This was by design,
and whether or not this was a good design choice, I think that is for
more experienced JVM architects than myself to decide.

By virtue of having GC control when object references are reclaimed,
the array and field storage ultimately falls under its control instead of being explicitly managed. The object_instance_new() and object_instance_delete()
being controlled by 'new' and GC, respectively.

*** WHAT SAY YOU JVM EXPERTS LURKING OUT THERE?  I KNOW
      YOU'RE READING THIS!  Please speak up!  I would like to hear
      what your experience has been so we can create the best
solution to the issue of real and virtual machine memory management. ***

Now to be fair with a complete disclosure at this time, my object allocation is from a static array in the 'pjvm->object[]' array of 'robject', which has a fixed, maximum size. The same for classes with the 'pjvm->class[]' array of 'rclass'. The OBJECT() and CLASS() macros can be adjusted to reflect any different allocation mechanism that might be chosen for any implementation, either now or in the future, hopefully making this JVM _extremely_ modular (See
also 'README' for a section on "Subsystem component abstraction".)

Keep in mind that this JVM was _not_ designed with blinding speed in mind for
its first cut, but with the Henry Ford approach:

    1.  Sweat blood and create a Model "A".
    2.  Sell enough to make it worth the while.
    3.  Work on improvements and create a Model "B".
4. Go from one failure to the next with no loss of enthusiasm (Quote from Mr. Ford)
    5.  Get down to the Model "K", which had some significant success.
6. Keep working until you build the Model "T", which sold by the million.

I guess I'd like us to get the Model "A" out the door even as we look toward improvements such as are being suggested from a number of folks. If we need to adjust the heap and GC models, sure, we can do it. And perhaps a new and better GC interface would be appropriate (As Robin pointed out to me off the list). As he also pointed out, now is the time to make an API change
like this before we get deeply into the project as a group.

I would like to see what this JVM has going for it with its design in its Model "A" state, whether or not we adjust the GC interface paradigm. Part of the reason I didn't supply GC is (1) it is a crucial element, and (2) I've never done one, and (3) there are people like Robin who have written honours theses on GC and are therefore
much more qualified.

---


GC is triggered in two cases: 1) the user code calls System.gc(). 2) the
heap fills up (for some suitable definition of 'fills up').  There is
never any need for the VM code to call the garbage collector.

A consequence is that every call to 'new' needs to be a gc safe point. If
the heap is full, there's no way to keep executing until a timer event
triggers.

What the VM needs to do is to provide services that allow the GC to do its
job.  These are at core:
- A way to allocate bulk memory (eg mmap)
- A way to enumerate roots (this is where stack scanning happens)
- A scheduling mechanism (especially for parallel GC)
- A way to enumerate the pointers in an object
- Notification (which the GC can ignore) for pointer read and write
operations (read and write barriers)

Understanding this will go a long way to getting past the disconnect we currently have over GC issues. When I propose the new gc interfaces, this
should become more concrete.


That depends on the GC implementation.  Look at 'jvm/src/gc_stub.c'
for the stub reference implementation.  To see the mechanics of
how to fit it into the compile environment, look at the GC and heap
setup in 'config.sh' and at 'jvm/src/heap.h' for how multiple heap
implementations get configured in.


As mentioned before, the heap *is* the GC.

---

I think we are using the terms "heap" and "GC" with slightly different
definitions. My definitions are stated above, where I think you are using
the terms synonomously.

Also, I have GC set up to meet the two conditions you state. But a 'new' event never needs a GC safe point in this implementation because of the
outer/inner loop implementation on the _same_ real-machine thread, as
described in other posts to this list.

---


The GC interface API that I defined may or may not be adequate
for everything.  I basically set it up so that any time an object
reference
was added or deleted, I called a GC function.


So is this a write barrier ?  IE, are these functions called for every
PUTFIELD, PUTSTATIC and AASTORE bytecode ?

---

No. There are no barriers of any kind except the mutex mechanism for the time slice thread. The outer/inner loop interpreter structure precludes the
need for it.

Notice that the implementation will determine whether this is an efficient way to do it or not, especially since I distinguish between fields and local variables.

---


                                                    The same goes for
class loading and unloading. For local variables on the JVM stack for
each
thread, the GC functions are slightly different than for fields in an
object,
but the principle is the same.


You can write the interface so that the GC needs to know when a new class
is loaded (or not, but IMO it's a good design).  As far as the GC is
concerned, a class is alive as long as there are objects of that type in
the heap.  If the class data structures are actually in the heap, this
becomes easy, but if you want to keep them on the VM side of the fence, you could potentially hijack the weak reference mechanism to get notified
when the last object dies.

---

I suspected that this might be a good idea for classes.  Thanks for
the confirmation.

There should not be any reason to highjack the weak reference
mechanism with this GC interface design as the GC mechanism
is notified when a class is deleted, which can _only_ occur when
there are no references to it.

Maybe I should state something that I consider to be of value to this
JVM design.  Both Robin and Rodrigo have been sniffing around the
edges of it in their critique.  And they both have some _good_ points
about what I have put together.  And I am learning quite a bit as I
think about their issues.

I think this JVM design has some strong intrinsic features in that I
explicitly do _not_ depend on a lot of heap allocation for my major
runtime structures, that is, for those that exist over a significant part
of the life of the JVM, namely the thread, class, and object tables.
In their place, a somewhat static malloc-type allocation (huh?) is done
for THREAD(), CLASS() and OBJECT() structures-- meaning that I do
a single heap allocation for the whole of each table and keep it until
the JVM shuts down.  (These table designs, of course, may be
changed as necessary.)  I do _nothing_ fancy in the way of
managing memory.  Period.  And Intentionally.  This attitude probably
comes from my experience in real-time embedded systems where the
resources are limited and non-extensible.  By applying what I consider
to be _extremely_ conservative memory management tactics, I think
that this design will have some inherent reliability and speed built into it that may not be obvious upon first blush. With that said, I am very
interested in the numerous ideas for architectural changes and
improvements, and I look forward to Robin's forthcoming suggestions
for a new API for the GC mechanism.

---


Regards,
Robin






Dan Lydick


--
Geir Magnusson Jr                                  +1-203-665-6437
[EMAIL PROTECTED]


Reply via email to