On 11/1/06, Robin Garner <[EMAIL PROTECTED]> wrote:

Rana Dasgupta wrote:
> Robin,
>    The basic difference of this with Etienne's method is that the flag
is
> on the vtable, instead of the CL instance. Do I understand correctly ?
The
> GC perf impact is therefore reduced because you need to lookup
> object->vtable instead of object->class->CLinstancewhen tracing the
heap.
> The space overhead is correspondingly slightly higher. Also the GC
impact
> may look lower because there are a couple of pseudo GC cycles to reset
the
> vtables and sweep the vtables.
>
> Thanks,
> Rana

The relevant part of Etienne's design I believe is this:

> 7- Each class loader structure maintains a set of boolean flags, one
>  flag per "non-nursery" garbage collected area (even when thread-local
>  heaps are used).  The flag is set when an instance of a class loaded by
>  this class leader is moved into the related GC-area.  The flag is unset
>  when the GC-area is emptied, or (optionally) when it can be determined
>  that no instance of a class loaded by this class loader remains in the
>  GC-area.  This is best implemented as follows: a) use an unconditional
>  write of "true" in the flag every time an object is moved into the
>  GC-area by the garbage collector, b) unset the related flag in "all"
>  class loader structures just before collecting a GC-area, then setting
>  the flag back when an object survives in the area.

My design differs in several key ways from this:
1. There is no requirement for a per non-nursery area flag

2. The mark byte/word is set unconditionally whenever an object is
visited by the GC, not when an object is moved into a particular mature
space.  This may be the same for some GCs, but not all.

3. The mark byte/word is an unconditional write - Etienne's proposal
would use a load/mask/write sequence.  This is performance critical.
4. My memory of x86 assembler is a little rusty, but I believe a
constant store can be done without requiring a register for the value to
be written, where as or-ing a bit value into a word requires a temporary
register or two.
5. In a parallel GC, setting bits in a mask requires a synchronized
update.  My design doesn't.

The point is an unconditional store to a structure you are already
accessing is very cheap, whereas register spills, loads and synchronized
updates are expensive.


I might be missing something here.  But my take is that Robin's design is
really the best one.  Its probably in the noise but it might be possible to
even reduce the overhead of clearing the vtable "mark" by using a epoch
number instead of a boolean.  The idea is that after every major GC,
increment the value used for the mark.  When sweeping the vtables, the stale
mark values are the unreachable classes.

cheers

> On 10/31/06, Robin Garner <[EMAIL PROTECTED]> wrote:
>>
>> Actually, just thinking about how I would implement this in JikesRVM, I
>> would use the reachability based algorithm, but piggyback on the
>> existing GC mechanisms:
>>
>> - Allocate a byte (or word) in each vtable for the purpose of tracking
>> class reachability.
>> - Periodically, at a time when no GC is running (even the most
>> aggressive concurrent GC algorithms have these, I believe), zero this
>> flag across all vtables.  This is the beginning of a class-unloading
>> epoch.
>> - During each GC, when the GC is fetching the GC map for an object,
>> *unconditionally* write a value to the class reachability byte.  It may
>> make sense for this byte to be in either the first cache-line of the
>> vtable, or the cache line that points to the GC map - just make sure
the
>> mark operation doesn't in general fetch an additional cache line.
>> - At a point in the sufficiently far future, when all reachable objects
>> are known to have been traced by the GC, sweep the vtables and check
the
>> reachability of the classloaders.
>>
>> The features of this approach are:
>>
>> - Minimal additional work at GC time.  The additional write will cause
>> some additional memory traffic, but a) it's to memory that is already
>> guaranteed to be in L1 cache, and b) it's an unconditional independent
>> write, and c) multiple writes to common classes will be absorbed by the
>> write buffer.
>>
>> - Space cost of at most 1 word per vtable.
>>
>> - This works whether vtables are objects or VM structures
>>
>> - If the relationship between a class and a vtable is not 1:1, this
only
>> need affect the periodic sweep process, which should be infrequent and
>> small compared to a GC.
>>
>> - shouldn't need a stop-the-world at any point.
>>
>> I've implemented and tested the GC-relevant part of this in JikesRVM,
>> and the GC time overhead appears to be just under 1% in the MMTk
>> MarkSweep collector.
>>
>> cheers,
>> Robin
>>
>




--
Weldon Washburn
Intel Enterprise Solutions Software Division

Reply via email to