Xiao-Feng Li wrote:
Robin, good idea.

I understand the main difference between your design and Aleksey's
proposal 1 is, the tracing in your design stops as vtable, but
Aleksey's continues to classloader. On the other hand, your approach
requires an extra step to sweep the vtables in order to determine the
classloaders' reachability.

Actually there are quite a few more differences:
- This mark phase is built into the standard GC trace, like Aleksey's automatic class unloading proposal. - This approach requires no additional fields in headers or objects (except maybe something to allow enumeration of vtables if this doesn't already exist) - The additional mark comes at an extremely low cost as discussed previously.

The operation to sweep vtables is very cheap, and only needs to be done when you believe there are classloaders that can be unloaded, rather than at every GC. You might for example trigger class unloading every time a new classloader is loaded.

If this is true, why not just let the tracing to continue as a
complete step to determine the classloaders' reachability?

Because that adds a large overhead to every GC, and requires vtables and classloader structures to be traced at every GC. While the numbers of vtables is not large, the number of pointers to them is. The particular flavour of mark in my proposal is much cheaper than the standard test and mark operation.

Another difference is to mark the reachability with an unconditional
write instead of a bit mask write. I think this can be applied to
either approach.

Not really.

If you use an unconditional mark, you lose the ability to test whether any particular mark is the first, and therefore enqueue an object for scanning only once, and therefore the heap trace can never complete. You can only use unconditional marks to process 'leaf' objects in the heap.

You can always turn a bit map into a byte map and avoid synchronized update, but you can't eliminate the dependent load in a standard trace algorithm. The difference in performance between a load-test-write and a load-test-mask-write is insignificant.


Of course a separate trace of the heap is an attractive operation - in MMTk, this is simple to build because the transitive closure code can simply be subclassed (eg the sanity checker is ~250 lines of code). Depending on how reusable the DRLVM heap trace code is, this may or may not be a good option.

cheers,
Robin


Thanks,
xiaofeng

On 11/1/06, Robin Garner <[EMAIL PROTECTED]> wrote:
Actually, just thinking about how I would implement this in JikesRVM, I
would use the reachability based algorithm, but piggyback on the
existing GC mechanisms:

- Allocate a byte (or word) in each vtable for the purpose of tracking
class reachability.
- Periodically, at a time when no GC is running (even the most
aggressive concurrent GC algorithms have these, I believe), zero this
flag across all vtables. This is the beginning of a class-unloading epoch.
- During each GC, when the GC is fetching the GC map for an object,
*unconditionally* write a value to the class reachability byte.  It may
make sense for this byte to be in either the first cache-line of the
vtable, or the cache line that points to the GC map - just make sure the
mark operation doesn't in general fetch an additional cache line.
- At a point in the sufficiently far future, when all reachable objects
are known to have been traced by the GC, sweep the vtables and check the
reachability of the classloaders.

The features of this approach are:

- Minimal additional work at GC time.  The additional write will cause
some additional memory traffic, but a) it's to memory that is already
guaranteed to be in L1 cache, and b) it's an unconditional independent
write, and c) multiple writes to common classes will be absorbed by the
write buffer.

- Space cost of at most 1 word per vtable.

- This works whether vtables are objects or VM structures

- If the relationship between a class and a vtable is not 1:1, this only
need affect the periodic sweep process, which should be infrequent and
small compared to a GC.

- shouldn't need a stop-the-world at any point.

I've implemented and tested the GC-relevant part of this in JikesRVM,
and the GC time overhead appears to be just under 1% in the MMTk
MarkSweep collector.

cheers,
Robin


Reply via email to