subject:"\[drlvm\] Class unloading support \- tested one approach"

Re: [drlvm] Class unloading support - tested one approach

2006-11-09 Thread Alexey Varlamov


[snip]

Alexey,

it looks like what you are thinking about is *concurrent* collector,
and concurrent garbage collections brings substantial complexity
even without class unloading.


Salikh,

You are correct. Maybe I'm running ahead of the train, but my concern
is that "scalability" of unloading design is not the last criteria.
The decision we'll do now should not strike back at us in some months.


However, the design we were discussing was for *stop-the-world* garbage
collectors, because this is the only thing currently supported by DRLVM,
and all existing GCs are stop-the-world.


I'm kinda optimistic on gcv5 progress, feeling that concurrent
collection is not improbable to be workable before H2/2007 :)



So, the correctness of unloading algorithm can easily be proved if we consider
that the "final unloading" collection is a full heap collection,
i.e. both nursery and mature space is collected.

Yes, things are more or less clear for the case of STW GC so we can
concentrate on scripting more detailed technical proposal...

[skip]

Re: [drlvm] Class unloading support - tested one approach

2006-11-09 Thread Robin Garner

Ivan Volosyuk wrote:

On 11/9/06, Etienne Gagnon <[EMAIL PROTECTED]> wrote:

Ivan Volosyuk wrote:
> We will get rid of false sharing. That's true. But it still be quite
> expensive to write those '1' values, because of ping-ponging of the
> cache line between processors. I see only one solution to this: use
> separate mark bits in vtable per GC thread which should reside in
> different cache lines and different from that word containing gcmap
> pointer.

The only thing that a GC thread does is write "1" in this slot; it never
writes "0".  So, it is not very important in what order (or even "when")
this word is finally commited to main memory.  As long as there is some
barrier before the "end of epoch collection" insuring that all
processors cache write buffers are commited to memory before tracing
vtables (or gc maps).

You don't need memory coherency on write-without-read. :-)

I don't speak about memory coherency, I speak about bus load with
useless memory traffic between processors and poor CPU cache usage.

Surely this wouldn't happen in a sufficiently weak memory model ?  Lets 
just not support x64 :-)

But I think this false sharing may be what kills this particular idea.
The next cheapest option should be to use a side array of bytes - as 
long as calculating the address of the mark byte can be done without any 
loads or register spills, it should still be cheaper than a full 
test-and-mark operation on the vtable.  I guess there are cache policies 
where this may still be slow on an SMP machine.

Side metadata is easiest to do when objects are in a specific space, and 
has coarse alignment.  Any ideas what the typical size of a DRLVM vtable 
is ?  Would 256 bytes be an excessive alignment boundary ?

I'll try it out in the next day or so.  Unfortunately I don't have 
access to anything with more parallelism than a Pentium D, so it's not 
likely to be conclusive.

--
Robin Garner
Dept. of Computer Science
Australian National University
http://cs.anu.edu.au/people/Robin.Garner/

Re: [drlvm] Class unloading support - tested one approach

2006-11-09 Thread Ivan Volosyuk

On 11/9/06, Etienne Gagnon <[EMAIL PROTECTED]> wrote:

Ivan Volosyuk wrote:
> We will get rid of false sharing. That's true. But it still be quite
> expensive to write those '1' values, because of ping-ponging of the
> cache line between processors. I see only one solution to this: use
> separate mark bits in vtable per GC thread which should reside in
> different cache lines and different from that word containing gcmap
> pointer.

The only thing that a GC thread does is write "1" in this slot; it never
writes "0".  So, it is not very important in what order (or even "when")
this word is finally commited to main memory.  As long as there is some
barrier before the "end of epoch collection" insuring that all
processors cache write buffers are commited to memory before tracing
vtables (or gc maps).

You don't need memory coherency on write-without-read. :-)

I don't speak about memory coherency, I speak about bus load with
useless memory traffic between processors and poor CPU cache usage.

--
Ivan
Intel Enterprise Solutions Software Division

Re: [drlvm] Class unloading support - tested one approach

2006-11-09 Thread Salikh Zakirov

Etienne Gagnon wrote:
> Salikh Zakirov wrote:
>>   7) let the GC finish collection and reclaim unreachable objects -- this 
>> reclaims java objects
> 
> Just a bit of a warning...  This should be integrated within the
> weak/soft/phantom + finalization framework.  We definitely don't want
> the native resources of a class loader to be freed and later have
> finalization revive the class loader...  :-)

Agreed. "Revival" of classloaders should be done after "revival"
of objects in finalization queue.

I think this scheme can be implemented by introducing one additional GC->VM 
callback (vm_trace_complete),
which would be called right after GC completed the trace. The call sequence 
will be as follows:

   GCVM
|---> vm_enumerate_root_set()
| gc_add_root_set_entry()<---|
| gc_add_root_set_entry()<---|
| gc_add_root_set_entry()<---|
|<- - - - - - - - - - -return from vm_enumerate_root_set()
||
   [trace heap]  |
||
|---> vm_trace_complete()
| gc_add_root_set_entry()<---|
| gc_add_root_set_entry()<---|
|< - - - - - - - - - - - return from vm_trace_complete()-|
||
   [trace heap from new roots,   |
 if there are any]   |
|---> vm_trace_complete()
|< - - - - - - - - - - - return from vm_trace_complete()-|
|
   [no retrace, as no new roots were received]
|
   [reclaim space]
|

Additionally, even finalization itself can be moved out of GC responsibility,
using this interface and one additional function to query if the object
was already reached or not.

> [Luckily, nothing special needs to be done for JNI code;
> CallStaticMethod does require a native reference to a class
> object.  Yay! ]

Unluckily, something needs to be done for JVMTI. It has a function 
IterateOverHeap
which is supposed to iterate over both reachable and unreachable objects by 
scanning
heap linearly.

Thus, if the respective capability (can_tag_objects) has been requested on the 
VM startup,
the GC must run in a special mode and zero out all unreachable objects, because 
the unreachable
object may loose its descriptor (VTable) at any time, and GC will not be able 
even to know its size.

This will prevent some optimizations, like not reclaiming short free areas in 
unmovable space,
and require some special attention from the GC developers. OTOH, gc_cc already 
has a special mode
(-Dgc.heap_iteration=1) to support iteration even before class unloading is 
implemented.

Re: [drlvm] Class unloading support - tested one approach

2006-11-09 Thread Salikh Zakirov

Etienne Gagnon wrote:
> Ivan Volosyuk wrote:
>> We will get rid of false sharing. That's true. But it still be quite
>> expensive to write those '1' values, because of ping-ponging of the
>> cache line between processors. I see only one solution to this: use
>> separate mark bits in vtable per GC thread which should reside in
>> different cache lines and different from that word containing gcmap
>> pointer.
> 
> The only thing that a GC thread does is write "1" in this slot; it never
> writes "0".  So, it is not very important in what order (or even "when")
> this word is finally commited to main memory.  As long as there is some
> barrier before the "end of epoch collection" insuring that all
> processors cache write buffers are commited to memory before tracing
> vtables (or gc maps).

The "false sharing" problem occurs whenever one processor writes into
the cache line other processors read from, because it invalidates loaded
copies and makes other processors read it again from main memory.
It doesn't matter if they write 1 or 0

In our case, both gcmap pointer and gcmap itself are likely to be read
by multiple processors, so writing to a location nearby may lead
to false sharing.

Re: [drlvm] Class unloading support - tested one approach

2006-11-09 Thread Etienne Gagnon

Ivan Volosyuk wrote:
> We will get rid of false sharing. That's true. But it still be quite
> expensive to write those '1' values, because of ping-ponging of the
> cache line between processors. I see only one solution to this: use
> separate mark bits in vtable per GC thread which should reside in
> different cache lines and different from that word containing gcmap
> pointer.

Thinking about it...  Doesn't the "object vtable" suffer from the same
problem, anyway?  It's probably worse, as it will be quite unfeasible to
try to locate them in the "right" cache lines!  Yep, another point
against object-vtables...

Etienne
-- 
Etienne M. Gagnon, Ph.D.http://www.info2.uqam.ca/~egagnon/
SableVM:   http://www.sablevm.org/
SableCC:   http://www.sablecc.org/


signature.asc
Description: OpenPGP digital signature

Re: [drlvm] Class unloading support - tested one approach

2006-11-09 Thread Etienne Gagnon

Salikh Zakirov wrote:
>   7) let the GC finish collection and reclaim unreachable objects -- this 
> reclaims java objects

Just a bit of a warning...  This should be integrated within the
weak/soft/phantom + finalization framework.  We definitely don't want
the native resources of a class loader to be freed and later have
finalization revive the class loader...  :-)

[Luckily, nothing special needs to be done for JNI code;
CallStaticMethod does require a native reference to a class
object.  Yay! ]

Etienne

-- 
Etienne M. Gagnon, Ph.D.http://www.info2.uqam.ca/~egagnon/
SableVM:   http://www.sablevm.org/
SableCC:   http://www.sablecc.org/

signature.asc
Description: OpenPGP digital signature

Re: [drlvm] Class unloading support - tested one approach

2006-11-09 Thread Salikh Zakirov

Etienne Gagnon wrote:
>>   3) trace the heap 
>>   4) scan vtable marks and "revive" marked class unloaders, by adding the 
>> strong root
>>  from the previously collected "unload list". Remove the revived 
>> classloaders from unload list.
>>   5) repeat steps (3) and (4) until there is no classloaders to revive
> 
> As long as it is understood that the repeated (3) is not a full trace.
> It's only a trace starting from the revived roots.  [This is important
> in evaluating the total work done].

Exactly.

Re: [drlvm] Class unloading support - tested one approach

2006-11-09 Thread Etienne Gagnon

Ivan Volosyuk wrote:
> We will get rid of false sharing. That's true. But it still be quite
> expensive to write those '1' values, because of ping-ponging of the
> cache line between processors. I see only one solution to this: use
> separate mark bits in vtable per GC thread which should reside in
> different cache lines and different from that word containing gcmap
> pointer.

The only thing that a GC thread does is write "1" in this slot; it never
writes "0".  So, it is not very important in what order (or even "when")
this word is finally commited to main memory.  As long as there is some
barrier before the "end of epoch collection" insuring that all
processors cache write buffers are commited to memory before tracing
vtables (or gc maps).

You don't need memory coherency on write-without-read. :-)

Etienne

-- 
Etienne M. Gagnon, Ph.D.http://www.info2.uqam.ca/~egagnon/
SableVM:   http://www.sablevm.org/
SableCC:   http://www.sablecc.org/

signature.asc
Description: OpenPGP digital signature

Re: [drlvm] Class unloading support - tested one approach

2006-11-09 Thread Ivan Volosyuk

On 11/9/06, Etienne Gagnon <[EMAIL PROTECTED]> wrote:

Salikh Zakirov wrote:
> Technically, it should not be too difficult to add an additional field to the 
VTable
> structure, and require GC to write 1 there during object scanning.
> However, if the VTable mark is located in the same cache line as gcmap,
> it may severely hit parallel GC performance on a multiprocessor due to false 
sharing,
> as writing VTable mark will invalidate the gcmap pointers loaded to caches of 
other
> processors.
>
>objectVTable   gcmap
>  +++---++--+
>  | VT ptr |--->| gcmap ptr |--->| offset of ref #1 |
>  |  ...   ||...|| offset of ref #2 |
>  +++---+|   ...|
> |0 |
> +--+

If you go that far for every scanned object (!), then you could simply
place the class unloading bit in the gc map, at index -1) to minimize
disruption of current code...

   objectVTable   gcmap
+--+
 +++---+| cl.un. mark bit  |
 | VT ptr |--->| gcmap ptr |--->| offset of ref #1 |
 |  ...   ||...|| offset of ref #2 |
 +++---+|   ...|
|0 |
+--+

This gets rid of the cache line hazard...

We will get rid of false sharing. That's true. But it still be quite
expensive to write those '1' values, because of ping-ponging of the
cache line between processors. I see only one solution to this: use
separate mark bits in vtable per GC thread which should reside in
different cache lines and different from that word containing gcmap
pointer.

--
Ivan
Intel Enterprise Solutions Software Division

Re: [drlvm] Class unloading support - tested one approach

2006-11-09 Thread Etienne Gagnon

Salikh Zakirov wrote:
> Ah, I think I got it.

Yep.

>   3) trace the heap 
>   4) scan vtable marks and "revive" marked class unloaders, by adding the 
> strong root
>  from the previously collected "unload list". Remove the revived 
> classloaders from unload list.
>   5) repeat steps (3) and (4) until there is no classloaders to revive

As long as it is understood that the repeated (3) is not a full trace.
It's only a trace starting from the revived roots.  [This is important
in evaluating the total work done].

> The voting definitely was premature, as it turns out that even the design 
> under discussion
> can be elaborated to multiple substantially different designs.

Yes, you're right.

Etienne
-- 
Etienne M. Gagnon, Ph.D.http://www.info2.uqam.ca/~egagnon/
SableVM:   http://www.sablevm.org/
SableCC:   http://www.sablecc.org/


signature.asc
Description: OpenPGP digital signature

Re: [drlvm] Class unloading support - tested one approach

2006-11-09 Thread Salikh Zakirov

Etienne Gagnon wrote:

> "Revival" is only needed if you use the finalization-like approach.  If
> you only do class-unloading GC when the nursery is empty, then no
> revival is needed.  

Ah, I think I got it.

You mean running a minor collection, and then "class unloading" full heap 
collection
sequentially, without any mutator work in between?
Then, the correctness is observed easily:

  1) all mature objects has their vtable marks set to 1
  2) after minor collection, the nursery is empty
  => all live objects already have vtable marks == 1

  Thus, if we find a classloader with vtable marks == 0, then it has no object 
instances,
  and its reachability is defined solely by reachability of 
java.lang.ClassLoader instance
  and existence of the method frames, which can be checked, respectively, by
  enumerating class loader roots as weak roots, and scanning stacks.

  Note, that the class loader, which became eligible for unloading during epoch 
N,
  will not be unloaded until the end of the epoch N+1.

However, in the case of non-generational collector, the "minor collection 
followed
by unloading collection" becomes effectively two successive garbage collections.

On the other side, "finalization-like" design goes as follows:

  1) clean vtable marks before "class unloading" collection
  2) enumerate classloader roots as weak and collect array of user classloader 
pointers for later use
 -- let's call it "unload list"
  3) trace the heap 
  4) scan vtable marks and "revive" marked class unloaders, by adding the 
strong root
 from the previously collected "unload list". Remove the revived 
classloaders from unload list.
  5) repeat steps (3) and (4) until there is no classloaders to revive
  6) unload the classloaders, pointed by the "unload list" -- this reclaims 
native resources
  7) let the GC finish collection and reclaim unreachable objects -- this 
reclaims java objects

This design unloads classloaders at the end of the very same epoch when they 
became unloadable.

The voting definitely was premature, as it turns out that even the design under 
discussion
can be elaborated to multiple substantially different designs.

Re: [drlvm] Class unloading support - tested one approach

2006-11-09 Thread Etienne Gagnon

Etienne Gagnon wrote:
> Salikh Zakirov wrote:
> 
>>I have another concern though: 
>>just before starting "final unloading" collection, we scan vtable marks and 
>>identify
>>the candidates for unloading. During the "final unloading" collection, the
>>candidate classloader roots are reported as week. At the end of the trace,
>>we need to rescan vtable marks and "revive" the classloader which were found
>>in possession of live objects. This techniques is exactly the same as the one
>>used for object finalization. 
>>
>>However, in contrast with finalization, we will need to repeat reviving
>>classloaders which have non-0 vtable marks until the process converges, and no
>>new classloaders are revived. (* in finalization, both dead and live objects 
>>in finalization
>>queue are revived, and thus the revival converges in just 1 step *).

In case you chose the finalization-like + revival way, then I don't see
any significant performance hit of multiple-step convergence!  For one
thing, you'll probably agree with me that it is quite unlikely to take
more than 1 step to converge in most cases, and the additional work in
the other cases is still quite insignificant relative to the remaining
collection work!

Etienne

-- 
Etienne M. Gagnon, Ph.D.http://www.info2.uqam.ca/~egagnon/
SableVM:   http://www.sablevm.org/
SableCC:   http://www.sablecc.org/


signature.asc
Description: OpenPGP digital signature

Re: [drlvm] Class unloading support - tested one approach

2006-11-09 Thread Etienne Gagnon

Salikh Zakirov wrote:
> I have another concern though: 
> just before starting "final unloading" collection, we scan vtable marks and 
> identify
> the candidates for unloading. During the "final unloading" collection, the
> candidate classloader roots are reported as week. At the end of the trace,
> we need to rescan vtable marks and "revive" the classloader which were found
> in possession of live objects. This techniques is exactly the same as the one
> used for object finalization. 
> 
> However, in contrast with finalization, we will need to repeat reviving
> classloaders which have non-0 vtable marks until the process converges, and no
> new classloaders are revived. (* in finalization, both dead and live objects 
> in finalization
> queue are revived, and thus the revival converges in just 1 step *).

"Revival" is only needed if you use the finalization-like approach.  If
you only do class-unloading GC when the nursery is empty, then no
revival is needed.  In this case, after GC you only need to revert weak
references to hard ones.  Nulled weak references relate to dead class
loaders for which you can definitely free the native resources.

Etienne

-- 
Etienne M. Gagnon, Ph.D.http://www.info2.uqam.ca/~egagnon/
SableVM:   http://www.sablevm.org/
SableCC:   http://www.sablecc.org/


signature.asc
Description: OpenPGP digital signature

Re: [drlvm] Class unloading support - tested one approach

2006-11-09 Thread Robin Garner


Etienne Gagnon wrote:

OK.  My latest proposal (a few messages ago) was assuming that the
nursery was empty when the "end of epoch collection" is launched.

If it is not, you can do 2 things:

a) do a minor collection to empty it, or

b) i  - use a finalization-like list of references to class loader
objects
   ii - launch gc, which might mark a previously unmarked vtable
   iii- do a finalization-like rescuing for resuscitated class loaders

"b)" should really have a minimal performance impact.  As for its
"apparent complexity", I would say that this is a non-issue; similar
code must already exist in drlvm for implementing finalization.


Just for clarification: "b)" implies a combined "nursery + mature space"
collection.

Actually, for the mature space part, you could get away with a smaller
collection if you premature all class loaders and classes to a specific
mature space area; the you only need to collect that space (in addition
to the nursery).

Etienne



This sounds rather 'stop the world' - while the barrier is more 
complicated I think it scales to concurrent collectors.


Also, don't forget an instance of a class in the nursery can pass a 
reference to its classloader to a mature-space object under suitably 
bizarre circumstances.  I guess you could have a write barrier on the 
class metadata space ...


... an XOR barrier could actually be an interesting solution ... but I'm 
sure it won't be necessary.


--
Robin Garner
Dept. of Computer Science
Australian National University
http://cs.anu.edu.au/people/Robin.Garner/

Re: [drlvm] Class unloading support - tested one approach

2006-11-09 Thread Salikh Zakirov

>> Etienne Gagnon wrote:
>> > My proposal already argued that vtable bit/byte/word marking is
>> > unnecessary for "nursery allocations".  You only need to mark the
>> vtable
>> > of objects that survive collection and pretenured objects.

Alexey Varlamov wrote:
> I may have missed it, but I only recall you argued that we just need
> to collect mature space for the *final unloading* as CL and classes
> are unlikely to die young, which I agree. But chances that a live
> object of a candidate class appeared in the nursery are higher.
> Otherwise I just do not grok how this algorithm can be proven for
> correctness.

Alexey, 

it looks like what you are thinking about is *concurrent* collector,
and concurrent garbage collections brings substantial complexity
even without class unloading.

However, the design we were discussing was for *stop-the-world* garbage
collectors, because this is the only thing currently supported by DRLVM,
and all existing GCs are stop-the-world.

So, the correctness of unloading algorithm can easily be proved if we consider
that the "final unloading" collection is a full heap collection,
i.e. both nursery and mature space is collected.

I have another concern though: 
just before starting "final unloading" collection, we scan vtable marks and 
identify
the candidates for unloading. During the "final unloading" collection, the
candidate classloader roots are reported as week. At the end of the trace,
we need to rescan vtable marks and "revive" the classloader which were found
in possession of live objects. This techniques is exactly the same as the one
used for object finalization. 

However, in contrast with finalization, we will need to repeat reviving
classloaders which have non-0 vtable marks until the process converges, and no
new classloaders are revived. (* in finalization, both dead and live objects in 
finalization
queue are revived, and thus the revival converges in just 1 step *).

Re: [drlvm] Class unloading support - tested one approach

2006-11-09 Thread Etienne Gagnon

> OK.  My latest proposal (a few messages ago) was assuming that the
> nursery was empty when the "end of epoch collection" is launched.
> 
> If it is not, you can do 2 things:
> 
> a) do a minor collection to empty it, or
> 
> b) i  - use a finalization-like list of references to class loader
> objects
>ii - launch gc, which might mark a previously unmarked vtable
>iii- do a finalization-like rescuing for resuscitated class loaders
> 
> "b)" should really have a minimal performance impact.  As for its
> "apparent complexity", I would say that this is a non-issue; similar
> code must already exist in drlvm for implementing finalization.

Just for clarification: "b)" implies a combined "nursery + mature space"
collection.

Actually, for the mature space part, you could get away with a smaller
collection if you premature all class loaders and classes to a specific
mature space area; the you only need to collect that space (in addition
to the nursery).

Etienne

-- 
Etienne M. Gagnon, Ph.D.http://www.info2.uqam.ca/~egagnon/
SableVM:   http://www.sablevm.org/
SableCC:   http://www.sablecc.org/


signature.asc
Description: OpenPGP digital signature

Re: [drlvm] Class unloading support - tested one approach

2006-11-09 Thread Etienne Gagnon

Alexey Varlamov wrote:
>> > My proposal already argued that vtable bit/byte/word marking is
>> > unnecessary for "nursery allocations".  You only need to mark the
>> vtable
>> > of objects that survive collection and pretenured objects.
> 
> 
> I may have missed it, but I only recall you argued that we just need
> to collect mature space for the *final unloading* as CL and classes
> are unlikely to die young, which I agree. But chances that a live
> object of a candidate class appeared in the nursery are higher.
> Otherwise I just do not grok how this algorithm can be proven for
> correctness.
> 

OK.  My latest proposal (a few messages ago) was assuming that the
nursery was empty when the "end of epoch collection" is launched.

If it is not, you can do 2 things:

a) do a minor collection to empty it, or

b) i  - use a finalization-like list of references to class loader
objects
   ii - launch gc, which might mark a previously unmarked vtable
   iii- do a finalization-like rescuing for resuscitated class loaders

"b)" should really have a minimal performance impact.  As for its
"apparent complexity", I would say that this is a non-issue; similar
code must already exist in drlvm for implementing finalization.

Etienne

-- 
Etienne M. Gagnon, Ph.D.http://www.info2.uqam.ca/~egagnon/
SableVM:   http://www.sablevm.org/
SableCC:   http://www.sablecc.org/


signature.asc
Description: OpenPGP digital signature

Re: [drlvm] Class unloading support - tested one approach

2006-11-09 Thread Robin Garner

Alexey Varlamov wrote:

2006/11/9, Robin Garner <[EMAIL PROTECTED]>:

Etienne Gagnon wrote:
> Alexey Varlamov wrote:
>> Sorry if it was already discussed, but I believe this approach also
>> requires marking vtable bit/byte on each object allocation, unitl the
>> "unloading" GC pass is strictly stop-the-world full-heap collection.
>> Robin, did you include this particular overhead too in your
>> measurements?

I didn't include it - having established that it's cheap during GC where
memory bandwidth is at a premium, I kind of took this for granted.

> My proposal already argued that vtable bit/byte/word marking is
> unnecessary for "nursery allocations".  You only need to mark the 
vtable

> of objects that survive collection and pretenured objects.

I may have missed it, but I only recall you argued that we just need
to collect mature space for the *final unloading* as CL and classes
are unlikely to die young, which I agree. But chances that a live
object of a candidate class appeared in the nursery are higher.
Otherwise I just do not grok how this algorithm can be proven for 
correctness.

There is definitely some kind of barrier required here.  If no 
references to classes belonging to a c/l exist, but references to one of 
the j.l.classloaders exist, classloader may get marked for collection. 
Objects get created (via reflection, in nursery), references to c/l are 
dropped, classloader unloads.

I believe a barrier in one or more of the reflective methods used to 
create objects from j.l.class/j.l.c/loader references is probably necessary.

Weak references can only be collected at the end of a reachability epoch 
in any case, so I think there may be some stronger guarantees that we 
can use, but I'm too sleepy to thing of them right now :)

And this is a persuasive argument.  But I can probably find time to
measure it tomorrow if you aren't convinced.

That would be very kindly, thank you.

--
Alexey

--
Robin Garner
Dept. of Computer Science
Australian National University

Re: [drlvm] Class unloading support - tested one approach

2006-11-09 Thread Etienne Gagnon

Salikh Zakirov wrote:
> Technically, it should not be too difficult to add an additional field to the 
> VTable
> structure, and require GC to write 1 there during object scanning.
> However, if the VTable mark is located in the same cache line as gcmap,
> it may severely hit parallel GC performance on a multiprocessor due to false 
> sharing,
> as writing VTable mark will invalidate the gcmap pointers loaded to caches of 
> other
> processors. 
> 
>objectVTable   gcmap
>  +++---++--+
>  | VT ptr |--->| gcmap ptr |--->| offset of ref #1 |
>  |  ...   ||...|| offset of ref #2 |
>  +++---+|   ...|
> |0 |
> +--+

If you go that far for every scanned object (!), then you could simply
place the class unloading bit in the gc map, at index -1) to minimize
disruption of current code...

   objectVTable   gcmap
+--+
 +++---+| cl.un. mark bit  |
 | VT ptr |--->| gcmap ptr |--->| offset of ref #1 |
 |  ...   ||...|| offset of ref #2 |
 +++---+|   ...|
|0 |
+--+

This gets rid of the cache line hazard...

Why don't you also investigate using SableVM's bidirectional object
layout?  Dayong Gu (a Ph.D. student tha I co-supervise) found that it
was quite simple to implement in JikesRVM.  I don't see why it should be
harder to implement in drlvm...  It would save you this nasty
indirection for scanning objects!  [See my Ph.D. thesis for an
explanation of the layout.  You can get in touch with Dayong through the
coordinates at http://www.sable.mcgill.ca/people/ ].

Etienne

-- 
Etienne M. Gagnon, Ph.D.http://www.info2.uqam.ca/~egagnon/
SableVM:   http://www.sablevm.org/
SableCC:   http://www.sablecc.org/

signature.asc
Description: OpenPGP digital signature

Re: [drlvm] Class unloading support - tested one approach

2006-11-09 Thread Alexey Varlamov

2006/11/9, Robin Garner <[EMAIL PROTECTED]>:

Etienne Gagnon wrote:
> Alexey Varlamov wrote:
>> Sorry if it was already discussed, but I believe this approach also
>> requires marking vtable bit/byte on each object allocation, unitl the
>> "unloading" GC pass is strictly stop-the-world full-heap collection.
>> Robin, did you include this particular overhead too in your
>> measurements?

I didn't include it - having established that it's cheap during GC where
memory bandwidth is at a premium, I kind of took this for granted.

> My proposal already argued that vtable bit/byte/word marking is
> unnecessary for "nursery allocations".  You only need to mark the vtable
> of objects that survive collection and pretenured objects.

I may have missed it, but I only recall you argued that we just need
to collect mature space for the *final unloading* as CL and classes
are unlikely to die young, which I agree. But chances that a live
object of a candidate class appeared in the nursery are higher.
Otherwise I just do not grok how this algorithm can be proven for correctness.

And this is a persuasive argument.  But I can probably find time to
measure it tomorrow if you aren't convinced.

That would be very kindly, thank you.

--
Alexey

--
Robin Garner
Dept. of Computer Science
Australian National University

Re: [drlvm] Class unloading support - tested one approach

2006-11-09 Thread Salikh Zakirov

Robin Garner wrote:
> Etienne Gagnon wrote:
>> 3- Why would it be so hard to add an unconditional write operation
>> during collection (e.g. during copying or marking of an object) in
>> drlvm?  A detailed technical explanation is welcome. :-)
> 
> I actually believe that this should be implementable in a GC-neutral
> way, whether vtables are objects or not.  The GC will at some point ask
> the VM for the GC map of the object it is about to scan.  At this point
> the VM can write the mark of the vtable.
> 
> I guess I'm making an assumption about the GC -> VM interface here, but
> if it doesn't exist, it should :)

In the current GC-VM interface, which is used in DRLVM
(see vm/include/open/gc.h and vm/include/open/vm_gc.h),
the GC never asks VM about gcmap; instead, it is building a gcmap
itself as one of the class loading steps. VM calls gc_class_prepared()
for each loaded class, and GC uses various query functions to learn
about types and offsets of object fields.

The gcmap pointer is stored in the VTable, in the several bytes reserved 
specifically
for the GC use.

Technically, it should not be too difficult to add an additional field to the 
VTable
structure, and require GC to write 1 there during object scanning.
However, if the VTable mark is located in the same cache line as gcmap,
it may severely hit parallel GC performance on a multiprocessor due to false 
sharing,
as writing VTable mark will invalidate the gcmap pointers loaded to caches of 
other
processors. 

   objectVTable   gcmap
 +++---++--+
 | VT ptr |--->| gcmap ptr |--->| offset of ref #1 |
 |  ...   ||...|| offset of ref #2 |
 +++---+|   ...|
|0 |
+--+

(* actually, in the current default collector "gc_cc",
 gcmap ptr also has some flags in lower 3 bits, and gcmap has some fields
before offsets array as well *)

That's why we probably would want to have the VTable mark be separated enough
from both gcmap pointer and the gcmap itself. 

>> By the way, what are the currently competing proposals?
>> 1- object vtables
>> 2- Robin/Gagnon proposal  (still finishing up some details ;-)
>> 3- Is there a 3rd?

yes, as far as I heard from Aleksey Ignatenko, there was 3rd prototype in works,
which worked as a completely independent from the GC stop-the-world phase,
tracing the heap and marking classes and classloaders specially.
The tracing functionality was reimplemented within VM without any GC changes.
The stop-the-world phase was piggy-backed into some collections.

And yet before the 3rd prototype, there was one more, which was different
in the tracing implementation. It used GC->VM callback on each object scan.

Re: [drlvm] Class unloading support - tested one approach

2006-11-09 Thread Robin Garner


Etienne Gagnon wrote:

Alexey Varlamov wrote:

Sorry if it was already discussed, but I believe this approach also
requires marking vtable bit/byte on each object allocation, unitl the
"unloading" GC pass is strictly stop-the-world full-heap collection.
Robin, did you include this particular overhead too in your
measurements?


I didn't include it - having established that it's cheap during GC where 
memory bandwidth is at a premium, I kind of took this for granted.



My proposal already argued that vtable bit/byte/word marking is
unnecessary for "nursery allocations".  You only need to mark the vtable
of objects that survive collection and pretenured objects.


And this is a persuasive argument.  But I can probably find time to 
measure it tomorrow if you aren't convinced.


--
Robin Garner
Dept. of Computer Science
Australian National University

Re: [drlvm] Class unloading support - tested one approach

2006-11-09 Thread Etienne Gagnon

Alexey Varlamov wrote:
> Sorry if it was already discussed, but I believe this approach also
> requires marking vtable bit/byte on each object allocation, unitl the
> "unloading" GC pass is strictly stop-the-world full-heap collection.
> Robin, did you include this particular overhead too in your
> measurements?

My proposal already argued that vtable bit/byte/word marking is
unnecessary for "nursery allocations".  You only need to mark the vtable
of objects that survive collection and pretenured objects.

Etienne

-- 
Etienne M. Gagnon, Ph.D.http://www.info2.uqam.ca/~egagnon/
SableVM:   http://www.sablevm.org/
SableCC:   http://www.sablecc.org/


signature.asc
Description: OpenPGP digital signature

Re: [drlvm] Class unloading support - tested one approach

2006-11-09 Thread Alexey Varlamov


[snip]

> > My proposal has been measured at ~1% overhead in GC time, or 0.1% in
> > execution time (caveats as above).  If there is some complexity in
> > establishing classloader reachability from this basis, I would assume it
> > can easliy be absorbed.


Sorry if it was already discussed, but I believe this approach also
requires marking vtable bit/byte on each object allocation, unitl the
"unloading" GC pass is strictly stop-the-world full-heap collection.
Robin, did you include this particular overhead too in your
measurements?

--
Regards,
Alexey

Re: [drlvm] Class unloading support - tested one approach

2006-11-08 Thread Alexey Varlamov


Uhm, Etienne overtook me with earlier posts.
Seems we are beginning to converge with design.

2006/11/9, Alexey Varlamov <[EMAIL PROTECTED]>:

2006/11/8, Robin Garner <[EMAIL PROTECTED]>:
> Robin Garner wrote:
> > Aleksey Ignatenko wrote:
> >> Robin.
> >>
> >>> OK, well how about keeping a weak reference to the >j.l.ClassLoader
> >>> object instead of a strong one.  When the reference >becomes (strong)ly
> >>> unreachable, invoke the class-unloading phase.
> >>
> >>
> >> If you have weak reference to j.l.Classloader - GC will collect it
> >> (with all
> >> appropriate jlClasses) as soon as there are no references to
> >> j.l.Classloaderand appropriate classes. But there is possible
> >> situation when there are some
> >> live objects of that classes and no references to jlClassloader and
> >> jlClasses. This will lead to unpredictable consequences (crash, etc).
> >>
> >>
> >>
> >> I want to remind that there 3 mandatory conditions of class unloading:
> >>
> >> 1. j.l.Classloader instance is unreachable.
> >>
> >> 2. Appropriate j.l.Class instances are unreachable.
> >>
> >> 3. No object of any class loaded by appropriate class loader exists.
> >
> > Let me repeat.  I offer an efficient solution to (3).  I don't purport
> > to have a solution to (1) and (2).
>
> Let me just add:  This is because I don't think (1) or (2) are
> particularly difficult from a performance point of view, although I'm
> happy to accept that there may still be some subtle engineering challenges.

Robin,

While your idea to (3) looks brilliant and quite convincing, it only
covers part of the whole mission. We really need to derive complete
design solution (like Etienne did), and I feel the voting started in
the neighbor thread is a bit premature.
Some of considerations below are beyond of my understanding, could you
please clarify them (inlined)?

And yet, it would be nice to have a confirmation that the notion of
"epoch of full-heap-collection" does not imply strict limitations on
GC algorithms. Maybe this is something obvious for people with more
decent GC background than me?

>
> Now this is just off the top of my head, but what about this for a design:
> - A j.l.ClassLoader maintains a collection of each of the classes it has
> loaded
> - A j.l.Class contains a pointer to its j.l.ClassLoader
> - A j.l.Class maintains a collection of its vtable(s) (or a pointer if 1:1).
> The point of this is that a class loader and its classes are a 'self
> sustaining' data structure - if one element in it is reachable the whole
> thing is reachable.
Right. The special case is for system classes which are always in VM
root set so never reclaimed.

> The VM maintains a weak reference to all its j.l.ClassLoader instances,
> and maintains a ReferenceQueue for weakly-reachable classloaders.
> ClassLoaders are placed on the ReferenceQueue if and only if they are
> unreachable from the heap (including via their j.l.Class objects).
Here: should it actually read as "WeakReference instances for
weakly-reachable classloaders are placed on the ReferenceQueue"?
Otherwise this sentence completely escapes my mind, sorry.
If the former, when how VM could obtain&rescue referent CL objects (+
it's j.l.Class instances) after GC pass - AFAIU references are cleared
automatically before enqueuing? I suppose we are not going to
introduce inter-phase communication between VM and GC...

> Note this is an irreversible condition: objects that are unreachable can
> never become reachable again, except through very specific methods.
>
> When it sweeps the ReferenceQueue for unreachable classloaders, the VM
> places the unreachable classloaders in a queue of classloaders that are
> candidates for unloading.  This queue is part of the root set of the VM.
Strongly referenced now I suppose.

>  A classloader in this queue is unreachable from the heap, and can be
> unloaded when there are no objects of any class it has loaded.
So if the VM decides it is time to try unloading, it should:
1) Check if the full epoch has passed;
2) for each unloadable CL, scan corresponding vtables;
3) if none of the vtables were marked reachable, drop the CL from root
set completely and clean corresponding native structures; Java
instances will be reclaimed at nearest GC iteration;
4) Reset "epoch marker" and vtable words.

Do I get it right?


>
> This is where my mechanism comes into play.
>
> If an object executes getClass() then its classloader is removed from
> the unloadable classloader queue, its weak reference gets recreated  and
> we're back at the initial state.  My guess is that this is a pretty
> infrequent method call.
>
> I think this stage of the algorithm is easy in performance terms -
> difficult in terms of proving correctness, but if you have an efficient
> reachability mechanism for classes I think the building blocks are
> there, and the subtleties are nothing that a talented engineer can't solve.

Yes, a bit complicated. Taking into account the issues with
ReferenceQueue abo

Re: [drlvm] Class unloading support - tested one approach

2006-11-08 Thread Alexey Varlamov

2006/11/8, Robin Garner <[EMAIL PROTECTED]>:

Robin Garner wrote:
> Aleksey Ignatenko wrote:
>> Robin.
>>
>>> OK, well how about keeping a weak reference to the >j.l.ClassLoader
>>> object instead of a strong one.  When the reference >becomes (strong)ly
>>> unreachable, invoke the class-unloading phase.
>>
>>
>> If you have weak reference to j.l.Classloader - GC will collect it
>> (with all
>> appropriate jlClasses) as soon as there are no references to
>> j.l.Classloaderand appropriate classes. But there is possible
>> situation when there are some
>> live objects of that classes and no references to jlClassloader and
>> jlClasses. This will lead to unpredictable consequences (crash, etc).
>>
>>
>>
>> I want to remind that there 3 mandatory conditions of class unloading:
>>
>> 1. j.l.Classloader instance is unreachable.
>>
>> 2. Appropriate j.l.Class instances are unreachable.
>>
>> 3. No object of any class loaded by appropriate class loader exists.
>
> Let me repeat.  I offer an efficient solution to (3).  I don't purport
> to have a solution to (1) and (2).

Let me just add:  This is because I don't think (1) or (2) are
particularly difficult from a performance point of view, although I'm
happy to accept that there may still be some subtle engineering challenges.

Robin,

While your idea to (3) looks brilliant and quite convincing, it only
covers part of the whole mission. We really need to derive complete
design solution (like Etienne did), and I feel the voting started in
the neighbor thread is a bit premature.
Some of considerations below are beyond of my understanding, could you
please clarify them (inlined)?

And yet, it would be nice to have a confirmation that the notion of
"epoch of full-heap-collection" does not imply strict limitations on
GC algorithms. Maybe this is something obvious for people with more
decent GC background than me?

Now this is just off the top of my head, but what about this for a design:
- A j.l.ClassLoader maintains a collection of each of the classes it has
loaded
- A j.l.Class contains a pointer to its j.l.ClassLoader
- A j.l.Class maintains a collection of its vtable(s) (or a pointer if 1:1).
The point of this is that a class loader and its classes are a 'self
sustaining' data structure - if one element in it is reachable the whole
thing is reachable.

Right. The special case is for system classes which are always in VM
root set so never reclaimed.

The VM maintains a weak reference to all its j.l.ClassLoader instances,
and maintains a ReferenceQueue for weakly-reachable classloaders.
ClassLoaders are placed on the ReferenceQueue if and only if they are
unreachable from the heap (including via their j.l.Class objects).

Here: should it actually read as "WeakReference instances for
weakly-reachable classloaders are placed on the ReferenceQueue"?
Otherwise this sentence completely escapes my mind, sorry.
If the former, when how VM could obtain&rescue referent CL objects (+
it's j.l.Class instances) after GC pass - AFAIU references are cleared
automatically before enqueuing? I suppose we are not going to
introduce inter-phase communication between VM and GC...

Note this is an irreversible condition: objects that are unreachable can
never become reachable again, except through very specific methods.

When it sweeps the ReferenceQueue for unreachable classloaders, the VM
places the unreachable classloaders in a queue of classloaders that are
candidates for unloading.  This queue is part of the root set of the VM.

Strongly referenced now I suppose.

 A classloader in this queue is unreachable from the heap, and can be
unloaded when there are no objects of any class it has loaded.

So if the VM decides it is time to try unloading, it should:
1) Check if the full epoch has passed;
2) for each unloadable CL, scan corresponding vtables;
3) if none of the vtables were marked reachable, drop the CL from root
set completely and clean corresponding native structures; Java
instances will be reclaimed at nearest GC iteration;
4) Reset "epoch marker" and vtable words.

Do I get it right?

This is where my mechanism comes into play.

If an object executes getClass() then its classloader is removed from
the unloadable classloader queue, its weak reference gets recreated  and
we're back at the initial state.  My guess is that this is a pretty
infrequent method call.

I think this stage of the algorithm is easy in performance terms -
difficult in terms of proving correctness, but if you have an efficient
reachability mechanism for classes I think the building blocks are
there, and the subtleties are nothing that a talented engineer can't solve.

Yes, a bit complicated. Taking into account the issues with
ReferenceQueue above, I'd rather suggest the following:

1) The j.l.Class and defining CL have mutual strong references, as said above.
2) Normally, the VM reports all CLs as strong roots thus preserving
them from premature reclamation;
3) When the VM decides (by whatever he

Re: [drlvm] Class unloading support - tested one approach

2006-11-08 Thread Robin Garner


Etienne Gagnon wrote:

3- Why would it be so hard to add an unconditional write operation
during collection (e.g. during copying or marking of an object) in
drlvm?  A detailed technical explanation is welcome. :-)


I actually believe that this should be implementable in a GC-neutral 
way, whether vtables are objects or not.  The GC will at some point ask 
the VM for the GC map of the object it is about to scan.  At this point 
the VM can write the mark of the vtable.


I guess I'm making an assumption about the GC -> VM interface here, but 
if it doesn't exist, it should :)



  So far, this latest point (3-) seems the sole argument in favor of
  using the "object-vtables" approach.  Wouldn't the right fix, if
  it's currently really impossible to implement an unconditional
  write, be to extend the drlvm GC interface?  Isn't this a design
  deficiency in the GC interface?  No other argument, so far, seems
  to be in favor of the object vtable approach, unless I missed some.


As for Robin's attempt to deal with weak/hard reference to the class
loader object using a Reference Queue, I am not yet convinced of the
correctness of the approach...  [off the top of my head: potential
problem with static synchronized methods].  And, this would be probably
more work intensive (so, less efficient) than my proposal in 1-[*]
above.  It might also be tricky to identify all possible situations such
as Object.getClass() where some special code is needed to deal with a
problem situation.  I prefer clean, scope-limited code for dealing with
class unloading.  It's easier, that way, to activate or deactivate class
loading [dynamically!].


Glad to see I'm not the only one :)  It really was just off the top of 
my head.



In summary, I would like to be convinced of the "completeness" and of
the "correctness" of all competing approaches.  I personally am, so far,
in favor of Robin's unconditional vtable bit/byte/word write idea, along
with an "adapted" version of my proposal for dealing with class loader
death (such as proposed in 1-[*] above).

Also, if somebody was able to find a "correctness" deficiency in my
proposal, then please let us know, so that we make sure this deficiency
is eliminated from all competing proposals.

By the way, what are the currently competing proposals?
1- object vtables
2- Robin/Gagnon proposal  (still finishing up some details ;-)
3- Is there a 3rd?

Which ones have existing implementations?  How "correct/complete" are
they?  Do we have access to some "human readable" (i.e. non-code) full
description of the algorithm?


And what is their performance hit ?


Etienne

Robin wrote:

On Thu, 2006-11-09 at 02:01 +0300, Ivan Volosyuk wrote:


Robin,

thank you for detailed description of the algorithm. IMHO, this was
the most complicated place of the whole story: how to have a weak
reference to classloader and still be able to get it alive again. This
shouldn't be performance critical part and is quite doable. I
absolutely agree with your estimations about tracing extra reference
per object. The approach you propose is more efficient and quite
elegant.
--
Ivan


Thanks :)



On 11/8/06, Robin Garner <[EMAIL PROTECTED]> wrote:


Robin Garner wrote:


Aleksey Ignatenko wrote:


Robin.



OK, well how about keeping a weak reference to the >j.l.ClassLoader
object instead of a strong one.  When the reference >becomes (strong)ly
unreachable, invoke the class-unloading phase.


If you have weak reference to j.l.Classloader - GC will collect it
(with all
appropriate jlClasses) as soon as there are no references to
j.l.Classloaderand appropriate classes. But there is possible
situation when there are some
live objects of that classes and no references to jlClassloader and
jlClasses. This will lead to unpredictable consequences (crash, etc).



I want to remind that there 3 mandatory conditions of class unloading:

1. j.l.Classloader instance is unreachable.

2. Appropriate j.l.Class instances are unreachable.

3. No object of any class loaded by appropriate class loader exists.

Let me repeat.  I offer an efficient solution to (3).  I don't purport
to have a solution to (1) and (2).

Let me just add:  This is because I don't think (1) or (2) are
particularly difficult from a performance point of view, although I'm
happy to accept that there may still be some subtle engineering challenges.

Now this is just off the top of my head, but what about this for a design:
- A j.l.ClassLoader maintains a collection of each of the classes it has
loaded
- A j.l.Class contains a pointer to its j.l.ClassLoader
- A j.l.Class maintains a collection of its vtable(s) (or a pointer if 1:1).
The point of this is that a class loader and its classes are a 'self
sustaining' data structure - if one element in it is reachable the whole
thing is reachable.

The VM maintains a weak reference to all its j.l.ClassLoader instances,
and maintains a ReferenceQueue for weakly-reachable classloaders.
ClassLoaders are placed on the Reference

Re: [drlvm] Class unloading support - tested one approach

2006-11-08 Thread Robin Garner


Etienne Gagnon wrote:

I was making it more complex than it needs...

Here's an improvement...

1- During normal operation, the VM keeps hard references to all class
loader instances.  [This prevents any premature class loader death].

2- At the start of an epoch (or just before), all vtable bits (or byte
or word) are cleared. [From now on, I will use the "bit" terminology for
simplicity.  The bit may reside in an otherwise unused byte or even
word, for efficiency purpose].

3- The end of an epoch happens "no sooner" than when all generations /
heap parts have been collected at least once since the epoch start.
[One can cheat and visit objects of uncollected parts/generations to
mark their vtables].

4- An "old generation" collection is chosen as the end of an epoch.
This is "end of epoch collection".  [As class loaders/classes are likely
to have moved to older generations, there's no point trying to kill them
in young collections].


In fact classes and clasloaders would be perfect targets for pretenuring.


5- Just before starting the "end of epoch collection", all the
class-loader vtable lists are visited (and bits are cleared in prevision
of the next epoch).  All vm references to [candidate] class loaders with
no surviving objects (nor active methods) (e.g. no vtable bit set) are
made "weak".

6- The "end of epoch collection" is launched.

7- There's actually no need for "rescuing" class loaders.  The vm
reference to any surviving [candidate] class loader is made hard again.
   Interesting fact: other candidate class loaders cannot have any
instance (nor any active method) as GC doesn't create instances nor
method calls.  So, there's no need for a rescuing dance!  The list of
dying class loaders can be used for freeing related native resources.

IMO: simple, clean, efficient...


It has the downside of being inherently 'stop the world', though.  I 
don't see this as being a big disadvantage, because it shouldn't be hard 
(compared to the work of building a concurrent collector in the first 
place) to extend to a concurrent class-unloader.



Etienne

Etienne Gagnon wrote:

 If it does, then can somebody explain to me what's wrong with
 my proposal of setting, in normal operation, a "hard" reference to
 class loader objects, and then temporarily using weak, rescuable
 reference to enable class loader collection?  I don't see a performance
 hog there.  Rescuing a few class loaders (if any) and their related
 classes once per epoch shouldn't cost much!  I have yet to see a
 convincing explanation of how continuous collection of "object-vtables"
 would be more efficient...

 Really, even with Robin's proposal, this would work.  If a class loader
 gets into an unloadable state, then most likely, the class loader and
 all classes it has loaded will have migrated to an old generation.  So,
 as long as we set then end of a class unloading epoch at an old
 generation collection, then we can simply "weaken" the class loader
 reference during collection (only when the bit of all related vtables
 are unset), then apply the finalization-like rescue dance to class
 loaders.

 [*]This wouldn't affect any other operation, during other GC cycles, as
 Robin's unconditional bit/byte/word vtable write only serves to tell us
 whether a class loader has had living instances of its classes during
 the epoch.





--
Robin Garner
Dept. of Computer Science
Australian National University

Re: [drlvm] Class unloading support - tested one approach

2006-11-08 Thread Etienne Gagnon

Note:  For preventing collection of class loaders related to active
method frames, there are various solutions.  One could simply walk all
method frame stacks just before the end of epoch collection (my
preferred approach) and mark the bit of related vtables.  Another
approach would be to add an unconditional write on every method call
(that would be a big tax to pay!).  I'll let you imagine all the
variations on that theme. :-)

Etienne

Etienne Gagnon wrote:
> I was making it more complex than it needs...
> 
> Here's an improvement...
> 

-- 
Etienne M. Gagnon, Ph.D.http://www.info2.uqam.ca/~egagnon/
SableVM:   http://www.sablevm.org/
SableCC:   http://www.sablecc.org/

signature.asc
Description: OpenPGP digital signature

Re: [drlvm] Class unloading support - tested one approach

2006-11-08 Thread Etienne Gagnon

I was making it more complex than it needs...

Here's an improvement...

1- During normal operation, the VM keeps hard references to all class
loader instances.  [This prevents any premature class loader death].

2- At the start of an epoch (or just before), all vtable bits (or byte
or word) are cleared. [From now on, I will use the "bit" terminology for
simplicity.  The bit may reside in an otherwise unused byte or even
word, for efficiency purpose].

3- The end of an epoch happens "no sooner" than when all generations /
heap parts have been collected at least once since the epoch start.
[One can cheat and visit objects of uncollected parts/generations to
mark their vtables].

4- An "old generation" collection is chosen as the end of an epoch.
This is "end of epoch collection".  [As class loaders/classes are likely
to have moved to older generations, there's no point trying to kill them
in young collections].

5- Just before starting the "end of epoch collection", all the
class-loader vtable lists are visited (and bits are cleared in prevision
of the next epoch).  All vm references to [candidate] class loaders with
no surviving objects (nor active methods) (e.g. no vtable bit set) are
made "weak".

6- The "end of epoch collection" is launched.

7- There's actually no need for "rescuing" class loaders.  The vm
reference to any surviving [candidate] class loader is made hard again.
   Interesting fact: other candidate class loaders cannot have any
instance (nor any active method) as GC doesn't create instances nor
method calls.  So, there's no need for a rescuing dance!  The list of
dying class loaders can be used for freeing related native resources.

IMO: simple, clean, efficient...

Etienne

Etienne Gagnon wrote:
>  If it does, then can somebody explain to me what's wrong with
>  my proposal of setting, in normal operation, a "hard" reference to
>  class loader objects, and then temporarily using weak, rescuable
>  reference to enable class loader collection?  I don't see a performance
>  hog there.  Rescuing a few class loaders (if any) and their related
>  classes once per epoch shouldn't cost much!  I have yet to see a
>  convincing explanation of how continuous collection of "object-vtables"
>  would be more efficient...
> 
>  Really, even with Robin's proposal, this would work.  If a class loader
>  gets into an unloadable state, then most likely, the class loader and
>  all classes it has loaded will have migrated to an old generation.  So,
>  as long as we set then end of a class unloading epoch at an old
>  generation collection, then we can simply "weaken" the class loader
>  reference during collection (only when the bit of all related vtables
>  are unset), then apply the finalization-like rescue dance to class
>  loaders.
> 
>  [*]This wouldn't affect any other operation, during other GC cycles, as
>  Robin's unconditional bit/byte/word vtable write only serves to tell us
>  whether a class loader has had living instances of its classes during
>  the epoch.

-- 
Etienne M. Gagnon, Ph.D.http://www.info2.uqam.ca/~egagnon/
SableVM:   http://www.sablevm.org/
SableCC:   http://www.sablecc.org/

signature.asc
Description: OpenPGP digital signature

Re: [drlvm] Class unloading support - tested one approach

2006-11-08 Thread Etienne Gagnon

[First, let me say that, as I am not contributing a class unloading
*implementation* to drlvm, I will understand if the project was more
inclined to chose an actually contributed piece of code over a design
without contributed implementation. :-)]

There was a "-1" vote...  Hmmm...  As I voted "+1", I will enter the
arena... :-)

Before getting farther in this class unloading discussion, I would like
to get some clarifications about 3 things:

1- How does drlvm implement object finalization?  Doesn't this require
some "object" rescuing, much similarly to my proposal for correctly
implementing class unloading (the class loader rescuing thing)?

 If it does, then can somebody explain to me what's wrong with
 my proposal of setting, in normal operation, a "hard" reference to
 class loader objects, and then temporarily using weak, rescuable
 reference to enable class loader collection?  I don't see a performance
 hog there.  Rescuing a few class loaders (if any) and their related
 classes once per epoch shouldn't cost much!  I have yet to see a
 convincing explanation of how continuous collection of "object-vtables"
 would be more efficient...

 Really, even with Robin's proposal, this would work.  If a class loader
 gets into an unloadable state, then most likely, the class loader and
 all classes it has loaded will have migrated to an old generation.  So,
 as long as we set then end of a class unloading epoch at an old
 generation collection, then we can simply "weaken" the class loader
 reference during collection (only when the bit of all related vtables
 are unset), then apply the finalization-like rescue dance to class
 loaders.

 [*]This wouldn't affect any other operation, during other GC cycles, as
 Robin's unconditional bit/byte/word vtable write only serves to tell us
 whether a class loader has had living instances of its classes during
 the epoch.

2- I would like to read some "full" description of the object-vtable
proposal.  In particular, how does this proposal deal with the following:
 a) Preventing unloading of a class which has an active static method.
 b) Preventing unloading of a class which has an unsynchronized instance
method active, which has overwritten local variable 0 (or for which the
liveness analysis has detected the death of the reference to "this").

3- Why would it be so hard to add an unconditional write operation
during collection (e.g. during copying or marking of an object) in
drlvm?  A detailed technical explanation is welcome. :-)

  So far, this latest point (3-) seems the sole argument in favor of
  using the "object-vtables" approach.  Wouldn't the right fix, if
  it's currently really impossible to implement an unconditional
  write, be to extend the drlvm GC interface?  Isn't this a design
  deficiency in the GC interface?  No other argument, so far, seems
  to be in favor of the object vtable approach, unless I missed some.

As for Robin's attempt to deal with weak/hard reference to the class
loader object using a Reference Queue, I am not yet convinced of the
correctness of the approach...  [off the top of my head: potential
problem with static synchronized methods].  And, this would be probably
more work intensive (so, less efficient) than my proposal in 1-[*]
above.  It might also be tricky to identify all possible situations such
as Object.getClass() where some special code is needed to deal with a
problem situation.  I prefer clean, scope-limited code for dealing with
class unloading.  It's easier, that way, to activate or deactivate class
loading [dynamically!].

In summary, I would like to be convinced of the "completeness" and of
the "correctness" of all competing approaches.  I personally am, so far,
in favor of Robin's unconditional vtable bit/byte/word write idea, along
with an "adapted" version of my proposal for dealing with class loader
death (such as proposed in 1-[*] above).

Also, if somebody was able to find a "correctness" deficiency in my
proposal, then please let us know, so that we make sure this deficiency
is eliminated from all competing proposals.

By the way, what are the currently competing proposals?
1- object vtables
2- Robin/Gagnon proposal  (still finishing up some details ;-)
3- Is there a 3rd?

Which ones have existing implementations?  How "correct/complete" are
they?  Do we have access to some "human readable" (i.e. non-code) full
description of the algorithm?

Etienne

Robin wrote:
> On Thu, 2006-11-09 at 02:01 +0300, Ivan Volosyuk wrote:
> 
>>Robin,
>>
>>thank you for detailed description of the algorithm. IMHO, this was
>>the most complicated place of the whole story: how to have a weak
>>reference to classloader and still be able to get it alive again. This
>>shouldn't be performance critical part and is quite doable. I
>>absolutely agree with your estimations about tracing extra reference
>>per object. The approach you propose is more efficient and quite
>>elegant.
>>--
>>Ivan
> 
> 
> Thanks :)
> 
> 
>>On 11/8/06, Robin Ga

Re: [drlvm] Class unloading support - tested one approach

2006-11-08 Thread Robin

On Thu, 2006-11-09 at 02:01 +0300, Ivan Volosyuk wrote:
> Robin,
> 
> thank you for detailed description of the algorithm. IMHO, this was
> the most complicated place of the whole story: how to have a weak
> reference to classloader and still be able to get it alive again. This
> shouldn't be performance critical part and is quite doable. I
> absolutely agree with your estimations about tracing extra reference
> per object. The approach you propose is more efficient and quite
> elegant.
> --
> Ivan

Thanks :)

> On 11/8/06, Robin Garner <[EMAIL PROTECTED]> wrote:
> > Robin Garner wrote:
> > > Aleksey Ignatenko wrote:
> > >> Robin.
> > >>
> > >>> OK, well how about keeping a weak reference to the >j.l.ClassLoader
> > >>> object instead of a strong one.  When the reference >becomes (strong)ly
> > >>> unreachable, invoke the class-unloading phase.
> > >>
> > >>
> > >> If you have weak reference to j.l.Classloader - GC will collect it
> > >> (with all
> > >> appropriate jlClasses) as soon as there are no references to
> > >> j.l.Classloaderand appropriate classes. But there is possible
> > >> situation when there are some
> > >> live objects of that classes and no references to jlClassloader and
> > >> jlClasses. This will lead to unpredictable consequences (crash, etc).
> > >>
> > >>
> > >>
> > >> I want to remind that there 3 mandatory conditions of class unloading:
> > >>
> > >> 1. j.l.Classloader instance is unreachable.
> > >>
> > >> 2. Appropriate j.l.Class instances are unreachable.
> > >>
> > >> 3. No object of any class loaded by appropriate class loader exists.
> > >
> > > Let me repeat.  I offer an efficient solution to (3).  I don't purport
> > > to have a solution to (1) and (2).
> >
> > Let me just add:  This is because I don't think (1) or (2) are
> > particularly difficult from a performance point of view, although I'm
> > happy to accept that there may still be some subtle engineering challenges.
> >
> > Now this is just off the top of my head, but what about this for a design:
> > - A j.l.ClassLoader maintains a collection of each of the classes it has
> > loaded
> > - A j.l.Class contains a pointer to its j.l.ClassLoader
> > - A j.l.Class maintains a collection of its vtable(s) (or a pointer if 1:1).
> > The point of this is that a class loader and its classes are a 'self
> > sustaining' data structure - if one element in it is reachable the whole
> > thing is reachable.
> >
> > The VM maintains a weak reference to all its j.l.ClassLoader instances,
> > and maintains a ReferenceQueue for weakly-reachable classloaders.
> > ClassLoaders are placed on the ReferenceQueue if and only if they are
> > unreachable from the heap (including via their j.l.Class objects).  Note
> > this is an irreversible condition: objects that are unreachable can
> > never become reachable again, except through very specific methods.
> >
> > When it sweeps the ReferenceQueue for unreachable classloaders, the VM
> > places the unreachable classloaders in a queue of classloaders that are
> > candidates for unloading.  This queue is part of the root set of the VM.
> >   A classloader in this queue is unreachable from the heap, and can be
> > unloaded when there are no objects of any class it has loaded.
> >
> > This is where my mechanism comes into play.
> >
> > If an object executes getClass() then its classloader is removed from
> > the unloadable classloader queue, its weak reference gets recreated  and
> > we're back at the initial state.  My guess is that this is a pretty
> > infrequent method call.
> >
> > I think this stage of the algorithm is easy in performance terms -
> > difficult in terms of proving correctness, but if you have an efficient
> > reachability mechanism for classes I think the building blocks are
> > there, and the subtleties are nothing that a talented engineer can't solve.
> >
> >
> > I'm not 100% sure what your counter-proposal is: I recall 2 approaches
> > from the mailing list:
> > 1) Each object has an additional word in its header that points back to
> > its j.l.Class object, and we proceed from here.
> >
> > Given that the mean object size is ~28 bytes, this proposal adds 14% to
> > each object size.  This increases the frequency of GC by 14% and incurs
> > a 14% slowdown.  Of course this is an oversimplification but a 14%
> > slowdown is a pretty lousy starting point to argue from.
> >
> > 2) The existing pointer in the GC header is traced during GC time.
> >
> > The average number of pointers per object (excluding the vtable) is
> > between 1.5 and 2 for the majority of benchmarks I have looked at
> > (footnote: if you know something different, drop me a line) (geometric
> > mean 1.78 for {specJVM, pseudoJBB and DaCapo 20051009}).  Tracing one
> > additional reference per object will therefore increase the cost of GC
> > by ~60% on average.  Again oversimplification but indicative.  If we
> > assume that GC accounts for 10% of runtime (more or less depending on
> > heap size), this

Re: [drlvm] Class unloading support - tested one approach

2006-11-08 Thread Ivan Volosyuk

Robin,

thank you for detailed description of the algorithm. IMHO, this was
the most complicated place of the whole story: how to have a weak
reference to classloader and still be able to get it alive again. This
shouldn't be performance critical part and is quite doable. I
absolutely agree with your estimations about tracing extra reference
per object. The approach you propose is more efficient and quite
elegant.
--
Ivan

On 11/8/06, Robin Garner <[EMAIL PROTECTED]> wrote:

Robin Garner wrote:
> Aleksey Ignatenko wrote:
>> Robin.
>>
>>> OK, well how about keeping a weak reference to the >j.l.ClassLoader
>>> object instead of a strong one.  When the reference >becomes (strong)ly
>>> unreachable, invoke the class-unloading phase.
>>
>>
>> If you have weak reference to j.l.Classloader - GC will collect it
>> (with all
>> appropriate jlClasses) as soon as there are no references to
>> j.l.Classloaderand appropriate classes. But there is possible
>> situation when there are some
>> live objects of that classes and no references to jlClassloader and
>> jlClasses. This will lead to unpredictable consequences (crash, etc).
>>
>>
>>
>> I want to remind that there 3 mandatory conditions of class unloading:
>>
>> 1. j.l.Classloader instance is unreachable.
>>
>> 2. Appropriate j.l.Class instances are unreachable.
>>
>> 3. No object of any class loaded by appropriate class loader exists.
>
> Let me repeat.  I offer an efficient solution to (3).  I don't purport
> to have a solution to (1) and (2).

Let me just add:  This is because I don't think (1) or (2) are
particularly difficult from a performance point of view, although I'm
happy to accept that there may still be some subtle engineering challenges.

Now this is just off the top of my head, but what about this for a design:
- A j.l.ClassLoader maintains a collection of each of the classes it has
loaded
- A j.l.Class contains a pointer to its j.l.ClassLoader
- A j.l.Class maintains a collection of its vtable(s) (or a pointer if 1:1).
The point of this is that a class loader and its classes are a 'self
sustaining' data structure - if one element in it is reachable the whole
thing is reachable.

The VM maintains a weak reference to all its j.l.ClassLoader instances,
and maintains a ReferenceQueue for weakly-reachable classloaders.
ClassLoaders are placed on the ReferenceQueue if and only if they are
unreachable from the heap (including via their j.l.Class objects).  Note
this is an irreversible condition: objects that are unreachable can
never become reachable again, except through very specific methods.

When it sweeps the ReferenceQueue for unreachable classloaders, the VM
places the unreachable classloaders in a queue of classloaders that are
candidates for unloading.  This queue is part of the root set of the VM.
  A classloader in this queue is unreachable from the heap, and can be
unloaded when there are no objects of any class it has loaded.

This is where my mechanism comes into play.

If an object executes getClass() then its classloader is removed from
the unloadable classloader queue, its weak reference gets recreated  and
we're back at the initial state.  My guess is that this is a pretty
infrequent method call.

I think this stage of the algorithm is easy in performance terms -
difficult in terms of proving correctness, but if you have an efficient
reachability mechanism for classes I think the building blocks are
there, and the subtleties are nothing that a talented engineer can't solve.

I'm not 100% sure what your counter-proposal is: I recall 2 approaches
from the mailing list:
1) Each object has an additional word in its header that points back to
its j.l.Class object, and we proceed from here.

Given that the mean object size is ~28 bytes, this proposal adds 14% to
each object size.  This increases the frequency of GC by 14% and incurs
a 14% slowdown.  Of course this is an oversimplification but a 14%
slowdown is a pretty lousy starting point to argue from.

2) The existing pointer in the GC header is traced during GC time.

The average number of pointers per object (excluding the vtable) is
between 1.5 and 2 for the majority of benchmarks I have looked at
(footnote: if you know something different, drop me a line) (geometric
mean 1.78 for {specJVM, pseudoJBB and DaCapo 20051009}).  Tracing one
additional reference per object will therefore increase the cost of GC
by ~60% on average.  Again oversimplification but indicative.  If we
assume that GC accounts for 10% of runtime (more or less depending on
heap size), this is a runtime overhead of 6%.

My proposal has been measured at ~1% overhead in GC time, or 0.1% in
execution time (caveats as above).  If there is some complexity in
establishing classloader reachability from this basis, I would assume it
can easliy be absorbed.

Therefore I think my proposal, while not complete, can form the basis of
an efficient complete system for class unloading.

(PS: I'd *love* to be proven wrong)

cheers,

Re: [drlvm] Class unloading support - tested one approach

2006-11-08 Thread Robin Garner

Robin Garner wrote:

Aleksey Ignatenko wrote:

Robin.

OK, well how about keeping a weak reference to the >j.l.ClassLoader
object instead of a strong one.  When the reference >becomes (strong)ly
unreachable, invoke the class-unloading phase.

If you have weak reference to j.l.Classloader - GC will collect it 
(with all

appropriate jlClasses) as soon as there are no references to
j.l.Classloaderand appropriate classes. But there is possible
situation when there are some
live objects of that classes and no references to jlClassloader and
jlClasses. This will lead to unpredictable consequences (crash, etc).

I want to remind that there 3 mandatory conditions of class unloading:

1. j.l.Classloader instance is unreachable.

2. Appropriate j.l.Class instances are unreachable.

3. No object of any class loaded by appropriate class loader exists.

Let me repeat.  I offer an efficient solution to (3).  I don't purport 
to have a solution to (1) and (2).

Let me just add:  This is because I don't think (1) or (2) are 
particularly difficult from a performance point of view, although I'm 
happy to accept that there may still be some subtle engineering challenges.

Now this is just off the top of my head, but what about this for a design:
- A j.l.ClassLoader maintains a collection of each of the classes it has 
loaded

- A j.l.Class contains a pointer to its j.l.ClassLoader
- A j.l.Class maintains a collection of its vtable(s) (or a pointer if 1:1).
The point of this is that a class loader and its classes are a 'self 
sustaining' data structure - if one element in it is reachable the whole 
thing is reachable.

The VM maintains a weak reference to all its j.l.ClassLoader instances, 
and maintains a ReferenceQueue for weakly-reachable classloaders. 
ClassLoaders are placed on the ReferenceQueue if and only if they are 
unreachable from the heap (including via their j.l.Class objects).  Note 
this is an irreversible condition: objects that are unreachable can 
never become reachable again, except through very specific methods.

When it sweeps the ReferenceQueue for unreachable classloaders, the VM 
places the unreachable classloaders in a queue of classloaders that are 
candidates for unloading.  This queue is part of the root set of the VM. 
 A classloader in this queue is unreachable from the heap, and can be 
unloaded when there are no objects of any class it has loaded.

This is where my mechanism comes into play.

If an object executes getClass() then its classloader is removed from 
the unloadable classloader queue, its weak reference gets recreated  and 
we're back at the initial state.  My guess is that this is a pretty 
infrequent method call.

I think this stage of the algorithm is easy in performance terms - 
difficult in terms of proving correctness, but if you have an efficient 
reachability mechanism for classes I think the building blocks are 
there, and the subtleties are nothing that a talented engineer can't solve.

I'm not 100% sure what your counter-proposal is: I recall 2 approaches 
from the mailing list:

1) Each object has an additional word in its header that points back to
   its j.l.Class object, and we proceed from here.

Given that the mean object size is ~28 bytes, this proposal adds 14% to 
each object size.  This increases the frequency of GC by 14% and incurs 
a 14% slowdown.  Of course this is an oversimplification but a 14% 
slowdown is a pretty lousy starting point to argue from.

2) The existing pointer in the GC header is traced during GC time.

The average number of pointers per object (excluding the vtable) is 
between 1.5 and 2 for the majority of benchmarks I have looked at 
(footnote: if you know something different, drop me a line) (geometric 
mean 1.78 for {specJVM, pseudoJBB and DaCapo 20051009}).  Tracing one 
additional reference per object will therefore increase the cost of GC 
by ~60% on average.  Again oversimplification but indicative.  If we 
assume that GC accounts for 10% of runtime (more or less depending on 
heap size), this is a runtime overhead of 6%.

My proposal has been measured at ~1% overhead in GC time, or 0.1% in 
execution time (caveats as above).  If there is some complexity in 
establishing classloader reachability from this basis, I would assume it 
can easliy be absorbed.

Therefore I think my proposal, while not complete, can form the basis of 
an efficient complete system for class unloading.

(PS: I'd *love* to be proven wrong)

cheers,
Robin

Regards,
Robin

Aleksey.

On 11/8/06, Robin Garner <[EMAIL PROTECTED]> wrote:

Pavel Pervov wrote:
> Robin,
>
> The kind of model I had in mind was along the lines of:
>> - VM maintains a linked list (or other collection type) of the
currently
>> loaded classloaders, each of which in turn maintains the 
collection of
>> classes loaded by that type.  The sweep of classloaders goes 
something

>> like:
>>
>> for (ClassLoader cl : classLoaders)
>>   for (Class c : cl.classes)
>>

Re: [drlvm] Class unloading support - tested one approach

2006-11-08 Thread Robin Garner

Aleksey Ignatenko wrote:

Robin.

OK, well how about keeping a weak reference to the >j.l.ClassLoader
object instead of a strong one.  When the reference >becomes (strong)ly
unreachable, invoke the class-unloading phase.

If you have weak reference to j.l.Classloader - GC will collect it (with 
all

appropriate jlClasses) as soon as there are no references to
j.l.Classloaderand appropriate classes. But there is possible
situation when there are some
live objects of that classes and no references to jlClassloader and
jlClasses. This will lead to unpredictable consequences (crash, etc).

I want to remind that there 3 mandatory conditions of class unloading:

1. j.l.Classloader instance is unreachable.

2. Appropriate j.l.Class instances are unreachable.

3. No object of any class loaded by appropriate class loader exists.

Let me repeat.  I offer an efficient solution to (3).  I don't purport 
to have a solution to (1) and (2).

Regards,
Robin

Aleksey.

On 11/8/06, Robin Garner <[EMAIL PROTECTED]> wrote:

Pavel Pervov wrote:
> Robin,
>
> The kind of model I had in mind was along the lines of:
>> - VM maintains a linked list (or other collection type) of the
currently
>> loaded classloaders, each of which in turn maintains the collection of
>> classes loaded by that type.  The sweep of classloaders goes something
>> like:
>>
>> for (ClassLoader cl : classLoaders)
>>   for (Class c : cl.classes)
>> cl.reachable |= c.vtable.reachable
>
>
> This is not enough. There are may be live j/l/Class'es and
> j/l/Classloader's
> in the heap. Even though no objects of any classes loaded by a 
particual

> class loader are available in the heap, if we have live reference to
> j/l/ClassLoader itself, it just can't be unloaded.

OK, well how about keeping a weak reference to the j.l.ClassLoader
object instead of a strong one.  When the reference becomes (strong)ly
unreachable, invoke the class-unloading phase.

To me the key issue from a performance POV is the reachability of
classes from objects in the heap.  I don't pretend to have an answer to
the other questions---the performance critical one is the one I have
addressed, and I accept there may be many solutions to this part of the
question.

> I believe that a separate heap trace pass, different from the standard
>> GC, that visited vtables and reachable resources from there would also
>> be a viable solution.  As mentioned in an earlier post, writing 
this in

>> MMTk (where a heap trace operation is a class that you can easily
>> subtype to do this) would be easy.
>>
>> One of the advantages of my other proposal is that it can be
implemented
>> in the VM independent of the GC to some extent.  This additional
>> mark/scan phase may or may not be easy to implement, depending on the
>> structure of DRLVM GCs, which is something I haven't explored.
>
>
> DRLVM may work with (potentially) any number of GCs. Designing class
> unloading the way, which would require mark&scan cooperation from 
GC, is

> not
> generally a good idea (from my HPOV).

That's what I gathered.  hence my proposal.

cheers

--
Robin Garner
Dept. of Computer Science
Australian National University

--
Robin Garner
Dept. of Computer Science
Australian National University

Re: [drlvm] Class unloading support - tested one approach

2006-11-08 Thread Aleksey Ignatenko

Robin.

OK, well how about keeping a weak reference to the >j.l.ClassLoader
object instead of a strong one.  When the reference >becomes (strong)ly
unreachable, invoke the class-unloading phase.

If you have weak reference to j.l.Classloader - GC will collect it (with all
appropriate jlClasses) as soon as there are no references to
j.l.Classloaderand appropriate classes. But there is possible
situation when there are some
live objects of that classes and no references to jlClassloader and
jlClasses. This will lead to unpredictable consequences (crash, etc).

I want to remind that there 3 mandatory conditions of class unloading:

1. j.l.Classloader instance is unreachable.

2. Appropriate j.l.Class instances are unreachable.

3. No object of any class loaded by appropriate class loader exists.

Aleksey.

On 11/8/06, Robin Garner <[EMAIL PROTECTED]> wrote:

Pavel Pervov wrote:
> Robin,
>
> The kind of model I had in mind was along the lines of:
>> - VM maintains a linked list (or other collection type) of the
currently
>> loaded classloaders, each of which in turn maintains the collection of
>> classes loaded by that type.  The sweep of classloaders goes something
>> like:
>>
>> for (ClassLoader cl : classLoaders)
>>   for (Class c : cl.classes)
>> cl.reachable |= c.vtable.reachable
>
>
> This is not enough. There are may be live j/l/Class'es and
> j/l/Classloader's
> in the heap. Even though no objects of any classes loaded by a particual
> class loader are available in the heap, if we have live reference to
> j/l/ClassLoader itself, it just can't be unloaded.

OK, well how about keeping a weak reference to the j.l.ClassLoader
object instead of a strong one.  When the reference becomes (strong)ly
unreachable, invoke the class-unloading phase.

To me the key issue from a performance POV is the reachability of
classes from objects in the heap.  I don't pretend to have an answer to
the other questions---the performance critical one is the one I have
addressed, and I accept there may be many solutions to this part of the
question.

> I believe that a separate heap trace pass, different from the standard
>> GC, that visited vtables and reachable resources from there would also
>> be a viable solution.  As mentioned in an earlier post, writing this in

>> MMTk (where a heap trace operation is a class that you can easily
>> subtype to do this) would be easy.
>>
>> One of the advantages of my other proposal is that it can be
implemented
>> in the VM independent of the GC to some extent.  This additional
>> mark/scan phase may or may not be easy to implement, depending on the
>> structure of DRLVM GCs, which is something I haven't explored.
>
>
> DRLVM may work with (potentially) any number of GCs. Designing class
> unloading the way, which would require mark&scan cooperation from GC, is
> not
> generally a good idea (from my HPOV).

That's what I gathered.  hence my proposal.

cheers

--
Robin Garner
Dept. of Computer Science
Australian National University

Re: [drlvm] Class unloading support - tested one approach

2006-11-08 Thread Robin Garner


Pavel Pervov wrote:

Robin,

The kind of model I had in mind was along the lines of:

- VM maintains a linked list (or other collection type) of the currently
loaded classloaders, each of which in turn maintains the collection of
classes loaded by that type.  The sweep of classloaders goes something
like:

for (ClassLoader cl : classLoaders)
  for (Class c : cl.classes)
cl.reachable |= c.vtable.reachable



This is not enough. There are may be live j/l/Class'es and 
j/l/Classloader's

in the heap. Even though no objects of any classes loaded by a particual
class loader are available in the heap, if we have live reference to
j/l/ClassLoader itself, it just can't be unloaded.


OK, well how about keeping a weak reference to the j.l.ClassLoader 
object instead of a strong one.  When the reference becomes (strong)ly 
unreachable, invoke the class-unloading phase.


To me the key issue from a performance POV is the reachability of 
classes from objects in the heap.  I don't pretend to have an answer to 
the other questions---the performance critical one is the one I have 
addressed, and I accept there may be many solutions to this part of the 
question.



I believe that a separate heap trace pass, different from the standard

GC, that visited vtables and reachable resources from there would also
be a viable solution.  As mentioned in an earlier post, writing this in
MMTk (where a heap trace operation is a class that you can easily
subtype to do this) would be easy.

One of the advantages of my other proposal is that it can be implemented
in the VM independent of the GC to some extent.  This additional
mark/scan phase may or may not be easy to implement, depending on the
structure of DRLVM GCs, which is something I haven't explored.



DRLVM may work with (potentially) any number of GCs. Designing class
unloading the way, which would require mark&scan cooperation from GC, is 
not

generally a good idea (from my HPOV).


That's what I gathered.  hence my proposal.

cheers

--
Robin Garner
Dept. of Computer Science
Australian National University

Re: [drlvm] Class unloading support - tested one approach

2006-11-08 Thread Pavel Pervov


Robin,

The kind of model I had in mind was along the lines of:

- VM maintains a linked list (or other collection type) of the currently
loaded classloaders, each of which in turn maintains the collection of
classes loaded by that type.  The sweep of classloaders goes something
like:

for (ClassLoader cl : classLoaders)
  for (Class c : cl.classes)
cl.reachable |= c.vtable.reachable



This is not enough. There are may be live j/l/Class'es and j/l/Classloader's
in the heap. Even though no objects of any classes loaded by a particual
class loader are available in the heap, if we have live reference to
j/l/ClassLoader itself, it just can't be unloaded.

I believe that a separate heap trace pass, different from the standard

GC, that visited vtables and reachable resources from there would also
be a viable solution.  As mentioned in an earlier post, writing this in
MMTk (where a heap trace operation is a class that you can easily
subtype to do this) would be easy.

One of the advantages of my other proposal is that it can be implemented
in the VM independent of the GC to some extent.  This additional
mark/scan phase may or may not be easy to implement, depending on the
structure of DRLVM GCs, which is something I haven't explored.



DRLVM may work with (potentially) any number of GCs. Designing class
unloading the way, which would require mark&scan cooperation from GC, is not
generally a good idea (from my HPOV).

--
Pavel Pervov,
Intel Enterprise Solutions Software Division

Re: [drlvm] Class unloading support - tested one approach

2006-11-08 Thread Robin Garner

Aleksey Ignatenko wrote:

Hi, Robin.
I do really like this proposed idea of marking VTables from objects via
additional word field in VTable.

But I have one question about detecting reachability of the classloaders
("sweep the vtables and check the reachability of the classloaders").
Possibly I missed something, but here is my view of the current model of
drlvm: all j.l.Classes and j.l.Classloaders are enumerated as strong roots
(strong references). Therefore we meet situation when all j.l.Classes and
j.l.Classloaders are always reachable (marked). And no sweep will help to
detect classloaders reachability.
I see the single way to distinguish if j.l.Classloader or j.l.Class was
marked not by strong root from VM but by some reference from heap - is
to write unique object value into VTable. Then we can detect if some
jlClasloader was marked from rootset (strong root from VM) or from some 
live

object.

The kind of model I had in mind was along the lines of:
- VM maintains a linked list (or other collection type) of the currently 
loaded classloaders, each of which in turn maintains the collection of 
classes loaded by that type.  The sweep of classloaders goes something like:

for (ClassLoader cl : classLoaders)
  for (Class c : cl.classes)
cl.reachable |= c.vtable.reachable

Then for any classloader where (!reachable), free its native resources 
and remove its strong root.  The java resources will be freed at next GC.

I also want to say that 1-st proposed design from me assumed addtional
mark&scan phase without enumeration of jlClasses and jlClassloaders to be
able to detect their reachability.

I believe that a separate heap trace pass, different from the standard 
GC, that visited vtables and reachable resources from there would also 
be a viable solution.  As mentioned in an earlier post, writing this in 
MMTk (where a heap trace operation is a class that you can easily 
subtype to do this) would be easy.

One of the advantages of my other proposal is that it can be implemented 
in the VM independent of the GC to some extent.  This additional 
mark/scan phase may or may not be easy to implement, depending on the 
structure of DRLVM GCs, which is something I haven't explored.

In terms of runtime cost, I would expect an auxiliary scan of this type 
to be equivalent in cost to a full-heap GC.  The other solution costs 
~1% of all GCs.  As a "back of a matchbox" calculation, if this is run 
less than every 100 (full heap) GCs, then the auxiliary trace is a win, 
if not, my other solution is a win.

Could you, please, clarify this moment.
Thanks, Aleksey.

Hope this answers your questions
cheers,
Robin

On 11/3/06, Rana Dasgupta <[EMAIL PROTECTED]> wrote:

On 11/2/06, Xiao-Feng Li <[EMAIL PROTECTED]> wrote:
>
> >Robin, thanks for all the clarifications. Now it seems clear to me 
>and

> >I am convinced by this proposal. :-)

Yes, this proposal is the simplest and has the least perf impact. Thanks
Robin.

--
Robin Garner
Dept. of Computer Science
Australian National University

Re: [drlvm] Class unloading support - tested one approach

2006-11-07 Thread Aleksey Ignatenko

Hi, Robin.
I do really like this proposed idea of marking VTables from objects via
additional word field in VTable.

But I have one question about detecting reachability of the classloaders
("sweep the vtables and check the reachability of the classloaders").
Possibly I missed something, but here is my view of the current model of
drlvm: all j.l.Classes and j.l.Classloaders are enumerated as strong roots
(strong references). Therefore we meet situation when all j.l.Classes and
j.l.Classloaders are always reachable (marked). And no sweep will help to
detect classloaders reachability.
I see the single way to distinguish if j.l.Classloader or j.l.Class was
marked not by strong root from VM but by some reference from heap - is
to write unique object value into VTable. Then we can detect if some
jlClasloader was marked from rootset (strong root from VM) or from some live
object.

I also want to say that 1-st proposed design from me assumed addtional
mark&scan phase without enumeration of jlClasses and jlClassloaders to be
able to detect their reachability.

Could you, please, clarify this moment.
Thanks, Aleksey.

On 11/3/06, Rana Dasgupta <[EMAIL PROTECTED]> wrote:

On 11/2/06, Xiao-Feng Li <[EMAIL PROTECTED]> wrote:
>
> >Robin, thanks for all the clarifications. Now it seems clear to me >and
> >I am convinced by this proposal. :-)

Yes, this proposal is the simplest and has the least perf impact. Thanks
Robin.

Re: [drlvm] Class unloading support - tested one approach

2006-11-02 Thread Rana Dasgupta


On 11/2/06, Xiao-Feng Li <[EMAIL PROTECTED]> wrote:


>Robin, thanks for all the clarifications. Now it seems clear to me >and
>I am convinced by this proposal. :-)



Yes, this proposal is the simplest and has the least perf impact. Thanks
Robin.

Re: [drlvm] Class unloading support - tested one approach

2006-11-02 Thread Xiao-Feng Li

On 11/3/06, Robin Garner <[EMAIL PROTECTED]> wrote:

Xiao-Feng Li wrote:
> Robin, good idea.
>
> I understand the main difference between your design and Aleksey's
> proposal 1 is, the tracing in your design stops as vtable, but
> Aleksey's continues to classloader. On the other hand, your approach
> requires an extra step to sweep the vtables in order to determine the
> classloaders' reachability.

Actually there are quite a few more differences:
- This mark phase is built into the standard GC trace, like Aleksey's
automatic class unloading proposal.
- This approach requires no additional fields in headers or objects
(except maybe something to allow enumeration of vtables if this doesn't
already exist)
- The additional mark comes at an extremely low cost as discussed
previously.

The operation to sweep vtables is very cheap, and only needs to be done
when you believe there are classloaders that can be unloaded, rather
than at every GC.  You might for example trigger class unloading every
time a new classloader is loaded.

> If this is true, why not just let the tracing to continue as a
> complete step to determine the classloaders' reachability?

Because that adds a large overhead to every GC, and requires vtables and
classloader structures to be traced at every GC.  While the numbers of
vtables is not large, the number of pointers to them is.  The particular
flavour of mark in my proposal is much cheaper than the standard test
and mark operation.

> Another difference is to mark the reachability with an unconditional
> write instead of a bit mask write. I think this can be applied to
> either approach.

Not really.

If you use an unconditional mark, you lose the ability to test whether
any particular mark is the first, and therefore enqueue an object for
scanning only once, and therefore the heap trace can never complete.
You can only use unconditional marks to process 'leaf' objects in the heap.

You can always turn a bit map into a byte map and avoid synchronized
update, but you can't eliminate the dependent load in a standard trace
algorithm.  The difference in performance between a load-test-write and
a load-test-mask-write is insignificant.

Of course a separate trace of the heap is an attractive operation - in
MMTk, this is simple to build because the transitive closure code can
simply be subclassed (eg the sanity checker is ~250 lines of code).
Depending on how reusable the DRLVM heap trace code is, this may or may
not be a good option.

Robin, thanks for all the clarifications. Now it seems clear to me and
I am convinced by this proposal. :-)

Thanks,
xiaofeng

cheers,
Robin

> Thanks,
> xiaofeng
>
> On 11/1/06, Robin Garner <[EMAIL PROTECTED]> wrote:
>> Actually, just thinking about how I would implement this in JikesRVM, I
>> would use the reachability based algorithm, but piggyback on the
>> existing GC mechanisms:
>>
>> - Allocate a byte (or word) in each vtable for the purpose of tracking
>> class reachability.
>> - Periodically, at a time when no GC is running (even the most
>> aggressive concurrent GC algorithms have these, I believe), zero this
>> flag across all vtables.  This is the beginning of a class-unloading
>> epoch.
>> - During each GC, when the GC is fetching the GC map for an object,
>> *unconditionally* write a value to the class reachability byte.  It may
>> make sense for this byte to be in either the first cache-line of the
>> vtable, or the cache line that points to the GC map - just make sure the
>> mark operation doesn't in general fetch an additional cache line.
>> - At a point in the sufficiently far future, when all reachable objects
>> are known to have been traced by the GC, sweep the vtables and check the
>> reachability of the classloaders.
>>
>> The features of this approach are:
>>
>> - Minimal additional work at GC time.  The additional write will cause
>> some additional memory traffic, but a) it's to memory that is already
>> guaranteed to be in L1 cache, and b) it's an unconditional independent
>> write, and c) multiple writes to common classes will be absorbed by the
>> write buffer.
>>
>> - Space cost of at most 1 word per vtable.
>>
>> - This works whether vtables are objects or VM structures
>>
>> - If the relationship between a class and a vtable is not 1:1, this only
>> need affect the periodic sweep process, which should be infrequent and
>> small compared to a GC.
>>
>> - shouldn't need a stop-the-world at any point.
>>
>> I've implemented and tested the GC-relevant part of this in JikesRVM,
>> and the GC time overhead appears to be just under 1% in the MMTk
>> MarkSweep collector.
>>
>> cheers,
>> Robin
>>

Re: [drlvm] Class unloading support - tested one approach

2006-11-02 Thread Robin Garner


Xiao-Feng Li wrote:

Robin, good idea.

I understand the main difference between your design and Aleksey's
proposal 1 is, the tracing in your design stops as vtable, but
Aleksey's continues to classloader. On the other hand, your approach
requires an extra step to sweep the vtables in order to determine the
classloaders' reachability.


Actually there are quite a few more differences:
- This mark phase is built into the standard GC trace, like Aleksey's 
automatic class unloading proposal.
- This approach requires no additional fields in headers or objects 
(except maybe something to allow enumeration of vtables if this doesn't 
already exist)
- The additional mark comes at an extremely low cost as discussed 
previously.


The operation to sweep vtables is very cheap, and only needs to be done 
when you believe there are classloaders that can be unloaded, rather 
than at every GC.  You might for example trigger class unloading every 
time a new classloader is loaded.



If this is true, why not just let the tracing to continue as a
complete step to determine the classloaders' reachability?


Because that adds a large overhead to every GC, and requires vtables and 
classloader structures to be traced at every GC.  While the numbers of 
vtables is not large, the number of pointers to them is.  The particular 
flavour of mark in my proposal is much cheaper than the standard test 
and mark operation.



Another difference is to mark the reachability with an unconditional
write instead of a bit mask write. I think this can be applied to
either approach.


Not really.

If you use an unconditional mark, you lose the ability to test whether 
any particular mark is the first, and therefore enqueue an object for 
scanning only once, and therefore the heap trace can never complete. 
You can only use unconditional marks to process 'leaf' objects in the heap.


You can always turn a bit map into a byte map and avoid synchronized 
update, but you can't eliminate the dependent load in a standard trace 
algorithm.  The difference in performance between a load-test-write and 
a load-test-mask-write is insignificant.



Of course a separate trace of the heap is an attractive operation - in 
MMTk, this is simple to build because the transitive closure code can 
simply be subclassed (eg the sanity checker is ~250 lines of code). 
Depending on how reusable the DRLVM heap trace code is, this may or may 
not be a good option.


cheers,
Robin



Thanks,
xiaofeng

On 11/1/06, Robin Garner <[EMAIL PROTECTED]> wrote:

Actually, just thinking about how I would implement this in JikesRVM, I
would use the reachability based algorithm, but piggyback on the
existing GC mechanisms:

- Allocate a byte (or word) in each vtable for the purpose of tracking
class reachability.
- Periodically, at a time when no GC is running (even the most
aggressive concurrent GC algorithms have these, I believe), zero this
flag across all vtables.  This is the beginning of a class-unloading 
epoch.

- During each GC, when the GC is fetching the GC map for an object,
*unconditionally* write a value to the class reachability byte.  It may
make sense for this byte to be in either the first cache-line of the
vtable, or the cache line that points to the GC map - just make sure the
mark operation doesn't in general fetch an additional cache line.
- At a point in the sufficiently far future, when all reachable objects
are known to have been traced by the GC, sweep the vtables and check the
reachability of the classloaders.

The features of this approach are:

- Minimal additional work at GC time.  The additional write will cause
some additional memory traffic, but a) it's to memory that is already
guaranteed to be in L1 cache, and b) it's an unconditional independent
write, and c) multiple writes to common classes will be absorbed by the
write buffer.

- Space cost of at most 1 word per vtable.

- This works whether vtables are objects or VM structures

- If the relationship between a class and a vtable is not 1:1, this only
need affect the periodic sweep process, which should be infrequent and
small compared to a GC.

- shouldn't need a stop-the-world at any point.

I've implemented and tested the GC-relevant part of this in JikesRVM,
and the GC time overhead appears to be just under 1% in the MMTk
MarkSweep collector.

cheers,
Robin

Re: [drlvm] Class unloading support - tested one approach

2006-11-01 Thread Xiao-Feng Li


Robin, good idea.

I understand the main difference between your design and Aleksey's
proposal 1 is, the tracing in your design stops as vtable, but
Aleksey's continues to classloader. On the other hand, your approach
requires an extra step to sweep the vtables in order to determine the
classloaders' reachability.

If this is true, why not just let the tracing to continue as a
complete step to determine the classloaders' reachability?

Another difference is to mark the reachability with an unconditional
write instead of a bit mask write. I think this can be applied to
either approach.

Thanks,
xiaofeng

On 11/1/06, Robin Garner <[EMAIL PROTECTED]> wrote:

Actually, just thinking about how I would implement this in JikesRVM, I
would use the reachability based algorithm, but piggyback on the
existing GC mechanisms:

- Allocate a byte (or word) in each vtable for the purpose of tracking
class reachability.
- Periodically, at a time when no GC is running (even the most
aggressive concurrent GC algorithms have these, I believe), zero this
flag across all vtables.  This is the beginning of a class-unloading epoch.
- During each GC, when the GC is fetching the GC map for an object,
*unconditionally* write a value to the class reachability byte.  It may
make sense for this byte to be in either the first cache-line of the
vtable, or the cache line that points to the GC map - just make sure the
mark operation doesn't in general fetch an additional cache line.
- At a point in the sufficiently far future, when all reachable objects
are known to have been traced by the GC, sweep the vtables and check the
reachability of the classloaders.

The features of this approach are:

- Minimal additional work at GC time.  The additional write will cause
some additional memory traffic, but a) it's to memory that is already
guaranteed to be in L1 cache, and b) it's an unconditional independent
write, and c) multiple writes to common classes will be absorbed by the
write buffer.

- Space cost of at most 1 word per vtable.

- This works whether vtables are objects or VM structures

- If the relationship between a class and a vtable is not 1:1, this only
need affect the periodic sweep process, which should be infrequent and
small compared to a GC.

- shouldn't need a stop-the-world at any point.

I've implemented and tested the GC-relevant part of this in JikesRVM,
and the GC time overhead appears to be just under 1% in the MMTk
MarkSweep collector.

cheers,
Robin

Re: [drlvm] Class unloading support - tested one approach

2006-11-01 Thread Robin Garner

Weldon Washburn wrote:

Its probably in the noise but it might be possible to
even reduce the overhead of clearing the vtable "mark" by using a epoch
number instead of a boolean.  The idea is that after every major GC,
increment the value used for the mark.  When sweeping the vtables, the 
stale

mark values are the unreachable classes.

cheers

Right.  I'm assuming we're all on the same page here, but I'll spell it 
out anyway:  The number of objects is orders of magnitude higher than 
the number of classes, so any operation on a 'per-class' basis can 
afford to be expensive, whereas per-object operations need to be fast.

Just looking at my stats for SpecJVM98, JBB 2000 and DaCapo 2006-10, the 
ratio of live objects to classes loaded is ~500:1 (geometric mean).  The 
extremes are 11:1 (jython) to 24000:1 (hsqldb).  These are probably also 
 very small heaps compared to enterprise workloads, which would drive 
the number of objects/class up.

The other consideration is that you are not going to want to check for 
unloadable classloaders at every GC.

Therefore, within reason, I don't think the efficiency of per-class 
operations is much of a consideration.

>  Its probably in the noise but it might be possible to
> even reduce the overhead of clearing the vtable "mark" by using a epoch
> number instead of a boolean.

Under the circumstances, requiring an additional register during GC to 
hold the class epoch number probably loses more than it gains.

cheers

Re: [drlvm] Class unloading support - tested one approach

2006-11-01 Thread Weldon Washburn


On 11/1/06, Robin Garner <[EMAIL PROTECTED]> wrote:


Rana Dasgupta wrote:
> Robin,
>The basic difference of this with Etienne's method is that the flag
is
> on the vtable, instead of the CL instance. Do I understand correctly ?
The
> GC perf impact is therefore reduced because you need to lookup
> object->vtable instead of object->class->CLinstancewhen tracing the
heap.
> The space overhead is correspondingly slightly higher. Also the GC
impact
> may look lower because there are a couple of pseudo GC cycles to reset
the
> vtables and sweep the vtables.
>
> Thanks,
> Rana

The relevant part of Etienne's design I believe is this:

> 7- Each class loader structure maintains a set of boolean flags, one
>  flag per "non-nursery" garbage collected area (even when thread-local
>  heaps are used).  The flag is set when an instance of a class loaded by
>  this class leader is moved into the related GC-area.  The flag is unset
>  when the GC-area is emptied, or (optionally) when it can be determined
>  that no instance of a class loaded by this class loader remains in the
>  GC-area.  This is best implemented as follows: a) use an unconditional
>  write of "true" in the flag every time an object is moved into the
>  GC-area by the garbage collector, b) unset the related flag in "all"
>  class loader structures just before collecting a GC-area, then setting
>  the flag back when an object survives in the area.

My design differs in several key ways from this:
1. There is no requirement for a per non-nursery area flag


2. The mark byte/word is set unconditionally whenever an object is

visited by the GC, not when an object is moved into a particular mature
space.  This may be the same for some GCs, but not all.


3. The mark byte/word is an unconditional write - Etienne's proposal

would use a load/mask/write sequence.  This is performance critical.
4. My memory of x86 assembler is a little rusty, but I believe a
constant store can be done without requiring a register for the value to
be written, where as or-ing a bit value into a word requires a temporary
register or two.
5. In a parallel GC, setting bits in a mask requires a synchronized
update.  My design doesn't.

The point is an unconditional store to a structure you are already
accessing is very cheap, whereas register spills, loads and synchronized
updates are expensive.



I might be missing something here.  But my take is that Robin's design is
really the best one.  Its probably in the noise but it might be possible to
even reduce the overhead of clearing the vtable "mark" by using a epoch
number instead of a boolean.  The idea is that after every major GC,
increment the value used for the mark.  When sweeping the vtables, the stale
mark values are the unreachable classes.

cheers


> On 10/31/06, Robin Garner <[EMAIL PROTECTED]> wrote:
>>
>> Actually, just thinking about how I would implement this in JikesRVM, I
>> would use the reachability based algorithm, but piggyback on the
>> existing GC mechanisms:
>>
>> - Allocate a byte (or word) in each vtable for the purpose of tracking
>> class reachability.
>> - Periodically, at a time when no GC is running (even the most
>> aggressive concurrent GC algorithms have these, I believe), zero this
>> flag across all vtables.  This is the beginning of a class-unloading
>> epoch.
>> - During each GC, when the GC is fetching the GC map for an object,
>> *unconditionally* write a value to the class reachability byte.  It may
>> make sense for this byte to be in either the first cache-line of the
>> vtable, or the cache line that points to the GC map - just make sure
the
>> mark operation doesn't in general fetch an additional cache line.
>> - At a point in the sufficiently far future, when all reachable objects
>> are known to have been traced by the GC, sweep the vtables and check
the
>> reachability of the classloaders.
>>
>> The features of this approach are:
>>
>> - Minimal additional work at GC time.  The additional write will cause
>> some additional memory traffic, but a) it's to memory that is already
>> guaranteed to be in L1 cache, and b) it's an unconditional independent
>> write, and c) multiple writes to common classes will be absorbed by the
>> write buffer.
>>
>> - Space cost of at most 1 word per vtable.
>>
>> - This works whether vtables are objects or VM structures
>>
>> - If the relationship between a class and a vtable is not 1:1, this
only
>> need affect the periodic sweep process, which should be infrequent and
>> small compared to a GC.
>>
>> - shouldn't need a stop-the-world at any point.
>>
>> I've implemented and tested the GC-relevant part of this in JikesRVM,
>> and the GC time overhead appears to be just under 1% in the MMTk
>> MarkSweep collector.
>>
>> cheers,
>> Robin
>>
>





--
Weldon Washburn
Intel Enterprise Solutions Software Division

Re: [drlvm] Class unloading support - tested one approach

2006-11-01 Thread Robin Garner


Rana Dasgupta wrote:

Robin,
   The basic difference of this with Etienne's method is that the flag is
on the vtable, instead of the CL instance. Do I understand correctly ? The
GC perf impact is therefore reduced because you need to lookup
object->vtable instead of object->class->CLinstancewhen tracing the heap.
The space overhead is correspondingly slightly higher. Also the GC impact
may look lower because there are a couple of pseudo GC cycles to reset the
vtables and sweep the vtables.

Thanks,
Rana


The relevant part of Etienne's design I believe is this:


7- Each class loader structure maintains a set of boolean flags, one
 flag per "non-nursery" garbage collected area (even when thread-local
 heaps are used).  The flag is set when an instance of a class loaded by
 this class leader is moved into the related GC-area.  The flag is unset
 when the GC-area is emptied, or (optionally) when it can be determined
 that no instance of a class loaded by this class loader remains in the
 GC-area.  This is best implemented as follows: a) use an unconditional
 write of "true" in the flag every time an object is moved into the
 GC-area by the garbage collector, b) unset the related flag in "all"
 class loader structures just before collecting a GC-area, then setting
 the flag back when an object survives in the area.


My design differs in several key ways from this:
1. There is no requirement for a per non-nursery area flag
2. The mark byte/word is set unconditionally whenever an object is 
visited by the GC, not when an object is moved into a particular mature 
space.  This may be the same for some GCs, but not all.
3. The mark byte/word is an unconditional write - Etienne's proposal 
would use a load/mask/write sequence.  This is performance critical.
4. My memory of x86 assembler is a little rusty, but I believe a 
constant store can be done without requiring a register for the value to 
be written, where as or-ing a bit value into a word requires a temporary 
register or two.
5. In a parallel GC, setting bits in a mask requires a synchronized 
update.  My design doesn't.


The point is an unconditional store to a structure you are already 
accessing is very cheap, whereas register spills, loads and synchronized 
updates are expensive.


cheers


On 10/31/06, Robin Garner <[EMAIL PROTECTED]> wrote:


Actually, just thinking about how I would implement this in JikesRVM, I
would use the reachability based algorithm, but piggyback on the
existing GC mechanisms:

- Allocate a byte (or word) in each vtable for the purpose of tracking
class reachability.
- Periodically, at a time when no GC is running (even the most
aggressive concurrent GC algorithms have these, I believe), zero this
flag across all vtables.  This is the beginning of a class-unloading
epoch.
- During each GC, when the GC is fetching the GC map for an object,
*unconditionally* write a value to the class reachability byte.  It may
make sense for this byte to be in either the first cache-line of the
vtable, or the cache line that points to the GC map - just make sure the
mark operation doesn't in general fetch an additional cache line.
- At a point in the sufficiently far future, when all reachable objects
are known to have been traced by the GC, sweep the vtables and check the
reachability of the classloaders.

The features of this approach are:

- Minimal additional work at GC time.  The additional write will cause
some additional memory traffic, but a) it's to memory that is already
guaranteed to be in L1 cache, and b) it's an unconditional independent
write, and c) multiple writes to common classes will be absorbed by the
write buffer.

- Space cost of at most 1 word per vtable.

- This works whether vtables are objects or VM structures

- If the relationship between a class and a vtable is not 1:1, this only
need affect the periodic sweep process, which should be infrequent and
small compared to a GC.

- shouldn't need a stop-the-world at any point.

I've implemented and tested the GC-relevant part of this in JikesRVM,
and the GC time overhead appears to be just under 1% in the MMTk
MarkSweep collector.

cheers,
Robin

Re: [drlvm] Class unloading support - tested one approach

2006-11-01 Thread Rana Dasgupta


Robin,
   The basic difference of this with Etienne's method is that the flag is
on the vtable, instead of the CL instance. Do I understand correctly ? The
GC perf impact is therefore reduced because you need to lookup
object->vtable instead of object->class->CLinstancewhen tracing the heap.
The space overhead is correspondingly slightly higher. Also the GC impact
may look lower because there are a couple of pseudo GC cycles to reset the
vtables and sweep the vtables.

Thanks,
Rana





On 10/31/06, Robin Garner <[EMAIL PROTECTED]> wrote:


Actually, just thinking about how I would implement this in JikesRVM, I
would use the reachability based algorithm, but piggyback on the
existing GC mechanisms:

- Allocate a byte (or word) in each vtable for the purpose of tracking
class reachability.
- Periodically, at a time when no GC is running (even the most
aggressive concurrent GC algorithms have these, I believe), zero this
flag across all vtables.  This is the beginning of a class-unloading
epoch.
- During each GC, when the GC is fetching the GC map for an object,
*unconditionally* write a value to the class reachability byte.  It may
make sense for this byte to be in either the first cache-line of the
vtable, or the cache line that points to the GC map - just make sure the
mark operation doesn't in general fetch an additional cache line.
- At a point in the sufficiently far future, when all reachable objects
are known to have been traced by the GC, sweep the vtables and check the
reachability of the classloaders.

The features of this approach are:

- Minimal additional work at GC time.  The additional write will cause
some additional memory traffic, but a) it's to memory that is already
guaranteed to be in L1 cache, and b) it's an unconditional independent
write, and c) multiple writes to common classes will be absorbed by the
write buffer.

- Space cost of at most 1 word per vtable.

- This works whether vtables are objects or VM structures

- If the relationship between a class and a vtable is not 1:1, this only
need affect the periodic sweep process, which should be infrequent and
small compared to a GC.

- shouldn't need a stop-the-world at any point.

I've implemented and tested the GC-relevant part of this in JikesRVM,
and the GC time overhead appears to be just under 1% in the MMTk
MarkSweep collector.

cheers,
Robin

Re: [drlvm] Class unloading support - tested one approach

2006-11-01 Thread Weldon Washburn


On 11/1/06, Robin Garner <[EMAIL PROTECTED]> wrote:


> Interesting idea!   It seems the real issue is "marking and sweeping"
the
> vtables.  A stab at categorizing the approaches:
>
> a)
> Force vtables to be as similar to ordinary java objects as
possible.  The
> upside: existing GC algorithms will work unaltered.  The downside is
> vtables
> of vtables of vtables...  This has already been discussed at length.
>
> b)
> Force vtables to live and die in a unique "vtable space".  The big
> challenge seems to be building a custom GC algorithm that is simpler and
> less invasive than doing a) above.  Most likely the performance of the
> custom GC algorithm will never be an issue.  Vtables have some very
> interesting properties that may make this doable.  The 4 (or 8) bytes at
> offset "K" always point to a class structure which, in turn, always has
a
> pointer at offset "Z" back to the vtable.  There are way fewer vtables
> than
> objects.  For practical reasons, it can be assumed that vtables will
> always
> be pinned.  The number of class structs/objects is no greater than the
> number of vtables.

I guess ... the issue of where vtables and other classloader related
structures lives seems to me to be simple engineering.  Whether they are
malloced in the C heap or 'new'ed in an immovable Java heap makes little
difference.  After all they are very very small compared to the rest of
the heap.

> A half-baked scheme that might be good enough:  Partition off 50
megabytes
> as a hard, fixed vtables space.  Then do a word-by-word scan to pick up
> candidate pointers class structs.  Then filter the candidate class
struct
> pointers by verifying the back pointers.  Occasionally there might be
> "floating garbage" with this approach but a valid, live vtable should
> never
> be accidentally freed.  The filtered set are the "live" vtables.  Next
> scan
> the live vtables looking for members that were never marked by the
regular
> GC as mentioned  below.  When found, zero the vtable, link-list on a
free
> vtable space list, mark the class struct as "vtable-less".  Process the
> "vtable-less" class struct, etc...

This sounds a bit complicated - i'm sure it could be done by maintaining
linked lists of all the classes loaded by each classloader (pointing to
vtables from there) and traversing it.  Traverse this once, propagating
the mark bits upwards and you are done.




You are right.  Your approach is way simpler.  It makes a lot of sense.  I
was too sleepy to think through this clearly.

Re: [drlvm] Class unloading support - tested one approach

2006-11-01 Thread Robin Garner

> Interesting idea!   It seems the real issue is "marking and sweeping" the
> vtables.  A stab at categorizing the approaches:
>
> a)
> Force vtables to be as similar to ordinary java objects as possible.  The
> upside: existing GC algorithms will work unaltered.  The downside is
> vtables
> of vtables of vtables...  This has already been discussed at length.
>
> b)
> Force vtables to live and die in a unique "vtable space".  The big
> challenge seems to be building a custom GC algorithm that is simpler and
> less invasive than doing a) above.  Most likely the performance of the
> custom GC algorithm will never be an issue.  Vtables have some very
> interesting properties that may make this doable.  The 4 (or 8) bytes at
> offset "K" always point to a class structure which, in turn, always has a
> pointer at offset "Z" back to the vtable.  There are way fewer vtables
> than
> objects.  For practical reasons, it can be assumed that vtables will
> always
> be pinned.  The number of class structs/objects is no greater than the
> number of vtables.

I guess ... the issue of where vtables and other classloader related
structures lives seems to me to be simple engineering.  Whether they are
malloced in the C heap or 'new'ed in an immovable Java heap makes little
difference.  After all they are very very small compared to the rest of
the heap.

> A half-baked scheme that might be good enough:  Partition off 50 megabytes
> as a hard, fixed vtables space.  Then do a word-by-word scan to pick up
> candidate pointers class structs.  Then filter the candidate class struct
> pointers by verifying the back pointers.  Occasionally there might be
> "floating garbage" with this approach but a valid, live vtable should
> never
> be accidentally freed.  The filtered set are the "live" vtables.  Next
> scan
> the live vtables looking for members that were never marked by the regular
> GC as mentioned  below.  When found, zero the vtable, link-list on a free
> vtable space list, mark the class struct as "vtable-less".  Process the
> "vtable-less" class struct, etc...

This sounds a bit complicated - i'm sure it could be done by maintaining
linked lists of all the classes loaded by each classloader (pointing to
vtables from there) and traversing it.  Traverse this once, propagating
the mark bits upwards and you are done.

> It seems as long as part of the system is built using garbage collected
> java
> objects and part of the system is built using malloc/free C structs, the
> problem of releasing connected objects/C_structs will be messy and hacky.
> In a sense, this issue is really a motivation for re-writing the whole VM
> in
> Java... hmmm...

Well in Java-in-Java you would hardly want to do a full trace operation
for every vtable pointer - that would be quite costly imo, because in a
full gc you do a test-and-set, then conditionally copy and/or enqueue for
scanning.   So there's a dependent load and conditional store, and the
test to detect the gc policy for the vtable space.

And you really don't want to have moveable vtables.  The cost of updating
the forwarding pointers alone would be a killer, let alone the engineering
difficulties.

There are definitely reasons to write a VM in java, but I don't believe
this is one of them!  As I say below I wouldn't do class unloading on top
of the standard GC mechanism - I would do it in the VM, using a hook from
the GC's tracing.

>
> On 10/31/06, Robin Garner <[EMAIL PROTECTED]> wrote:
>>
>> Actually, just thinking about how I would implement this in JikesRVM, I
>> would use the reachability based algorithm, but piggyback on the
>> existing GC mechanisms:
>>
>> - Allocate a byte (or word) in each vtable for the purpose of tracking
>> class reachability.
>> - Periodically, at a time when no GC is running (even the most
>> aggressive concurrent GC algorithms have these, I believe), zero this
>> flag across all vtables.  This is the beginning of a class-unloading
>> epoch.
>> - During each GC, when the GC is fetching the GC map for an object,
>> *unconditionally* write a value to the class reachability byte.  It may
>> make sense for this byte to be in either the first cache-line of the
>> vtable, or the cache line that points to the GC map - just make sure the
>> mark operation doesn't in general fetch an additional cache line.
>> - At a point in the sufficiently far future, when all reachable objects
>> are known to have been traced by the GC, sweep the vtables and check the
>> reachability of the classloaders.
>>
>> The features of this approach are:
>>
>> - Minimal additional work at GC time.  The additional write will cause
>> some additional memory traffic, but a) it's to memory that is already
>> guaranteed to be in L1 cache, and b) it's an unconditional independent
>> write, and c) multiple writes to common classes will be absorbed by the
>> write buffer.
>>
>> - Space cost of at most 1 word per vtable.
>>
>> - This works whether vtables are objects or VM structures
>>
>> - I

Re: [drlvm] Class unloading support - tested one approach

2006-11-01 Thread Etienne Gagnon

Robin Garner wrote:
> - Allocate a byte (or word) in each vtable for the purpose of tracking
> class reachability.

Yep, there's no reason to keep the bits (or words, for performance) in
the class loader, even in the approach I've proposed.  They could be
moved to the vtable.

Etienne

-- 
Etienne M. Gagnon, Ph.D.http://www.info2.uqam.ca/~egagnon/
SableVM:   http://www.sablevm.org/
SableCC:   http://www.sablecc.org/

signature.asc
Description: OpenPGP digital signature

Re: [drlvm] Class unloading support - tested one approach

2006-11-01 Thread Ivan Volosyuk


+1 for this approach. It will give us some kind of class unloading
without much performance impact on GC.
--
Ivan

On 11/1/06, Robin Garner <[EMAIL PROTECTED]> wrote:

Actually, just thinking about how I would implement this in JikesRVM, I
would use the reachability based algorithm, but piggyback on the
existing GC mechanisms:

- Allocate a byte (or word) in each vtable for the purpose of tracking
class reachability.
- Periodically, at a time when no GC is running (even the most
aggressive concurrent GC algorithms have these, I believe), zero this
flag across all vtables.  This is the beginning of a class-unloading epoch.
- During each GC, when the GC is fetching the GC map for an object,
*unconditionally* write a value to the class reachability byte.  It may
make sense for this byte to be in either the first cache-line of the
vtable, or the cache line that points to the GC map - just make sure the
mark operation doesn't in general fetch an additional cache line.
- At a point in the sufficiently far future, when all reachable objects
are known to have been traced by the GC, sweep the vtables and check the
reachability of the classloaders.

The features of this approach are:

- Minimal additional work at GC time.  The additional write will cause
some additional memory traffic, but a) it's to memory that is already
guaranteed to be in L1 cache, and b) it's an unconditional independent
write, and c) multiple writes to common classes will be absorbed by the
write buffer.

- Space cost of at most 1 word per vtable.

- This works whether vtables are objects or VM structures

- If the relationship between a class and a vtable is not 1:1, this only
need affect the periodic sweep process, which should be infrequent and
small compared to a GC.

- shouldn't need a stop-the-world at any point.

I've implemented and tested the GC-relevant part of this in JikesRVM,
and the GC time overhead appears to be just under 1% in the MMTk
MarkSweep collector.

cheers,
Robin

--
Ivan
Intel Enterprise Solutions Software Division

Re: [drlvm] Class unloading support - tested one approach

2006-10-31 Thread Weldon Washburn


Interesting idea!   It seems the real issue is "marking and sweeping" the
vtables.  A stab at categorizing the approaches:

a)
Force vtables to be as similar to ordinary java objects as possible.  The
upside: existing GC algorithms will work unaltered.  The downside is vtables
of vtables of vtables...  This has already been discussed at length.

b)
Force vtables to live and die in a unique "vtable space".  The big
challenge seems to be building a custom GC algorithm that is simpler and
less invasive than doing a) above.  Most likely the performance of the
custom GC algorithm will never be an issue.  Vtables have some very
interesting properties that may make this doable.  The 4 (or 8) bytes at
offset "K" always point to a class structure which, in turn, always has a
pointer at offset "Z" back to the vtable.  There are way fewer vtables than
objects.  For practical reasons, it can be assumed that vtables will always
be pinned.  The number of class structs/objects is no greater than the
number of vtables.

A half-baked scheme that might be good enough:  Partition off 50 megabytes
as a hard, fixed vtables space.  Then do a word-by-word scan to pick up
candidate pointers class structs.  Then filter the candidate class struct
pointers by verifying the back pointers.  Occasionally there might be
"floating garbage" with this approach but a valid, live vtable should never
be accidentally freed.  The filtered set are the "live" vtables.  Next scan
the live vtables looking for members that were never marked by the regular
GC as mentioned  below.  When found, zero the vtable, link-list on a free
vtable space list, mark the class struct as "vtable-less".  Process the
"vtable-less" class struct, etc...

It seems as long as part of the system is built using garbage collected java
objects and part of the system is built using malloc/free C structs, the
problem of releasing connected objects/C_structs will be messy and hacky.
In a sense, this issue is really a motivation for re-writing the whole VM in
Java... hmmm...


On 10/31/06, Robin Garner <[EMAIL PROTECTED]> wrote:


Actually, just thinking about how I would implement this in JikesRVM, I
would use the reachability based algorithm, but piggyback on the
existing GC mechanisms:

- Allocate a byte (or word) in each vtable for the purpose of tracking
class reachability.
- Periodically, at a time when no GC is running (even the most
aggressive concurrent GC algorithms have these, I believe), zero this
flag across all vtables.  This is the beginning of a class-unloading
epoch.
- During each GC, when the GC is fetching the GC map for an object,
*unconditionally* write a value to the class reachability byte.  It may
make sense for this byte to be in either the first cache-line of the
vtable, or the cache line that points to the GC map - just make sure the
mark operation doesn't in general fetch an additional cache line.
- At a point in the sufficiently far future, when all reachable objects
are known to have been traced by the GC, sweep the vtables and check the
reachability of the classloaders.

The features of this approach are:

- Minimal additional work at GC time.  The additional write will cause
some additional memory traffic, but a) it's to memory that is already
guaranteed to be in L1 cache, and b) it's an unconditional independent
write, and c) multiple writes to common classes will be absorbed by the
write buffer.

- Space cost of at most 1 word per vtable.

- This works whether vtables are objects or VM structures

- If the relationship between a class and a vtable is not 1:1, this only
need affect the periodic sweep process, which should be infrequent and
small compared to a GC.

- shouldn't need a stop-the-world at any point.

I've implemented and tested the GC-relevant part of this in JikesRVM,
and the GC time overhead appears to be just under 1% in the MMTk
MarkSweep collector.

cheers,
Robin





--
Weldon Washburn
Intel Enterprise Solutions Software Division

Re: [drlvm] Class unloading support - tested one approach

2006-10-31 Thread Robin Garner

Actually, just thinking about how I would implement this in JikesRVM, I 
would use the reachability based algorithm, but piggyback on the 
existing GC mechanisms:


- Allocate a byte (or word) in each vtable for the purpose of tracking 
class reachability.
- Periodically, at a time when no GC is running (even the most 
aggressive concurrent GC algorithms have these, I believe), zero this 
flag across all vtables.  This is the beginning of a class-unloading epoch.
- During each GC, when the GC is fetching the GC map for an object, 
*unconditionally* write a value to the class reachability byte.  It may 
make sense for this byte to be in either the first cache-line of the 
vtable, or the cache line that points to the GC map - just make sure the 
mark operation doesn't in general fetch an additional cache line.
- At a point in the sufficiently far future, when all reachable objects 
are known to have been traced by the GC, sweep the vtables and check the 
reachability of the classloaders.


The features of this approach are:

- Minimal additional work at GC time.  The additional write will cause 
some additional memory traffic, but a) it's to memory that is already 
guaranteed to be in L1 cache, and b) it's an unconditional independent 
write, and c) multiple writes to common classes will be absorbed by the 
write buffer.


- Space cost of at most 1 word per vtable.

- This works whether vtables are objects or VM structures

- If the relationship between a class and a vtable is not 1:1, this only 
need affect the periodic sweep process, which should be infrequent and 
small compared to a GC.


- shouldn't need a stop-the-world at any point.

I've implemented and tested the GC-relevant part of this in JikesRVM, 
and the GC time overhead appears to be just under 1% in the MMTk 
MarkSweep collector.


cheers,
Robin

55 matches

Mail list logo