2006/11/8, Robin Garner <[EMAIL PROTECTED]>:
Robin Garner wrote:
> Aleksey Ignatenko wrote:
>> Robin.
>>
>>> OK, well how about keeping a weak reference to the >j.l.ClassLoader
>>> object instead of a strong one.  When the reference >becomes (strong)ly
>>> unreachable, invoke the class-unloading phase.
>>
>>
>> If you have weak reference to j.l.Classloader - GC will collect it
>> (with all
>> appropriate jlClasses) as soon as there are no references to
>> j.l.Classloaderand appropriate classes. But there is possible
>> situation when there are some
>> live objects of that classes and no references to jlClassloader and
>> jlClasses. This will lead to unpredictable consequences (crash, etc).
>>
>>
>>
>> I want to remind that there 3 mandatory conditions of class unloading:
>>
>> 1. j.l.Classloader instance is unreachable.
>>
>> 2. Appropriate j.l.Class instances are unreachable.
>>
>> 3. No object of any class loaded by appropriate class loader exists.
>
> Let me repeat.  I offer an efficient solution to (3).  I don't purport
> to have a solution to (1) and (2).

Let me just add:  This is because I don't think (1) or (2) are
particularly difficult from a performance point of view, although I'm
happy to accept that there may still be some subtle engineering challenges.

Robin,

While your idea to (3) looks brilliant and quite convincing, it only
covers part of the whole mission. We really need to derive complete
design solution (like Etienne did), and I feel the voting started in
the neighbor thread is a bit premature.
Some of considerations below are beyond of my understanding, could you
please clarify them (inlined)?

And yet, it would be nice to have a confirmation that the notion of
"epoch of full-heap-collection" does not imply strict limitations on
GC algorithms. Maybe this is something obvious for people with more
decent GC background than me?


Now this is just off the top of my head, but what about this for a design:
- A j.l.ClassLoader maintains a collection of each of the classes it has
loaded
- A j.l.Class contains a pointer to its j.l.ClassLoader
- A j.l.Class maintains a collection of its vtable(s) (or a pointer if 1:1).
The point of this is that a class loader and its classes are a 'self
sustaining' data structure - if one element in it is reachable the whole
thing is reachable.
Right. The special case is for system classes which are always in VM
root set so never reclaimed.

The VM maintains a weak reference to all its j.l.ClassLoader instances,
and maintains a ReferenceQueue for weakly-reachable classloaders.
ClassLoaders are placed on the ReferenceQueue if and only if they are
unreachable from the heap (including via their j.l.Class objects).
Here: should it actually read as "WeakReference instances for
weakly-reachable classloaders are placed on the ReferenceQueue"?
Otherwise this sentence completely escapes my mind, sorry.
If the former, when how VM could obtain&rescue referent CL objects (+
it's j.l.Class instances) after GC pass - AFAIU references are cleared
automatically before enqueuing? I suppose we are not going to
introduce inter-phase communication between VM and GC...

Note this is an irreversible condition: objects that are unreachable can
never become reachable again, except through very specific methods.

When it sweeps the ReferenceQueue for unreachable classloaders, the VM
places the unreachable classloaders in a queue of classloaders that are
candidates for unloading.  This queue is part of the root set of the VM.
Strongly referenced now I suppose.

 A classloader in this queue is unreachable from the heap, and can be
unloaded when there are no objects of any class it has loaded.
So if the VM decides it is time to try unloading, it should:
1) Check if the full epoch has passed;
2) for each unloadable CL, scan corresponding vtables;
3) if none of the vtables were marked reachable, drop the CL from root
set completely and clean corresponding native structures; Java
instances will be reclaimed at nearest GC iteration;
4) Reset "epoch marker" and vtable words.

Do I get it right?



This is where my mechanism comes into play.

If an object executes getClass() then its classloader is removed from
the unloadable classloader queue, its weak reference gets recreated  and
we're back at the initial state.  My guess is that this is a pretty
infrequent method call.

I think this stage of the algorithm is easy in performance terms -
difficult in terms of proving correctness, but if you have an efficient
reachability mechanism for classes I think the building blocks are
there, and the subtleties are nothing that a talented engineer can't solve.

Yes, a bit complicated. Taking into account the issues with
ReferenceQueue above, I'd rather suggest the following:

1) The j.l.Class and defining CL have mutual strong references, as said above.
2) Normally, the VM reports all CLs as strong roots thus preserving
them from premature reclamation;
3) When the VM decides (by whatever heuristic) it is time to perform
unloading, it checks epoch invariant and scans all vtables for all
CLs;
4) if a CL has no "reachable" vtables, it is moved to
unloading-candidates collection and reported as a weak root, otherwise
it remain in the strong root set.
5) If the nearest GC clears some of the weak references above, do
corresponding natives cleanup and return survived CLs to normal root
set.
6) Reset all data: epoch/vtables/etc and return back to 2).

I believe this is less disruptive to component interfaces and requires
less support on GC side.



I'm not 100% sure what your counter-proposal is: I recall 2 approaches
from the mailing list:
1) Each object has an additional word in its header that points back to
   its j.l.Class object, and we proceed from here.

Given that the mean object size is ~28 bytes, this proposal adds 14% to
each object size.  This increases the frequency of GC by 14% and incurs
a 14% slowdown.  Of course this is an oversimplification but a 14%
slowdown is a pretty lousy starting point to argue from.

2) The existing pointer in the GC header is traced during GC time.

The average number of pointers per object (excluding the vtable) is
between 1.5 and 2 for the majority of benchmarks I have looked at
(footnote: if you know something different, drop me a line) (geometric
mean 1.78 for {specJVM, pseudoJBB and DaCapo 20051009}).  Tracing one
additional reference per object will therefore increase the cost of GC
by ~60% on average.  Again oversimplification but indicative.  If we
assume that GC accounts for 10% of runtime (more or less depending on
heap size), this is a runtime overhead of 6%.
Looks reasonable as upper estimation, it would be nice to look at a
live data though. Aleksey?

My proposal has been measured at ~1% overhead in GC time, or 0.1% in
execution time (caveats as above).  If there is some complexity in
establishing classloader reachability from this basis, I would assume it
can easliy be absorbed.

Therefore I think my proposal, while not complete, can form the basis of
an efficient complete system for class unloading.

Nice thing about "automitic" approach is that it does not imply
slightest limitation on GC policy and adopts to any future algorithms
improvements. It's a pity the same wasn't (can't be?) said about the
voted idea.
Actually some tuning for the "automitic" approach is possible, like
keeping all j.l.Class & VT instances in a special space which is
collected only periodically, so GC does not need to trace VTs all the
time.

--
Regards,
Alexey


(PS: I'd *love* to be proven wrong)

cheers,
Robin

> Regards,
> Robin
>
>>
>>
>> Aleksey.
>>
>>
>> On 11/8/06, Robin Garner <[EMAIL PROTECTED]> wrote:
>>>
>>> Pavel Pervov wrote:
>>> > Robin,
>>> >
>>> > The kind of model I had in mind was along the lines of:
>>> >> - VM maintains a linked list (or other collection type) of the
>>> currently
>>> >> loaded classloaders, each of which in turn maintains the
>>> collection of
>>> >> classes loaded by that type.  The sweep of classloaders goes
>>> something
>>> >> like:
>>> >>
>>> >> for (ClassLoader cl : classLoaders)
>>> >>   for (Class c : cl.classes)
>>> >>     cl.reachable |= c.vtable.reachable
>>> >
>>> >
>>> > This is not enough. There are may be live j/l/Class'es and
>>> > j/l/Classloader's
>>> > in the heap. Even though no objects of any classes loaded by a
>>> particual
>>> > class loader are available in the heap, if we have live reference to
>>> > j/l/ClassLoader itself, it just can't be unloaded.
>>>
>>> OK, well how about keeping a weak reference to the j.l.ClassLoader
>>> object instead of a strong one.  When the reference becomes (strong)ly
>>> unreachable, invoke the class-unloading phase.
>>>
>>> To me the key issue from a performance POV is the reachability of
>>> classes from objects in the heap.  I don't pretend to have an answer to
>>> the other questions---the performance critical one is the one I have
>>> addressed, and I accept there may be many solutions to this part of the
>>> question.
>>>
>>> > I believe that a separate heap trace pass, different from the standard
>>> >> GC, that visited vtables and reachable resources from there would
>>> also
>>> >> be a viable solution.  As mentioned in an earlier post, writing
>>> this in
>>>
>>> >> MMTk (where a heap trace operation is a class that you can easily
>>> >> subtype to do this) would be easy.
>>> >>
>>> >> One of the advantages of my other proposal is that it can be
>>> implemented
>>> >> in the VM independent of the GC to some extent.  This additional
>>> >> mark/scan phase may or may not be easy to implement, depending on the
>>> >> structure of DRLVM GCs, which is something I haven't explored.
>>> >
>>> >
>>> > DRLVM may work with (potentially) any number of GCs. Designing class
>>> > unloading the way, which would require mark&scan cooperation from
>>> GC, is
>>> > not
>>> > generally a good idea (from my HPOV).
>>>
>>> That's what I gathered.  hence my proposal.
>>>
>>> cheers
>>>
>>> --
>>> Robin Garner
>>> Dept. of Computer Science
>>> Australian National University
>>>
>>
>
>


--
Robin Garner
Dept. of Computer Science
Australian National University

Reply via email to