Etienne, This is a good design, thanks. Conceptually, reference counting in the VM is somewhat similar to Aleksey's proposal 1, if I understand correctly. This design also requires quite a few hand-offs between the VM and GC. In DRLVM, the problem is that we have quite a few GC's, not all within our control. However, it seems to me that we can either desire to make unloading automatic, in which case, we will need things like java vtables etc and leave most things to the GC. Or we can do refcounting or tracing in the VM, and work lock step with the GC(s). I am not sure which is the better way.
Thanks, Rana On 10/30/06, Etienne Gagnon <[EMAIL PROTECTED]> wrote:
Hi all, Here's a more structured proposal for a simple and effective implementation of class unloading support. In accordance with Section 2.17.8 of the JVM spec, class unloading (and its related native resource cleanup) can only happen when the class loader instance becomes unreachable. For this to happen, we put in place the following things: 1- Each class loader is represented by some VM internal structure. [We'll call it the "class loader structure"]. 2- Each class loader internal structure, except (optionally) the bootstrap class loader, maintains a weak reference to an object instance of class ClassLoader (or some subclass). The Java instance has some opaque pointer back to the internal VM structure. The Java instance is usually created before the internal VM structure. The instance constructor is usually in charge of creating the internal VM structure. [We'll call it the "class loader instance"] 3- Each class loader instance maintains a collection of loaded classes. A class/interface is never removed from this collection. This collection maintains "hard" (i.e. "not weak") references to classes/interfaces. 4- [Informative] A class loader instance is also most likely to maintain a collection of classes for which it has "initiated" class loading. This collection should use hard references (as weak references won't lead to earlier class loading). 5- Each class loader instance maintains a hard reference to its parent class loader. This reference is (optionally) null if the parent is the bootstrap class loader. 6- Each j.l.Class instance maintains a hard reference to the class loader instance of the class loader that has loaded it. [This is not the "initiating" loaders, but really the "loading" loader]. 7- Each class loader structure maintains a set of boolean flags, one flag per "non-nursery" garbage collected area (even when thread-local heaps are used). The flag is set when an instance of a class loaded by this class leader is moved into the related GC-area. The flag is unset when the GC-area is emptied, or (optionally) when it can be determined that no instance of a class loaded by this class loader remains in the GC-area. This is best implemented as follows: a) use an unconditional write of "true" in the flag every time an object is moved into the GC-area by the garbage collector, b) unset the related flag in "all" class loader structures just before collecting a GC-area, then setting the flag back when an object survives in the area. 8- Each method invocation frame maintains a hard reference to either its surrounding instance (in case of instance methods, i.e. (invokevirtual, invokeinterface, and invokespecial) or its surrounding class (invokestatic). This is already required for synchronized methods (it's not a good idea to allow the instance to be collected before the end of a synchronized instance method call; yep, learned the hard way in SableVM...) So, the "overhead" is quite minimal. The importance of this is in the correctness of not letting a class loader to die while a static/instance method of a class loaded by it is still active, leading to premature release of native resources (such as jitted code, etc.). 9- A little magic is required to prevent premature collection of a class loader instance and its loaded j.l.Class instances (see [3-] above), as object instances do not maintain a hard reference to their j.l.Class instance, yet we want to preserve the correctness of Object.getClass(). So, the simplest approach is to maintain a hard reference in a class loader structure to its class loader instance (in addition to the weak reference in [2-] above). This reference is kept always set (thus preventing collection of the class loader instance), except when *all* the following conditions are met: a) All nurseries are empty. b) All GC-area flags are unset. Actually, for making this practical and preserving correctness, it's a little trickier. It requires a 2-step process, much like the object-finalization dance. Here's a typical example: On a major collection, where all nurseries are collected, and some (but not necessary all) other GC-areas are collected, we do the following sequence of actions: a) All class loader structures are visited. All flags related to non-nursery GC-areas that we intend to collect are unset. If this leads to *all* flags to be unset, the hard reference to the class loader instance is set to NULL (thus enabling, possibly, the collection of the class loader instance). b) The garbage collection cycle is started and proceeds as usual. Note that the work mandated in [7-] above is also done, which might lead to setting back some flags in class loader structures that had all their flags unset in [a)]. c) After the initial garbage collection is applied, and just before the usual treatment of weak references (where they are set to NULL when pointing to a collected object), all class loader structures are visited again. The hard pointer of every class loader structure that has any flag set is set back to point to the class loader instance if it was NULL (same as how object instances are preserved for finalization). d) If [c)] has triggered any change (i.e. it mandates the survival of additional class loader instances that were due to die), the garbage collection cycle is "extended" to rescue the additional class loader instances and all objects they can reach. e) Any additional work of the garbage collection cycle is done (e.g. soft, weak, and phantom references, finalization handling). f) All class loader structures are visited again. Every structure for which the weak reference has NOT been set to NULL has its hard reference set to the weak reference target. Every structure for which the weak reference has been set to NULL is now ready to be unloaded ( i.e. release all of its native resources, including jitted code, class information, method information, vtables, and so on). In addition,I highly recommend using the approach proposed in Chapter 3 of http://sablevm.org/people/egagnon/gagnon-phd.pdf for managing class-loader related memory. It has many advantages: 1- No "header space" overhead for very small allocations. [This is a typical "hidden" space overhead of malloc() implementations to allow for later free() calls]. 2- Minimal memory fragmentation. [Allocation only happens in large blocks]. 3- Simple and very efficient allocation. [No overhead for complex management of freeing small areas later]. 4- Efficient freeing of large memory blocks on class unloading. 5- Possibility of clever usage of this memory; see Chapter 4 of the same document for the implementation of sparse interface virtual tables enabling invokeinterface at the simple cost of invokevirtual. :-) I hope this is useful to both projects [drlvm][sablevm] :-) Etienne (C) 2006 by Etienne M. Gagnon <[EMAIL PROTECTED]> This text is licensed under the Apache License, Version 2.0. [You may add this document in svn; I am willing to sign the required Apache agreement to make it so, if you intend to use it in drlvm's implementation]. -- Etienne M. Gagnon, Ph.D. http://www.info2.uqam.ca/~egagnon/ SableVM: http://www.sablevm.org/ SableCC: http://www.sablecc.org/