jwagenleitner wrote:> I think performance in general and not just under concurrent use is extremely important for ClassInfo. My understanding is that the static cache it holds of ClassInfo's is queried on every method call (at least in dynamic groovy). That is probably why the current hash-based caches are used to save from the O(n) retrieval from a globalClassSet which is implemented as a linked list.
Too bad, I see no way at the moment how to get "on-the-fly" garbage collection of ClassInfo and fast concurrent access to it.
What I quickly tried was to change the type of object stored with globalClassValue from ClassInfo to other types, but all failed in some ways (I did not test the full matrix of (Java 7 ClassValue or not) and (same class loader as Groovy for compiled Groovy script or not), but gave up once something failed): - WeakReference<ClassInfo>: Garbage collected even if the class is still referenced.
- SoftReference<ClassInfo>: OutOfMemoryError but took longer than usual.- WeakHashMap<Class,ClassInfo>: i.e. a WeakHashMap with just a single entry (so no synchronization needed) - that was actually my biggest hope, but: OutOfMemoryError.
jwagenleitner wrote: >>> So why not have the GroovyClassLoader keep a set of all classes it >>> compiled itself and were loaded and offer a new ~ >>> GroovyClassLoader#finalCleanup() method that removes meta information>>> for all these classes so that they would become immediately eligible for
>>> garbage collection? (I guess InvokerHelper.removeClass(clazz) and >>> Introspector.flushFromCaches(clazz), but whatever is needed...) >>>> GroovyClassLoader (GCL) actually represents a tree of class loaders. for each compilation GCL will spawn an instance of InnerLoader. Since two different compilations are supposed to know each others classes a list of classes is kept in GCL itself (see classCache). The inner loader itself is not referenced by GCL. Because of that list GCL has the clearCache method to remove classes from previous compilations.
>>>> Why did we use this structure? GCL is supposed to offer you the possibility to compile the same class multiple times. That means you will get the same class multiple times. At the same time a class must be defined under the same name only once in a given defining class loader. As a result trying to define a class, that already exists under that name results in an error. A classloading constraint is actually to return the same class instance each time you request a class with a certain name. Is implies the error before.... it also means GCL is breaking those constraints knowingly.
>>>> Anyway... I think such a cleanup method is misplaced in GCL, since it spans beyond the classloader... how about GroovySystem?
>>> I agree that if a method were added I don't think GCL is the right place and that something like GroovySystem#removeClass(Class) or GroovySystem#flushFromCaches(Class) would be good.
I guess this would help a little, but - again - as soon as you use e.g. a closure in a script, you have more than one class from a compilation, and I guess this happens often.
In cases where a script was compiled at runtime, that would have to be by a GroovyClassLoader$InnerLoader (which extends GroovyClassLoader). Now, that $InnerLoader could be obtained with script.getClass().getClassLoader() - you could tell so by whether it is a GroovyClassLoader#InnerLoader - and tell it to clean up all classes it loaded which would all be related to the only script it compiled. If you want to offer that - which I think would probably make sense - you would have to add a new method to GroovyClassLoader or at least to $InnerLoader - like #cleanupCompiledScripts() or whatever - and then offer its functionality also from GroovySystem for convenience, probably in two variants, one that really only removes just the indicated class and one extends to all classes compiled from the same script in case it was compiled at runtime from a Groovy script.
This will certainly also not cover all use cases (Groovy classes loaded from the file system by an URLClassLoader, for example, there I see now way how to track which classes would belong to which main script etc.), but I think the use case of scripts compiled at runtime would still justify it. It would offer a relatively clean way to make all classes from a script compilation available for GC more quickly.
I would maybe also offer a similar method for ConfigSlurper, for convenience, because that also implicitly always compiles a Groovy script (the config), thus filling Metaspace/PermGen - with almost certainly nobody expecting this offhand - so that users would not have to explicitly get the class from the parsed object and then call the removal function of GroovySystem. (There a different type of loader seems to be used, RootLoader, but also usually compilation only gives a single class, I would estimate.)
Finally, Groovy class loading is so dynamic/flexible that many things become impossible to untangle (I am just waiting now for Jochen Theodorou to say that some things I wrote above are not always like that), so if my hopes really fade regarding a prospect for "on-the-fly" GC in the relatively near future (before a Groovy 3), I might consider to add similar cleanup functionality to Grengine, where I estimate it could be done in a much more structured and controlled way.
Script:
def script = new GroovyShell().parse("99")
println script.getClass().getClassLoader()
def a = new ConfigSlurper().parse("b{c=5}")
assert a.b.c == 5
println a.getClass().getClassLoader()
Output:
groovy.lang.GroovyClassLoader$InnerLoader@7f010382
org.codehaus.groovy.tools.RootLoader@5451c3a8
Alain
