Hi, Graham, for what it's worth, I'll stand with you. :-) I think addressing the issues you've discovered is really important. Here's an idea: how about some new unit and/or performance tests that check if a class and/or app is unloading cleanly? In other words, would it be possible to express the tests you have in such a way that they could be part of the new testing framework? Are there JIRA issues, and/or patches for what you have already found/fixed?
--Hardy > -----Original Message----- > From: Graham Triggs [mailto:grahamtri...@gmail.com] > Sent: Tuesday, September 21, 2010 6:52 AM > To: Tom De Mulder > Cc: dspace-tech@lists.sourceforge.net; Damian Marinaccio > Subject: Re: [Dspace-tech] tomcat reporting memory leak? > > On 20 September 2010 15:59, Tom De Mulder <td...@cam.ac.uk> wrote: > > > On Mon, 20 Sep 2010, Damian Marinaccio wrote: > > > I'm seeing the following log messages in catalina.out: > > > [...] > > > SEVERE: The web application [] appears to have started a thread > named [FinalizableReferenceQueue] but has failed to stop it. > > This is very likely to create a memory leak. > > > There are quite a few memory leaks in DSpace. We have a cronjob to > restart > Tomcat nightly, because otherwise it'll break the next day. > > > > > Hi all, > > Oh, welcome to my world!! > > I'm going to start off by pointing out that the majority of DSpace code > is actually quite well behaved. Going back to the codebase circa 1.4.2 / > 1.5, and using the JSP user interface - I've got *thirty* spearate > DSpace repositories / applications running in a single Tomcat instance, > which has operated without a restart in over 90 days. And whilst be able > to undeploy and redeploy any of those applications at will - or just > reload them so that they pick up new configuration. > > That does require a bit of careful setup / teardown in the context > listeners (that wasn't always part of the DSpace code), and you need to > get certain JARs - particularly the database/pooling drivers - out of > the web applications entirely and into the shared level of Tomcat. Most > of that is actually just good / recommended practise for systems > administration of a Java application server anyway. > > I was careful to point out that I have achieved that with pre-1.6 code > and JSP only. Both 1.6 and XML ui (of any age) change the landscape. XML > ui has always taken a large chunk of resources, although whilst it was > still based on Cocoon 2.1, I managed to at least clean up it's startup / > shutdown behaviour by repairing it's logging handler. This behaviour has > changed with Cocoon 2.2, and I'll come back to that shortly. > > So, 1.6 - I've been doing some work on the resource usage and clean > loading/unloading of both JSP and XML using 1.6.2 recently, and neither > are clean out of the box. > > The first issue you run into is the FinalizableReferenceQueue noted in > the stack trace above. This is coming from a reference map in > reflectutils - and was found to be a cleanup problem in course of DSpace > 2 development (the kernel / services framework was backported from that > work). I added a LifecycleManager to reflectutils that was released as > version 0.9.11 that allows the internal structures to be shutdown > cleanly, and implemented this as part of DSpace 2, however this appears > to have been ignored in the backport. > > So, with the reflectutils/Lifecycle changes, and careful placement of > JARs, etc. I did get the JSP ui to unload cleanly last week. I would > note that I didn't stress the application too heavily, so there may be > some operations that might trigger different code paths that are still a > problem, but at the baseline it was working correctly. > > XML ui has proven to be a somewhat more challenging beast. I first ran > into two problems that are inside Cocoon 2.2 itself - 1) in the sitemap > processing, it's using a stack inside a ThreadLocal, but it never > removes the stack when it empties it, and 2) in one class relating to > flowscript handling, it does not clean up the Mozilla Rhino engine > correctly when it's finished using it (curiously, it's used in a number > of places, and everywhere else it appears to be structured correctly to > clean up - just this one class is screwed up). > > With locally patched versions of the sitemap and flowscript JARs from > Cocoon (the ThreadLocal patch isn't really guaranteed to not leak in > unexpected circumstances - but it was sufficient to remove the problem > in the scope of this testing. Basically, ThreadLocal is really dangerous > to use), I then ran into another issue, this time with the > CachingService that was backported. > > With XML ui, it's using the RequestScope function of the caching service > (it didn't appear to be exercising this part with JSP - that may just be > because I only ran through limited code paths). For the RequestScope, > it's tying the cache not to the request object... but to a ThreadLocal. > And that ThreadLocal isn't being cleaned up at the end of the request. > (The shutdown code is also incapable of doing the job it's intended for, > as it will only ever execute on a single thread, and not see all the > other threads that may have processed requests). > > There is a high probability of this leaking memory all over the place, > and there is also the nasty potential of leak information across > requests that is undesirable. > > I made another hacked version that removes the ThreadLocal, but > replicates a lot of it's thread affinity behaviour (so, it still has the > nasty side effects of the implementation, but at least removed the hold > the system had over the application resources). XML ui was *still* not > unloading correctly, and at this point the profiler stopped giving me > pointers to strong references that were being held. So right now I'm not > sure what else is up - but there is at least one more troubling part of > the code remaining in there. > > I have repeatedly warned about the consequences of overly-complicated > code and using 'clever tricks' under the hood. A lot of what I've > mentioned above *can* be replaced with a much simpler architecture, > that's much easier to understand, easier to maintain, and does not have > the same problems. > > If this matters to you, then it's going to take more than just me to > stand up and say this. > > G ------------------------------------------------------------------------------ Start uncovering the many advantages of virtual appliances and start using them to simplify application deployment and accelerate your shift to cloud computing. http://p.sf.net/sfu/novell-sfdev2dev _______________________________________________ DSpace-tech mailing list DSpace-tech@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dspace-tech