Hi Graham, I don't have time at the moment to consider some of the bigger issues you raise but I would like to echo Hardy's comments. Historically many Dspace installations have had little content and been lightly used. I think this has allowed us to develop without much consideration for performance. I would like to see the sort of testing you have done becoming part of our procedures prior to release, rather than being left to the bigger sites, such as BioMed, to sort out after the event.
Cheers, Robin. Robin Taylor Main Library University of Edinburgh Tel. 0131 6513808 > -----Original Message----- > From: Pottinger, Hardy J. [mailto:[email protected]] > Sent: 21 September 2010 18:27 > To: Graham Triggs; Tom De Mulder > Cc: [email protected]; Damian Marinaccio > Subject: Re: [Dspace-tech] tomcat reporting memory leak? > > Hi, Graham, for what it's worth, I'll stand with you. :-) I > think addressing the issues you've discovered is really > important. Here's an idea: how about some new unit and/or > performance tests that check if a class and/or app is > unloading cleanly? In other words, would it be possible to > express the tests you have in such a way that they could be > part of the new testing framework? Are there JIRA issues, > and/or patches for what you have already found/fixed? > > --Hardy > > > -----Original Message----- > > From: Graham Triggs [mailto:[email protected]] > > Sent: Tuesday, September 21, 2010 6:52 AM > > To: Tom De Mulder > > Cc: [email protected]; Damian Marinaccio > > Subject: Re: [Dspace-tech] tomcat reporting memory leak? > > > > On 20 September 2010 15:59, Tom De Mulder <[email protected]> wrote: > > > > > > On Mon, 20 Sep 2010, Damian Marinaccio wrote: > > > > > I'm seeing the following log messages in catalina.out: > > > > > [...] > > > > > SEVERE: The web application [] appears to have > started a thread > > named [FinalizableReferenceQueue] but has failed to stop it. > > > This is very likely to create a memory leak. > > > > > > There are quite a few memory leaks in DSpace. We have a > cronjob to > > restart > > Tomcat nightly, because otherwise it'll break the next day. > > > > > > > > > > Hi all, > > > > Oh, welcome to my world!! > > > > I'm going to start off by pointing out that the majority of DSpace > > code is actually quite well behaved. Going back to the > codebase circa > > 1.4.2 / 1.5, and using the JSP user interface - I've got *thirty* > > spearate DSpace repositories / applications running in a > single Tomcat > > instance, which has operated without a restart in over 90 days. And > > whilst be able to undeploy and redeploy any of those > applications at > > will - or just reload them so that they pick up new configuration. > > > > That does require a bit of careful setup / teardown in the context > > listeners (that wasn't always part of the DSpace code), and > you need > > to get certain JARs - particularly the database/pooling > drivers - out > > of the web applications entirely and into the shared level > of Tomcat. > > Most of that is actually just good / recommended practise > for systems > > administration of a Java application server anyway. > > > > I was careful to point out that I have achieved that with > pre-1.6 code > > and JSP only. Both 1.6 and XML ui (of any age) change the > landscape. > > XML ui has always taken a large chunk of resources, > although whilst it > > was still based on Cocoon 2.1, I managed to at least clean up it's > > startup / shutdown behaviour by repairing it's logging > handler. This > > behaviour has changed with Cocoon 2.2, and I'll come back > to that shortly. > > > > So, 1.6 - I've been doing some work on the resource usage and clean > > loading/unloading of both JSP and XML using 1.6.2 recently, and > > neither are clean out of the box. > > > > The first issue you run into is the > FinalizableReferenceQueue noted in > > the stack trace above. This is coming from a reference map in > > reflectutils - and was found to be a cleanup problem in course of > > DSpace > > 2 development (the kernel / services framework was backported from > > that work). I added a LifecycleManager to reflectutils that was > > released as version 0.9.11 that allows the internal > structures to be > > shutdown cleanly, and implemented this as part of DSpace 2, however > > this appears to have been ignored in the backport. > > > > So, with the reflectutils/Lifecycle changes, and careful > placement of > > JARs, etc. I did get the JSP ui to unload cleanly last > week. I would > > note that I didn't stress the application too heavily, so > there may be > > some operations that might trigger different code paths > that are still > > a problem, but at the baseline it was working correctly. > > > > XML ui has proven to be a somewhat more challenging beast. > I first ran > > into two problems that are inside Cocoon 2.2 itself - 1) in the > > sitemap processing, it's using a stack inside a ThreadLocal, but it > > never removes the stack when it empties it, and 2) in one class > > relating to flowscript handling, it does not clean up the Mozilla > > Rhino engine correctly when it's finished using it (curiously, it's > > used in a number of places, and everywhere else it appears to be > > structured correctly to clean up - just this one class is > screwed up). > > > > With locally patched versions of the sitemap and flowscript > JARs from > > Cocoon (the ThreadLocal patch isn't really guaranteed to > not leak in > > unexpected circumstances - but it was sufficient to remove > the problem > > in the scope of this testing. Basically, ThreadLocal is really > > dangerous to use), I then ran into another issue, this time > with the > > CachingService that was backported. > > > > With XML ui, it's using the RequestScope function of the caching > > service (it didn't appear to be exercising this part with > JSP - that > > may just be because I only ran through limited code paths). For the > > RequestScope, it's tying the cache not to the request > object... but to a ThreadLocal. > > And that ThreadLocal isn't being cleaned up at the end of > the request. > > (The shutdown code is also incapable of doing the job it's intended > > for, as it will only ever execute on a single thread, and > not see all > > the other threads that may have processed requests). > > > > There is a high probability of this leaking memory all over > the place, > > and there is also the nasty potential of leak information across > > requests that is undesirable. > > > > I made another hacked version that removes the ThreadLocal, but > > replicates a lot of it's thread affinity behaviour (so, it > still has > > the nasty side effects of the implementation, but at least > removed the > > hold the system had over the application resources). XML ui was > > *still* not unloading correctly, and at this point the profiler > > stopped giving me pointers to strong references that were > being held. > > So right now I'm not sure what else is up - but there is at > least one > > more troubling part of the code remaining in there. > > > > I have repeatedly warned about the consequences of > overly-complicated > > code and using 'clever tricks' under the hood. A lot of what I've > > mentioned above *can* be replaced with a much simpler architecture, > > that's much easier to understand, easier to maintain, and does not > > have the same problems. > > > > If this matters to you, then it's going to take more than > just me to > > stand up and say this. > > > > G > > > -------------------------------------------------------------- > ---------------- > Start uncovering the many advantages of virtual appliances > and start using them to simplify application deployment and > accelerate your shift to cloud computing. > http://p.sf.net/sfu/novell-sfdev2dev > _______________________________________________ > DSpace-tech mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/dspace-tech > -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. ------------------------------------------------------------------------------ Start uncovering the many advantages of virtual appliances and start using them to simplify application deployment and accelerate your shift to cloud computing. http://p.sf.net/sfu/novell-sfdev2dev _______________________________________________ DSpace-tech mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/dspace-tech

