Hi, Graham, for what it's worth, I'll stand with you. :-) I think addressing 
the issues you've discovered is really important. Here's an idea: how about 
some new unit and/or performance tests that check if a class and/or app is 
unloading cleanly? In other words, would it be possible to express the tests 
you have in such a way that they could be part of the new testing framework? 
Are there JIRA issues, and/or patches for what you have already found/fixed?

--Hardy 

> -----Original Message-----
> From: Graham Triggs [mailto:grahamtri...@gmail.com]
> Sent: Tuesday, September 21, 2010 6:52 AM
> To: Tom De Mulder
> Cc: dspace-tech@lists.sourceforge.net; Damian Marinaccio
> Subject: Re: [Dspace-tech] tomcat reporting memory leak?
> 
> On 20 September 2010 15:59, Tom De Mulder <td...@cam.ac.uk> wrote:
> 
> 
>       On Mon, 20 Sep 2010, Damian Marinaccio wrote:
> 
>       > I'm seeing the following log messages in catalina.out:
> 
>       > [...]
> 
>       > SEVERE: The web application [] appears to have started a thread
> named [FinalizableReferenceQueue] but has failed to stop it.
>       > This is very likely to create a memory leak.
> 
> 
>       There are quite a few memory leaks in DSpace. We have a cronjob to
> restart
>       Tomcat nightly, because otherwise it'll break the next day.
> 
> 
> 
> 
> Hi all,
> 
> Oh, welcome to my world!!
> 
> I'm going to start off by pointing out that the majority of DSpace code
> is actually quite well behaved. Going back to the codebase circa 1.4.2 /
> 1.5, and using the JSP user interface - I've got *thirty* spearate
> DSpace repositories / applications running in a single Tomcat instance,
> which has operated without a restart in over 90 days. And whilst be able
> to undeploy and redeploy any of those applications at will - or just
> reload them so that they pick up new configuration.
> 
> That does require a bit of careful setup / teardown in the context
> listeners (that wasn't always part of the DSpace code), and you need to
> get certain JARs - particularly the database/pooling drivers - out of
> the web applications entirely and into the shared level of Tomcat. Most
> of that is actually just good / recommended practise for systems
> administration of a Java application server anyway.
> 
> I was careful to point out that I have achieved that with pre-1.6 code
> and JSP only. Both 1.6 and XML ui (of any age) change the landscape. XML
> ui has always taken a large chunk of resources, although whilst it was
> still based on Cocoon 2.1, I managed to at least clean up it's startup /
> shutdown behaviour by repairing it's logging handler. This behaviour has
> changed with Cocoon 2.2, and I'll come back to that shortly.
> 
> So, 1.6 - I've been doing some work on the resource usage and clean
> loading/unloading of both JSP and XML using 1.6.2 recently, and neither
> are clean out of the box.
> 
> The first issue you run into is the FinalizableReferenceQueue noted in
> the stack trace above. This is coming from a reference map in
> reflectutils - and was found to be a cleanup problem in course of DSpace
> 2 development (the kernel / services framework was backported from that
> work). I added a LifecycleManager to reflectutils that was released as
> version 0.9.11 that allows the internal structures to be shutdown
> cleanly, and implemented this as part of DSpace 2, however this appears
> to have been ignored in the backport.
> 
> So, with the reflectutils/Lifecycle changes, and careful placement of
> JARs, etc. I did get the JSP ui to unload cleanly last week. I would
> note that I didn't stress the application too heavily, so there may be
> some operations that might trigger different code paths that are still a
> problem, but at the baseline it was working correctly.
> 
> XML ui has proven to be a somewhat more challenging beast. I first ran
> into two problems that are inside Cocoon 2.2 itself - 1) in the sitemap
> processing, it's using a stack inside a ThreadLocal, but it never
> removes the stack when it empties it, and 2) in one class relating to
> flowscript handling, it does not clean up the Mozilla Rhino engine
> correctly when it's finished using it (curiously, it's used in a number
> of places, and everywhere else it appears to be structured correctly to
> clean up - just this one class is screwed up).
> 
> With locally patched versions of the sitemap and flowscript JARs from
> Cocoon (the ThreadLocal patch isn't really guaranteed to not leak in
> unexpected circumstances - but it was sufficient to remove the problem
> in the scope of this testing. Basically, ThreadLocal is really dangerous
> to use), I then ran into another issue, this time with the
> CachingService that was backported.
> 
> With XML ui, it's using the RequestScope function of the caching service
> (it didn't appear to be exercising this part with JSP - that may just be
> because I only ran through limited code paths). For the RequestScope,
> it's tying the cache not to the request object... but to a ThreadLocal.
> And that ThreadLocal isn't being cleaned up at the end of the request.
> (The shutdown code is also incapable of doing the job it's intended for,
> as it will only ever execute on a single thread, and not see all the
> other threads that may have processed requests).
> 
> There is a high probability of this leaking memory all over the place,
> and there is also the nasty potential of leak information across
> requests that is undesirable.
> 
> I made another hacked version that removes the ThreadLocal, but
> replicates a lot of it's thread affinity behaviour (so, it still has the
> nasty side effects of the implementation, but at least removed the hold
> the system had over the application resources). XML ui was *still* not
> unloading correctly, and at this point the profiler stopped giving me
> pointers to strong references that were being held. So right now I'm not
> sure what else is up - but there is at least one more troubling part of
> the code remaining in there.
> 
> I have repeatedly warned about the consequences of overly-complicated
> code and using 'clever tricks' under the hood. A lot of what I've
> mentioned above *can* be replaced with a much simpler architecture,
> that's much easier to understand, easier to maintain, and does not have
> the same problems.
> 
> If this matters to you, then it's going to take more than just me to
> stand up and say this.
> 
> G


------------------------------------------------------------------------------
Start uncovering the many advantages of virtual appliances
and start using them to simplify application deployment and
accelerate your shift to cloud computing.
http://p.sf.net/sfu/novell-sfdev2dev
_______________________________________________
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech

Reply via email to