On 20 September 2010 15:59, Tom De Mulder <[email protected]> wrote:

> On Mon, 20 Sep 2010, Damian Marinaccio wrote:
>
> > I'm seeing the following log messages in catalina.out:
> > [...]
> > SEVERE: The web application [] appears to have started a thread named
> [FinalizableReferenceQueue] but has failed to stop it.
> > This is very likely to create a memory leak.
>
> There are quite a few memory leaks in DSpace. We have a cronjob to restart
> Tomcat nightly, because otherwise it'll break the next day.
>


Hi all,

Oh, welcome to my world!!

I'm going to start off by pointing out that the majority of DSpace code is
actually quite well behaved. Going back to the codebase circa 1.4.2 / 1.5,
and using the JSP user interface - I've got *thirty* spearate DSpace
repositories / applications running in a single Tomcat instance, which has
operated without a restart in over 90 days. And whilst be able to undeploy
and redeploy any of those applications at will - or just reload them so that
they pick up new configuration.

That does require a bit of careful setup / teardown in the context listeners
(that wasn't always part of the DSpace code), and you need to get certain
JARs - particularly the database/pooling drivers - out of the web
applications entirely and into the shared level of Tomcat. Most of that is
actually just good / recommended practise for systems administration of a
Java application server anyway.

I was careful to point out that I have achieved that with pre-1.6 code and
JSP only. Both 1.6 and XML ui (of any age) change the landscape. XML ui has
always taken a large chunk of resources, although whilst it was still based
on Cocoon 2.1, I managed to at least clean up it's startup / shutdown
behaviour by repairing it's logging handler. This behaviour has changed with
Cocoon 2.2, and I'll come back to that shortly.

So, 1.6 - I've been doing some work on the resource usage and clean
loading/unloading of both JSP and XML using 1.6.2 recently, and neither are
clean out of the box.

The first issue you run into is the FinalizableReferenceQueue noted in the
stack trace above. This is coming from a reference map in reflectutils - and
was found to be a cleanup problem in course of DSpace 2 development (the
kernel / services framework was backported from that work). I added a
LifecycleManager to reflectutils that was released as version 0.9.11 that
allows the internal structures to be shutdown cleanly, and implemented this
as part of DSpace 2, however this appears to have been ignored in the
backport.

So, with the reflectutils/Lifecycle changes, and careful placement of JARs,
etc. I did get the JSP ui to unload cleanly last week. I would note that I
didn't stress the application too heavily, so there may be some operations
that might trigger different code paths that are still a problem, but at the
baseline it was working correctly.

XML ui has proven to be a somewhat more challenging beast. I first ran into
two problems that are inside Cocoon 2.2 itself - 1) in the sitemap
processing, it's using a stack inside a ThreadLocal, but it never removes
the stack when it empties it, and 2) in one class relating to flowscript
handling, it does not clean up the Mozilla Rhino engine correctly when it's
finished using it (curiously, it's used in a number of places, and
everywhere else it appears to be structured correctly to clean up - just
this one class is screwed up).

With locally patched versions of the sitemap and flowscript JARs from Cocoon
(the ThreadLocal patch isn't really guaranteed to not leak in unexpected
circumstances - but it was sufficient to remove the problem in the scope of
this testing. Basically, ThreadLocal is really dangerous to use), I then ran
into another issue, this time with the CachingService that was backported.

With XML ui, it's using the RequestScope function of the caching service (it
didn't appear to be exercising this part with JSP - that may just be because
I only ran through limited code paths). For the RequestScope, it's tying the
cache not to the request object... but to a ThreadLocal. And that
ThreadLocal isn't being cleaned up at the end of the request. (The shutdown
code is also incapable of doing the job it's intended for, as it will only
ever execute on a single thread, and not see all the other threads that may
have processed requests).

There is a high probability of this leaking memory all over the place, and
there is also the nasty potential of leak information across requests that
is undesirable.

I made another hacked version that removes the ThreadLocal, but replicates a
lot of it's thread affinity behaviour (so, it still has the nasty side
effects of the implementation, but at least removed the hold the system had
over the application resources). XML ui was *still* not unloading correctly,
and at this point the profiler stopped giving me pointers to strong
references that were being held. So right now I'm not sure what else is up -
but there is at least one more troubling part of the code remaining in
there.

I have repeatedly warned about the consequences of overly-complicated code
and using 'clever tricks' under the hood. A lot of what I've mentioned above
*can* be replaced with a much simpler architecture, that's much easier to
understand, easier to maintain, and does not have the same problems.

If this matters to you, then it's going to take more than just me to stand
up and say this.

G
------------------------------------------------------------------------------
Start uncovering the many advantages of virtual appliances
and start using them to simplify application deployment and
accelerate your shift to cloud computing.
http://p.sf.net/sfu/novell-sfdev2dev
_______________________________________________
DSpace-tech mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/dspace-tech

Reply via email to