Re: ThreadLocal causing memory leak with J2EE applications

robert engels Wed, 10 Sep 2008 08:35:24 -0700

It is basic Java. Threads are not guaranteed to run on any sort ofschedule. If you create lots of large objects in one thread,releasing them in another, there is a good chance you will get an OOM(since the releasing thread may not run before the OOM occurs)...This is not Lucene specific by any means.


It is a misunderstanding on your part about how GC works.

I assume you must at some point be creating new RAMDirectories -otherwise the memory would never really increase, since theIndexReader/enums/etc are not very large...

When you create a new RAMDirectories, you need to BE CERTAIN !!! thatthe other IndexReaders/Searchers using the old RAMDirectory are ALLCLOSED, otherwise their memory will still be in use, which leads toyour OOM...



On Sep 10, 2008, at 10:16 AM, Chris Lu wrote:

I do not believe I am making any mistake. Actually I just got anemail from another user, complaining about the same thing. And I amhaving the same usage pattern.
After the reader is opened, the RAMDirectory is shared by severalobjects.There is one instance of RAMDirectory in the memory, and it isholding lots of memory, which is expected.
If I close the reader in the same thread that has opened it, theRAMDirectory is gone from the memory.If I close the reader in other threads, the RAMDirectory is left inthe memory, referenced along the tree I draw in the first email.
I do not think the usage is wrong. Period.

-------------------------------------
Hi,

   i found a forum post from you here [1] where you mention that you
have a memory leak using the lucene ram directory. I'd like to ask you
if you already have resolved the problem and how you did it or maybe
you know where i can read about the solution. We are using
RAMDirectory too and figured out, that over time the memory
consumption raises and raises until the system breaks down but only
when we performing much index updates. if we only create the index and
don't do nothing except searching it, it work fine.

maybe you can give me a hint or a link,
greetz,
-------------------------------------

--
Chris Lu
-------------------------
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes: http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutesDBSight customer, a shopping comparison site, (anonymous perrequest) got 2.6 Million Euro funding!
On Wed, Sep 10, 2008 at 7:12 AM, robert engels<[EMAIL PROTECTED]> wrote:
Sorry, but I am fairly certain you are mistaken.
If you only have a single IndexReader, the RAMDirectory will beshared in all cases.
The only memory growth is any buffer space allocated by anIndexInput (used in many places and cached).
Normally the IndexInput created by a RAMDirectory do not have anybuffer allocated, since the underlying store is already in memory.
You have some other problem in your code...

On Sep 10, 2008, at 1:10 AM, Chris Lu wrote:
Actually, even I only use one IndexReader, some resources arecached via the ThreadLocal cache, and can not be released unlessall threads do the close action.
SegmentTermEnum itself is small, but it holds RAMDirectory alongthe path, which is big.
--
Chris Lu
-------------------------
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes: http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutesDBSight customer, a shopping comparison site, (anonymous perrequest) got 2.6 Million Euro funding!
On Tue, Sep 9, 2008 at 10:43 PM, robert engels<[EMAIL PROTECTED]> wrote:
You do not need a pool of IndexReaders...
It does not matter what class it is, what matters is the classthat ultimately holds the reference.
If the IndexReader is never closed, the SegmentReader(s) is neverclosed, so the thread local in TermInfosReader is not cleared(because the thread never dies). So you will get oneSegmentTermEnum, per thread * per segment.
The SegmentTermEnum is not a large object, so even if you had 100threads, and 100 segments, for 10k instances, seems hard tobelieve that is the source of your memory issue.
The SegmentTermEnum is cached by thread since it needs toenumerate the terms, not having a per thread cache, would lead tolots of random access when multiple threads read the index - veryslow.
You need to keep in mind, what if every thread was executing asearch simultaneously - you would still have 100x100SegmentTermEnum instances anyway ! The only way to prevent thatwould be to create and destroy the SegmentTermEnum on each call(opening and seeking to the proper spot) - which would be SLOWSLOW SLOW.
On Sep 10, 2008, at 12:19 AM, Chris Lu wrote:
I have tried to create an IndexReader pool and dynamically createsearcher. But the memory leak is the same. It's not related tothe Searcher class specifically, but the SegmentTermEnum inTermInfosReader.
--
Chris Lu
-------------------------
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes: http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutesDBSight customer, a shopping comparison site, (anonymous perrequest) got 2.6 Million Euro funding!
On Tue, Sep 9, 2008 at 10:14 PM, robert engels<[EMAIL PROTECTED]> wrote:A searcher uses an IndexReader - the IndexReader is slow to open,not a Searcher. And searchers can share an IndexReader.
You want to create a single shared (across all threads/users)IndexReader (usually), and create an Searcher as needed anddispose. It is VERY CHEAP to create the Searcher.
I am fairly certain the javadoc on Searcher is incorrect. Thewarning "For performance reasons it is recommended to open onlyone IndexSearcher and use it for all of your searches" is nottrue in the case where an IndexReader is passed to the ctor.
Any caching should USUALLY be performed at the IndexReader level.
You are most likely using the "path" ctor, and that is the sourceof your problems, as multiple IndexReader instances are beingcreated, and thus the memory use.
On Sep 9, 2008, at 11:44 PM, Chris Lu wrote:
On J2EE environment, usually there is a searcher pool withseveral searchers open.The speed to opening a large index for every user is notacceptable.
--
Chris Lu
-------------------------
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes: http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutesDBSight customer, a shopping comparison site, (anonymous perrequest) got 2.6 Million Euro funding!
On Tue, Sep 9, 2008 at 9:03 PM, robert engels<[EMAIL PROTECTED]> wrote:You need to close the searcher within the thread that is usingit, in order to have it cleaned up quickly... usually rightafter you display the page of results.
If you are keeping multiple searcher refs across multiplethreads for paging/whatever, you have not coded it correctly.
Imagine 10,000 users - storing a searcher for each one is notgoing to work...
On Sep 9, 2008, at 10:21 PM, Chris Lu wrote:
Right, in a sense I can not release it from another thread. Butthat's the problem.
It's a J2EE environment, all threads are kind of equal. It'ssimply not possible to iterate through all threads to close thesearcher, thus releasing the ThreadLocal cache.Unless Lucene is not recommended for J2EE environment, this hasto be fixed.
--
Chris Lu
-------------------------
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes: http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutesDBSight customer, a shopping comparison site, (anonymous perrequest) got 2.6 Million Euro funding!
On Tue, Sep 9, 2008 at 8:14 PM, robert engels<[EMAIL PROTECTED]> wrote:Your code is not correct. You cannot release it on anotherthread - the first thread may creating hundreds/thousands ofinstances before the other thread ever runs...
On Sep 9, 2008, at 10:10 PM, Chris Lu wrote:
If I release it on the thread that's creating the searcher, bysetting searcher=null, everything is fine, the memory isreleased very cleanly.My load test was to repeatedly create a searcher on aRAMDirectory and release it on another thread. The test willquickly go to OOM after several runs. I set the heap size tobe 1024M, and the RAMDirectory is of size 250M. Using someprofiling tool, the used size simply stepped up prettyobviously by 250M.
I think we should not rely on something that's a "maybe"behavior, especially for a general purpose library.
Since it's a multi-threaded env, the thread that's creatingthe entries in the LRU cache may not go away quickly(actuallymost, if not all, application servers will try to reusethreads), so the LRU cache, which uses thread as the key, cannot be released, so the SegmentTermEnum which is in the sameclass can not be released.
And yes, I close the RAMDirectory, and the fileMap isreleased. I verified that through the profiler by directlychecking the values in the snapshot.
Pretty sure the reference tree wasn't like this using codebefore this commit, because after close the searcher inanother thread, the RAMDirectory totally disappeared from thememory snapshot.
--
Chris Lu
-------------------------
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes: http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutesDBSight customer, a shopping comparison site, (anonymous perrequest) got 2.6 Million Euro funding!
On Tue, Sep 9, 2008 at 5:03 PM, Michael McCandless<[EMAIL PROTECTED]> wrote:
Chris Lu wrote:
The problem should be similar to what's talked about on thisdiscussion.
http://lucene.markmail.org/message/keosgz2c2yjc7qre?q=ThreadLocal
The "rough" conclusion of that thread is that, technically,this isn't a memory leak but rather a "delayed freeing"problem. Ie, it may take longer, possibly much longer, thanyou want for the memory to be freed.
There is a memory leak for Lucene search from Lucene-1195.(svnr659602, May23,2008)
This patch brings in a ThreadLocal cache to TermInfosReader.
One thing that confuses me: TermInfosReader was already usinga ThreadLocal to cache the SegmentTermEnum instance. What wasadded in this commit (for LUCENE-1195) was an LRU cachestoring Term -> TermInfo instances. But it seems like it'sthe SegmentTermEnum instance that you're tracing below.
It's usually recommended to keep the reader open, and reuse itwhenpossible. In a common J2EE application, the http requests areusuallyhandled by different threads. But since the cache isThreadLocal, the cacheare not really usable by other threads. What's worse, thecache can not be
cleared by another thread!
This leak is not so obvious usually. But my case is usingRAMDirectory,having several hundred megabytes. So one un-released resourceis obvious to
me.

Here is the reference tree:
org.apache.lucene.store.RAMDirectory
 |- directory of org.apache.lucene.store.RAMFile
    |- file of org.apache.lucene.store.RAMInputStream
|- base of org.apache.lucene.index.CompoundFileReader$CSIndexInput
            |- input of org.apache.lucene.index.SegmentTermEnum
|- value of java.lang.ThreadLocal$ThreadLocalMap$Entry
So you have a RAMDir that has several hundred MB stored in it,that you're done with yet through this path Lucene is keepingit alive?
Did you close the RAMDir? (which will null its fileMap andshould also free your memory).
Also, that reference tree doesn't show the ThreadResourcesclass that was added in that commit -- are you sure thisreference tree wasn't before the commit?
Mike
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




--
Chris Lu
-------------------------
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes: http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutesDBSight customer, a shopping comparison site, (anonymous perrequest) got 2.6 Million Euro funding!

Re: ThreadLocal causing memory leak with J2EE applications

Reply via email to