Re: ThreadLocal causing memory leak with J2EE applications

2008-09-13 Thread Chris Lu

Just confirmed the fix for this problem is ready in patch LUCENE-1383

Thanks Robert Engels for arguing with me and understand the problem 
quickly, and contributed a ClosableThreadLocal class, although the 
problem itself is hard to reproduce for him, and thanks Michael 
McCandless for fixing the problem s quickly.


--
Chris Lu
-
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes: 
http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
DBSight customer, a shopping comparison site, (anonymous per request) 
got 2.6 Million Euro funding!


Michael McCandless wrote:


Yeah I think that's the right approach.

I'll code it up.

Mike

robert engels wrote:

I think that would work, but I think you would be better off 
encapsulating that in an extended ThreadLocal, e.g. WeakThreadLocal, 
and use that every where. Add a method clear(), that clears the 
ThreadLocals list (which will allow the values to be GC'd).



On Sep 11, 2008, at 9:43 AM, Michael McCandless wrote:



OK so we compact the list (removing dead threads) every time we add 
a new entry to the list.  This way for a long lived SegmentReader 
but short lived threads, the list keeps only live threads.


We do need sync access to the list, but that's only on binding a new 
thread.  Retrieving an existing thread has no sync.


Mike

robert engels wrote:

You still need to sync access to the list, and how would it be 
removed from the list prior to close? That is you need one per 
thread, but you can have the reader shared across all threads. So 
if threads were created and destroyed without ever closing the 
reader, the list would grow unbounded.


On Sep 11, 2008, at 9:20 AM, Michael McCandless wrote:



I don't need it by thread, because I would still use ThreadLocal 
to retrieve the SegmentTermEnum.  This avoids any sync during get.


The list is just a fallback to hold a hard reference to the 
SegmentTermEnum to keep it alive.  That's it's only purpose.  
Then, when SegmentReader is closed this list is cleared and GC is 
free to reclaim all SegmentTermEnums.


Mike

robert engels wrote:


But you need it by thread, so it can't be a list.

You could have a HashMap of Thread,ThreadState in FieldsReader, 
and when SegmentReader is closed, FieldsReader is closed, which 
clears the map, and not use thread locals at all. The difference 
being you would need a sync'd map.


On Sep 11, 2008, at 4:56 AM, Michael McCandless wrote:



What if we wrap the value in a WeakReference, but secondarily 
hold a hard reference to it in a normal list?


Then, when TermInfosReader is closed we clear that list of all 
its hard references, at which point GC will be free to reclaim 
the object out from under the ThreadLocal even before the 
ThreadLocal purges its stale entries.


Mike

robert engels wrote:

You can't hold the ThreadLocal value in a WeakReference, 
because there is no hard reference between enumeration calls 
(so it would be cleared out from under you while enumerating).


All of this occurs because you have some objects 
(readers/segments etc.) that are shared across all threads, but 
these contain objects that are 'thread/search state' specific. 
These latter objects are essentially cached for performance 
(so you don't need to seek and read, sequential buffer access, 
etc.)


A sometimes better solution is to have the state returned to 
the caller, and require the caller to pass/use the state later 
- then you don't need thread locals.


You can accomplish a similar solution by returning a 
SessionKey object, and have the caller pass this later.  You 
can then have a WeakHashMap of SessionKey,SearchState that the 
code can use.  When the SessionKey is destroyed (no longer 
referenced), the state map can be cleaned up automatically.




On Sep 10, 2008, at 11:30 PM, Noble Paul നോബിള്‍ नोब्ळ् wrote:

When I look at the reference tree That is the feeling I get. 
if you

held a WeakReference it would get released .
|- base of 
org.apache.lucene.index.CompoundFileReader$CSIndexInput

  |- input of org.apache.lucene.index.SegmentTermEnum
  |- value of 
java.lang.ThreadLocal$ThreadLocalMap$Entry


On Wed, Sep 10, 2008 at 8:39 PM, Chris Lu [EMAIL PROTECTED] 
wrote:

Does this make any difference?
If I intentionally close the searcher and reader failed to 
release the

memory, I can not rely on some magic of JVM to release it.
--
Chris Lu
-
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes:
http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes 

DBSight customer, a shopping comparison site, (anonymous per 
request) got

2.6 Million Euro funding!

On Wed, Sep 10, 2008 at 4:03 AM, Noble Paul നോബിള്‍ नोब्ळ्
[EMAIL PROTECTED] 

Re: ThreadLocal causing memory leak with J2EE applications

2008-09-11 Thread Michael McCandless


What if we wrap the value in a WeakReference, but secondarily hold a  
hard reference to it in a normal list?


Then, when TermInfosReader is closed we clear that list of all its  
hard references, at which point GC will be free to reclaim the object  
out from under the ThreadLocal even before the ThreadLocal purges its  
stale entries.


Mike

robert engels wrote:

You can't hold the ThreadLocal value in a WeakReference, because  
there is no hard reference between enumeration calls (so it would be  
cleared out from under you while enumerating).


All of this occurs because you have some objects (readers/segments  
etc.) that are shared across all threads, but these contain objects  
that are 'thread/search state' specific. These latter objects are  
essentially cached for performance (so you don't need to seek and  
read, sequential buffer access, etc.)


A sometimes better solution is to have the state returned to the  
caller, and require the caller to pass/use the state later - then  
you don't need thread locals.


You can accomplish a similar solution by returning a SessionKey  
object, and have the caller pass this later.  You can then have a  
WeakHashMap of SessionKey,SearchState that the code can use.  When  
the SessionKey is destroyed (no longer referenced), the state map  
can be cleaned up automatically.




On Sep 10, 2008, at 11:30 PM, Noble Paul നോബിള്‍  
नोब्ळ् wrote:



When I look at the reference tree That is the feeling I get. if you
held a WeakReference it would get released .
|- base of org.apache.lucene.index.CompoundFileReader$CSIndexInput
 |- input of org.apache.lucene.index.SegmentTermEnum
 |- value of java.lang.ThreadLocal$ThreadLocalMap 
$Entry


On Wed, Sep 10, 2008 at 8:39 PM, Chris Lu [EMAIL PROTECTED] wrote:

Does this make any difference?
If I intentionally close the searcher and reader failed to release  
the

memory, I can not rely on some magic of JVM to release it.
--
Chris Lu
-
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes:
http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
DBSight customer, a shopping comparison site, (anonymous per  
request) got

2.6 Million Euro funding!

On Wed, Sep 10, 2008 at 4:03 AM, Noble Paul നോബിള്‍  
नोब्ळ्

[EMAIL PROTECTED] wrote:


Why do you need to keep a strong reference?
Why not a WeakReference ?

--Noble

On Wed, Sep 10, 2008 at 12:27 AM, Chris Lu [EMAIL PROTECTED]  
wrote:
The problem should be similar to what's talked about on this  
discussion.

http://lucene.markmail.org/message/keosgz2c2yjc7qre?q=ThreadLocal

There is a memory leak for Lucene search from Lucene-1195.(svn  
r659602,

May23,2008)

This patch brings in a ThreadLocal cache to TermInfosReader.

It's usually recommended to keep the reader open, and reuse it  
when
possible. In a common J2EE application, the http requests are  
usually
handled by different threads. But since the cache is  
ThreadLocal, the

cache
are not really usable by other threads. What's worse, the cache  
can not

be
cleared by another thread!

This leak is not so obvious usually. But my case is using  
RAMDirectory,
having several hundred megabytes. So one un-released resource is  
obvious

to
me.

Here is the reference tree:
org.apache.lucene.store.RAMDirectory
|- directory of org.apache.lucene.store.RAMFile
|- file of org.apache.lucene.store.RAMInputStream
|- base of
org.apache.lucene.index.CompoundFileReader$CSIndexInput
|- input of org.apache.lucene.index.SegmentTermEnum
|- value of java.lang.ThreadLocal$ThreadLocalMap 
$Entry



After I switched back to svn revision 659601, right before this  
patch is

checked in, the memory leak is gone.
Although my case is RAMDirectory, I believe this will affect  
disk based

index also.

--
Chris Lu
-
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes:

http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
DBSight customer, a shopping comparison site, (anonymous per  
request)

got
2.6 Million Euro funding!





--
--Noble Paul

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]









--
--Noble Paul



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: ThreadLocal causing memory leak with J2EE applications

2008-09-11 Thread robert engels

But you need it by thread, so it can't be a list.

You could have a HashMap of Thread,ThreadState in FieldsReader, and  
when SegmentReader is closed, FieldsReader is closed, which clears  
the map, and not use thread locals at all. The difference being you  
would need a sync'd map.


On Sep 11, 2008, at 4:56 AM, Michael McCandless wrote:



What if we wrap the value in a WeakReference, but secondarily hold  
a hard reference to it in a normal list?


Then, when TermInfosReader is closed we clear that list of all its  
hard references, at which point GC will be free to reclaim the  
object out from under the ThreadLocal even before the ThreadLocal  
purges its stale entries.


Mike

robert engels wrote:

You can't hold the ThreadLocal value in a WeakReference, because  
there is no hard reference between enumeration calls (so it would  
be cleared out from under you while enumerating).


All of this occurs because you have some objects (readers/segments  
etc.) that are shared across all threads, but these contain  
objects that are 'thread/search state' specific. These latter  
objects are essentially cached for performance (so you don't  
need to seek and read, sequential buffer access, etc.)


A sometimes better solution is to have the state returned to the  
caller, and require the caller to pass/use the state later - then  
you don't need thread locals.


You can accomplish a similar solution by returning a SessionKey  
object, and have the caller pass this later.  You can then have a  
WeakHashMap of SessionKey,SearchState that the code can use.  When  
the SessionKey is destroyed (no longer referenced), the state map  
can be cleaned up automatically.




On Sep 10, 2008, at 11:30 PM, Noble Paul നോബിള്‍  
नोब्ळ् wrote:



When I look at the reference tree That is the feeling I get. if you
held a WeakReference it would get released .
|- base of org.apache.lucene.index.CompoundFileReader$CSIndexInput
 |- input of org.apache.lucene.index.SegmentTermEnum
 |- value of java.lang.ThreadLocal$ThreadLocalMap 
$Entry


On Wed, Sep 10, 2008 at 8:39 PM, Chris Lu [EMAIL PROTECTED]  
wrote:

Does this make any difference?
If I intentionally close the searcher and reader failed to  
release the

memory, I can not rely on some magic of JVM to release it.
--
Chris Lu
-
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes:
http://wiki.dbsight.com/index.php? 
title=Create_Lucene_Database_Search_in_3_minutes
DBSight customer, a shopping comparison site, (anonymous per  
request) got

2.6 Million Euro funding!

On Wed, Sep 10, 2008 at 4:03 AM, Noble Paul  
നോബിള്‍ नोब्ळ्

[EMAIL PROTECTED] wrote:


Why do you need to keep a strong reference?
Why not a WeakReference ?

--Noble

On Wed, Sep 10, 2008 at 12:27 AM, Chris Lu [EMAIL PROTECTED]  
wrote:
The problem should be similar to what's talked about on this  
discussion.

http://lucene.markmail.org/message/keosgz2c2yjc7qre?q=ThreadLocal

There is a memory leak for Lucene search from Lucene-1195.(svn  
r659602,

May23,2008)

This patch brings in a ThreadLocal cache to TermInfosReader.

It's usually recommended to keep the reader open, and reuse it  
when
possible. In a common J2EE application, the http requests are  
usually
handled by different threads. But since the cache is  
ThreadLocal, the

cache
are not really usable by other threads. What's worse, the  
cache can not

be
cleared by another thread!

This leak is not so obvious usually. But my case is using  
RAMDirectory,
having several hundred megabytes. So one un-released resource  
is obvious

to
me.

Here is the reference tree:
org.apache.lucene.store.RAMDirectory
|- directory of org.apache.lucene.store.RAMFile
|- file of org.apache.lucene.store.RAMInputStream
|- base of
org.apache.lucene.index.CompoundFileReader$CSIndexInput
|- input of org.apache.lucene.index.SegmentTermEnum
|- value of java.lang.ThreadLocal 
$ThreadLocalMap$Entry



After I switched back to svn revision 659601, right before  
this patch is

checked in, the memory leak is gone.
Although my case is RAMDirectory, I believe this will affect  
disk based

index also.

--
Chris Lu
-
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes:

http://wiki.dbsight.com/index.php? 
title=Create_Lucene_Database_Search_in_3_minutes
DBSight customer, a shopping comparison site, (anonymous per  
request)

got
2.6 Million Euro funding!





--
--Noble Paul

-- 
---

To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]









--
--Noble Paul



-
To unsubscribe, 

Re: ThreadLocal causing memory leak with J2EE applications

2008-09-11 Thread Michael McCandless


I don't need it by thread, because I would still use ThreadLocal to  
retrieve the SegmentTermEnum.  This avoids any sync during get.


The list is just a fallback to hold a hard reference to the  
SegmentTermEnum to keep it alive.  That's it's only purpose.  Then,  
when SegmentReader is closed this list is cleared and GC is free to  
reclaim all SegmentTermEnums.


Mike

robert engels wrote:


But you need it by thread, so it can't be a list.

You could have a HashMap of Thread,ThreadState in FieldsReader,  
and when SegmentReader is closed, FieldsReader is closed, which  
clears the map, and not use thread locals at all. The difference  
being you would need a sync'd map.


On Sep 11, 2008, at 4:56 AM, Michael McCandless wrote:



What if we wrap the value in a WeakReference, but secondarily hold  
a hard reference to it in a normal list?


Then, when TermInfosReader is closed we clear that list of all its  
hard references, at which point GC will be free to reclaim the  
object out from under the ThreadLocal even before the ThreadLocal  
purges its stale entries.


Mike

robert engels wrote:

You can't hold the ThreadLocal value in a WeakReference, because  
there is no hard reference between enumeration calls (so it would  
be cleared out from under you while enumerating).


All of this occurs because you have some objects (readers/segments  
etc.) that are shared across all threads, but these contain  
objects that are 'thread/search state' specific. These latter  
objects are essentially cached for performance (so you don't  
need to seek and read, sequential buffer access, etc.)


A sometimes better solution is to have the state returned to the  
caller, and require the caller to pass/use the state later - then  
you don't need thread locals.


You can accomplish a similar solution by returning a SessionKey  
object, and have the caller pass this later.  You can then have a  
WeakHashMap of SessionKey,SearchState that the code can use.  When  
the SessionKey is destroyed (no longer referenced), the state map  
can be cleaned up automatically.




On Sep 10, 2008, at 11:30 PM, Noble Paul നോബിള്‍  
नोब्ळ् wrote:



When I look at the reference tree That is the feeling I get. if you
held a WeakReference it would get released .
|- base of org.apache.lucene.index.CompoundFileReader$CSIndexInput
|- input of org.apache.lucene.index.SegmentTermEnum
|- value of java.lang.ThreadLocal$ThreadLocalMap 
$Entry


On Wed, Sep 10, 2008 at 8:39 PM, Chris Lu [EMAIL PROTECTED]  
wrote:

Does this make any difference?
If I intentionally close the searcher and reader failed to  
release the

memory, I can not rely on some magic of JVM to release it.
--
Chris Lu
-
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes:
http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
DBSight customer, a shopping comparison site, (anonymous per  
request) got

2.6 Million Euro funding!

On Wed, Sep 10, 2008 at 4:03 AM, Noble Paul നോബിള്‍  
नोब्ळ्

[EMAIL PROTECTED] wrote:


Why do you need to keep a strong reference?
Why not a WeakReference ?

--Noble

On Wed, Sep 10, 2008 at 12:27 AM, Chris Lu [EMAIL PROTECTED]  
wrote:
The problem should be similar to what's talked about on this  
discussion.
http://lucene.markmail.org/message/keosgz2c2yjc7qre? 
q=ThreadLocal


There is a memory leak for Lucene search from Lucene-1195.(svn  
r659602,

May23,2008)

This patch brings in a ThreadLocal cache to TermInfosReader.

It's usually recommended to keep the reader open, and reuse it  
when
possible. In a common J2EE application, the http requests are  
usually
handled by different threads. But since the cache is  
ThreadLocal, the

cache
are not really usable by other threads. What's worse, the  
cache can not

be
cleared by another thread!

This leak is not so obvious usually. But my case is using  
RAMDirectory,
having several hundred megabytes. So one un-released resource  
is obvious

to
me.

Here is the reference tree:
org.apache.lucene.store.RAMDirectory
|- directory of org.apache.lucene.store.RAMFile
   |- file of org.apache.lucene.store.RAMInputStream
   |- base of
org.apache.lucene.index.CompoundFileReader$CSIndexInput
   |- input of org.apache.lucene.index.SegmentTermEnum
   |- value of java.lang.ThreadLocal$ThreadLocalMap 
$Entry



After I switched back to svn revision 659601, right before  
this patch is

checked in, the memory leak is gone.
Although my case is RAMDirectory, I believe this will affect  
disk based

index also.

--
Chris Lu
-
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes:

http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
DBSight customer, a 

Re: ThreadLocal causing memory leak with J2EE applications

2008-09-11 Thread robert engels
You still need to sync access to the list, and how would it be  
removed from the list prior to close? That is you need one per  
thread, but you can have the reader shared across all threads. So if  
threads were created and destroyed without ever closing the reader,  
the list would grow unbounded.


On Sep 11, 2008, at 9:20 AM, Michael McCandless wrote:



I don't need it by thread, because I would still use ThreadLocal to  
retrieve the SegmentTermEnum.  This avoids any sync during get.


The list is just a fallback to hold a hard reference to the  
SegmentTermEnum to keep it alive.  That's it's only purpose.  Then,  
when SegmentReader is closed this list is cleared and GC is free to  
reclaim all SegmentTermEnums.


Mike

robert engels wrote:


But you need it by thread, so it can't be a list.

You could have a HashMap of Thread,ThreadState in FieldsReader,  
and when SegmentReader is closed, FieldsReader is closed, which  
clears the map, and not use thread locals at all. The difference  
being you would need a sync'd map.


On Sep 11, 2008, at 4:56 AM, Michael McCandless wrote:



What if we wrap the value in a WeakReference, but secondarily  
hold a hard reference to it in a normal list?


Then, when TermInfosReader is closed we clear that list of all  
its hard references, at which point GC will be free to reclaim  
the object out from under the ThreadLocal even before the  
ThreadLocal purges its stale entries.


Mike

robert engels wrote:

You can't hold the ThreadLocal value in a WeakReference, because  
there is no hard reference between enumeration calls (so it  
would be cleared out from under you while enumerating).


All of this occurs because you have some objects (readers/ 
segments etc.) that are shared across all threads, but these  
contain objects that are 'thread/search state' specific. These  
latter objects are essentially cached for performance (so you  
don't need to seek and read, sequential buffer access, etc.)


A sometimes better solution is to have the state returned to the  
caller, and require the caller to pass/use the state later -  
then you don't need thread locals.


You can accomplish a similar solution by returning a  
SessionKey object, and have the caller pass this later.  You  
can then have a WeakHashMap of SessionKey,SearchState that the  
code can use.  When the SessionKey is destroyed (no longer  
referenced), the state map can be cleaned up automatically.




On Sep 10, 2008, at 11:30 PM, Noble Paul  
നോബിള്‍ नोब्ळ् wrote:


When I look at the reference tree That is the feeling I get. if  
you

held a WeakReference it would get released .
|- base of org.apache.lucene.index.CompoundFileReader$CSIndexInput
|- input of org.apache.lucene.index.SegmentTermEnum
|- value of java.lang.ThreadLocal$ThreadLocalMap 
$Entry


On Wed, Sep 10, 2008 at 8:39 PM, Chris Lu [EMAIL PROTECTED]  
wrote:

Does this make any difference?
If I intentionally close the searcher and reader failed to  
release the

memory, I can not rely on some magic of JVM to release it.
--
Chris Lu
-
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes:
http://wiki.dbsight.com/index.php? 
title=Create_Lucene_Database_Search_in_3_minutes
DBSight customer, a shopping comparison site, (anonymous per  
request) got

2.6 Million Euro funding!

On Wed, Sep 10, 2008 at 4:03 AM, Noble Paul  
നോബിള്‍ नोब्ळ्

[EMAIL PROTECTED] wrote:


Why do you need to keep a strong reference?
Why not a WeakReference ?

--Noble

On Wed, Sep 10, 2008 at 12:27 AM, Chris Lu  
[EMAIL PROTECTED] wrote:
The problem should be similar to what's talked about on this  
discussion.
http://lucene.markmail.org/message/keosgz2c2yjc7qre? 
q=ThreadLocal


There is a memory leak for Lucene search from Lucene-1195. 
(svn r659602,

May23,2008)

This patch brings in a ThreadLocal cache to TermInfosReader.

It's usually recommended to keep the reader open, and reuse  
it when
possible. In a common J2EE application, the http requests  
are usually
handled by different threads. But since the cache is  
ThreadLocal, the

cache
are not really usable by other threads. What's worse, the  
cache can not

be
cleared by another thread!

This leak is not so obvious usually. But my case is using  
RAMDirectory,
having several hundred megabytes. So one un-released  
resource is obvious

to
me.

Here is the reference tree:
org.apache.lucene.store.RAMDirectory
|- directory of org.apache.lucene.store.RAMFile
   |- file of org.apache.lucene.store.RAMInputStream
   |- base of
org.apache.lucene.index.CompoundFileReader$CSIndexInput
   |- input of org.apache.lucene.index.SegmentTermEnum
   |- value of java.lang.ThreadLocal 
$ThreadLocalMap$Entry



After I switched back to svn revision 659601, right before  
this patch is

checked in, the memory leak is gone.
Although my case is 

Re: ThreadLocal causing memory leak with J2EE applications

2008-09-11 Thread Michael McCandless


OK so we compact the list (removing dead threads) every time we add a  
new entry to the list.  This way for a long lived SegmentReader but  
short lived threads, the list keeps only live threads.


We do need sync access to the list, but that's only on binding a new  
thread.  Retrieving an existing thread has no sync.


Mike

robert engels wrote:

You still need to sync access to the list, and how would it be  
removed from the list prior to close? That is you need one per  
thread, but you can have the reader shared across all threads. So if  
threads were created and destroyed without ever closing the reader,  
the list would grow unbounded.


On Sep 11, 2008, at 9:20 AM, Michael McCandless wrote:



I don't need it by thread, because I would still use ThreadLocal to  
retrieve the SegmentTermEnum.  This avoids any sync during get.


The list is just a fallback to hold a hard reference to the  
SegmentTermEnum to keep it alive.  That's it's only purpose.  Then,  
when SegmentReader is closed this list is cleared and GC is free to  
reclaim all SegmentTermEnums.


Mike

robert engels wrote:


But you need it by thread, so it can't be a list.

You could have a HashMap of Thread,ThreadState in FieldsReader,  
and when SegmentReader is closed, FieldsReader is closed, which  
clears the map, and not use thread locals at all. The difference  
being you would need a sync'd map.


On Sep 11, 2008, at 4:56 AM, Michael McCandless wrote:



What if we wrap the value in a WeakReference, but secondarily  
hold a hard reference to it in a normal list?


Then, when TermInfosReader is closed we clear that list of all  
its hard references, at which point GC will be free to reclaim  
the object out from under the ThreadLocal even before the  
ThreadLocal purges its stale entries.


Mike

robert engels wrote:

You can't hold the ThreadLocal value in a WeakReference, because  
there is no hard reference between enumeration calls (so it  
would be cleared out from under you while enumerating).


All of this occurs because you have some objects (readers/ 
segments etc.) that are shared across all threads, but these  
contain objects that are 'thread/search state' specific. These  
latter objects are essentially cached for performance (so you  
don't need to seek and read, sequential buffer access, etc.)


A sometimes better solution is to have the state returned to the  
caller, and require the caller to pass/use the state later -  
then you don't need thread locals.


You can accomplish a similar solution by returning a  
SessionKey object, and have the caller pass this later.  You  
can then have a WeakHashMap of SessionKey,SearchState that the  
code can use.  When the SessionKey is destroyed (no longer  
referenced), the state map can be cleaned up automatically.




On Sep 10, 2008, at 11:30 PM, Noble Paul നോബിള്‍  
नोब्ळ् wrote:


When I look at the reference tree That is the feeling I get. if  
you

held a WeakReference it would get released .
|- base of org.apache.lucene.index.CompoundFileReader 
$CSIndexInput

   |- input of org.apache.lucene.index.SegmentTermEnum
   |- value of java.lang.ThreadLocal$ThreadLocalMap 
$Entry


On Wed, Sep 10, 2008 at 8:39 PM, Chris Lu [EMAIL PROTECTED]  
wrote:

Does this make any difference?
If I intentionally close the searcher and reader failed to  
release the

memory, I can not rely on some magic of JVM to release it.
--
Chris Lu
-
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes:
http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
DBSight customer, a shopping comparison site, (anonymous per  
request) got

2.6 Million Euro funding!

On Wed, Sep 10, 2008 at 4:03 AM, Noble Paul  
നോബിള്‍ नोब्ळ्

[EMAIL PROTECTED] wrote:


Why do you need to keep a strong reference?
Why not a WeakReference ?

--Noble

On Wed, Sep 10, 2008 at 12:27 AM, Chris Lu  
[EMAIL PROTECTED] wrote:
The problem should be similar to what's talked about on this  
discussion.

http://lucene.markmail.org/message/keosgz2c2yjc7qre?q=ThreadLocal

There is a memory leak for Lucene search from Lucene-1195. 
(svn r659602,

May23,2008)

This patch brings in a ThreadLocal cache to TermInfosReader.

It's usually recommended to keep the reader open, and reuse  
it when
possible. In a common J2EE application, the http requests  
are usually
handled by different threads. But since the cache is  
ThreadLocal, the

cache
are not really usable by other threads. What's worse, the  
cache can not

be
cleared by another thread!

This leak is not so obvious usually. But my case is using  
RAMDirectory,
having several hundred megabytes. So one un-released  
resource is obvious

to
me.

Here is the reference tree:
org.apache.lucene.store.RAMDirectory
|- directory of org.apache.lucene.store.RAMFile
  |- file of org.apache.lucene.store.RAMInputStream
 

Re: ThreadLocal causing memory leak with J2EE applications

2008-09-11 Thread robert engels
I think that would work, but I think you would be better off  
encapsulating that in an extended ThreadLocal, e.g. WeakThreadLocal,  
and use that every where. Add a method clear(), that clears the  
ThreadLocals list (which will allow the values to be GC'd).



On Sep 11, 2008, at 9:43 AM, Michael McCandless wrote:



OK so we compact the list (removing dead threads) every time we add  
a new entry to the list.  This way for a long lived SegmentReader  
but short lived threads, the list keeps only live threads.


We do need sync access to the list, but that's only on binding a  
new thread.  Retrieving an existing thread has no sync.


Mike

robert engels wrote:

You still need to sync access to the list, and how would it be  
removed from the list prior to close? That is you need one per  
thread, but you can have the reader shared across all threads. So  
if threads were created and destroyed without ever closing the  
reader, the list would grow unbounded.


On Sep 11, 2008, at 9:20 AM, Michael McCandless wrote:



I don't need it by thread, because I would still use ThreadLocal  
to retrieve the SegmentTermEnum.  This avoids any sync during get.


The list is just a fallback to hold a hard reference to the  
SegmentTermEnum to keep it alive.  That's it's only purpose.   
Then, when SegmentReader is closed this list is cleared and GC is  
free to reclaim all SegmentTermEnums.


Mike

robert engels wrote:


But you need it by thread, so it can't be a list.

You could have a HashMap of Thread,ThreadState in  
FieldsReader, and when SegmentReader is closed, FieldsReader is  
closed, which clears the map, and not use thread locals at all.  
The difference being you would need a sync'd map.


On Sep 11, 2008, at 4:56 AM, Michael McCandless wrote:



What if we wrap the value in a WeakReference, but secondarily  
hold a hard reference to it in a normal list?


Then, when TermInfosReader is closed we clear that list of all  
its hard references, at which point GC will be free to reclaim  
the object out from under the ThreadLocal even before the  
ThreadLocal purges its stale entries.


Mike

robert engels wrote:

You can't hold the ThreadLocal value in a WeakReference,  
because there is no hard reference between enumeration calls  
(so it would be cleared out from under you while enumerating).


All of this occurs because you have some objects (readers/ 
segments etc.) that are shared across all threads, but these  
contain objects that are 'thread/search state' specific. These  
latter objects are essentially cached for performance (so  
you don't need to seek and read, sequential buffer access, etc.)


A sometimes better solution is to have the state returned to  
the caller, and require the caller to pass/use the state later  
- then you don't need thread locals.


You can accomplish a similar solution by returning a  
SessionKey object, and have the caller pass this later.  You  
can then have a WeakHashMap of SessionKey,SearchState that the  
code can use.  When the SessionKey is destroyed (no longer  
referenced), the state map can be cleaned up automatically.




On Sep 10, 2008, at 11:30 PM, Noble Paul  
നോബിള്‍ नोब्ळ् wrote:


When I look at the reference tree That is the feeling I get.  
if you

held a WeakReference it would get released .
|- base of org.apache.lucene.index.CompoundFileReader 
$CSIndexInput

   |- input of org.apache.lucene.index.SegmentTermEnum
   |- value of java.lang.ThreadLocal 
$ThreadLocalMap$Entry


On Wed, Sep 10, 2008 at 8:39 PM, Chris Lu  
[EMAIL PROTECTED] wrote:

Does this make any difference?
If I intentionally close the searcher and reader failed to  
release the

memory, I can not rely on some magic of JVM to release it.
--
Chris Lu
-
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes:
http://wiki.dbsight.com/index.php? 
title=Create_Lucene_Database_Search_in_3_minutes
DBSight customer, a shopping comparison site, (anonymous per  
request) got

2.6 Million Euro funding!

On Wed, Sep 10, 2008 at 4:03 AM, Noble Paul  
നോബിള്‍ नोब्ळ्

[EMAIL PROTECTED] wrote:


Why do you need to keep a strong reference?
Why not a WeakReference ?

--Noble

On Wed, Sep 10, 2008 at 12:27 AM, Chris Lu  
[EMAIL PROTECTED] wrote:
The problem should be similar to what's talked about on  
this discussion.
http://lucene.markmail.org/message/keosgz2c2yjc7qre? 
q=ThreadLocal


There is a memory leak for Lucene search from Lucene-1195. 
(svn r659602,

May23,2008)

This patch brings in a ThreadLocal cache to TermInfosReader.

It's usually recommended to keep the reader open, and  
reuse it when
possible. In a common J2EE application, the http requests  
are usually
handled by different threads. But since the cache is  
ThreadLocal, the

cache
are not really usable by other threads. What's worse, the  
cache can not

be
cleared by another thread!


Re: ThreadLocal causing memory leak with J2EE applications

2008-09-11 Thread robert engels
Technically, you need to sync on the set as well, since you need to  
remove the old value, and add the new to the list. Although Lucene  
doesn't use the set. just the initial value set, so the overhead is  
minimal.


On Sep 11, 2008, at 9:43 AM, Michael McCandless wrote:



OK so we compact the list (removing dead threads) every time we add  
a new entry to the list.  This way for a long lived SegmentReader  
but short lived threads, the list keeps only live threads.


We do need sync access to the list, but that's only on binding a  
new thread.  Retrieving an existing thread has no sync.


Mike

robert engels wrote:

You still need to sync access to the list, and how would it be  
removed from the list prior to close? That is you need one per  
thread, but you can have the reader shared across all threads. So  
if threads were created and destroyed without ever closing the  
reader, the list would grow unbounded.


On Sep 11, 2008, at 9:20 AM, Michael McCandless wrote:



I don't need it by thread, because I would still use ThreadLocal  
to retrieve the SegmentTermEnum.  This avoids any sync during get.


The list is just a fallback to hold a hard reference to the  
SegmentTermEnum to keep it alive.  That's it's only purpose.   
Then, when SegmentReader is closed this list is cleared and GC is  
free to reclaim all SegmentTermEnums.


Mike

robert engels wrote:


But you need it by thread, so it can't be a list.

You could have a HashMap of Thread,ThreadState in  
FieldsReader, and when SegmentReader is closed, FieldsReader is  
closed, which clears the map, and not use thread locals at all.  
The difference being you would need a sync'd map.


On Sep 11, 2008, at 4:56 AM, Michael McCandless wrote:



What if we wrap the value in a WeakReference, but secondarily  
hold a hard reference to it in a normal list?


Then, when TermInfosReader is closed we clear that list of all  
its hard references, at which point GC will be free to reclaim  
the object out from under the ThreadLocal even before the  
ThreadLocal purges its stale entries.


Mike

robert engels wrote:

You can't hold the ThreadLocal value in a WeakReference,  
because there is no hard reference between enumeration calls  
(so it would be cleared out from under you while enumerating).


All of this occurs because you have some objects (readers/ 
segments etc.) that are shared across all threads, but these  
contain objects that are 'thread/search state' specific. These  
latter objects are essentially cached for performance (so  
you don't need to seek and read, sequential buffer access, etc.)


A sometimes better solution is to have the state returned to  
the caller, and require the caller to pass/use the state later  
- then you don't need thread locals.


You can accomplish a similar solution by returning a  
SessionKey object, and have the caller pass this later.  You  
can then have a WeakHashMap of SessionKey,SearchState that the  
code can use.  When the SessionKey is destroyed (no longer  
referenced), the state map can be cleaned up automatically.




On Sep 10, 2008, at 11:30 PM, Noble Paul  
നോബിള്‍ नोब्ळ् wrote:


When I look at the reference tree That is the feeling I get.  
if you

held a WeakReference it would get released .
|- base of org.apache.lucene.index.CompoundFileReader 
$CSIndexInput

   |- input of org.apache.lucene.index.SegmentTermEnum
   |- value of java.lang.ThreadLocal 
$ThreadLocalMap$Entry


On Wed, Sep 10, 2008 at 8:39 PM, Chris Lu  
[EMAIL PROTECTED] wrote:

Does this make any difference?
If I intentionally close the searcher and reader failed to  
release the

memory, I can not rely on some magic of JVM to release it.
--
Chris Lu
-
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes:
http://wiki.dbsight.com/index.php? 
title=Create_Lucene_Database_Search_in_3_minutes
DBSight customer, a shopping comparison site, (anonymous per  
request) got

2.6 Million Euro funding!

On Wed, Sep 10, 2008 at 4:03 AM, Noble Paul  
നോബിള്‍ नोब्ळ्

[EMAIL PROTECTED] wrote:


Why do you need to keep a strong reference?
Why not a WeakReference ?

--Noble

On Wed, Sep 10, 2008 at 12:27 AM, Chris Lu  
[EMAIL PROTECTED] wrote:
The problem should be similar to what's talked about on  
this discussion.
http://lucene.markmail.org/message/keosgz2c2yjc7qre? 
q=ThreadLocal


There is a memory leak for Lucene search from Lucene-1195. 
(svn r659602,

May23,2008)

This patch brings in a ThreadLocal cache to TermInfosReader.

It's usually recommended to keep the reader open, and  
reuse it when
possible. In a common J2EE application, the http requests  
are usually
handled by different threads. But since the cache is  
ThreadLocal, the

cache
are not really usable by other threads. What's worse, the  
cache can not

be
cleared by another thread!

This leak is not so obvious usually. But my 

Re: ThreadLocal causing memory leak with J2EE applications

2008-09-11 Thread Michael McCandless


Yeah I think that's the right approach.

I'll code it up.

Mike

robert engels wrote:

I think that would work, but I think you would be better off  
encapsulating that in an extended ThreadLocal, e.g. WeakThreadLocal,  
and use that every where. Add a method clear(), that clears the  
ThreadLocals list (which will allow the values to be GC'd).



On Sep 11, 2008, at 9:43 AM, Michael McCandless wrote:



OK so we compact the list (removing dead threads) every time we add  
a new entry to the list.  This way for a long lived SegmentReader  
but short lived threads, the list keeps only live threads.


We do need sync access to the list, but that's only on binding a  
new thread.  Retrieving an existing thread has no sync.


Mike

robert engels wrote:

You still need to sync access to the list, and how would it be  
removed from the list prior to close? That is you need one per  
thread, but you can have the reader shared across all threads. So  
if threads were created and destroyed without ever closing the  
reader, the list would grow unbounded.


On Sep 11, 2008, at 9:20 AM, Michael McCandless wrote:



I don't need it by thread, because I would still use ThreadLocal  
to retrieve the SegmentTermEnum.  This avoids any sync during get.


The list is just a fallback to hold a hard reference to the  
SegmentTermEnum to keep it alive.  That's it's only purpose.   
Then, when SegmentReader is closed this list is cleared and GC is  
free to reclaim all SegmentTermEnums.


Mike

robert engels wrote:


But you need it by thread, so it can't be a list.

You could have a HashMap of Thread,ThreadState in  
FieldsReader, and when SegmentReader is closed, FieldsReader is  
closed, which clears the map, and not use thread locals at all.  
The difference being you would need a sync'd map.


On Sep 11, 2008, at 4:56 AM, Michael McCandless wrote:



What if we wrap the value in a WeakReference, but secondarily  
hold a hard reference to it in a normal list?


Then, when TermInfosReader is closed we clear that list of all  
its hard references, at which point GC will be free to reclaim  
the object out from under the ThreadLocal even before the  
ThreadLocal purges its stale entries.


Mike

robert engels wrote:

You can't hold the ThreadLocal value in a WeakReference,  
because there is no hard reference between enumeration calls  
(so it would be cleared out from under you while enumerating).


All of this occurs because you have some objects (readers/ 
segments etc.) that are shared across all threads, but these  
contain objects that are 'thread/search state' specific. These  
latter objects are essentially cached for performance (so  
you don't need to seek and read, sequential buffer access, etc.)


A sometimes better solution is to have the state returned to  
the caller, and require the caller to pass/use the state later  
- then you don't need thread locals.


You can accomplish a similar solution by returning a  
SessionKey object, and have the caller pass this later.  You  
can then have a WeakHashMap of SessionKey,SearchState that the  
code can use.  When the SessionKey is destroyed (no longer  
referenced), the state map can be cleaned up automatically.




On Sep 10, 2008, at 11:30 PM, Noble Paul നോബിള്‍  
नोब्ळ् wrote:


When I look at the reference tree That is the feeling I get.  
if you

held a WeakReference it would get released .
|- base of org.apache.lucene.index.CompoundFileReader 
$CSIndexInput

  |- input of org.apache.lucene.index.SegmentTermEnum
  |- value of java.lang.ThreadLocal$ThreadLocalMap 
$Entry


On Wed, Sep 10, 2008 at 8:39 PM, Chris Lu  
[EMAIL PROTECTED] wrote:

Does this make any difference?
If I intentionally close the searcher and reader failed to  
release the

memory, I can not rely on some magic of JVM to release it.
--
Chris Lu
-
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes:
http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
DBSight customer, a shopping comparison site, (anonymous per  
request) got

2.6 Million Euro funding!

On Wed, Sep 10, 2008 at 4:03 AM, Noble Paul  
നോബിള്‍ नोब्ळ्

[EMAIL PROTECTED] wrote:


Why do you need to keep a strong reference?
Why not a WeakReference ?

--Noble

On Wed, Sep 10, 2008 at 12:27 AM, Chris Lu [EMAIL PROTECTED] 
 wrote:
The problem should be similar to what's talked about on  
this discussion.

http://lucene.markmail.org/message/keosgz2c2yjc7qre?q=ThreadLocal

There is a memory leak for Lucene search from Lucene-1195. 
(svn r659602,

May23,2008)

This patch brings in a ThreadLocal cache to TermInfosReader.

It's usually recommended to keep the reader open, and  
reuse it when
possible. In a common J2EE application, the http requests  
are usually
handled by different threads. But since the cache is  
ThreadLocal, the

cache
are not really usable 

Re: ThreadLocal causing memory leak with J2EE applications

2008-09-10 Thread Chris Lu
Yes. In the end, the IndexReader holds a large object via ThreadLocal.
On the one hand, I should pool IndexReader because opening IndexReader cost
a lot.
On the other hand, I should not pool IndexReader because some resources are
cached via ThreadLocal, and unless all threads closes the IndexReader in the
pool.

These contradictory requirements are caused by the ThreadLocal LRU cache in
the LUCENE-1195.

My only solution is to revert back this particular patch.

-- 
Chris Lu
-
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes:
http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
DBSight customer, a shopping comparison site, (anonymous per request) got
2.6 Million Euro funding!

On Tue, Sep 9, 2008 at 10:46 PM, robert engels [EMAIL PROTECTED]wrote:

 As a follow-up, the SegmentTermEnum does contain an IndexInput and based on
 your configuration (buffer sizes, eg) this could be a large object, so you
 do need to be careful !

 On Sep 10, 2008, at 12:14 AM, robert engels wrote:

 A searcher uses an IndexReader - the IndexReader is slow to open, not a
 Searcher. And searchers can share an IndexReader.
 You want to create a single shared (across all threads/users) IndexReader
 (usually), and create an Searcher as needed and dispose.  It is VERY CHEAP
 to create the Searcher.

 I am fairly certain the javadoc on Searcher is incorrect.  The warning 
 For performance reasons it is recommended to open only one IndexSearcher
 and use it for all of your searches is not true in the case where an
 IndexReader is passed to the ctor.

 Any caching should USUALLY be performed at the IndexReader level.

 You are most likely using the path ctor, and that is the source of your
 problems, as multiple IndexReader instances are being created, and thus the
 memory use.


 On Sep 9, 2008, at 11:44 PM, Chris Lu wrote:

 On J2EE environment, usually there is a searcher pool with several
 searchers open.The speed to opening a large index for every user is not
 acceptable.

 --
 Chris Lu
 -
 Instant Scalable Full-Text Search On Any Database/Application
 site: http://www.dbsight.net
 demo: http://search.dbsight.com
 Lucene Database Search in 3 minutes:
 http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
 DBSight customer, a shopping comparison site, (anonymous per request) got
 2.6 Million Euro funding!

 On Tue, Sep 9, 2008 at 9:03 PM, robert engels [EMAIL PROTECTED]wrote:

 You need to close the searcher within the thread that is using it, in
 order to have it cleaned up quickly... usually right after you display the
 page of results.
 If you are keeping multiple searcher refs across multiple threads for
 paging/whatever, you have not coded it correctly.

 Imagine 10,000 users - storing a searcher for each one is not going to
 work...

 On Sep 9, 2008, at 10:21 PM, Chris Lu wrote:

 Right, in a sense I can not release it from another thread. But that's the
 problem.

 It's a J2EE environment, all threads are kind of equal. It's simply not
 possible to iterate through all threads to close the searcher, thus
 releasing the ThreadLocal cache.
 Unless Lucene is not recommended for J2EE environment, this has to be
 fixed.

 --
 Chris Lu
 -
 Instant Scalable Full-Text Search On Any Database/Application
 site: http://www.dbsight.net
 demo: http://search.dbsight.com
 Lucene Database Search in 3 minutes:
 http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
 DBSight customer, a shopping comparison site, (anonymous per request) got
 2.6 Million Euro funding!

 On Tue, Sep 9, 2008 at 8:14 PM, robert engels [EMAIL PROTECTED]wrote:

 Your code is not correct. You cannot release it on another thread - the
 first thread may creating hundreds/thousands of instances before the other
 thread ever runs...

 On Sep 9, 2008, at 10:10 PM, Chris Lu wrote:

 If I release it on the thread that's creating the searcher, by setting
 searcher=null, everything is fine, the memory is released very cleanly.
 My load test was to repeatedly create a searcher on a RAMDirectory and
 release it on another thread. The test will quickly go to OOM after several
 runs. I set the heap size to be 1024M, and the RAMDirectory is of size 250M.
 Using some profiling tool, the used size simply stepped up pretty obviously
 by 250M.

 I think we should not rely on something that's a maybe behavior,
 especially for a general purpose library.

 Since it's a multi-threaded env, the thread that's creating the entries
 in the LRU cache may not go away quickly(actually most, if not all,
 application servers will try to reuse threads), so the LRU cache, which uses
 thread as the key, can not be released, so the SegmentTermEnum which is in
 the same class can not be released.

 And yes, I close the RAMDirectory, and the 

Re: ThreadLocal causing memory leak with J2EE applications

2008-09-10 Thread Chris Lu
Actually, even I only use one IndexReader, some resources are cached via the
ThreadLocal cache, and can not be released unless all threads do the close
action.

SegmentTermEnum itself is small, but it holds RAMDirectory along the path,
which is big.

-- 
Chris Lu
-
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes:
http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
DBSight customer, a shopping comparison site, (anonymous per request) got
2.6 Million Euro funding!
On Tue, Sep 9, 2008 at 10:43 PM, robert engels [EMAIL PROTECTED]wrote:

 You do not need a pool of IndexReaders...
 It does not matter what class it is, what matters is the class that
 ultimately holds the reference.

 If the IndexReader is never closed, the SegmentReader(s) is never closed,
 so the thread local in TermInfosReader is not cleared (because the thread
 never dies). So you will get one SegmentTermEnum, per thread * per segment.

 The SegmentTermEnum is not a large object, so even if you had 100 threads,
 and 100 segments, for 10k instances, seems hard to believe that is the
 source of your memory issue.

 The SegmentTermEnum is cached by thread since it needs to enumerate the
 terms, not having a per thread cache, would lead to lots of random access
 when multiple threads read the index - very slow.

 You need to keep in mind, what if every thread was executing a search
 simultaneously - you would still have 100x100 SegmentTermEnum instances
 anyway !  The only way to prevent that would be to create and destroy the
 SegmentTermEnum on each call (opening and seeking to the proper spot) -
 which would be SLOW SLOW SLOW.

 On Sep 10, 2008, at 12:19 AM, Chris Lu wrote:

 I have tried to create an IndexReader pool and dynamically create searcher.
 But the memory leak is the same. It's not related to the Searcher class
 specifically, but the SegmentTermEnum in TermInfosReader.

 --
 Chris Lu
 -
 Instant Scalable Full-Text Search On Any Database/Application
 site: http://www.dbsight.net
 demo: http://search.dbsight.com
 Lucene Database Search in 3 minutes:
 http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
 DBSight customer, a shopping comparison site, (anonymous per request) got
 2.6 Million Euro funding!

 On Tue, Sep 9, 2008 at 10:14 PM, robert engels [EMAIL PROTECTED]wrote:

  A searcher uses an IndexReader - the IndexReader is slow to open, not a
 Searcher. And searchers can share an IndexReader.
 You want to create a single shared (across all threads/users) IndexReader
 (usually), and create an Searcher as needed and dispose.  It is VERY CHEAP
 to create the Searcher.

 I am fairly certain the javadoc on Searcher is incorrect.  The warning 
 For performance reasons it is recommended to open only one IndexSearcher
 and use it for all of your searches is not true in the case where an
 IndexReader is passed to the ctor.

 Any caching should USUALLY be performed at the IndexReader level.

 You are most likely using the path ctor, and that is the source of your
 problems, as multiple IndexReader instances are being created, and thus the
 memory use.


 On Sep 9, 2008, at 11:44 PM, Chris Lu wrote:

 On J2EE environment, usually there is a searcher pool with several
 searchers open. The speed to opening a large index for every user is not
 acceptable.

 --
 Chris Lu
 -
 Instant Scalable Full-Text Search On Any Database/Application
 site: http://www.dbsight.net
 demo: http://search.dbsight.com
 Lucene Database Search in 3 minutes:
 http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
 DBSight customer, a shopping comparison site, (anonymous per request) got
 2.6 Million Euro funding!

 On Tue, Sep 9, 2008 at 9:03 PM, robert engels [EMAIL PROTECTED]wrote:

 You need to close the searcher within the thread that is using it, in
 order to have it cleaned up quickly... usually right after you display the
 page of results.
 If you are keeping multiple searcher refs across multiple threads for
 paging/whatever, you have not coded it correctly.

 Imagine 10,000 users - storing a searcher for each one is not going to
 work...

 On Sep 9, 2008, at 10:21 PM, Chris Lu wrote:

 Right, in a sense I can not release it from another thread. But that's
 the problem.

 It's a J2EE environment, all threads are kind of equal. It's simply not
 possible to iterate through all threads to close the searcher, thus
 releasing the ThreadLocal cache.
 Unless Lucene is not recommended for J2EE environment, this has to be
 fixed.

 --
 Chris Lu
 -
 Instant Scalable Full-Text Search On Any Database/Application
 site: http://www.dbsight.net
 demo: http://search.dbsight.com
 Lucene Database Search in 3 minutes:
 

Re: ThreadLocal causing memory leak with J2EE applications

2008-09-10 Thread Michael McCandless


I still don't quite understand what's causing your memory growth.

SegmentTermEnum insances have been held in a ThreadLocal cache in  
TermInfosReader for a very long time (at least since Lucene 1.4).


If indeed it's the RAMDir's contents being kept alive due to this,  
then, you should have already been seeing this problem before rev  
659602.  And I still don't get why your reference tree is missing the  
TermInfosReader.ThreadResources class.


I'd like to understand the root cause before we hash out possible  
solutions.


Can you post the sources for your load test?

Mike

Chris Lu wrote:

Actually, even I only use one IndexReader, some resources are cached  
via the ThreadLocal cache, and can not be released unless all  
threads do the close action.


SegmentTermEnum itself is small, but it holds RAMDirectory along the  
path, which is big.


--
Chris Lu
-
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes: 
http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
DBSight customer, a shopping comparison site, (anonymous per  
request) got 2.6 Million Euro funding!


On Tue, Sep 9, 2008 at 10:43 PM, robert engels  
[EMAIL PROTECTED] wrote:

You do not need a pool of IndexReaders...

It does not matter what class it is, what matters is the class that  
ultimately holds the reference.


If the IndexReader is never closed, the SegmentReader(s) is never  
closed, so the thread local in TermInfosReader is not cleared  
(because the thread never dies). So you will get one  
SegmentTermEnum, per thread * per segment.


The SegmentTermEnum is not a large object, so even if you had 100  
threads, and 100 segments, for 10k instances, seems hard to believe  
that is the source of your memory issue.


The SegmentTermEnum is cached by thread since it needs to enumerate  
the terms, not having a per thread cache, would lead to lots of  
random access when multiple threads read the index - very slow.


You need to keep in mind, what if every thread was executing a  
search simultaneously - you would still have 100x100 SegmentTermEnum  
instances anyway !  The only way to prevent that would be to create  
and destroy the SegmentTermEnum on each call (opening and seeking to  
the proper spot) - which would be SLOW SLOW SLOW.


On Sep 10, 2008, at 12:19 AM, Chris Lu wrote:

I have tried to create an IndexReader pool and dynamically create  
searcher. But the memory leak is the same. It's not related to the  
Searcher class specifically, but the SegmentTermEnum in  
TermInfosReader.


--
Chris Lu
-
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes: 
http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
DBSight customer, a shopping comparison site, (anonymous per  
request) got 2.6 Million Euro funding!


On Tue, Sep 9, 2008 at 10:14 PM, robert engels  
[EMAIL PROTECTED] wrote:
A searcher uses an IndexReader - the IndexReader is slow to open,  
not a Searcher. And searchers can share an IndexReader.


You want to create a single shared (across all threads/users)  
IndexReader (usually), and create an Searcher as needed and  
dispose.  It is VERY CHEAP to create the Searcher.


I am fairly certain the javadoc on Searcher is incorrect.  The  
warning For performance reasons it is recommended to open only one  
IndexSearcher and use it for all of your searches is not true in  
the case where an IndexReader is passed to the ctor.


Any caching should USUALLY be performed at the IndexReader level.

You are most likely using the path ctor, and that is the source  
of your problems, as multiple IndexReader instances are being  
created, and thus the memory use.



On Sep 9, 2008, at 11:44 PM, Chris Lu wrote:

On J2EE environment, usually there is a searcher pool with several  
searchers open.

The speed to opening a large index for every user is not acceptable.

--
Chris Lu
-
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes: 
http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
DBSight customer, a shopping comparison site, (anonymous per  
request) got 2.6 Million Euro funding!


On Tue, Sep 9, 2008 at 9:03 PM, robert engels  
[EMAIL PROTECTED] wrote:
You need to close the searcher within the thread that is using it,  
in order to have it cleaned up quickly... usually right after you  
display the page of results.


If you are keeping multiple searcher refs across multiple threads  
for paging/whatever, you have not coded it correctly.


Imagine 10,000 users - storing a searcher for each one is not  
going to work...


On Sep 9, 2008, at 10:21 PM, 

Re: ThreadLocal causing memory leak with J2EE applications

2008-09-10 Thread Noble Paul നോബിള്‍ नोब्ळ्
Why do you need to keep a strong reference?
Why not a WeakReference ?

--Noble

On Wed, Sep 10, 2008 at 12:27 AM, Chris Lu [EMAIL PROTECTED] wrote:
 The problem should be similar to what's talked about on this discussion.
 http://lucene.markmail.org/message/keosgz2c2yjc7qre?q=ThreadLocal

 There is a memory leak for Lucene search from Lucene-1195.(svn r659602,
 May23,2008)

 This patch brings in a ThreadLocal cache to TermInfosReader.

 It's usually recommended to keep the reader open, and reuse it when
 possible. In a common J2EE application, the http requests are usually
 handled by different threads. But since the cache is ThreadLocal, the cache
 are not really usable by other threads. What's worse, the cache can not be
 cleared by another thread!

 This leak is not so obvious usually. But my case is using RAMDirectory,
 having several hundred megabytes. So one un-released resource is obvious to
 me.

 Here is the reference tree:
 org.apache.lucene.store.RAMDirectory
  |- directory of org.apache.lucene.store.RAMFile
  |- file of org.apache.lucene.store.RAMInputStream
  |- base of org.apache.lucene.index.CompoundFileReader$CSIndexInput
  |- input of org.apache.lucene.index.SegmentTermEnum
  |- value of java.lang.ThreadLocal$ThreadLocalMap$Entry


 After I switched back to svn revision 659601, right before this patch is
 checked in, the memory leak is gone.
 Although my case is RAMDirectory, I believe this will affect disk based
 index also.

 --
 Chris Lu
 -
 Instant Scalable Full-Text Search On Any Database/Application
 site: http://www.dbsight.net
 demo: http://search.dbsight.com
 Lucene Database Search in 3 minutes:
 http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
 DBSight customer, a shopping comparison site, (anonymous per request) got
 2.6 Million Euro funding!




-- 
--Noble Paul

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: ThreadLocal causing memory leak with J2EE applications

2008-09-10 Thread robert engels

Sorry, but I am fairly certain you are mistaken.

If you only have a single IndexReader, the RAMDirectory will be  
shared in all cases.


The only memory growth is any buffer space allocated by an IndexInput  
(used in many places and cached).


Normally the IndexInput created by a RAMDirectory do not have any  
buffer allocated, since the underlying store is already in memory.


You have some other problem in your code...

On Sep 10, 2008, at 1:10 AM, Chris Lu wrote:

Actually, even I only use one IndexReader, some resources are  
cached via the ThreadLocal cache, and can not be released unless  
all threads do the close action.


SegmentTermEnum itself is small, but it holds RAMDirectory along  
the path, which is big.


--
Chris Lu
-
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes: http://wiki.dbsight.com/ 
index.php?title=Create_Lucene_Database_Search_in_3_minutes
DBSight customer, a shopping comparison site, (anonymous per  
request) got 2.6 Million Euro funding!


On Tue, Sep 9, 2008 at 10:43 PM, robert engels  
[EMAIL PROTECTED] wrote:

You do not need a pool of IndexReaders...

It does not matter what class it is, what matters is the class that  
ultimately holds the reference.


If the IndexReader is never closed, the SegmentReader(s) is never  
closed, so the thread local in TermInfosReader is not cleared  
(because the thread never dies). So you will get one  
SegmentTermEnum, per thread * per segment.


The SegmentTermEnum is not a large object, so even if you had 100  
threads, and 100 segments, for 10k instances, seems hard to believe  
that is the source of your memory issue.


The SegmentTermEnum is cached by thread since it needs to enumerate  
the terms, not having a per thread cache, would lead to lots of  
random access when multiple threads read the index - very slow.


You need to keep in mind, what if every thread was executing a  
search simultaneously - you would still have 100x100  
SegmentTermEnum instances anyway !  The only way to prevent that  
would be to create and destroy the SegmentTermEnum on each call  
(opening and seeking to the proper spot) - which would be SLOW SLOW  
SLOW.


On Sep 10, 2008, at 12:19 AM, Chris Lu wrote:

I have tried to create an IndexReader pool and dynamically create  
searcher. But the memory leak is the same. It's not related to the  
Searcher class specifically, but the SegmentTermEnum in  
TermInfosReader.


--
Chris Lu
-
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes: http://wiki.dbsight.com/ 
index.php?title=Create_Lucene_Database_Search_in_3_minutes
DBSight customer, a shopping comparison site, (anonymous per  
request) got 2.6 Million Euro funding!


On Tue, Sep 9, 2008 at 10:14 PM, robert engels  
[EMAIL PROTECTED] wrote:
A searcher uses an IndexReader - the IndexReader is slow to open,  
not a Searcher. And searchers can share an IndexReader.


You want to create a single shared (across all threads/users)  
IndexReader (usually), and create an Searcher as needed and  
dispose.  It is VERY CHEAP to create the Searcher.


I am fairly certain the javadoc on Searcher is incorrect.  The  
warning For performance reasons it is recommended to open only  
one IndexSearcher and use it for all of your searches is not true  
in the case where an IndexReader is passed to the ctor.


Any caching should USUALLY be performed at the IndexReader level.

You are most likely using the path ctor, and that is the source  
of your problems, as multiple IndexReader instances are being  
created, and thus the memory use.



On Sep 9, 2008, at 11:44 PM, Chris Lu wrote:

On J2EE environment, usually there is a searcher pool with  
several searchers open.

The speed to opening a large index for every user is not acceptable.

--
Chris Lu
-
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes: http://wiki.dbsight.com/ 
index.php?title=Create_Lucene_Database_Search_in_3_minutes
DBSight customer, a shopping comparison site, (anonymous per  
request) got 2.6 Million Euro funding!


On Tue, Sep 9, 2008 at 9:03 PM, robert engels  
[EMAIL PROTECTED] wrote:
You need to close the searcher within the thread that is using  
it, in order to have it cleaned up quickly... usually right after  
you display the page of results.


If you are keeping multiple searcher refs across multiple threads  
for paging/whatever, you have not coded it correctly.


Imagine 10,000 users - storing a searcher for each one is not  
going to work...


On Sep 9, 2008, at 10:21 PM, Chris Lu wrote:

Right, in a sense I can not release it from another thread. But  
that's the problem.


It's a J2EE 

Re: ThreadLocal causing memory leak with J2EE applications

2008-09-10 Thread Chris Lu
Does this make any difference?If I intentionally close the searcher and
reader failed to release the memory, I can not rely on some magic of JVM to
release it.

-- 
Chris Lu
-
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes:
http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
DBSight customer, a shopping comparison site, (anonymous per request) got
2.6 Million Euro funding!


On Wed, Sep 10, 2008 at 4:03 AM, Noble Paul നോബിള്‍ नोब्ळ् 
[EMAIL PROTECTED] wrote:

 Why do you need to keep a strong reference?
 Why not a WeakReference ?

 --Noble

 On Wed, Sep 10, 2008 at 12:27 AM, Chris Lu [EMAIL PROTECTED] wrote:
  The problem should be similar to what's talked about on this discussion.
  http://lucene.markmail.org/message/keosgz2c2yjc7qre?q=ThreadLocal
 
  There is a memory leak for Lucene search from Lucene-1195.(svn r659602,
  May23,2008)
 
  This patch brings in a ThreadLocal cache to TermInfosReader.
 
  It's usually recommended to keep the reader open, and reuse it when
  possible. In a common J2EE application, the http requests are usually
  handled by different threads. But since the cache is ThreadLocal, the
 cache
  are not really usable by other threads. What's worse, the cache can not
 be
  cleared by another thread!
 
  This leak is not so obvious usually. But my case is using RAMDirectory,
  having several hundred megabytes. So one un-released resource is obvious
 to
  me.
 
  Here is the reference tree:
  org.apache.lucene.store.RAMDirectory
   |- directory of org.apache.lucene.store.RAMFile
   |- file of org.apache.lucene.store.RAMInputStream
   |- base of
 org.apache.lucene.index.CompoundFileReader$CSIndexInput
   |- input of org.apache.lucene.index.SegmentTermEnum
   |- value of java.lang.ThreadLocal$ThreadLocalMap$Entry
 
 
  After I switched back to svn revision 659601, right before this patch is
  checked in, the memory leak is gone.
  Although my case is RAMDirectory, I believe this will affect disk based
  index also.
 
  --
  Chris Lu
  -
  Instant Scalable Full-Text Search On Any Database/Application
  site: http://www.dbsight.net
  demo: http://search.dbsight.com
  Lucene Database Search in 3 minutes:
 
 http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
  DBSight customer, a shopping comparison site, (anonymous per request) got
  2.6 Million Euro funding!
 



 --
 --Noble Paul

 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]




Re: ThreadLocal causing memory leak with J2EE applications

2008-09-10 Thread Chris Lu
I do not believe I am making any mistake. Actually I just got an email from
another user, complaining about the same thing. And I am having the same
usage pattern.
After the reader is opened, the RAMDirectory is shared by several objects.
There is one instance of RAMDirectory in the memory, and it is holding lots
of memory, which is expected.

If I close the reader in the same thread that has opened it, the
RAMDirectory is gone from the memory.
If I close the reader in other threads, the RAMDirectory is left in the
memory, referenced along the tree I draw in the first email.

I do not think the usage is wrong. Period.

-

Hi,

   i found a forum post from you here [1] where you mention that you
have a memory leak using the lucene ram directory. I'd like to ask you
if you already have resolved the problem and how you did it or maybe
you know where i can read about the solution. We are using
RAMDirectory too and figured out, that over time the memory
consumption raises and raises until the system breaks down but only
when we performing much index updates. if we only create the index and
don't do nothing except searching it, it work fine.

maybe you can give me a hint or a link,
greetz,

-

-- 
Chris Lu
-
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes:
http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
DBSight customer, a shopping comparison site, (anonymous per request) got
2.6 Million Euro funding!

On Wed, Sep 10, 2008 at 7:12 AM, robert engels [EMAIL PROTECTED]wrote:

 Sorry, but I am fairly certain you are mistaken.
 If you only have a single IndexReader, the RAMDirectory will be shared in
 all cases.

 The only memory growth is any buffer space allocated by an IndexInput (used
 in many places and cached).

 Normally the IndexInput created by a RAMDirectory do not have any buffer
 allocated, since the underlying store is already in memory.

 You have some other problem in your code...

 On Sep 10, 2008, at 1:10 AM, Chris Lu wrote:

 Actually, even I only use one IndexReader, some resources are cached via
 the ThreadLocal cache, and can not be released unless all threads do the
 close action.

 SegmentTermEnum itself is small, but it holds RAMDirectory along the path,
 which is big.

 --
 Chris Lu
 -
 Instant Scalable Full-Text Search On Any Database/Application
 site: http://www.dbsight.net
 demo: http://search.dbsight.com
 Lucene Database Search in 3 minutes:
 http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
 DBSight customer, a shopping comparison site, (anonymous per request) got
 2.6 Million Euro funding!
 On Tue, Sep 9, 2008 at 10:43 PM, robert engels [EMAIL PROTECTED]wrote:

  You do not need a pool of IndexReaders...
 It does not matter what class it is, what matters is the class that
 ultimately holds the reference.

 If the IndexReader is never closed, the SegmentReader(s) is never closed,
 so the thread local in TermInfosReader is not cleared (because the thread
 never dies). So you will get one SegmentTermEnum, per thread * per segment.

 The SegmentTermEnum is not a large object, so even if you had 100 threads,
 and 100 segments, for 10k instances, seems hard to believe that is the
 source of your memory issue.

 The SegmentTermEnum is cached by thread since it needs to enumerate the
 terms, not having a per thread cache, would lead to lots of random access
 when multiple threads read the index - very slow.

 You need to keep in mind, what if every thread was executing a search
 simultaneously - you would still have 100x100 SegmentTermEnum instances
 anyway !  The only way to prevent that would be to create and destroy the
 SegmentTermEnum on each call (opening and seeking to the proper spot) -
 which would be SLOW SLOW SLOW.

 On Sep 10, 2008, at 12:19 AM, Chris Lu wrote:

 I have tried to create an IndexReader pool and dynamically create
 searcher. But the memory leak is the same. It's not related to the Searcher
 class specifically, but the SegmentTermEnum in TermInfosReader.

 --
 Chris Lu
 -
 Instant Scalable Full-Text Search On Any Database/Application
 site: http://www.dbsight.net
 demo: http://search.dbsight.com
 Lucene Database Search in 3 minutes:
 http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
 DBSight customer, a shopping comparison site, (anonymous per request) got
 2.6 Million Euro funding!

 On Tue, Sep 9, 2008 at 10:14 PM, robert engels [EMAIL PROTECTED]wrote:

  A searcher uses an IndexReader - the IndexReader is slow to open, not a
 Searcher. And searchers can share an IndexReader.
 You want to create a single shared (across all threads/users) IndexReader
 (usually), and create an Searcher as needed and dispose.  It is 

Re: ThreadLocal causing memory leak with J2EE applications

2008-09-10 Thread Chris Lu
Frankly I don't know why TermInfosReader.ThreadResources is not showing up
in the memory snapshot.

Yes. It's been there for a long time. But let's see what's changed : A LRU
cache of termInfoCache is added.
I SegmentTermEnum previously would be released, since it's relatively a
simple object.
But with a cache added to the same class ThreadResources, which hold many
objects, with the threads still hanging around, the cache can not be
released, so in turn the SegmentTermEnum can not be released, so the
RAMDirectory can not be released.

My test is too coupled with the software I am working on and not easy to
post here. But here is a similar case from another user:

---

i found a forum post from you here [1] where you mention that you
have a memory leak using the lucene ram directory. I'd like to ask you
if you already have resolved the problem and how you did it or maybe
you know where i can read about the solution. We are using
RAMDirectory too and figured out, that over time the memory
consumption raises and raises until the system breaks down but only
when we performing much index updates. if we only create the index and
don't do nothing except searching it, it work fine.

---

-- 
Chris Lu
-
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes:
http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
DBSight customer, a shopping comparison site, (anonymous per request) got
2.6 Million Euro funding!

On Wed, Sep 10, 2008 at 2:45 AM, Michael McCandless 
[EMAIL PROTECTED] wrote:


 I still don't quite understand what's causing your memory growth.

 SegmentTermEnum insances have been held in a ThreadLocal cache in
 TermInfosReader for a very long time (at least since Lucene 1.4).

 If indeed it's the RAMDir's contents being kept alive due to this, then,
 you should have already been seeing this problem before rev 659602.  And I
 still don't get why your reference tree is missing the
 TermInfosReader.ThreadResources class.

 I'd like to understand the root cause before we hash out possible
 solutions.

 Can you post the sources for your load test?

 Mike


 Chris Lu wrote:

  Actually, even I only use one IndexReader, some resources are cached via
 the ThreadLocal cache, and can not be released unless all threads do the
 close action.

 SegmentTermEnum itself is small, but it holds RAMDirectory along the path,
 which is big.

 --
 Chris Lu
 -
 Instant Scalable Full-Text Search On Any Database/Application
 site: http://www.dbsight.net
 demo: http://search.dbsight.com
 Lucene Database Search in 3 minutes:
 http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
 DBSight customer, a shopping comparison site, (anonymous per request) got
 2.6 Million Euro funding!

 On Tue, Sep 9, 2008 at 10:43 PM, robert engels [EMAIL PROTECTED]
 wrote:
 You do not need a pool of IndexReaders...

 It does not matter what class it is, what matters is the class that
 ultimately holds the reference.

 If the IndexReader is never closed, the SegmentReader(s) is never closed,
 so the thread local in TermInfosReader is not cleared (because the thread
 never dies). So you will get one SegmentTermEnum, per thread * per segment.

 The SegmentTermEnum is not a large object, so even if you had 100 threads,
 and 100 segments, for 10k instances, seems hard to believe that is the
 source of your memory issue.

 The SegmentTermEnum is cached by thread since it needs to enumerate the
 terms, not having a per thread cache, would lead to lots of random access
 when multiple threads read the index - very slow.

 You need to keep in mind, what if every thread was executing a search
 simultaneously - you would still have 100x100 SegmentTermEnum instances
 anyway !  The only way to prevent that would be to create and destroy the
 SegmentTermEnum on each call (opening and seeking to the proper spot) -
 which would be SLOW SLOW SLOW.

 On Sep 10, 2008, at 12:19 AM, Chris Lu wrote:

  I have tried to create an IndexReader pool and dynamically create
 searcher. But the memory leak is the same. It's not related to the Searcher
 class specifically, but the SegmentTermEnum in TermInfosReader.

 --
 Chris Lu
 -
 Instant Scalable Full-Text Search On Any Database/Application
 site: http://www.dbsight.net
 demo: http://search.dbsight.com
 Lucene Database Search in 3 minutes:
 http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
 DBSight customer, a shopping comparison site, (anonymous per request) got
 2.6 Million Euro funding!

 On Tue, Sep 9, 2008 at 10:14 PM, robert engels [EMAIL PROTECTED]
 wrote:
 A searcher uses an IndexReader - the 

Re: ThreadLocal causing memory leak with J2EE applications

2008-09-10 Thread robert engels
It is basic Java. Threads are not guaranteed to run on any sort of  
schedule. If you create lots of large objects in one thread,  
releasing them in another, there is a good chance you will get an OOM  
(since the releasing thread may not run before the OOM occurs)...   
This is not Lucene specific by any means.


It is a misunderstanding on your part about how GC works.

I assume you must at some point be creating new RAMDirectories -  
otherwise the memory would never really increase, since the  
IndexReader/enums/etc are not very large...


When you create a new RAMDirectories, you need to BE CERTAIN !!! that  
the other IndexReaders/Searchers using the old RAMDirectory are ALL  
CLOSED, otherwise their memory will still be in use, which leads to  
your OOM...



On Sep 10, 2008, at 10:16 AM, Chris Lu wrote:

I do not believe I am making any mistake. Actually I just got an  
email from another user, complaining about the same thing. And I am  
having the same usage pattern.


After the reader is opened, the RAMDirectory is shared by several  
objects.
There is one instance of RAMDirectory in the memory, and it is  
holding lots of memory, which is expected.


If I close the reader in the same thread that has opened it, the  
RAMDirectory is gone from the memory.
If I close the reader in other threads, the RAMDirectory is left in  
the memory, referenced along the tree I draw in the first email.


I do not think the usage is wrong. Period.

-
Hi,

   i found a forum post from you here [1] where you mention that you
have a memory leak using the lucene ram directory. I'd like to ask you
if you already have resolved the problem and how you did it or maybe
you know where i can read about the solution. We are using
RAMDirectory too and figured out, that over time the memory
consumption raises and raises until the system breaks down but only
when we performing much index updates. if we only create the index and
don't do nothing except searching it, it work fine.

maybe you can give me a hint or a link,
greetz,
-

--
Chris Lu
-
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes: http://wiki.dbsight.com/ 
index.php?title=Create_Lucene_Database_Search_in_3_minutes
DBSight customer, a shopping comparison site, (anonymous per  
request) got 2.6 Million Euro funding!


On Wed, Sep 10, 2008 at 7:12 AM, robert engels  
[EMAIL PROTECTED] wrote:

Sorry, but I am fairly certain you are mistaken.

If you only have a single IndexReader, the RAMDirectory will be  
shared in all cases.


The only memory growth is any buffer space allocated by an  
IndexInput (used in many places and cached).


Normally the IndexInput created by a RAMDirectory do not have any  
buffer allocated, since the underlying store is already in memory.


You have some other problem in your code...

On Sep 10, 2008, at 1:10 AM, Chris Lu wrote:

Actually, even I only use one IndexReader, some resources are  
cached via the ThreadLocal cache, and can not be released unless  
all threads do the close action.


SegmentTermEnum itself is small, but it holds RAMDirectory along  
the path, which is big.


--
Chris Lu
-
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes: http://wiki.dbsight.com/ 
index.php?title=Create_Lucene_Database_Search_in_3_minutes
DBSight customer, a shopping comparison site, (anonymous per  
request) got 2.6 Million Euro funding!


On Tue, Sep 9, 2008 at 10:43 PM, robert engels  
[EMAIL PROTECTED] wrote:

You do not need a pool of IndexReaders...

It does not matter what class it is, what matters is the class  
that ultimately holds the reference.


If the IndexReader is never closed, the SegmentReader(s) is never  
closed, so the thread local in TermInfosReader is not cleared  
(because the thread never dies). So you will get one  
SegmentTermEnum, per thread * per segment.


The SegmentTermEnum is not a large object, so even if you had 100  
threads, and 100 segments, for 10k instances, seems hard to  
believe that is the source of your memory issue.


The SegmentTermEnum is cached by thread since it needs to  
enumerate the terms, not having a per thread cache, would lead to  
lots of random access when multiple threads read the index - very  
slow.


You need to keep in mind, what if every thread was executing a  
search simultaneously - you would still have 100x100  
SegmentTermEnum instances anyway !  The only way to prevent that  
would be to create and destroy the SegmentTermEnum on each call  
(opening and seeking to the proper spot) - which would be SLOW  
SLOW SLOW.


On Sep 10, 2008, at 12:19 AM, Chris Lu wrote:

I have tried to create an IndexReader pool and dynamically create  
searcher. 

Re: ThreadLocal causing memory leak with J2EE applications

2008-09-10 Thread robert engels
Actually, a single RAMDirectory would be sufficient (since it  
supports writes). There should never be a reason to create a new  
RAMDirectory (unless you have some specialized real-time search  
occuring).


If you are creating new RAMDirectories, the statements below hold.

On Sep 10, 2008, at 10:34 AM, robert engels wrote:

It is basic Java. Threads are not guaranteed to run on any sort of  
schedule. If you create lots of large objects in one thread,  
releasing them in another, there is a good chance you will get an  
OOM (since the releasing thread may not run before the OOM  
occurs)...  This is not Lucene specific by any means.


It is a misunderstanding on your part about how GC works.

I assume you must at some point be creating new RAMDirectories -  
otherwise the memory would never really increase, since the  
IndexReader/enums/etc are not very large...


When you create a new RAMDirectories, you need to BE CERTAIN !!!  
that the other IndexReaders/Searchers using the old RAMDirectory  
are ALL CLOSED, otherwise their memory will still be in use, which  
leads to your OOM...



On Sep 10, 2008, at 10:16 AM, Chris Lu wrote:

I do not believe I am making any mistake. Actually I just got an  
email from another user, complaining about the same thing. And I  
am having the same usage pattern.


After the reader is opened, the RAMDirectory is shared by several  
objects.
There is one instance of RAMDirectory in the memory, and it is  
holding lots of memory, which is expected.


If I close the reader in the same thread that has opened it, the  
RAMDirectory is gone from the memory.
If I close the reader in other threads, the RAMDirectory is left  
in the memory, referenced along the tree I draw in the first email.


I do not think the usage is wrong. Period.

-
Hi,

   i found a forum post from you here [1] where you mention that you
have a memory leak using the lucene ram directory. I'd like to ask  
you

if you already have resolved the problem and how you did it or maybe
you know where i can read about the solution. We are using
RAMDirectory too and figured out, that over time the memory
consumption raises and raises until the system breaks down but only
when we performing much index updates. if we only create the index  
and

don't do nothing except searching it, it work fine.

maybe you can give me a hint or a link,
greetz,
-

--
Chris Lu
-
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes: http://wiki.dbsight.com/ 
index.php?title=Create_Lucene_Database_Search_in_3_minutes
DBSight customer, a shopping comparison site, (anonymous per  
request) got 2.6 Million Euro funding!


On Wed, Sep 10, 2008 at 7:12 AM, robert engels  
[EMAIL PROTECTED] wrote:

Sorry, but I am fairly certain you are mistaken.

If you only have a single IndexReader, the RAMDirectory will be  
shared in all cases.


The only memory growth is any buffer space allocated by an  
IndexInput (used in many places and cached).


Normally the IndexInput created by a RAMDirectory do not have any  
buffer allocated, since the underlying store is already in memory.


You have some other problem in your code...

On Sep 10, 2008, at 1:10 AM, Chris Lu wrote:

Actually, even I only use one IndexReader, some resources are  
cached via the ThreadLocal cache, and can not be released unless  
all threads do the close action.


SegmentTermEnum itself is small, but it holds RAMDirectory along  
the path, which is big.


--
Chris Lu
-
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes: http://wiki.dbsight.com/ 
index.php?title=Create_Lucene_Database_Search_in_3_minutes
DBSight customer, a shopping comparison site, (anonymous per  
request) got 2.6 Million Euro funding!


On Tue, Sep 9, 2008 at 10:43 PM, robert engels  
[EMAIL PROTECTED] wrote:

You do not need a pool of IndexReaders...

It does not matter what class it is, what matters is the class  
that ultimately holds the reference.


If the IndexReader is never closed, the SegmentReader(s) is never  
closed, so the thread local in TermInfosReader is not cleared  
(because the thread never dies). So you will get one  
SegmentTermEnum, per thread * per segment.


The SegmentTermEnum is not a large object, so even if you had 100  
threads, and 100 segments, for 10k instances, seems hard to  
believe that is the source of your memory issue.


The SegmentTermEnum is cached by thread since it needs to  
enumerate the terms, not having a per thread cache, would lead to  
lots of random access when multiple threads read the index - very  
slow.


You need to keep in mind, what if every thread was executing a  
search simultaneously - you would still have 100x100  

Re: ThreadLocal causing memory leak with J2EE applications

2008-09-10 Thread Chris Lu
I am really want to find out where I am doing wrong, if that's the case.

Yes. I have made certain that I closed all Readers/Searchers, and verified
that through memory profiler.
Yes. I am creating new RAMDirectory. But that's the problem. I need to
update the content. Sure, if no content update and everything the same, of
course no OOM.

Yes. No guarantee of the thread schedule. But that's the problem. If Lucene
is using ThreadLocal to cache lots of things by the Thread as the key, and
no idea when it'll be released. Of course ThreadLocal is not Lucene's
problem...

Chris

On Wed, Sep 10, 2008 at 8:34 AM, robert engels [EMAIL PROTECTED]wrote:

 It is basic Java. Threads are not guaranteed to run on any sort of
 schedule. If you create lots of large objects in one thread, releasing them
 in another, there is a good chance you will get an OOM (since the releasing
 thread may not run before the OOM occurs)...  This is not Lucene specific by
 any means.
 It is a misunderstanding on your part about how GC works.

 I assume you must at some point be creating new RAMDirectories - otherwise
 the memory would never really increase, since the IndexReader/enums/etc are
 not very large...

 When you create a new RAMDirectories, you need to BE CERTAIN !!! that the
 other IndexReaders/Searchers using the old RAMDirectory are ALL CLOSED,
 otherwise their memory will still be in use, which leads to your OOM...


 On Sep 10, 2008, at 10:16 AM, Chris Lu wrote:

 I do not believe I am making any mistake. Actually I just got an email from
 another user, complaining about the same thing. And I am having the same
 usage pattern.
 After the reader is opened, the RAMDirectory is shared by several objects.
 There is one instance of RAMDirectory in the memory, and it is holding lots
 of memory, which is expected.

 If I close the reader in the same thread that has opened it, the
 RAMDirectory is gone from the memory.
 If I close the reader in other threads, the RAMDirectory is left in the
 memory, referenced along the tree I draw in the first email.

 I do not think the usage is wrong. Period.

 -

 Hi,

i found a forum post from you here [1] where you mention that you
 have a memory leak using the lucene ram directory. I'd like to ask you
 if you already have resolved the problem and how you did it or maybe
 you know where i can read about the solution. We are using
 RAMDirectory too and figured out, that over time the memory
 consumption raises and raises until the system breaks down but only
 when we performing much index updates. if we only create the index and
 don't do nothing except searching it, it work fine.

 maybe you can give me a hint or a link,
 greetz,

 -

 --
 Chris Lu
 -
 Instant Scalable Full-Text Search On Any Database/Application
 site: http://www.dbsight.net
 demo: http://search.dbsight.com
 Lucene Database Search in 3 minutes:
 http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
 DBSight customer, a shopping comparison site, (anonymous per request) got
 2.6 Million Euro funding!

 On Wed, Sep 10, 2008 at 7:12 AM, robert engels [EMAIL PROTECTED]wrote:

 Sorry, but I am fairly certain you are mistaken.
 If you only have a single IndexReader, the RAMDirectory will be shared in
 all cases.

 The only memory growth is any buffer space allocated by an IndexInput
 (used in many places and cached).

 Normally the IndexInput created by a RAMDirectory do not have any buffer
 allocated, since the underlying store is already in memory.

 You have some other problem in your code...

 On Sep 10, 2008, at 1:10 AM, Chris Lu wrote:

 Actually, even I only use one IndexReader, some resources are cached via
 the ThreadLocal cache, and can not be released unless all threads do the
 close action.

 SegmentTermEnum itself is small, but it holds RAMDirectory along the path,
 which is big.

 --
 Chris Lu
 -
 Instant Scalable Full-Text Search On Any Database/Application
 site: http://www.dbsight.net
 demo: http://search.dbsight.com
 Lucene Database Search in 3 minutes:
 http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
 DBSight customer, a shopping comparison site, (anonymous per request) got
 2.6 Million Euro funding!
 On Tue, Sep 9, 2008 at 10:43 PM, robert engels [EMAIL PROTECTED]wrote:

  You do not need a pool of IndexReaders...
 It does not matter what class it is, what matters is the class that
 ultimately holds the reference.

 If the IndexReader is never closed, the SegmentReader(s) is never closed,
 so the thread local in TermInfosReader is not cleared (because the thread
 never dies). So you will get one SegmentTermEnum, per thread * per segment.

 The SegmentTermEnum is not a large object, so even if you had 100
 threads, and 100 segments, for 10k instances, seems hard to believe that is
 the source of your memory issue.

 The 

Re: ThreadLocal causing memory leak with J2EE applications

2008-09-10 Thread Michael McCandless


Chris,

After you close your IndexSearcher/Reader, is it possible you're still  
holding a reference to it?


Mike

Chris Lu wrote:

Frankly I don't know why TermInfosReader.ThreadResources is not  
showing up in the memory snapshot.


Yes. It's been there for a long time. But let's see what's changed :  
A LRU cache of termInfoCache is added.
I SegmentTermEnum previously would be released, since it's  
relatively a simple object.
But with a cache added to the same class ThreadResources, which hold  
many objects, with the threads still hanging around, the cache can  
not be released, so in turn the SegmentTermEnum can not be released,  
so the RAMDirectory can not be released.


My test is too coupled with the software I am working on and not  
easy to post here. But here is a similar case from another user:


---
i found a forum post from you here [1] where you mention that you
have a memory leak using the lucene ram directory. I'd like to ask you
if you already have resolved the problem and how you did it or maybe
you know where i can read about the solution. We are using
RAMDirectory too and figured out, that over time the memory
consumption raises and raises until the system breaks down but only
when we performing much index updates. if we only create the index and
don't do nothing except searching it, it work fine.
---

--
Chris Lu
-
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes: 
http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
DBSight customer, a shopping comparison site, (anonymous per  
request) got 2.6 Million Euro funding!


On Wed, Sep 10, 2008 at 2:45 AM, Michael McCandless [EMAIL PROTECTED] 
 wrote:


I still don't quite understand what's causing your memory growth.

SegmentTermEnum insances have been held in a ThreadLocal cache in  
TermInfosReader for a very long time (at least since Lucene 1.4).


If indeed it's the RAMDir's contents being kept alive due to this,  
then, you should have already been seeing this problem before rev  
659602.  And I still don't get why your reference tree is missing  
the TermInfosReader.ThreadResources class.


I'd like to understand the root cause before we hash out possible  
solutions.


Can you post the sources for your load test?

Mike


Chris Lu wrote:

Actually, even I only use one IndexReader, some resources are cached  
via the ThreadLocal cache, and can not be released unless all  
threads do the close action.


SegmentTermEnum itself is small, but it holds RAMDirectory along the  
path, which is big.


--
Chris Lu
-
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes: 
http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
DBSight customer, a shopping comparison site, (anonymous per  
request) got 2.6 Million Euro funding!


On Tue, Sep 9, 2008 at 10:43 PM, robert engels  
[EMAIL PROTECTED] wrote:

You do not need a pool of IndexReaders...

It does not matter what class it is, what matters is the class that  
ultimately holds the reference.


If the IndexReader is never closed, the SegmentReader(s) is never  
closed, so the thread local in TermInfosReader is not cleared  
(because the thread never dies). So you will get one  
SegmentTermEnum, per thread * per segment.


The SegmentTermEnum is not a large object, so even if you had 100  
threads, and 100 segments, for 10k instances, seems hard to believe  
that is the source of your memory issue.


The SegmentTermEnum is cached by thread since it needs to enumerate  
the terms, not having a per thread cache, would lead to lots of  
random access when multiple threads read the index - very slow.


You need to keep in mind, what if every thread was executing a  
search simultaneously - you would still have 100x100 SegmentTermEnum  
instances anyway !  The only way to prevent that would be to create  
and destroy the SegmentTermEnum on each call (opening and seeking to  
the proper spot) - which would be SLOW SLOW SLOW.


On Sep 10, 2008, at 12:19 AM, Chris Lu wrote:

I have tried to create an IndexReader pool and dynamically create  
searcher. But the memory leak is the same. It's not related to the  
Searcher class specifically, but the SegmentTermEnum in  
TermInfosReader.


--
Chris Lu
-
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes: 
http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
DBSight customer, a shopping comparison site, (anonymous per  

Re: ThreadLocal causing memory leak with J2EE applications

2008-09-10 Thread Michael McCandless


Good question.

As far as I can tell, nowhere in Lucene do we put a SegmentTermEnum  
directly into ThreadLocal, after rev 659602.


Is it possible that output came from a run with Lucene before rev  
659602?


Mike

Chris Lu wrote:

Is it possible that some other places that's using SegmentTermEnum  
as ThreadLocal?
This may explain why TermInfosReader.ThreadResources is not in the  
memory snapshot.


--
Chris Lu
-
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes: 
http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
DBSight customer, a shopping comparison site, (anonymous per  
request) got 2.6 Million Euro funding!


On Wed, Sep 10, 2008 at 2:45 AM, Michael McCandless [EMAIL PROTECTED] 
 wrote:


I still don't quite understand what's causing your memory growth.

SegmentTermEnum insances have been held in a ThreadLocal cache in  
TermInfosReader for a very long time (at least since Lucene 1.4).


If indeed it's the RAMDir's contents being kept alive due to this,  
then, you should have already been seeing this problem before rev  
659602.  And I still don't get why your reference tree is missing  
the TermInfosReader.ThreadResources class.


I'd like to understand the root cause before we hash out possible  
solutions.


Can you post the sources for your load test?

Mike


Chris Lu wrote:

Actually, even I only use one IndexReader, some resources are cached  
via the ThreadLocal cache, and can not be released unless all  
threads do the close action.


SegmentTermEnum itself is small, but it holds RAMDirectory along the  
path, which is big.


--
Chris Lu
-
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes: 
http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
DBSight customer, a shopping comparison site, (anonymous per  
request) got 2.6 Million Euro funding!


On Tue, Sep 9, 2008 at 10:43 PM, robert engels  
[EMAIL PROTECTED] wrote:

You do not need a pool of IndexReaders...

It does not matter what class it is, what matters is the class that  
ultimately holds the reference.


If the IndexReader is never closed, the SegmentReader(s) is never  
closed, so the thread local in TermInfosReader is not cleared  
(because the thread never dies). So you will get one  
SegmentTermEnum, per thread * per segment.


The SegmentTermEnum is not a large object, so even if you had 100  
threads, and 100 segments, for 10k instances, seems hard to believe  
that is the source of your memory issue.


The SegmentTermEnum is cached by thread since it needs to enumerate  
the terms, not having a per thread cache, would lead to lots of  
random access when multiple threads read the index - very slow.


You need to keep in mind, what if every thread was executing a  
search simultaneously - you would still have 100x100 SegmentTermEnum  
instances anyway !  The only way to prevent that would be to create  
and destroy the SegmentTermEnum on each call (opening and seeking to  
the proper spot) - which would be SLOW SLOW SLOW.


On Sep 10, 2008, at 12:19 AM, Chris Lu wrote:

I have tried to create an IndexReader pool and dynamically create  
searcher. But the memory leak is the same. It's not related to the  
Searcher class specifically, but the SegmentTermEnum in  
TermInfosReader.


--
Chris Lu
-
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes: 
http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
DBSight customer, a shopping comparison site, (anonymous per  
request) got 2.6 Million Euro funding!


On Tue, Sep 9, 2008 at 10:14 PM, robert engels  
[EMAIL PROTECTED] wrote:
A searcher uses an IndexReader - the IndexReader is slow to open,  
not a Searcher. And searchers can share an IndexReader.


You want to create a single shared (across all threads/users)  
IndexReader (usually), and create an Searcher as needed and  
dispose.  It is VERY CHEAP to create the Searcher.


I am fairly certain the javadoc on Searcher is incorrect.  The  
warning For performance reasons it is recommended to open only one  
IndexSearcher and use it for all of your searches is not true in  
the case where an IndexReader is passed to the ctor.


Any caching should USUALLY be performed at the IndexReader level.

You are most likely using the path ctor, and that is the source of  
your problems, as multiple IndexReader instances are being created,  
and thus the memory use.



On Sep 9, 2008, at 11:44 PM, Chris Lu wrote:

On J2EE environment, usually there is a searcher pool with several  
searchers open.

The speed to opening a large index for every user is not 

Re: ThreadLocal causing memory leak with J2EE applications

2008-09-10 Thread robert engels
You do not need to create a new RAMDirectory - just write to the  
existing one, and then reopen() the IndexReader using it.


This will prevent lots of big objects being created. This may be the  
source of your problem.


Even if the Segment is closed, the ThreadLocal will no longer be  
referenced, but there will still be a reference to the  
SegmentTermEnum (which will be cleared when the thread dies, or most  
likely when new thread locals on that thread a created, so here is a  
potential problem.


Thread 1 does a search, creates a thread local that references the  
RAMDir (A).
Thread 2 does a search, creates a thread local that references the  
RAMDir (A).


All readers, are closed on RAMDir (A).

A new RAMDir (B) is opened.

There may still be references in the thread local maps to RAMDir A  
(since no new thread local have been created yet).


So you may get OOM depending on the size of the RAMDir (since you  
would need room for more than 1).  If you extend this out with lots  
of threads that don't run very often, you can see how you could  
easily run out of memory.  I think that ThreadLocal should use a  
ReferenceQueue so stale object slots can be reclaimed as soon as the  
key is dereferenced - but that is an issue for SUN.


This is why you don't want to create new RAMDirs.

A good rule of thumb - don't keep references to large objects in  
ThreadLocal (especially indirectly).  If needed, use a key, and  
then read the cache using a the key.

This would be something for the Lucene folks to change.

On Sep 10, 2008, at 10:44 AM, Chris Lu wrote:

I am really want to find out where I am doing wrong, if that's the  
case.


Yes. I have made certain that I closed all Readers/Searchers, and  
verified that through memory profiler.


Yes. I am creating new RAMDirectory. But that's the problem. I need  
to update the content. Sure, if no content update and everything  
the same, of course no OOM.


Yes. No guarantee of the thread schedule. But that's the problem.  
If Lucene is using ThreadLocal to cache lots of things by the  
Thread as the key, and no idea when it'll be released. Of course  
ThreadLocal is not Lucene's problem...


Chris

On Wed, Sep 10, 2008 at 8:34 AM, robert engels  
[EMAIL PROTECTED] wrote:
It is basic Java. Threads are not guaranteed to run on any sort of  
schedule. If you create lots of large objects in one thread,  
releasing them in another, there is a good chance you will get an  
OOM (since the releasing thread may not run before the OOM  
occurs)...  This is not Lucene specific by any means.


It is a misunderstanding on your part about how GC works.

I assume you must at some point be creating new RAMDirectories -  
otherwise the memory would never really increase, since the  
IndexReader/enums/etc are not very large...


When you create a new RAMDirectories, you need to BE CERTAIN !!!  
that the other IndexReaders/Searchers using the old RAMDirectory  
are ALL CLOSED, otherwise their memory will still be in use, which  
leads to your OOM...



On Sep 10, 2008, at 10:16 AM, Chris Lu wrote:

I do not believe I am making any mistake. Actually I just got an  
email from another user, complaining about the same thing. And I  
am having the same usage pattern.


After the reader is opened, the RAMDirectory is shared by several  
objects.
There is one instance of RAMDirectory in the memory, and it is  
holding lots of memory, which is expected.


If I close the reader in the same thread that has opened it, the  
RAMDirectory is gone from the memory.
If I close the reader in other threads, the RAMDirectory is left  
in the memory, referenced along the tree I draw in the first email.


I do not think the usage is wrong. Period.

-
Hi,

   i found a forum post from you here [1] where you mention that you
have a memory leak using the lucene ram directory. I'd like to ask  
you

if you already have resolved the problem and how you did it or maybe
you know where i can read about the solution. We are using
RAMDirectory too and figured out, that over time the memory
consumption raises and raises until the system breaks down but only
when we performing much index updates. if we only create the index  
and

don't do nothing except searching it, it work fine.

maybe you can give me a hint or a link,
greetz,
-

--
Chris Lu
-
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes: http://wiki.dbsight.com/ 
index.php?title=Create_Lucene_Database_Search_in_3_minutes
DBSight customer, a shopping comparison site, (anonymous per  
request) got 2.6 Million Euro funding!


On Wed, Sep 10, 2008 at 7:12 AM, robert engels  
[EMAIL PROTECTED] wrote:

Sorry, but I am fairly certain you are mistaken.

If you only have a single IndexReader, the RAMDirectory will be  
shared in all cases.


The only 

Re: ThreadLocal causing memory leak with J2EE applications

2008-09-10 Thread robert engels
My review of truck, show a SegmentReader, contains a TermInfosReader,  
which contains a threadlocal of ThreadResources, which contains a  
SegmentTermEnum.


So there should be a ThreadResources in the memory profiler for each  
SegmentTermEnum instances - unless you have something goofy going on.


On Sep 10, 2008, at 11:05 AM, Michael McCandless wrote:



Good question.

As far as I can tell, nowhere in Lucene do we put a SegmentTermEnum  
directly into ThreadLocal, after rev 659602.


Is it possible that output came from a run with Lucene before rev  
659602?


Mike

Chris Lu wrote:

Is it possible that some other places that's using SegmentTermEnum  
as ThreadLocal?
This may explain why TermInfosReader.ThreadResources is not in the  
memory snapshot.


--
Chris Lu
-
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes: http://wiki.dbsight.com/ 
index.php?title=Create_Lucene_Database_Search_in_3_minutes
DBSight customer, a shopping comparison site, (anonymous per  
request) got 2.6 Million Euro funding!


On Wed, Sep 10, 2008 at 2:45 AM, Michael McCandless  
[EMAIL PROTECTED] wrote:


I still don't quite understand what's causing your memory growth.

SegmentTermEnum insances have been held in a ThreadLocal cache in  
TermInfosReader for a very long time (at least since Lucene 1.4).


If indeed it's the RAMDir's contents being kept alive due to  
this, then, you should have already been seeing this problem  
before rev 659602.  And I still don't get why your reference tree  
is missing the TermInfosReader.ThreadResources class.


I'd like to understand the root cause before we hash out possible  
solutions.


Can you post the sources for your load test?

Mike


Chris Lu wrote:

Actually, even I only use one IndexReader, some resources are  
cached via the ThreadLocal cache, and can not be released unless  
all threads do the close action.


SegmentTermEnum itself is small, but it holds RAMDirectory along  
the path, which is big.


--
Chris Lu
-
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes: http://wiki.dbsight.com/ 
index.php?title=Create_Lucene_Database_Search_in_3_minutes
DBSight customer, a shopping comparison site, (anonymous per  
request) got 2.6 Million Euro funding!


On Tue, Sep 9, 2008 at 10:43 PM, robert engels  
[EMAIL PROTECTED] wrote:

You do not need a pool of IndexReaders...

It does not matter what class it is, what matters is the class  
that ultimately holds the reference.


If the IndexReader is never closed, the SegmentReader(s) is never  
closed, so the thread local in TermInfosReader is not cleared  
(because the thread never dies). So you will get one  
SegmentTermEnum, per thread * per segment.


The SegmentTermEnum is not a large object, so even if you had 100  
threads, and 100 segments, for 10k instances, seems hard to  
believe that is the source of your memory issue.


The SegmentTermEnum is cached by thread since it needs to  
enumerate the terms, not having a per thread cache, would lead to  
lots of random access when multiple threads read the index - very  
slow.


You need to keep in mind, what if every thread was executing a  
search simultaneously - you would still have 100x100  
SegmentTermEnum instances anyway !  The only way to prevent that  
would be to create and destroy the SegmentTermEnum on each call  
(opening and seeking to the proper spot) - which would be SLOW  
SLOW SLOW.


On Sep 10, 2008, at 12:19 AM, Chris Lu wrote:

I have tried to create an IndexReader pool and dynamically create  
searcher. But the memory leak is the same. It's not related to the  
Searcher class specifically, but the SegmentTermEnum in  
TermInfosReader.


--
Chris Lu
-
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes: http://wiki.dbsight.com/ 
index.php?title=Create_Lucene_Database_Search_in_3_minutes
DBSight customer, a shopping comparison site, (anonymous per  
request) got 2.6 Million Euro funding!


On Tue, Sep 9, 2008 at 10:14 PM, robert engels  
[EMAIL PROTECTED] wrote:
A searcher uses an IndexReader - the IndexReader is slow to open,  
not a Searcher. And searchers can share an IndexReader.


You want to create a single shared (across all threads/users)  
IndexReader (usually), and create an Searcher as needed and  
dispose.  It is VERY CHEAP to create the Searcher.


I am fairly certain the javadoc on Searcher is incorrect.  The  
warning For performance reasons it is recommended to open only  
one IndexSearcher and use it for all of your searches is not true  
in the case where an IndexReader is passed to the ctor.


Any caching should USUALLY be performed at the IndexReader 

Re: ThreadLocal causing memory leak with J2EE applications

2008-09-10 Thread robert engels
The other thing Lucene can do is create a SafeThreadLocal - it is  
rather trivial, and have that integrate at a higher-level, allowing  
for manual clean-up across all threads.


It MIGHT  be a bit slower than the JDK version (since that uses  
heuristics to clear stale entries), and so doesn't always clear.


But it will be far more deterministic.

If someone is interested I can post the class, but I think it is well  
within the understanding of the core Lucene developers.



On Sep 10, 2008, at 11:10 AM, robert engels wrote:

You do not need to create a new RAMDirectory - just write to the  
existing one, and then reopen() the IndexReader using it.


This will prevent lots of big objects being created. This may be  
the source of your problem.


Even if the Segment is closed, the ThreadLocal will no longer be  
referenced, but there will still be a reference to the  
SegmentTermEnum (which will be cleared when the thread dies, or  
most likely when new thread locals on that thread a created, so  
here is a potential problem.


Thread 1 does a search, creates a thread local that references the  
RAMDir (A).
Thread 2 does a search, creates a thread local that references the  
RAMDir (A).


All readers, are closed on RAMDir (A).

A new RAMDir (B) is opened.

There may still be references in the thread local maps to RAMDir A  
(since no new thread local have been created yet).


So you may get OOM depending on the size of the RAMDir (since you  
would need room for more than 1).  If you extend this out with lots  
of threads that don't run very often, you can see how you could  
easily run out of memory.  I think that ThreadLocal should use a  
ReferenceQueue so stale object slots can be reclaimed as soon as  
the key is dereferenced - but that is an issue for SUN.


This is why you don't want to create new RAMDirs.

A good rule of thumb - don't keep references to large objects in  
ThreadLocal (especially indirectly).  If needed, use a key, and  
then read the cache using a the key.

This would be something for the Lucene folks to change.

On Sep 10, 2008, at 10:44 AM, Chris Lu wrote:

I am really want to find out where I am doing wrong, if that's the  
case.


Yes. I have made certain that I closed all Readers/Searchers, and  
verified that through memory profiler.


Yes. I am creating new RAMDirectory. But that's the problem. I  
need to update the content. Sure, if no content update and  
everything the same, of course no OOM.


Yes. No guarantee of the thread schedule. But that's the problem.  
If Lucene is using ThreadLocal to cache lots of things by the  
Thread as the key, and no idea when it'll be released. Of course  
ThreadLocal is not Lucene's problem...


Chris

On Wed, Sep 10, 2008 at 8:34 AM, robert engels  
[EMAIL PROTECTED] wrote:
It is basic Java. Threads are not guaranteed to run on any sort of  
schedule. If you create lots of large objects in one thread,  
releasing them in another, there is a good chance you will get an  
OOM (since the releasing thread may not run before the OOM  
occurs)...  This is not Lucene specific by any means.


It is a misunderstanding on your part about how GC works.

I assume you must at some point be creating new RAMDirectories -  
otherwise the memory would never really increase, since the  
IndexReader/enums/etc are not very large...


When you create a new RAMDirectories, you need to BE CERTAIN !!!  
that the other IndexReaders/Searchers using the old RAMDirectory  
are ALL CLOSED, otherwise their memory will still be in use, which  
leads to your OOM...



On Sep 10, 2008, at 10:16 AM, Chris Lu wrote:

I do not believe I am making any mistake. Actually I just got an  
email from another user, complaining about the same thing. And I  
am having the same usage pattern.


After the reader is opened, the RAMDirectory is shared by several  
objects.
There is one instance of RAMDirectory in the memory, and it is  
holding lots of memory, which is expected.


If I close the reader in the same thread that has opened it, the  
RAMDirectory is gone from the memory.
If I close the reader in other threads, the RAMDirectory is left  
in the memory, referenced along the tree I draw in the first email.


I do not think the usage is wrong. Period.

-
Hi,

   i found a forum post from you here [1] where you mention that you
have a memory leak using the lucene ram directory. I'd like to  
ask you

if you already have resolved the problem and how you did it or maybe
you know where i can read about the solution. We are using
RAMDirectory too and figured out, that over time the memory
consumption raises and raises until the system breaks down but only
when we performing much index updates. if we only create the  
index and

don't do nothing except searching it, it work fine.

maybe you can give me a hint or a link,
greetz,
-

--
Chris Lu
-
Instant Scalable Full-Text Search On 

Re: ThreadLocal causing memory leak with J2EE applications

2008-09-10 Thread Chris Lu
Thanks for the analysis, really appreciate it, and I agree with it. But...
This is really a normal J2EE use case. The threads seldom die.
Doesn't that mean closing the RAMDirectory doesn't work for J2EE
applications?
And only reopen() works?
And close() doesn't release the resources? duh...

I can only say this is a problem to be cleaned up.

-- 
Chris Lu
-
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes:
http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
DBSight customer, a shopping comparison site, (anonymous per request) got
2.6 Million Euro funding!


On Wed, Sep 10, 2008 at 9:10 AM, robert engels [EMAIL PROTECTED]wrote:

 You do not need to create a new RAMDirectory - just write to the existing
 one, and then reopen() the IndexReader using it.
 This will prevent lots of big objects being created. This may be the source
 of your problem.

 Even if the Segment is closed, the ThreadLocal will no longer be
 referenced, but there will still be a reference to the SegmentTermEnum
 (which will be cleared when the thread dies, or most likely when new
 thread locals on that thread a created, so here is a potential problem.

 Thread 1 does a search, creates a thread local that references the RAMDir
 (A).
 Thread 2 does a search, creates a thread local that references the RAMDir
 (A).

 All readers, are closed on RAMDir (A).

 A new RAMDir (B) is opened.

 There may still be references in the thread local maps to RAMDir A (since
 no new thread local have been created yet).

 So you may get OOM depending on the size of the RAMDir (since you would
 need room for more than 1).  If you extend this out with lots of threads
 that don't run very often, you can see how you could easily run out of
 memory.  I think that ThreadLocal should use a ReferenceQueue so stale
 object slots can be reclaimed as soon as the key is dereferenced - but that
 is an issue for SUN.

 This is why you don't want to create new RAMDirs.

 A good rule of thumb - don't keep references to large objects in
 ThreadLocal (especially indirectly).  If needed, use a key, and then read
 the cache using a the key.
 This would be something for the Lucene folks to change.

 On Sep 10, 2008, at 10:44 AM, Chris Lu wrote:

 I am really want to find out where I am doing wrong, if that's the case.

 Yes. I have made certain that I closed all Readers/Searchers, and verified
 that through memory profiler.
 Yes. I am creating new RAMDirectory. But that's the problem. I need to
 update the content. Sure, if no content update and everything the same, of
 course no OOM.

 Yes. No guarantee of the thread schedule. But that's the problem. If Lucene
 is using ThreadLocal to cache lots of things by the Thread as the key, and
 no idea when it'll be released. Of course ThreadLocal is not Lucene's
 problem...

 Chris

 On Wed, Sep 10, 2008 at 8:34 AM, robert engels [EMAIL PROTECTED]wrote:

  It is basic Java. Threads are not guaranteed to run on any sort of
 schedule. If you create lots of large objects in one thread, releasing them
 in another, there is a good chance you will get an OOM (since the releasing
 thread may not run before the OOM occurs)...  This is not Lucene specific by
 any means.
 It is a misunderstanding on your part about how GC works.

 I assume you must at some point be creating new RAMDirectories - otherwise
 the memory would never really increase, since the IndexReader/enums/etc are
 not very large...

 When you create a new RAMDirectories, you need to BE CERTAIN !!! that the
 other IndexReaders/Searchers using the old RAMDirectory are ALL CLOSED,
 otherwise their memory will still be in use, which leads to your OOM...


 On Sep 10, 2008, at 10:16 AM, Chris Lu wrote:

 I do not believe I am making any mistake. Actually I just got an email
 from another user, complaining about the same thing. And I am having the
 same usage pattern.
 After the reader is opened, the RAMDirectory is shared by several objects.
 There is one instance of RAMDirectory in the memory, and it is holding
 lots of memory, which is expected.

 If I close the reader in the same thread that has opened it, the
 RAMDirectory is gone from the memory.
 If I close the reader in other threads, the RAMDirectory is left in the
 memory, referenced along the tree I draw in the first email.

 I do not think the usage is wrong. Period.

 -

 Hi,

i found a forum post from you here [1] where you mention that you
 have a memory leak using the lucene ram directory. I'd like to ask you
 if you already have resolved the problem and how you did it or maybe
 you know where i can read about the solution. We are using
 RAMDirectory too and figured out, that over time the memory
 consumption raises and raises until the system breaks down but only
 when we performing much index updates. if we only 

Re: ThreadLocal causing memory leak with J2EE applications

2008-09-10 Thread Chris Lu
Not holding searcher/reader. I did check that via memory snapshot.

-- 
Chris Lu
-
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes:
http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
DBSight customer, a shopping comparison site, (anonymous per request) got
2.6 Million Euro funding!

On Wed, Sep 10, 2008 at 8:58 AM, Michael McCandless 
[EMAIL PROTECTED] wrote:


 Chris,

 After you close your IndexSearcher/Reader, is it possible you're still
 holding a reference to it?

 Mike


 Chris Lu wrote:

  Frankly I don't know why TermInfosReader.ThreadResources is not showing up
 in the memory snapshot.

 Yes. It's been there for a long time. But let's see what's changed : A LRU
 cache of termInfoCache is added.
 I SegmentTermEnum previously would be released, since it's relatively a
 simple object.
 But with a cache added to the same class ThreadResources, which hold many
 objects, with the threads still hanging around, the cache can not be
 released, so in turn the SegmentTermEnum can not be released, so the
 RAMDirectory can not be released.

 My test is too coupled with the software I am working on and not easy to
 post here. But here is a similar case from another user:


 ---
 i found a forum post from you here [1] where you mention that you
 have a memory leak using the lucene ram directory. I'd like to ask you
 if you already have resolved the problem and how you did it or maybe
 you know where i can read about the solution. We are using
 RAMDirectory too and figured out, that over time the memory
 consumption raises and raises until the system breaks down but only
 when we performing much index updates. if we only create the index and
 don't do nothing except searching it, it work fine.

 ---

 --
 Chris Lu
 -
 Instant Scalable Full-Text Search On Any Database/Application
 site: http://www.dbsight.net
 demo: http://search.dbsight.com
 Lucene Database Search in 3 minutes:
 http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
 DBSight customer, a shopping comparison site, (anonymous per request) got
 2.6 Million Euro funding!

 On Wed, Sep 10, 2008 at 2:45 AM, Michael McCandless 
 [EMAIL PROTECTED] wrote:

 I still don't quite understand what's causing your memory growth.

 SegmentTermEnum insances have been held in a ThreadLocal cache in
 TermInfosReader for a very long time (at least since Lucene 1.4).

 If indeed it's the RAMDir's contents being kept alive due to this, then,
 you should have already been seeing this problem before rev 659602.  And I
 still don't get why your reference tree is missing the
 TermInfosReader.ThreadResources class.

 I'd like to understand the root cause before we hash out possible
 solutions.

 Can you post the sources for your load test?

 Mike


 Chris Lu wrote:

 Actually, even I only use one IndexReader, some resources are cached via
 the ThreadLocal cache, and can not be released unless all threads do the
 close action.

 SegmentTermEnum itself is small, but it holds RAMDirectory along the path,
 which is big.

 --
 Chris Lu
 -
 Instant Scalable Full-Text Search On Any Database/Application
 site: http://www.dbsight.net
 demo: http://search.dbsight.com
 Lucene Database Search in 3 minutes:
 http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
 DBSight customer, a shopping comparison site, (anonymous per request) got
 2.6 Million Euro funding!

 On Tue, Sep 9, 2008 at 10:43 PM, robert engels [EMAIL PROTECTED]
 wrote:
 You do not need a pool of IndexReaders...

 It does not matter what class it is, what matters is the class that
 ultimately holds the reference.

 If the IndexReader is never closed, the SegmentReader(s) is never closed,
 so the thread local in TermInfosReader is not cleared (because the thread
 never dies). So you will get one SegmentTermEnum, per thread * per segment.

 The SegmentTermEnum is not a large object, so even if you had 100 threads,
 and 100 segments, for 10k instances, seems hard to believe that is the
 source of your memory issue.

 The SegmentTermEnum is cached by thread since it needs to enumerate the
 terms, not having a per thread cache, would lead to lots of random access
 when multiple threads read the index - very slow.

 You need to keep in mind, what if every thread was executing a search
 simultaneously - you would still have 100x100 SegmentTermEnum instances
 anyway !  The only way to prevent that would be to create and destroy the
 SegmentTermEnum on each call (opening and seeking to the proper spot) -
 which would be SLOW SLOW SLOW.

 On Sep 10, 2008, at 12:19 AM, Chris Lu wrote:

 I have tried to create 

Re: ThreadLocal causing memory leak with J2EE applications

2008-09-10 Thread robert engels
Close() does work - it is just that the memory may not be freed until  
much later...


When working with VERY LARGE objects, this can be a problem.

On Sep 10, 2008, at 12:36 PM, Chris Lu wrote:

Thanks for the analysis, really appreciate it, and I agree with it.  
But...


This is really a normal J2EE use case. The threads seldom die.
Doesn't that mean closing the RAMDirectory doesn't work for J2EE  
applications?

And only reopen() works?
And close() doesn't release the resources? duh...

I can only say this is a problem to be cleaned up.

--
Chris Lu
-
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes: http://wiki.dbsight.com/ 
index.php?title=Create_Lucene_Database_Search_in_3_minutes
DBSight customer, a shopping comparison site, (anonymous per  
request) got 2.6 Million Euro funding!



On Wed, Sep 10, 2008 at 9:10 AM, robert engels  
[EMAIL PROTECTED] wrote:
You do not need to create a new RAMDirectory - just write to the  
existing one, and then reopen() the IndexReader using it.


This will prevent lots of big objects being created. This may be  
the source of your problem.


Even if the Segment is closed, the ThreadLocal will no longer be  
referenced, but there will still be a reference to the  
SegmentTermEnum (which will be cleared when the thread dies, or  
most likely when new thread locals on that thread a created, so  
here is a potential problem.


Thread 1 does a search, creates a thread local that references the  
RAMDir (A).
Thread 2 does a search, creates a thread local that references the  
RAMDir (A).


All readers, are closed on RAMDir (A).

A new RAMDir (B) is opened.

There may still be references in the thread local maps to RAMDir A  
(since no new thread local have been created yet).


So you may get OOM depending on the size of the RAMDir (since you  
would need room for more than 1).  If you extend this out with lots  
of threads that don't run very often, you can see how you could  
easily run out of memory.  I think that ThreadLocal should use a  
ReferenceQueue so stale object slots can be reclaimed as soon as  
the key is dereferenced - but that is an issue for SUN.


This is why you don't want to create new RAMDirs.

A good rule of thumb - don't keep references to large objects in  
ThreadLocal (especially indirectly).  If needed, use a key, and  
then read the cache using a the key.

This would be something for the Lucene folks to change.

On Sep 10, 2008, at 10:44 AM, Chris Lu wrote:

I am really want to find out where I am doing wrong, if that's the  
case.


Yes. I have made certain that I closed all Readers/Searchers, and  
verified that through memory profiler.


Yes. I am creating new RAMDirectory. But that's the problem. I  
need to update the content. Sure, if no content update and  
everything the same, of course no OOM.


Yes. No guarantee of the thread schedule. But that's the problem.  
If Lucene is using ThreadLocal to cache lots of things by the  
Thread as the key, and no idea when it'll be released. Of course  
ThreadLocal is not Lucene's problem...


Chris

On Wed, Sep 10, 2008 at 8:34 AM, robert engels  
[EMAIL PROTECTED] wrote:
It is basic Java. Threads are not guaranteed to run on any sort of  
schedule. If you create lots of large objects in one thread,  
releasing them in another, there is a good chance you will get an  
OOM (since the releasing thread may not run before the OOM  
occurs)...  This is not Lucene specific by any means.


It is a misunderstanding on your part about how GC works.

I assume you must at some point be creating new RAMDirectories -  
otherwise the memory would never really increase, since the  
IndexReader/enums/etc are not very large...


When you create a new RAMDirectories, you need to BE CERTAIN !!!  
that the other IndexReaders/Searchers using the old RAMDirectory  
are ALL CLOSED, otherwise their memory will still be in use, which  
leads to your OOM...



On Sep 10, 2008, at 10:16 AM, Chris Lu wrote:

I do not believe I am making any mistake. Actually I just got an  
email from another user, complaining about the same thing. And I  
am having the same usage pattern.


After the reader is opened, the RAMDirectory is shared by several  
objects.
There is one instance of RAMDirectory in the memory, and it is  
holding lots of memory, which is expected.


If I close the reader in the same thread that has opened it, the  
RAMDirectory is gone from the memory.
If I close the reader in other threads, the RAMDirectory is left  
in the memory, referenced along the tree I draw in the first email.


I do not think the usage is wrong. Period.

-
Hi,

   i found a forum post from you here [1] where you mention that you
have a memory leak using the lucene ram directory. I'd like to  
ask you

if you already have resolved the problem and how you did it or maybe

Re: ThreadLocal causing memory leak with J2EE applications

2008-09-10 Thread Chris Lu
Not likely. Actually I made some changes to Lucene source code and I can see
the changes in the memory snapshot. So it is the latest Lucene version.
-- 
Chris Lu
-
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes:
http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
DBSight customer, a shopping comparison site, (anonymous per request) got
2.6 Million Euro funding!

On Wed, Sep 10, 2008 at 9:05 AM, Michael McCandless 
[EMAIL PROTECTED] wrote:


 Good question.

 As far as I can tell, nowhere in Lucene do we put a SegmentTermEnum
 directly into ThreadLocal, after rev 659602.

 Is it possible that output came from a run with Lucene before rev 659602?

 Mike


 Chris Lu wrote:

  Is it possible that some other places that's using SegmentTermEnum as
 ThreadLocal?
 This may explain why TermInfosReader.ThreadResources is not in the memory
 snapshot.

 --
 Chris Lu
 -
 Instant Scalable Full-Text Search On Any Database/Application
 site: http://www.dbsight.net
 demo: http://search.dbsight.com
 Lucene Database Search in 3 minutes:
 http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
 DBSight customer, a shopping comparison site, (anonymous per request) got
 2.6 Million Euro funding!

 On Wed, Sep 10, 2008 at 2:45 AM, Michael McCandless 
 [EMAIL PROTECTED] wrote:

 I still don't quite understand what's causing your memory growth.

 SegmentTermEnum insances have been held in a ThreadLocal cache in
 TermInfosReader for a very long time (at least since Lucene 1.4).

 If indeed it's the RAMDir's contents being kept alive due to this, then,
 you should have already been seeing this problem before rev 659602.  And I
 still don't get why your reference tree is missing the
 TermInfosReader.ThreadResources class.

 I'd like to understand the root cause before we hash out possible
 solutions.

 Can you post the sources for your load test?

 Mike


 Chris Lu wrote:

 Actually, even I only use one IndexReader, some resources are cached via
 the ThreadLocal cache, and can not be released unless all threads do the
 close action.

 SegmentTermEnum itself is small, but it holds RAMDirectory along the path,
 which is big.

 --
 Chris Lu
 -
 Instant Scalable Full-Text Search On Any Database/Application
 site: http://www.dbsight.net
 demo: http://search.dbsight.com
 Lucene Database Search in 3 minutes:
 http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
 DBSight customer, a shopping comparison site, (anonymous per request) got
 2.6 Million Euro funding!

 On Tue, Sep 9, 2008 at 10:43 PM, robert engels [EMAIL PROTECTED]
 wrote:
 You do not need a pool of IndexReaders...

 It does not matter what class it is, what matters is the class that
 ultimately holds the reference.

 If the IndexReader is never closed, the SegmentReader(s) is never closed,
 so the thread local in TermInfosReader is not cleared (because the thread
 never dies). So you will get one SegmentTermEnum, per thread * per segment.

 The SegmentTermEnum is not a large object, so even if you had 100 threads,
 and 100 segments, for 10k instances, seems hard to believe that is the
 source of your memory issue.

 The SegmentTermEnum is cached by thread since it needs to enumerate the
 terms, not having a per thread cache, would lead to lots of random access
 when multiple threads read the index - very slow.

 You need to keep in mind, what if every thread was executing a search
 simultaneously - you would still have 100x100 SegmentTermEnum instances
 anyway !  The only way to prevent that would be to create and destroy the
 SegmentTermEnum on each call (opening and seeking to the proper spot) -
 which would be SLOW SLOW SLOW.

 On Sep 10, 2008, at 12:19 AM, Chris Lu wrote:

 I have tried to create an IndexReader pool and dynamically create
 searcher. But the memory leak is the same. It's not related to the Searcher
 class specifically, but the SegmentTermEnum in TermInfosReader.

 --
 Chris Lu
 -
 Instant Scalable Full-Text Search On Any Database/Application
 site: http://www.dbsight.net
 demo: http://search.dbsight.com
 Lucene Database Search in 3 minutes:
 http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
 DBSight customer, a shopping comparison site, (anonymous per request) got
 2.6 Million Euro funding!

 On Tue, Sep 9, 2008 at 10:14 PM, robert engels [EMAIL PROTECTED]
 wrote:
 A searcher uses an IndexReader - the IndexReader is slow to open, not a
 Searcher. And searchers can share an IndexReader.

 You want to create a single shared (across all threads/users) IndexReader
 (usually), and create an Searcher as needed and dispose.  It is VERY CHEAP
 to create the Searcher.

 I am fairly certain the javadoc on Searcher is incorrect.  The warning
 

Re: ThreadLocal causing memory leak with J2EE applications

2008-09-10 Thread Chris Lu
Yeah, the timing is different. But it's an unknown, undetermined, and
uncontrollable time...
We can not ask the user,

while(memory is low){
  sleep(1000);
}
do_the_real_thing_an_hour_later


-- 
Chris Lu
-
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes:
http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
DBSight customer, a shopping comparison site, (anonymous per request) got
2.6 Million Euro funding!

On Wed, Sep 10, 2008 at 10:39 AM, robert engels [EMAIL PROTECTED]wrote:

 Close() does work - it is just that the memory may not be freed until much
 later...
 When working with VERY LARGE objects, this can be a problem.

 On Sep 10, 2008, at 12:36 PM, Chris Lu wrote:

 Thanks for the analysis, really appreciate it, and I agree with it. But...
 This is really a normal J2EE use case. The threads seldom die.
 Doesn't that mean closing the RAMDirectory doesn't work for J2EE
 applications?
 And only reopen() works?
 And close() doesn't release the resources? duh...

 I can only say this is a problem to be cleaned up.

 --
 Chris Lu
 -
 Instant Scalable Full-Text Search On Any Database/Application
 site: http://www.dbsight.net
 demo: http://search.dbsight.com
 Lucene Database Search in 3 minutes:
 http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
 DBSight customer, a shopping comparison site, (anonymous per request) got
 2.6 Million Euro funding!


 On Wed, Sep 10, 2008 at 9:10 AM, robert engels [EMAIL PROTECTED]wrote:

 You do not need to create a new RAMDirectory - just write to the existing
 one, and then reopen() the IndexReader using it.
 This will prevent lots of big objects being created. This may be the
 source of your problem.

 Even if the Segment is closed, the ThreadLocal will no longer be
 referenced, but there will still be a reference to the SegmentTermEnum
 (which will be cleared when the thread dies, or most likely when new
 thread locals on that thread a created, so here is a potential problem.

 Thread 1 does a search, creates a thread local that references the RAMDir
 (A).
 Thread 2 does a search, creates a thread local that references the RAMDir
 (A).

 All readers, are closed on RAMDir (A).

 A new RAMDir (B) is opened.

 There may still be references in the thread local maps to RAMDir A (since
 no new thread local have been created yet).

 So you may get OOM depending on the size of the RAMDir (since you would
 need room for more than 1).  If you extend this out with lots of threads
 that don't run very often, you can see how you could easily run out of
 memory.  I think that ThreadLocal should use a ReferenceQueue so stale
 object slots can be reclaimed as soon as the key is dereferenced - but that
 is an issue for SUN.

 This is why you don't want to create new RAMDirs.

 A good rule of thumb - don't keep references to large objects in
 ThreadLocal (especially indirectly).  If needed, use a key, and then read
 the cache using a the key.
 This would be something for the Lucene folks to change.

 On Sep 10, 2008, at 10:44 AM, Chris Lu wrote:

 I am really want to find out where I am doing wrong, if that's the case.

 Yes. I have made certain that I closed all Readers/Searchers, and verified
 that through memory profiler.
 Yes. I am creating new RAMDirectory. But that's the problem. I need to
 update the content. Sure, if no content update and everything the same, of
 course no OOM.

 Yes. No guarantee of the thread schedule. But that's the problem. If
 Lucene is using ThreadLocal to cache lots of things by the Thread as the
 key, and no idea when it'll be released. Of course ThreadLocal is not
 Lucene's problem...

 Chris

 On Wed, Sep 10, 2008 at 8:34 AM, robert engels [EMAIL PROTECTED]wrote:

  It is basic Java. Threads are not guaranteed to run on any sort of
 schedule. If you create lots of large objects in one thread, releasing them
 in another, there is a good chance you will get an OOM (since the releasing
 thread may not run before the OOM occurs)...  This is not Lucene specific by
 any means.
 It is a misunderstanding on your part about how GC works.

 I assume you must at some point be creating new RAMDirectories -
 otherwise the memory would never really increase, since the
 IndexReader/enums/etc are not very large...

 When you create a new RAMDirectories, you need to BE CERTAIN !!! that the
 other IndexReaders/Searchers using the old RAMDirectory are ALL CLOSED,
 otherwise their memory will still be in use, which leads to your OOM...


 On Sep 10, 2008, at 10:16 AM, Chris Lu wrote:

 I do not believe I am making any mistake. Actually I just got an email
 from another user, complaining about the same thing. And I am having the
 same usage pattern.
 After the reader is opened, the RAMDirectory is shared by several
 objects.
 There is one instance of 

Re: ThreadLocal causing memory leak with J2EE applications

2008-09-10 Thread robert engels

Why not just use reopen() and be done with it???

On Sep 10, 2008, at 12:48 PM, Chris Lu wrote:

Yeah, the timing is different. But it's an unknown, undetermined,  
and uncontrollable time...


We can not ask the user,

while(memory is low){
  sleep(1000);
}
do_the_real_thing_an_hour_later


--
Chris Lu
-
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes: http://wiki.dbsight.com/ 
index.php?title=Create_Lucene_Database_Search_in_3_minutes
DBSight customer, a shopping comparison site, (anonymous per  
request) got 2.6 Million Euro funding!


On Wed, Sep 10, 2008 at 10:39 AM, robert engels  
[EMAIL PROTECTED] wrote:
Close() does work - it is just that the memory may not be freed  
until much later...


When working with VERY LARGE objects, this can be a problem.

On Sep 10, 2008, at 12:36 PM, Chris Lu wrote:

Thanks for the analysis, really appreciate it, and I agree with  
it. But...


This is really a normal J2EE use case. The threads seldom die.
Doesn't that mean closing the RAMDirectory doesn't work for J2EE  
applications?

And only reopen() works?
And close() doesn't release the resources? duh...

I can only say this is a problem to be cleaned up.

--
Chris Lu
-
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes: http://wiki.dbsight.com/ 
index.php?title=Create_Lucene_Database_Search_in_3_minutes
DBSight customer, a shopping comparison site, (anonymous per  
request) got 2.6 Million Euro funding!



On Wed, Sep 10, 2008 at 9:10 AM, robert engels  
[EMAIL PROTECTED] wrote:
You do not need to create a new RAMDirectory - just write to the  
existing one, and then reopen() the IndexReader using it.


This will prevent lots of big objects being created. This may be  
the source of your problem.


Even if the Segment is closed, the ThreadLocal will no longer be  
referenced, but there will still be a reference to the  
SegmentTermEnum (which will be cleared when the thread dies, or  
most likely when new thread locals on that thread a created, so  
here is a potential problem.


Thread 1 does a search, creates a thread local that references the  
RAMDir (A).
Thread 2 does a search, creates a thread local that references the  
RAMDir (A).


All readers, are closed on RAMDir (A).

A new RAMDir (B) is opened.

There may still be references in the thread local maps to RAMDir A  
(since no new thread local have been created yet).


So you may get OOM depending on the size of the RAMDir (since you  
would need room for more than 1).  If you extend this out with  
lots of threads that don't run very often, you can see how you  
could easily run out of memory.  I think that ThreadLocal should  
use a ReferenceQueue so stale object slots can be reclaimed as  
soon as the key is dereferenced - but that is an issue for SUN.


This is why you don't want to create new RAMDirs.

A good rule of thumb - don't keep references to large objects in  
ThreadLocal (especially indirectly).  If needed, use a key, and  
then read the cache using a the key.

This would be something for the Lucene folks to change.

On Sep 10, 2008, at 10:44 AM, Chris Lu wrote:

I am really want to find out where I am doing wrong, if that's  
the case.


Yes. I have made certain that I closed all Readers/Searchers, and  
verified that through memory profiler.


Yes. I am creating new RAMDirectory. But that's the problem. I  
need to update the content. Sure, if no content update and  
everything the same, of course no OOM.


Yes. No guarantee of the thread schedule. But that's the problem.  
If Lucene is using ThreadLocal to cache lots of things by the  
Thread as the key, and no idea when it'll be released. Of course  
ThreadLocal is not Lucene's problem...


Chris

On Wed, Sep 10, 2008 at 8:34 AM, robert engels  
[EMAIL PROTECTED] wrote:
It is basic Java. Threads are not guaranteed to run on any sort  
of schedule. If you create lots of large objects in one thread,  
releasing them in another, there is a good chance you will get an  
OOM (since the releasing thread may not run before the OOM  
occurs)...  This is not Lucene specific by any means.


It is a misunderstanding on your part about how GC works.

I assume you must at some point be creating new RAMDirectories -  
otherwise the memory would never really increase, since the  
IndexReader/enums/etc are not very large...


When you create a new RAMDirectories, you need to BE CERTAIN !!!  
that the other IndexReaders/Searchers using the old RAMDirectory  
are ALL CLOSED, otherwise their memory will still be in use,  
which leads to your OOM...



On Sep 10, 2008, at 10:16 AM, Chris Lu wrote:

I do not believe I am making any mistake. Actually I just got an  
email from another user, complaining about the same thing. And I  
am having 

Re: ThreadLocal causing memory leak with J2EE applications

2008-09-10 Thread Chris Lu
Actually I am done with it by simply downgrading and not to use r659602 and
later.The old version is more clean and consistent with the API and close()
does mean close, not something complicated and unknown to most users, which
almost feels like a trap. And later on, if no changes happened for this
file, I will have to upgrade Lucene and manually remove the patch
Lucene-1195.

-- 
Chris Lu
-
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes:
http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
DBSight customer, a shopping comparison site, (anonymous per request) got
2.6 Million Euro funding!

On Wed, Sep 10, 2008 at 10:56 AM, robert engels [EMAIL PROTECTED]wrote:

 Why not just use reopen() and be done with it???

 On Sep 10, 2008, at 12:48 PM, Chris Lu wrote:

 Yeah, the timing is different. But it's an unknown, undetermined, and
 uncontrollable time...
 We can not ask the user,

 while(memory is low){
   sleep(1000);
 }
 do_the_real_thing_an_hour_later


 --
 Chris Lu
 -
 Instant Scalable Full-Text Search On Any Database/Application
 site: http://www.dbsight.net
 demo: http://search.dbsight.com
 Lucene Database Search in 3 minutes:
 http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
 DBSight customer, a shopping comparison site, (anonymous per request) got
 2.6 Million Euro funding!

 On Wed, Sep 10, 2008 at 10:39 AM, robert engels [EMAIL PROTECTED]wrote:

 Close() does work - it is just that the memory may not be freed until much
 later...
 When working with VERY LARGE objects, this can be a problem.

 On Sep 10, 2008, at 12:36 PM, Chris Lu wrote:

 Thanks for the analysis, really appreciate it, and I agree with it. But...
 This is really a normal J2EE use case. The threads seldom die.
 Doesn't that mean closing the RAMDirectory doesn't work for J2EE
 applications?
 And only reopen() works?
 And close() doesn't release the resources? duh...

 I can only say this is a problem to be cleaned up.

 --
 Chris Lu
 -
 Instant Scalable Full-Text Search On Any Database/Application
 site: http://www.dbsight.net
 demo: http://search.dbsight.com
 Lucene Database Search in 3 minutes:
 http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
 DBSight customer, a shopping comparison site, (anonymous per request) got
 2.6 Million Euro funding!


 On Wed, Sep 10, 2008 at 9:10 AM, robert engels [EMAIL PROTECTED]wrote:

 You do not need to create a new RAMDirectory - just write to the existing
 one, and then reopen() the IndexReader using it.
 This will prevent lots of big objects being created. This may be the
 source of your problem.

 Even if the Segment is closed, the ThreadLocal will no longer be
 referenced, but there will still be a reference to the SegmentTermEnum
 (which will be cleared when the thread dies, or most likely when new
 thread locals on that thread a created, so here is a potential problem.

 Thread 1 does a search, creates a thread local that references the RAMDir
 (A).
 Thread 2 does a search, creates a thread local that references the RAMDir
 (A).

 All readers, are closed on RAMDir (A).

 A new RAMDir (B) is opened.

 There may still be references in the thread local maps to RAMDir A (since
 no new thread local have been created yet).

 So you may get OOM depending on the size of the RAMDir (since you would
 need room for more than 1).  If you extend this out with lots of threads
 that don't run very often, you can see how you could easily run out of
 memory.  I think that ThreadLocal should use a ReferenceQueue so stale
 object slots can be reclaimed as soon as the key is dereferenced - but that
 is an issue for SUN.

 This is why you don't want to create new RAMDirs.

 A good rule of thumb - don't keep references to large objects in
 ThreadLocal (especially indirectly).  If needed, use a key, and then read
 the cache using a the key.
 This would be something for the Lucene folks to change.

 On Sep 10, 2008, at 10:44 AM, Chris Lu wrote:

  I am really want to find out where I am doing wrong, if that's the case.

 Yes. I have made certain that I closed all Readers/Searchers, and
 verified that through memory profiler.
  Yes. I am creating new RAMDirectory. But that's the problem. I need to
 update the content. Sure, if no content update and everything the same, of
 course no OOM.

 Yes. No guarantee of the thread schedule. But that's the problem. If
 Lucene is using ThreadLocal to cache lots of things by the Thread as the
 key, and no idea when it'll be released. Of course ThreadLocal is not
 Lucene's problem...

 Chris

 On Wed, Sep 10, 2008 at 8:34 AM, robert engels [EMAIL PROTECTED]wrote:

  It is basic Java. Threads are not guaranteed to run on any sort of
 schedule. If you create lots of large objects in one thread, releasing 

Re: ThreadLocal causing memory leak with J2EE applications

2008-09-10 Thread robert engels

Always your prerogative.

On Sep 10, 2008, at 1:15 PM, Chris Lu wrote:

Actually I am done with it by simply downgrading and not to use  
r659602 and later.
The old version is more clean and consistent with the API and close 
() does mean close, not something complicated and unknown to most  
users, which almost feels like a trap. And later on, if no changes  
happened for this file, I will have to upgrade Lucene and manually  
remove the patch Lucene-1195.


--
Chris Lu
-
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes: http://wiki.dbsight.com/ 
index.php?title=Create_Lucene_Database_Search_in_3_minutes
DBSight customer, a shopping comparison site, (anonymous per  
request) got 2.6 Million Euro funding!


On Wed, Sep 10, 2008 at 10:56 AM, robert engels  
[EMAIL PROTECTED] wrote:

Why not just use reopen() and be done with it???

On Sep 10, 2008, at 12:48 PM, Chris Lu wrote:

Yeah, the timing is different. But it's an unknown, undetermined,  
and uncontrollable time...


We can not ask the user,

while(memory is low){
  sleep(1000);
}
do_the_real_thing_an_hour_later


--
Chris Lu
-
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes: http://wiki.dbsight.com/ 
index.php?title=Create_Lucene_Database_Search_in_3_minutes
DBSight customer, a shopping comparison site, (anonymous per  
request) got 2.6 Million Euro funding!


On Wed, Sep 10, 2008 at 10:39 AM, robert engels  
[EMAIL PROTECTED] wrote:
Close() does work - it is just that the memory may not be freed  
until much later...


When working with VERY LARGE objects, this can be a problem.

On Sep 10, 2008, at 12:36 PM, Chris Lu wrote:

Thanks for the analysis, really appreciate it, and I agree with  
it. But...


This is really a normal J2EE use case. The threads seldom die.
Doesn't that mean closing the RAMDirectory doesn't work for J2EE  
applications?

And only reopen() works?
And close() doesn't release the resources? duh...

I can only say this is a problem to be cleaned up.

--
Chris Lu
-
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes: http://wiki.dbsight.com/ 
index.php?title=Create_Lucene_Database_Search_in_3_minutes
DBSight customer, a shopping comparison site, (anonymous per  
request) got 2.6 Million Euro funding!



On Wed, Sep 10, 2008 at 9:10 AM, robert engels  
[EMAIL PROTECTED] wrote:
You do not need to create a new RAMDirectory - just write to the  
existing one, and then reopen() the IndexReader using it.


This will prevent lots of big objects being created. This may be  
the source of your problem.


Even if the Segment is closed, the ThreadLocal will no longer be  
referenced, but there will still be a reference to the  
SegmentTermEnum (which will be cleared when the thread dies, or  
most likely when new thread locals on that thread a created, so  
here is a potential problem.


Thread 1 does a search, creates a thread local that references  
the RAMDir (A).
Thread 2 does a search, creates a thread local that references  
the RAMDir (A).


All readers, are closed on RAMDir (A).

A new RAMDir (B) is opened.

There may still be references in the thread local maps to RAMDir  
A (since no new thread local have been created yet).


So you may get OOM depending on the size of the RAMDir (since you  
would need room for more than 1).  If you extend this out with  
lots of threads that don't run very often, you can see how you  
could easily run out of memory.  I think that ThreadLocal  
should use a ReferenceQueue so stale object slots can be  
reclaimed as soon as the key is dereferenced - but that is an  
issue for SUN.


This is why you don't want to create new RAMDirs.

A good rule of thumb - don't keep references to large objects in  
ThreadLocal (especially indirectly).  If needed, use a key, and  
then read the cache using a the key.

This would be something for the Lucene folks to change.

On Sep 10, 2008, at 10:44 AM, Chris Lu wrote:

I am really want to find out where I am doing wrong, if that's  
the case.


Yes. I have made certain that I closed all Readers/Searchers,  
and verified that through memory profiler.


Yes. I am creating new RAMDirectory. But that's the problem. I  
need to update the content. Sure, if no content update and  
everything the same, of course no OOM.


Yes. No guarantee of the thread schedule. But that's the  
problem. If Lucene is using ThreadLocal to cache lots of things  
by the Thread as the key, and no idea when it'll be released. Of  
course ThreadLocal is not Lucene's problem...


Chris

On Wed, Sep 10, 2008 at 8:34 AM, robert engels  
[EMAIL PROTECTED] wrote:
It is basic Java. Threads are not 

Re: ThreadLocal causing memory leak with J2EE applications

2008-09-10 Thread Chris Lu
Well, the code is correct, because it can work by avoiding this trap. But it
failed to act as a good API.

I learned the inside details from you. I am not the only one that's trapped.
And more users will likely be trapped again, unless javadoc to describe the
close() function is changed. Actually, I didn't look at the javadoc of
close(), because, shouldn't close() means close(), not uncontrollably
delayed resource releasing? So I fear just changing the javadoc is not
enough.
-- 
Chris Lu
-
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes:
http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
DBSight customer, a shopping comparison site, (anonymous per request) got
2.6 Million Euro funding!


On Wed, Sep 10, 2008 at 1:03 PM, robert engels [EMAIL PROTECTED]wrote:

 Always your prerogative.

 On Sep 10, 2008, at 1:15 PM, Chris Lu wrote:

 Actually I am done with it by simply downgrading and not to use r659602
 and later.The old version is more clean and consistent with the API and
 close() does mean close, not something complicated and unknown to most
 users, which almost feels like a trap. And later on, if no changes happened
 for this file, I will have to upgrade Lucene and manually remove the patch
 Lucene-1195.

 --
 Chris Lu
 -
 Instant Scalable Full-Text Search On Any Database/Application
 site: http://www.dbsight.net
 demo: http://search.dbsight.com
 Lucene Database Search in 3 minutes:
 http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
 DBSight customer, a shopping comparison site, (anonymous per request) got
 2.6 Million Euro funding!

 On Wed, Sep 10, 2008 at 10:56 AM, robert engels [EMAIL PROTECTED]wrote:

 Why not just use reopen() and be done with it???

 On Sep 10, 2008, at 12:48 PM, Chris Lu wrote:

 Yeah, the timing is different. But it's an unknown, undetermined, and
 uncontrollable time...
 We can not ask the user,

 while(memory is low){
   sleep(1000);
 }
 do_the_real_thing_an_hour_later


 --
 Chris Lu
 -
 Instant Scalable Full-Text Search On Any Database/Application
 site: http://www.dbsight.net
 demo: http://search.dbsight.com
 Lucene Database Search in 3 minutes:
 http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
 DBSight customer, a shopping comparison site, (anonymous per request) got
 2.6 Million Euro funding!

 On Wed, Sep 10, 2008 at 10:39 AM, robert engels [EMAIL PROTECTED]wrote:

 Close() does work - it is just that the memory may not be freed until
 much later...
 When working with VERY LARGE objects, this can be a problem.

 On Sep 10, 2008, at 12:36 PM, Chris Lu wrote:

 Thanks for the analysis, really appreciate it, and I agree with it.
 But...
 This is really a normal J2EE use case. The threads seldom die.
 Doesn't that mean closing the RAMDirectory doesn't work for J2EE
 applications?
 And only reopen() works?
 And close() doesn't release the resources? duh...

 I can only say this is a problem to be cleaned up.

 --
 Chris Lu
 -
 Instant Scalable Full-Text Search On Any Database/Application
 site: http://www.dbsight.net
 demo: http://search.dbsight.com
 Lucene Database Search in 3 minutes:
 http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
 DBSight customer, a shopping comparison site, (anonymous per request) got
 2.6 Million Euro funding!


 On Wed, Sep 10, 2008 at 9:10 AM, robert engels [EMAIL PROTECTED]wrote:

 You do not need to create a new RAMDirectory - just write to the
 existing one, and then reopen() the IndexReader using it.
 This will prevent lots of big objects being created. This may be the
 source of your problem.

 Even if the Segment is closed, the ThreadLocal will no longer be
 referenced, but there will still be a reference to the SegmentTermEnum
 (which will be cleared when the thread dies, or most likely when new
 thread locals on that thread a created, so here is a potential problem.

 Thread 1 does a search, creates a thread local that references the
 RAMDir (A).
 Thread 2 does a search, creates a thread local that references the
 RAMDir (A).

 All readers, are closed on RAMDir (A).

 A new RAMDir (B) is opened.

 There may still be references in the thread local maps to RAMDir A
 (since no new thread local have been created yet).

 So you may get OOM depending on the size of the RAMDir (since you would
 need room for more than 1).  If you extend this out with lots of threads
 that don't run very often, you can see how you could easily run out of
 memory.  I think that ThreadLocal should use a ReferenceQueue so stale
 object slots can be reclaimed as soon as the key is dereferenced - but that
 is an issue for SUN.

 This is why you don't want to create new RAMDirs.

 A good rule of thumb - don't keep references to large 

Re: ThreadLocal causing memory leak with J2EE applications

2008-09-10 Thread Noble Paul നോബിള്‍ नोब्ळ्
When I look at the reference tree That is the feeling I get. if you
held a WeakReference it would get released .
 |- base of org.apache.lucene.index.CompoundFileReader$CSIndexInput
  |- input of org.apache.lucene.index.SegmentTermEnum
  |- value of java.lang.ThreadLocal$ThreadLocalMap$Entry

On Wed, Sep 10, 2008 at 8:39 PM, Chris Lu [EMAIL PROTECTED] wrote:
 Does this make any difference?
 If I intentionally close the searcher and reader failed to release the
 memory, I can not rely on some magic of JVM to release it.
 --
 Chris Lu
 -
 Instant Scalable Full-Text Search On Any Database/Application
 site: http://www.dbsight.net
 demo: http://search.dbsight.com
 Lucene Database Search in 3 minutes:
 http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
 DBSight customer, a shopping comparison site, (anonymous per request) got
 2.6 Million Euro funding!

 On Wed, Sep 10, 2008 at 4:03 AM, Noble Paul നോബിള്‍ नोब्ळ्
 [EMAIL PROTECTED] wrote:

 Why do you need to keep a strong reference?
 Why not a WeakReference ?

 --Noble

 On Wed, Sep 10, 2008 at 12:27 AM, Chris Lu [EMAIL PROTECTED] wrote:
  The problem should be similar to what's talked about on this discussion.
  http://lucene.markmail.org/message/keosgz2c2yjc7qre?q=ThreadLocal
 
  There is a memory leak for Lucene search from Lucene-1195.(svn r659602,
  May23,2008)
 
  This patch brings in a ThreadLocal cache to TermInfosReader.
 
  It's usually recommended to keep the reader open, and reuse it when
  possible. In a common J2EE application, the http requests are usually
  handled by different threads. But since the cache is ThreadLocal, the
  cache
  are not really usable by other threads. What's worse, the cache can not
  be
  cleared by another thread!
 
  This leak is not so obvious usually. But my case is using RAMDirectory,
  having several hundred megabytes. So one un-released resource is obvious
  to
  me.
 
  Here is the reference tree:
  org.apache.lucene.store.RAMDirectory
   |- directory of org.apache.lucene.store.RAMFile
   |- file of org.apache.lucene.store.RAMInputStream
   |- base of
  org.apache.lucene.index.CompoundFileReader$CSIndexInput
   |- input of org.apache.lucene.index.SegmentTermEnum
   |- value of java.lang.ThreadLocal$ThreadLocalMap$Entry
 
 
  After I switched back to svn revision 659601, right before this patch is
  checked in, the memory leak is gone.
  Although my case is RAMDirectory, I believe this will affect disk based
  index also.
 
  --
  Chris Lu
  -
  Instant Scalable Full-Text Search On Any Database/Application
  site: http://www.dbsight.net
  demo: http://search.dbsight.com
  Lucene Database Search in 3 minutes:
 
  http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
  DBSight customer, a shopping comparison site, (anonymous per request)
  got
  2.6 Million Euro funding!
 



 --
 --Noble Paul

 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]







-- 
--Noble Paul


Re: ThreadLocal causing memory leak with J2EE applications

2008-09-10 Thread robert engels
You can't hold the ThreadLocal value in a WeakReference, because  
there is no hard reference between enumeration calls (so it would be  
cleared out from under you while enumerating).


All of this occurs because you have some objects (readers/segments  
etc.) that are shared across all threads, but these contain objects  
that are 'thread/search state' specific. These latter objects are  
essentially cached for performance (so you don't need to seek and  
read, sequential buffer access, etc.)


A sometimes better solution is to have the state returned to the  
caller, and require the caller to pass/use the state later - then you  
don't need thread locals.


You can accomplish a similar solution by returning a SessionKey  
object, and have the caller pass this later.  You can then have a  
WeakHashMap of SessionKey,SearchState that the code can use.  When  
the SessionKey is destroyed (no longer referenced), the state map can  
be cleaned up automatically.




On Sep 10, 2008, at 11:30 PM, Noble Paul നോബിള്‍  
नोब्ळ् wrote:



When I look at the reference tree That is the feeling I get. if you
held a WeakReference it would get released .
 |- base of org.apache.lucene.index.CompoundFileReader$CSIndexInput
  |- input of org.apache.lucene.index.SegmentTermEnum
  |- value of java.lang.ThreadLocal$ThreadLocalMap 
$Entry


On Wed, Sep 10, 2008 at 8:39 PM, Chris Lu [EMAIL PROTECTED] wrote:

Does this make any difference?
If I intentionally close the searcher and reader failed to release  
the

memory, I can not rely on some magic of JVM to release it.
--
Chris Lu
-
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes:
http://wiki.dbsight.com/index.php? 
title=Create_Lucene_Database_Search_in_3_minutes
DBSight customer, a shopping comparison site, (anonymous per  
request) got

2.6 Million Euro funding!

On Wed, Sep 10, 2008 at 4:03 AM, Noble Paul  
നോബിള്‍ नोब्ळ्

[EMAIL PROTECTED] wrote:


Why do you need to keep a strong reference?
Why not a WeakReference ?

--Noble

On Wed, Sep 10, 2008 at 12:27 AM, Chris Lu [EMAIL PROTECTED]  
wrote:
The problem should be similar to what's talked about on this  
discussion.

http://lucene.markmail.org/message/keosgz2c2yjc7qre?q=ThreadLocal

There is a memory leak for Lucene search from Lucene-1195.(svn  
r659602,

May23,2008)

This patch brings in a ThreadLocal cache to TermInfosReader.

It's usually recommended to keep the reader open, and reuse it when
possible. In a common J2EE application, the http requests are  
usually
handled by different threads. But since the cache is  
ThreadLocal, the

cache
are not really usable by other threads. What's worse, the cache  
can not

be
cleared by another thread!

This leak is not so obvious usually. But my case is using  
RAMDirectory,
having several hundred megabytes. So one un-released resource is  
obvious

to
me.

Here is the reference tree:
org.apache.lucene.store.RAMDirectory
 |- directory of org.apache.lucene.store.RAMFile
 |- file of org.apache.lucene.store.RAMInputStream
 |- base of
org.apache.lucene.index.CompoundFileReader$CSIndexInput
 |- input of org.apache.lucene.index.SegmentTermEnum
 |- value of java.lang.ThreadLocal$ThreadLocalMap 
$Entry



After I switched back to svn revision 659601, right before this  
patch is

checked in, the memory leak is gone.
Although my case is RAMDirectory, I believe this will affect  
disk based

index also.

--
Chris Lu
-
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes:

http://wiki.dbsight.com/index.php? 
title=Create_Lucene_Database_Search_in_3_minutes
DBSight customer, a shopping comparison site, (anonymous per  
request)

got
2.6 Million Euro funding!





--
--Noble Paul

 
-

To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]









--
--Noble Paul



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



ThreadLocal causing memory leak with J2EE applications

2008-09-09 Thread Chris Lu
The problem should be similar to what's talked about on this discussion.
http://lucene.markmail.org/message/keosgz2c2yjc7qre?q=ThreadLocal

There is a memory leak for Lucene search from Lucene-1195.(svn r659602,
May23,2008)

This patch brings in a ThreadLocal cache to TermInfosReader.

It's usually recommended to keep the reader open, and reuse it when
possible. In a common J2EE application, the http requests are usually
handled by different threads. But since the cache is ThreadLocal, the cache
are not really usable by other threads. What's worse, the cache can not be
cleared by another thread!

This leak is not so obvious usually. But my case is using RAMDirectory,
having several hundred megabytes. So one un-released resource is obvious to
me.

Here is the reference tree:
org.apache.lucene.store.RAMDirectory
 |- directory of org.apache.lucene.store.RAMFile
 |- file of org.apache.lucene.store.RAMInputStream
 |- base of org.apache.lucene.index.CompoundFileReader$CSIndexInput
 |- input of org.apache.lucene.index.SegmentTermEnum
 |- value of java.lang.ThreadLocal$ThreadLocalMap$Entry


After I switched back to svn revision 659601, right before this patch is
checked in, the memory leak is gone.
Although my case is RAMDirectory, I believe this will affect disk based
index also.

-- 
Chris Lu
-
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes:
http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
DBSight customer, a shopping comparison site, (anonymous per request) got
2.6 Million Euro funding!


Re: ThreadLocal causing memory leak with J2EE applications

2008-09-09 Thread Michael McCandless


Chris Lu wrote:

The problem should be similar to what's talked about on this  
discussion.

http://lucene.markmail.org/message/keosgz2c2yjc7qre?q=ThreadLocal


The rough conclusion of that thread is that, technically, this isn't  
a memory leak but rather a delayed freeing problem.  Ie, it may take  
longer, possibly much longer, than you want for the memory to be freed.


There is a memory leak for Lucene search from Lucene-1195.(svn  
r659602, May23,2008)


This patch brings in a ThreadLocal cache to TermInfosReader.


One thing that confuses me: TermInfosReader was already using a  
ThreadLocal to cache the SegmentTermEnum instance.  What was added in  
this commit (for LUCENE-1195) was an LRU cache storing Term -  
TermInfo instances.  But it seems like it's the SegmentTermEnum  
instance that you're tracing below.



It's usually recommended to keep the reader open, and reuse it when
possible. In a common J2EE application, the http requests are usually
handled by different threads. But since the cache is ThreadLocal,  
the cache
are not really usable by other threads. What's worse, the cache can  
not be

cleared by another thread!

This leak is not so obvious usually. But my case is using  
RAMDirectory,
having several hundred megabytes. So one un-released resource is  
obvious to

me.

Here is the reference tree:
org.apache.lucene.store.RAMDirectory
 |- directory of org.apache.lucene.store.RAMFile
 |- file of org.apache.lucene.store.RAMInputStream
 |- base of org.apache.lucene.index.CompoundFileReader 
$CSIndexInput

 |- input of org.apache.lucene.index.SegmentTermEnum
 |- value of java.lang.ThreadLocal$ThreadLocalMap 
$Entry


So you have a RAMDir that has several hundred MB stored in it, that  
you're done with yet through this path Lucene is keeping it alive?


Did you close the RAMDir?  (which will null its fileMap and should  
also free your memory).


Also, that reference tree doesn't show the ThreadResources class that  
was added in that commit -- are you sure this reference tree wasn't  
before the commit?


Mike

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: ThreadLocal causing memory leak with J2EE applications

2008-09-09 Thread Chris Lu
If I release it on the thread that's creating the searcher, by setting
searcher=null, everything is fine, the memory is released very cleanly.
My load test was to repeatedly create a searcher on a RAMDirectory and
release it on another thread. The test will quickly go to OOM after several
runs. I set the heap size to be 1024M, and the RAMDirectory is of size 250M.
Using some profiling tool, the used size simply stepped up pretty obviously
by 250M.

I think we should not rely on something that's a maybe behavior,
especially for a general purpose library.

Since it's a multi-threaded env, the thread that's creating the entries in
the LRU cache may not go away quickly(actually most, if not all, application
servers will try to reuse threads), so the LRU cache, which uses thread as
the key, can not be released, so the SegmentTermEnum which is in the same
class can not be released.

And yes, I close the RAMDirectory, and the fileMap is released. I verified
that through the profiler by directly checking the values in the snapshot.

Pretty sure the reference tree wasn't like this using code before this
commit, because after close the searcher in another thread, the RAMDirectory
totally disappeared from the memory snapshot.

-- 
Chris Lu
-
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes:
http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
DBSight customer, a shopping comparison site, (anonymous per request) got
2.6 Million Euro funding!

On Tue, Sep 9, 2008 at 5:03 PM, Michael McCandless 
[EMAIL PROTECTED] wrote:


 Chris Lu wrote:

  The problem should be similar to what's talked about on this discussion.
 http://lucene.markmail.org/message/keosgz2c2yjc7qre?q=ThreadLocal


 The rough conclusion of that thread is that, technically, this isn't a
 memory leak but rather a delayed freeing problem.  Ie, it may take longer,
 possibly much longer, than you want for the memory to be freed.

  There is a memory leak for Lucene search from Lucene-1195.(svn r659602,
 May23,2008)

 This patch brings in a ThreadLocal cache to TermInfosReader.


 One thing that confuses me: TermInfosReader was already using a ThreadLocal
 to cache the SegmentTermEnum instance.  What was added in this commit (for
 LUCENE-1195) was an LRU cache storing Term - TermInfo instances.  But it
 seems like it's the SegmentTermEnum instance that you're tracing below.

  It's usually recommended to keep the reader open, and reuse it when
 possible. In a common J2EE application, the http requests are usually
 handled by different threads. But since the cache is ThreadLocal, the
 cache
 are not really usable by other threads. What's worse, the cache can not be
 cleared by another thread!

 This leak is not so obvious usually. But my case is using RAMDirectory,
 having several hundred megabytes. So one un-released resource is obvious
 to
 me.

 Here is the reference tree:
 org.apache.lucene.store.RAMDirectory
  |- directory of org.apache.lucene.store.RAMFile
 |- file of org.apache.lucene.store.RAMInputStream
 |- base of org.apache.lucene.index.CompoundFileReader$CSIndexInput
 |- input of org.apache.lucene.index.SegmentTermEnum
 |- value of java.lang.ThreadLocal$ThreadLocalMap$Entry


 So you have a RAMDir that has several hundred MB stored in it, that you're
 done with yet through this path Lucene is keeping it alive?

 Did you close the RAMDir?  (which will null its fileMap and should also
 free your memory).

 Also, that reference tree doesn't show the ThreadResources class that was
 added in that commit -- are you sure this reference tree wasn't before the
 commit?

 Mike

 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]




-- 
Chris Lu
-
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes:
http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
DBSight customer, a shopping comparison site, (anonymous per request) got
2.6 Million Euro funding!


Re: ThreadLocal causing memory leak with J2EE applications

2008-09-09 Thread robert engels
Your code is not correct. You cannot release it on another thread -  
the first thread may creating hundreds/thousands of instances before  
the other thread ever runs...


On Sep 9, 2008, at 10:10 PM, Chris Lu wrote:

If I release it on the thread that's creating the searcher, by  
setting searcher=null, everything is fine, the memory is released  
very cleanly.
My load test was to repeatedly create a searcher on a RAMDirectory  
and release it on another thread. The test will quickly go to OOM  
after several runs. I set the heap size to be 1024M, and the  
RAMDirectory is of size 250M. Using some profiling tool, the used  
size simply stepped up pretty obviously by 250M.


I think we should not rely on something that's a maybe behavior,  
especially for a general purpose library.


Since it's a multi-threaded env, the thread that's creating the  
entries in the LRU cache may not go away quickly(actually most, if  
not all, application servers will try to reuse threads), so the LRU  
cache, which uses thread as the key, can not be released, so the  
SegmentTermEnum which is in the same class can not be released.


And yes, I close the RAMDirectory, and the fileMap is released. I  
verified that through the profiler by directly checking the values  
in the snapshot.


Pretty sure the reference tree wasn't like this using code before  
this commit, because after close the searcher in another thread,  
the RAMDirectory totally disappeared from the memory snapshot.


--
Chris Lu
-
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes: http://wiki.dbsight.com/ 
index.php?title=Create_Lucene_Database_Search_in_3_minutes
DBSight customer, a shopping comparison site, (anonymous per  
request) got 2.6 Million Euro funding!


On Tue, Sep 9, 2008 at 5:03 PM, Michael McCandless  
[EMAIL PROTECTED] wrote:


Chris Lu wrote:

The problem should be similar to what's talked about on this  
discussion.

http://lucene.markmail.org/message/keosgz2c2yjc7qre?q=ThreadLocal

The rough conclusion of that thread is that, technically, this  
isn't a memory leak but rather a delayed freeing problem.  Ie, it  
may take longer, possibly much longer, than you want for the memory  
to be freed.



There is a memory leak for Lucene search from Lucene-1195.(svn  
r659602, May23,2008)


This patch brings in a ThreadLocal cache to TermInfosReader.

One thing that confuses me: TermInfosReader was already using a  
ThreadLocal to cache the SegmentTermEnum instance.  What was added  
in this commit (for LUCENE-1195) was an LRU cache storing Term -  
TermInfo instances.  But it seems like it's the SegmentTermEnum  
instance that you're tracing below.



It's usually recommended to keep the reader open, and reuse it when
possible. In a common J2EE application, the http requests are usually
handled by different threads. But since the cache is ThreadLocal,  
the cache
are not really usable by other threads. What's worse, the cache can  
not be

cleared by another thread!

This leak is not so obvious usually. But my case is using  
RAMDirectory,
having several hundred megabytes. So one un-released resource is  
obvious to

me.

Here is the reference tree:
org.apache.lucene.store.RAMDirectory
 |- directory of org.apache.lucene.store.RAMFile
|- file of org.apache.lucene.store.RAMInputStream
|- base of org.apache.lucene.index.CompoundFileReader 
$CSIndexInput

|- input of org.apache.lucene.index.SegmentTermEnum
|- value of java.lang.ThreadLocal$ThreadLocalMap$Entry

So you have a RAMDir that has several hundred MB stored in it, that  
you're done with yet through this path Lucene is keeping it alive?


Did you close the RAMDir?  (which will null its fileMap and should  
also free your memory).


Also, that reference tree doesn't show the ThreadResources class  
that was added in that commit -- are you sure this reference tree  
wasn't before the commit?


Mike

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




--
Chris Lu
-
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes: http://wiki.dbsight.com/ 
index.php?title=Create_Lucene_Database_Search_in_3_minutes
DBSight customer, a shopping comparison site, (anonymous per  
request) got 2.6 Million Euro funding!




Re: ThreadLocal causing memory leak with J2EE applications

2008-09-09 Thread Chris Lu
Right, in a sense I can not release it from another thread. But that's the
problem.

It's a J2EE environment, all threads are kind of equal. It's simply not
possible to iterate through all threads to close the searcher, thus
releasing the ThreadLocal cache.
Unless Lucene is not recommended for J2EE environment, this has to be fixed.

-- 
Chris Lu
-
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes:
http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
DBSight customer, a shopping comparison site, (anonymous per request) got
2.6 Million Euro funding!

On Tue, Sep 9, 2008 at 8:14 PM, robert engels [EMAIL PROTECTED] wrote:

 Your code is not correct. You cannot release it on another thread - the
 first thread may creating hundreds/thousands of instances before the other
 thread ever runs...

 On Sep 9, 2008, at 10:10 PM, Chris Lu wrote:

 If I release it on the thread that's creating the searcher, by setting
 searcher=null, everything is fine, the memory is released very cleanly.
 My load test was to repeatedly create a searcher on a RAMDirectory and
 release it on another thread. The test will quickly go to OOM after several
 runs. I set the heap size to be 1024M, and the RAMDirectory is of size 250M.
 Using some profiling tool, the used size simply stepped up pretty obviously
 by 250M.

 I think we should not rely on something that's a maybe behavior,
 especially for a general purpose library.

 Since it's a multi-threaded env, the thread that's creating the entries in
 the LRU cache may not go away quickly(actually most, if not all, application
 servers will try to reuse threads), so the LRU cache, which uses thread as
 the key, can not be released, so the SegmentTermEnum which is in the same
 class can not be released.

 And yes, I close the RAMDirectory, and the fileMap is released. I verified
 that through the profiler by directly checking the values in the snapshot.

 Pretty sure the reference tree wasn't like this using code before this
 commit, because after close the searcher in another thread, the RAMDirectory
 totally disappeared from the memory snapshot.

 --
 Chris Lu
 -
 Instant Scalable Full-Text Search On Any Database/Application
 site: http://www.dbsight.net
 demo: http://search.dbsight.com
 Lucene Database Search in 3 minutes:
 http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
 DBSight customer, a shopping comparison site, (anonymous per request) got
 2.6 Million Euro funding!

 On Tue, Sep 9, 2008 at 5:03 PM, Michael McCandless 
 [EMAIL PROTECTED] wrote:


 Chris Lu wrote:

  The problem should be similar to what's talked about on this discussion.
 http://lucene.markmail.org/message/keosgz2c2yjc7qre?q=ThreadLocal


 The rough conclusion of that thread is that, technically, this isn't a
 memory leak but rather a delayed freeing problem.  Ie, it may take longer,
 possibly much longer, than you want for the memory to be freed.

  There is a memory leak for Lucene search from Lucene-1195.(svn r659602,
 May23,2008)

 This patch brings in a ThreadLocal cache to TermInfosReader.


 One thing that confuses me: TermInfosReader was already using a
 ThreadLocal to cache the SegmentTermEnum instance.  What was added in this
 commit (for LUCENE-1195) was an LRU cache storing Term - TermInfo
 instances.  But it seems like it's the SegmentTermEnum instance that you're
 tracing below.

  It's usually recommended to keep the reader open, and reuse it when
 possible. In a common J2EE application, the http requests are usually
 handled by different threads. But since the cache is ThreadLocal, the
 cache
 are not really usable by other threads. What's worse, the cache can not
 be
 cleared by another thread!

 This leak is not so obvious usually. But my case is using RAMDirectory,
 having several hundred megabytes. So one un-released resource is obvious
 to
 me.

 Here is the reference tree:
 org.apache.lucene.store.RAMDirectory
  |- directory of org.apache.lucene.store.RAMFile
 |- file of org.apache.lucene.store.RAMInputStream
 |- base of
 org.apache.lucene.index.CompoundFileReader$CSIndexInput
 |- input of org.apache.lucene.index.SegmentTermEnum
 |- value of java.lang.ThreadLocal$ThreadLocalMap$Entry


 So you have a RAMDir that has several hundred MB stored in it, that you're
 done with yet through this path Lucene is keeping it alive?

 Did you close the RAMDir?  (which will null its fileMap and should also
 free your memory).

 Also, that reference tree doesn't show the ThreadResources class that was
 added in that commit -- are you sure this reference tree wasn't before the
 commit?

 Mike

 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL 

Re: ThreadLocal causing memory leak with J2EE applications

2008-09-09 Thread robert engels
You need to close the searcher within the thread that is using it, in  
order to have it cleaned up quickly... usually right after you  
display the page of results.


If you are keeping multiple searcher refs across multiple threads for  
paging/whatever, you have not coded it correctly.


Imagine 10,000 users - storing a searcher for each one is not going  
to work...


On Sep 9, 2008, at 10:21 PM, Chris Lu wrote:

Right, in a sense I can not release it from another thread. But  
that's the problem.


It's a J2EE environment, all threads are kind of equal. It's simply  
not possible to iterate through all threads to close the searcher,  
thus releasing the ThreadLocal cache.
Unless Lucene is not recommended for J2EE environment, this has to  
be fixed.


--
Chris Lu
-
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes: http://wiki.dbsight.com/ 
index.php?title=Create_Lucene_Database_Search_in_3_minutes
DBSight customer, a shopping comparison site, (anonymous per  
request) got 2.6 Million Euro funding!



On Tue, Sep 9, 2008 at 8:14 PM, robert engels  
[EMAIL PROTECTED] wrote:
Your code is not correct. You cannot release it on another thread -  
the first thread may creating hundreds/thousands of instances  
before the other thread ever runs...


On Sep 9, 2008, at 10:10 PM, Chris Lu wrote:

If I release it on the thread that's creating the searcher, by  
setting searcher=null, everything is fine, the memory is released  
very cleanly.
My load test was to repeatedly create a searcher on a RAMDirectory  
and release it on another thread. The test will quickly go to OOM  
after several runs. I set the heap size to be 1024M, and the  
RAMDirectory is of size 250M. Using some profiling tool, the used  
size simply stepped up pretty obviously by 250M.


I think we should not rely on something that's a maybe behavior,  
especially for a general purpose library.


Since it's a multi-threaded env, the thread that's creating the  
entries in the LRU cache may not go away quickly(actually most, if  
not all, application servers will try to reuse threads), so the  
LRU cache, which uses thread as the key, can not be released, so  
the SegmentTermEnum which is in the same class can not be released.


And yes, I close the RAMDirectory, and the fileMap is released. I  
verified that through the profiler by directly checking the values  
in the snapshot.


Pretty sure the reference tree wasn't like this using code before  
this commit, because after close the searcher in another thread,  
the RAMDirectory totally disappeared from the memory snapshot.


--
Chris Lu
-
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes: http://wiki.dbsight.com/ 
index.php?title=Create_Lucene_Database_Search_in_3_minutes
DBSight customer, a shopping comparison site, (anonymous per  
request) got 2.6 Million Euro funding!


On Tue, Sep 9, 2008 at 5:03 PM, Michael McCandless  
[EMAIL PROTECTED] wrote:


Chris Lu wrote:

The problem should be similar to what's talked about on this  
discussion.

http://lucene.markmail.org/message/keosgz2c2yjc7qre?q=ThreadLocal

The rough conclusion of that thread is that, technically, this  
isn't a memory leak but rather a delayed freeing problem.  Ie,  
it may take longer, possibly much longer, than you want for the  
memory to be freed.



There is a memory leak for Lucene search from Lucene-1195.(svn  
r659602, May23,2008)


This patch brings in a ThreadLocal cache to TermInfosReader.

One thing that confuses me: TermInfosReader was already using a  
ThreadLocal to cache the SegmentTermEnum instance.  What was added  
in this commit (for LUCENE-1195) was an LRU cache storing Term -  
TermInfo instances.  But it seems like it's the SegmentTermEnum  
instance that you're tracing below.



It's usually recommended to keep the reader open, and reuse it when
possible. In a common J2EE application, the http requests are usually
handled by different threads. But since the cache is ThreadLocal,  
the cache
are not really usable by other threads. What's worse, the cache  
can not be

cleared by another thread!

This leak is not so obvious usually. But my case is using  
RAMDirectory,
having several hundred megabytes. So one un-released resource is  
obvious to

me.

Here is the reference tree:
org.apache.lucene.store.RAMDirectory
 |- directory of org.apache.lucene.store.RAMFile
|- file of org.apache.lucene.store.RAMInputStream
|- base of org.apache.lucene.index.CompoundFileReader 
$CSIndexInput

|- input of org.apache.lucene.index.SegmentTermEnum
|- value of java.lang.ThreadLocal$ThreadLocalMap 
$Entry


So you have a RAMDir that has several hundred MB stored in it,  
that you're done with yet through this path 

Re: ThreadLocal causing memory leak with J2EE applications

2008-09-09 Thread Chris Lu
On J2EE environment, usually there is a searcher pool with several searchers
open.The speed to opening a large index for every user is not acceptable.

-- 
Chris Lu
-
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes:
http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
DBSight customer, a shopping comparison site, (anonymous per request) got
2.6 Million Euro funding!

On Tue, Sep 9, 2008 at 9:03 PM, robert engels [EMAIL PROTECTED] wrote:

 You need to close the searcher within the thread that is using it, in order
 to have it cleaned up quickly... usually right after you display the page of
 results.
 If you are keeping multiple searcher refs across multiple threads for
 paging/whatever, you have not coded it correctly.

 Imagine 10,000 users - storing a searcher for each one is not going to
 work...

 On Sep 9, 2008, at 10:21 PM, Chris Lu wrote:

 Right, in a sense I can not release it from another thread. But that's the
 problem.

 It's a J2EE environment, all threads are kind of equal. It's simply not
 possible to iterate through all threads to close the searcher, thus
 releasing the ThreadLocal cache.
 Unless Lucene is not recommended for J2EE environment, this has to be
 fixed.

 --
 Chris Lu
 -
 Instant Scalable Full-Text Search On Any Database/Application
 site: http://www.dbsight.net
 demo: http://search.dbsight.com
 Lucene Database Search in 3 minutes:
 http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
 DBSight customer, a shopping comparison site, (anonymous per request) got
 2.6 Million Euro funding!

 On Tue, Sep 9, 2008 at 8:14 PM, robert engels [EMAIL PROTECTED]wrote:

 Your code is not correct. You cannot release it on another thread - the
 first thread may creating hundreds/thousands of instances before the other
 thread ever runs...

 On Sep 9, 2008, at 10:10 PM, Chris Lu wrote:

 If I release it on the thread that's creating the searcher, by setting
 searcher=null, everything is fine, the memory is released very cleanly.
 My load test was to repeatedly create a searcher on a RAMDirectory and
 release it on another thread. The test will quickly go to OOM after several
 runs. I set the heap size to be 1024M, and the RAMDirectory is of size 250M.
 Using some profiling tool, the used size simply stepped up pretty obviously
 by 250M.

 I think we should not rely on something that's a maybe behavior,
 especially for a general purpose library.

 Since it's a multi-threaded env, the thread that's creating the entries in
 the LRU cache may not go away quickly(actually most, if not all, application
 servers will try to reuse threads), so the LRU cache, which uses thread as
 the key, can not be released, so the SegmentTermEnum which is in the same
 class can not be released.

 And yes, I close the RAMDirectory, and the fileMap is released. I verified
 that through the profiler by directly checking the values in the snapshot.

 Pretty sure the reference tree wasn't like this using code before this
 commit, because after close the searcher in another thread, the RAMDirectory
 totally disappeared from the memory snapshot.

 --
 Chris Lu
 -
 Instant Scalable Full-Text Search On Any Database/Application
 site: http://www.dbsight.net
 demo: http://search.dbsight.com
 Lucene Database Search in 3 minutes:
 http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
 DBSight customer, a shopping comparison site, (anonymous per request) got
 2.6 Million Euro funding!

 On Tue, Sep 9, 2008 at 5:03 PM, Michael McCandless 
 [EMAIL PROTECTED] wrote:


 Chris Lu wrote:

  The problem should be similar to what's talked about on this discussion.
 http://lucene.markmail.org/message/keosgz2c2yjc7qre?q=ThreadLocal


 The rough conclusion of that thread is that, technically, this isn't a
 memory leak but rather a delayed freeing problem.  Ie, it may take longer,
 possibly much longer, than you want for the memory to be freed.

  There is a memory leak for Lucene search from Lucene-1195.(svn r659602,
 May23,2008)

 This patch brings in a ThreadLocal cache to TermInfosReader.


 One thing that confuses me: TermInfosReader was already using a
 ThreadLocal to cache the SegmentTermEnum instance.  What was added in this
 commit (for LUCENE-1195) was an LRU cache storing Term - TermInfo
 instances.  But it seems like it's the SegmentTermEnum instance that you're
 tracing below.

  It's usually recommended to keep the reader open, and reuse it when
 possible. In a common J2EE application, the http requests are usually
 handled by different threads. But since the cache is ThreadLocal, the
 cache
 are not really usable by other threads. What's worse, the cache can not
 be
 cleared by another thread!

 This leak is not so obvious usually. But my case is using 

Re: ThreadLocal causing memory leak with J2EE applications

2008-09-09 Thread robert engels
A searcher uses an IndexReader - the IndexReader is slow to open, not  
a Searcher. And searchers can share an IndexReader.


You want to create a single shared (across all threads/users)  
IndexReader (usually), and create an Searcher as needed and dispose.   
It is VERY CHEAP to create the Searcher.


I am fairly certain the javadoc on Searcher is incorrect.  The  
warning For performance reasons it is recommended to open only one  
IndexSearcher and use it for all of your searches is not true in the  
case where an IndexReader is passed to the ctor.


Any caching should USUALLY be performed at the IndexReader level.

You are most likely using the path ctor, and that is the source of  
your problems, as multiple IndexReader instances are being created,  
and thus the memory use.



On Sep 9, 2008, at 11:44 PM, Chris Lu wrote:

On J2EE environment, usually there is a searcher pool with several  
searchers open.

The speed to opening a large index for every user is not acceptable.

--
Chris Lu
-
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes: http://wiki.dbsight.com/ 
index.php?title=Create_Lucene_Database_Search_in_3_minutes
DBSight customer, a shopping comparison site, (anonymous per  
request) got 2.6 Million Euro funding!


On Tue, Sep 9, 2008 at 9:03 PM, robert engels  
[EMAIL PROTECTED] wrote:
You need to close the searcher within the thread that is using it,  
in order to have it cleaned up quickly... usually right after you  
display the page of results.


If you are keeping multiple searcher refs across multiple threads  
for paging/whatever, you have not coded it correctly.


Imagine 10,000 users - storing a searcher for each one is not going  
to work...


On Sep 9, 2008, at 10:21 PM, Chris Lu wrote:

Right, in a sense I can not release it from another thread. But  
that's the problem.


It's a J2EE environment, all threads are kind of equal. It's  
simply not possible to iterate through all threads to close the  
searcher, thus releasing the ThreadLocal cache.
Unless Lucene is not recommended for J2EE environment, this has to  
be fixed.


--
Chris Lu
-
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes: http://wiki.dbsight.com/ 
index.php?title=Create_Lucene_Database_Search_in_3_minutes
DBSight customer, a shopping comparison site, (anonymous per  
request) got 2.6 Million Euro funding!



On Tue, Sep 9, 2008 at 8:14 PM, robert engels  
[EMAIL PROTECTED] wrote:
Your code is not correct. You cannot release it on another thread  
- the first thread may creating hundreds/thousands of instances  
before the other thread ever runs...


On Sep 9, 2008, at 10:10 PM, Chris Lu wrote:

If I release it on the thread that's creating the searcher, by  
setting searcher=null, everything is fine, the memory is released  
very cleanly.
My load test was to repeatedly create a searcher on a  
RAMDirectory and release it on another thread. The test will  
quickly go to OOM after several runs. I set the heap size to be  
1024M, and the RAMDirectory is of size 250M. Using some profiling  
tool, the used size simply stepped up pretty obviously by 250M.


I think we should not rely on something that's a maybe  
behavior, especially for a general purpose library.


Since it's a multi-threaded env, the thread that's creating the  
entries in the LRU cache may not go away quickly(actually most,  
if not all, application servers will try to reuse threads), so  
the LRU cache, which uses thread as the key, can not be released,  
so the SegmentTermEnum which is in the same class can not be  
released.


And yes, I close the RAMDirectory, and the fileMap is released. I  
verified that through the profiler by directly checking the  
values in the snapshot.


Pretty sure the reference tree wasn't like this using code before  
this commit, because after close the searcher in another thread,  
the RAMDirectory totally disappeared from the memory snapshot.


--
Chris Lu
-
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes: http://wiki.dbsight.com/ 
index.php?title=Create_Lucene_Database_Search_in_3_minutes
DBSight customer, a shopping comparison site, (anonymous per  
request) got 2.6 Million Euro funding!


On Tue, Sep 9, 2008 at 5:03 PM, Michael McCandless  
[EMAIL PROTECTED] wrote:


Chris Lu wrote:

The problem should be similar to what's talked about on this  
discussion.

http://lucene.markmail.org/message/keosgz2c2yjc7qre?q=ThreadLocal

The rough conclusion of that thread is that, technically, this  
isn't a memory leak but rather a delayed freeing problem.  Ie,  
it may take longer, possibly much longer, than you want for the  

Re: ThreadLocal causing memory leak with J2EE applications

2008-09-09 Thread Chris Lu
I have tried to create an IndexReader pool and dynamically create searcher.
But the memory leak is the same. It's not related to the Searcher class
specifically, but the SegmentTermEnum in TermInfosReader.

-- 
Chris Lu
-
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes:
http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
DBSight customer, a shopping comparison site, (anonymous per request) got
2.6 Million Euro funding!

On Tue, Sep 9, 2008 at 10:14 PM, robert engels [EMAIL PROTECTED]wrote:

 A searcher uses an IndexReader - the IndexReader is slow to open, not a
 Searcher. And searchers can share an IndexReader.
 You want to create a single shared (across all threads/users) IndexReader
 (usually), and create an Searcher as needed and dispose.  It is VERY CHEAP
 to create the Searcher.

 I am fairly certain the javadoc on Searcher is incorrect.  The warning 
 For performance reasons it is recommended to open only one IndexSearcher
 and use it for all of your searches is not true in the case where an
 IndexReader is passed to the ctor.

 Any caching should USUALLY be performed at the IndexReader level.

 You are most likely using the path ctor, and that is the source of your
 problems, as multiple IndexReader instances are being created, and thus the
 memory use.


 On Sep 9, 2008, at 11:44 PM, Chris Lu wrote:

 On J2EE environment, usually there is a searcher pool with several
 searchers open.The speed to opening a large index for every user is not
 acceptable.

 --
 Chris Lu
 -
 Instant Scalable Full-Text Search On Any Database/Application
 site: http://www.dbsight.net
 demo: http://search.dbsight.com
 Lucene Database Search in 3 minutes:
 http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
 DBSight customer, a shopping comparison site, (anonymous per request) got
 2.6 Million Euro funding!

 On Tue, Sep 9, 2008 at 9:03 PM, robert engels [EMAIL PROTECTED]wrote:

 You need to close the searcher within the thread that is using it, in
 order to have it cleaned up quickly... usually right after you display the
 page of results.
 If you are keeping multiple searcher refs across multiple threads for
 paging/whatever, you have not coded it correctly.

 Imagine 10,000 users - storing a searcher for each one is not going to
 work...

 On Sep 9, 2008, at 10:21 PM, Chris Lu wrote:

 Right, in a sense I can not release it from another thread. But that's the
 problem.

 It's a J2EE environment, all threads are kind of equal. It's simply not
 possible to iterate through all threads to close the searcher, thus
 releasing the ThreadLocal cache.
 Unless Lucene is not recommended for J2EE environment, this has to be
 fixed.

 --
 Chris Lu
 -
 Instant Scalable Full-Text Search On Any Database/Application
 site: http://www.dbsight.net
 demo: http://search.dbsight.com
 Lucene Database Search in 3 minutes:
 http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
 DBSight customer, a shopping comparison site, (anonymous per request) got
 2.6 Million Euro funding!

 On Tue, Sep 9, 2008 at 8:14 PM, robert engels [EMAIL PROTECTED]wrote:

 Your code is not correct. You cannot release it on another thread - the
 first thread may creating hundreds/thousands of instances before the other
 thread ever runs...

 On Sep 9, 2008, at 10:10 PM, Chris Lu wrote:

 If I release it on the thread that's creating the searcher, by setting
 searcher=null, everything is fine, the memory is released very cleanly.
 My load test was to repeatedly create a searcher on a RAMDirectory and
 release it on another thread. The test will quickly go to OOM after several
 runs. I set the heap size to be 1024M, and the RAMDirectory is of size 250M.
 Using some profiling tool, the used size simply stepped up pretty obviously
 by 250M.

 I think we should not rely on something that's a maybe behavior,
 especially for a general purpose library.

 Since it's a multi-threaded env, the thread that's creating the entries
 in the LRU cache may not go away quickly(actually most, if not all,
 application servers will try to reuse threads), so the LRU cache, which uses
 thread as the key, can not be released, so the SegmentTermEnum which is in
 the same class can not be released.

 And yes, I close the RAMDirectory, and the fileMap is released. I
 verified that through the profiler by directly checking the values in the
 snapshot.

 Pretty sure the reference tree wasn't like this using code before this
 commit, because after close the searcher in another thread, the RAMDirectory
 totally disappeared from the memory snapshot.

 --
 Chris Lu
 -
 Instant Scalable Full-Text Search On Any Database/Application
 site: http://www.dbsight.net
 demo: http://search.dbsight.com
 Lucene Database 

Re: ThreadLocal causing memory leak with J2EE applications

2008-09-09 Thread robert engels

You do not need a pool of IndexReaders...

It does not matter what class it is, what matters is the class that  
ultimately holds the reference.


If the IndexReader is never closed, the SegmentReader(s) is never  
closed, so the thread local in TermInfosReader is not cleared  
(because the thread never dies). So you will get one SegmentTermEnum,  
per thread * per segment.


The SegmentTermEnum is not a large object, so even if you had 100  
threads, and 100 segments, for 10k instances, seems hard to believe  
that is the source of your memory issue.


The SegmentTermEnum is cached by thread since it needs to enumerate  
the terms, not having a per thread cache, would lead to lots of  
random access when multiple threads read the index - very slow.


You need to keep in mind, what if every thread was executing a search  
simultaneously - you would still have 100x100 SegmentTermEnum  
instances anyway !  The only way to prevent that would be to create  
and destroy the SegmentTermEnum on each call (opening and seeking to  
the proper spot) - which would be SLOW SLOW SLOW.


On Sep 10, 2008, at 12:19 AM, Chris Lu wrote:

I have tried to create an IndexReader pool and dynamically create  
searcher. But the memory leak is the same. It's not related to the  
Searcher class specifically, but the SegmentTermEnum in  
TermInfosReader.


--
Chris Lu
-
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes: http://wiki.dbsight.com/ 
index.php?title=Create_Lucene_Database_Search_in_3_minutes
DBSight customer, a shopping comparison site, (anonymous per  
request) got 2.6 Million Euro funding!


On Tue, Sep 9, 2008 at 10:14 PM, robert engels  
[EMAIL PROTECTED] wrote:
A searcher uses an IndexReader - the IndexReader is slow to open,  
not a Searcher. And searchers can share an IndexReader.


You want to create a single shared (across all threads/users)  
IndexReader (usually), and create an Searcher as needed and  
dispose.  It is VERY CHEAP to create the Searcher.


I am fairly certain the javadoc on Searcher is incorrect.  The  
warning For performance reasons it is recommended to open only one  
IndexSearcher and use it for all of your searches is not true in  
the case where an IndexReader is passed to the ctor.


Any caching should USUALLY be performed at the IndexReader level.

You are most likely using the path ctor, and that is the source  
of your problems, as multiple IndexReader instances are being  
created, and thus the memory use.



On Sep 9, 2008, at 11:44 PM, Chris Lu wrote:

On J2EE environment, usually there is a searcher pool with several  
searchers open.

The speed to opening a large index for every user is not acceptable.

--
Chris Lu
-
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes: http://wiki.dbsight.com/ 
index.php?title=Create_Lucene_Database_Search_in_3_minutes
DBSight customer, a shopping comparison site, (anonymous per  
request) got 2.6 Million Euro funding!


On Tue, Sep 9, 2008 at 9:03 PM, robert engels  
[EMAIL PROTECTED] wrote:
You need to close the searcher within the thread that is using it,  
in order to have it cleaned up quickly... usually right after you  
display the page of results.


If you are keeping multiple searcher refs across multiple threads  
for paging/whatever, you have not coded it correctly.


Imagine 10,000 users - storing a searcher for each one is not  
going to work...


On Sep 9, 2008, at 10:21 PM, Chris Lu wrote:

Right, in a sense I can not release it from another thread. But  
that's the problem.


It's a J2EE environment, all threads are kind of equal. It's  
simply not possible to iterate through all threads to close the  
searcher, thus releasing the ThreadLocal cache.
Unless Lucene is not recommended for J2EE environment, this has  
to be fixed.


--
Chris Lu
-
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes: http://wiki.dbsight.com/ 
index.php?title=Create_Lucene_Database_Search_in_3_minutes
DBSight customer, a shopping comparison site, (anonymous per  
request) got 2.6 Million Euro funding!



On Tue, Sep 9, 2008 at 8:14 PM, robert engels  
[EMAIL PROTECTED] wrote:
Your code is not correct. You cannot release it on another thread  
- the first thread may creating hundreds/thousands of instances  
before the other thread ever runs...


On Sep 9, 2008, at 10:10 PM, Chris Lu wrote:

If I release it on the thread that's creating the searcher, by  
setting searcher=null, everything is fine, the memory is  
released very cleanly.
My load test was to repeatedly create a searcher on a  
RAMDirectory and release it on another thread. The test will  
quickly go to 

Re: ThreadLocal causing memory leak with J2EE applications

2008-09-09 Thread robert engels
As a follow-up, the SegmentTermEnum does contain an IndexInput and  
based on your configuration (buffer sizes, eg) this could be a large  
object, so you do need to be careful !


On Sep 10, 2008, at 12:14 AM, robert engels wrote:

A searcher uses an IndexReader - the IndexReader is slow to open,  
not a Searcher. And searchers can share an IndexReader.


You want to create a single shared (across all threads/users)  
IndexReader (usually), and create an Searcher as needed and  
dispose.  It is VERY CHEAP to create the Searcher.


I am fairly certain the javadoc on Searcher is incorrect.  The  
warning For performance reasons it is recommended to open only one  
IndexSearcher and use it for all of your searches is not true in  
the case where an IndexReader is passed to the ctor.


Any caching should USUALLY be performed at the IndexReader level.

You are most likely using the path ctor, and that is the source  
of your problems, as multiple IndexReader instances are being  
created, and thus the memory use.



On Sep 9, 2008, at 11:44 PM, Chris Lu wrote:

On J2EE environment, usually there is a searcher pool with several  
searchers open.

The speed to opening a large index for every user is not acceptable.

--
Chris Lu
-
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes: http://wiki.dbsight.com/ 
index.php?title=Create_Lucene_Database_Search_in_3_minutes
DBSight customer, a shopping comparison site, (anonymous per  
request) got 2.6 Million Euro funding!


On Tue, Sep 9, 2008 at 9:03 PM, robert engels  
[EMAIL PROTECTED] wrote:
You need to close the searcher within the thread that is using it,  
in order to have it cleaned up quickly... usually right after you  
display the page of results.


If you are keeping multiple searcher refs across multiple threads  
for paging/whatever, you have not coded it correctly.


Imagine 10,000 users - storing a searcher for each one is not  
going to work...


On Sep 9, 2008, at 10:21 PM, Chris Lu wrote:

Right, in a sense I can not release it from another thread. But  
that's the problem.


It's a J2EE environment, all threads are kind of equal. It's  
simply not possible to iterate through all threads to close the  
searcher, thus releasing the ThreadLocal cache.
Unless Lucene is not recommended for J2EE environment, this has  
to be fixed.


--
Chris Lu
-
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes: http://wiki.dbsight.com/ 
index.php?title=Create_Lucene_Database_Search_in_3_minutes
DBSight customer, a shopping comparison site, (anonymous per  
request) got 2.6 Million Euro funding!



On Tue, Sep 9, 2008 at 8:14 PM, robert engels  
[EMAIL PROTECTED] wrote:
Your code is not correct. You cannot release it on another thread  
- the first thread may creating hundreds/thousands of instances  
before the other thread ever runs...


On Sep 9, 2008, at 10:10 PM, Chris Lu wrote:

If I release it on the thread that's creating the searcher, by  
setting searcher=null, everything is fine, the memory is  
released very cleanly.
My load test was to repeatedly create a searcher on a  
RAMDirectory and release it on another thread. The test will  
quickly go to OOM after several runs. I set the heap size to be  
1024M, and the RAMDirectory is of size 250M. Using some  
profiling tool, the used size simply stepped up pretty obviously  
by 250M.


I think we should not rely on something that's a maybe  
behavior, especially for a general purpose library.


Since it's a multi-threaded env, the thread that's creating the  
entries in the LRU cache may not go away quickly(actually most,  
if not all, application servers will try to reuse threads), so  
the LRU cache, which uses thread as the key, can not be  
released, so the SegmentTermEnum which is in the same class can  
not be released.


And yes, I close the RAMDirectory, and the fileMap is released.  
I verified that through the profiler by directly checking the  
values in the snapshot.


Pretty sure the reference tree wasn't like this using code  
before this commit, because after close the searcher in another  
thread, the RAMDirectory totally disappeared from the memory  
snapshot.


--
Chris Lu
-
Instant Scalable Full-Text Search On Any Database/Application
site: http://www.dbsight.net
demo: http://search.dbsight.com
Lucene Database Search in 3 minutes: http://wiki.dbsight.com/ 
index.php?title=Create_Lucene_Database_Search_in_3_minutes
DBSight customer, a shopping comparison site, (anonymous per  
request) got 2.6 Million Euro funding!


On Tue, Sep 9, 2008 at 5:03 PM, Michael McCandless  
[EMAIL PROTECTED] wrote:


Chris Lu wrote:

The problem should be similar to what's talked about on this  
discussion.