Re: ThreadLocal leak (was Re: Leaking org.apache.lucene.index.* objects)

2006-12-19 Thread Chris Hostetter

: > What do you think about that alternative approach I mentioned?  Instead of 
having FieldCacheImpl be aware of all IndexReaders, have FieldCache be an inst 
var in IndexReader?
:
: I wonder why it wasn't done that way to start with perhaps to
: completely separate sorting from index reading.  Anyway, it's not
: backward compatible, and it doesn't buy us much to change now does it?
:  We would get rid of a singe hash lookup on the reader, which is
: insignificant compared to anything the FieldCache is used for anyway.

the discussion on this i'm remembering was here...
http://www.nabble.com/Poor-performance-%22race-condition%22-in-FieldSortedHitQueue-tf2070773.html#a5767432

Doron speculated that the reason FieldCache was seperate from IndexReader
was to keep the "caching" at a higher level then the "reading", i
suggested that if we're going to change the API, we should move the
"refrence" to the cache up into the Searcher since that's the place
Sorting is acctually done, but that it should still be based on weak ref
to the reader since multiple searchers might share a reader.

thinking about it now however, i would think it does make sense for the
IndexReader to hang on to it's field cache ... it would not only simplify
things a lot, and allow an IndexReader to explicilty "clear" it's cache
when it's closed, but it would also allow us to intellegently populate the
cache based on the individual segments in a MultiReader (so if some of hte
segments haven't hcange,d we don't reread that part of hte cache)


-Hoss


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: ThreadLocal leak (was Re: Leaking org.apache.lucene.index.* objects)

2006-12-19 Thread Otis Gospodnetic
I think Robert might have still had LUCENE-436 in the back of his mind.
I think I'll take care of that one soon - I think those finalize() overrides 
are no longer missing, and that's the only "issue" in LUCENE-436.  I think 
finalize()s there are not hurting us, but since they contain hacks for 
pre-1.4.2 JVMs, we could clean that up now.  FixedThreadLocal.java that's 
attach there won't be needed.

Otis

- Original Message 
From: robert engels <[EMAIL PROTECTED]>
To: java-dev@lucene.apache.org
Sent: Tuesday, December 19, 2006 4:52:18 PM
Subject: Re: ThreadLocal leak (was Re: Leaking org.apache.lucene.index.* 
objects)

Sorry, it wasn't this particular thread. Sorry for the confusion.  
This thread is actually on the money I believe (The FieldCache issues).

I got my wires crossed during a vent. It is just difficult when you  
spend numerous hours to diagnose "the problem" and then have to  
revisit the same issue every couple of months.

On Dec 19, 2006, at 3:25 PM, Mike Klaas wrote:

> On 12/19/06, robert engels <[EMAIL PROTECTED]> wrote:
>> I would suggest that in order to even bring up "thread local issues"
>> in the future that the submitter supplies a pure Java NON-LUCENE test
>> case that demonstrates the problem (just as you would if reporting a
>> bug to Sun).
>>
>> All of the "guessing" about what is going on is counter productive.
>> You can review the JDK source code, and you can run test cases.
>>
>> There is nothing inherently broken in ThreadLocals, and people that
>> keep claiming that are only doing a disservice to the people that
>> make sure Java is a robust and reliable platform.
>
> I didn't observe any maligning of java ThreadLocals in this thread--I
> only noticed questions about lucene's use of them.  I can understand
> that you are worried about FUD, but I don't think that is happening
> here, nor are anyone's efforts being disprespected.
>
> -Mike
>
>> On Dec 19, 2006, at 1:41 PM, Yonik Seeley wrote:
>>
>> > On 12/19/06, Otis Gospodnetic <[EMAIL PROTECTED]> wrote:
>> >> I _think_ Robert is right and ThreadLocals are not the problem (I
>> >> tried getting rid of them, and replacing them with an instance var
>> >> last week, but run into problems with multi-threaded unit tests).
>> >
>> > If you want to try getting rid of the ThreadLocals, you can't  
>> replace
>> > with instance vars because the stuff put into the Threadlocals  
>> isn't
>> > thread safe.
>> > Replace the ThreadLocal stuff with the code that populated it in  
>> the
>> > first place.
>> >
>> > So for instance, in TermInfosReader, replace this
>> >  private SegmentTermEnum getEnum() {
>> >SegmentTermEnum termEnum = (SegmentTermEnum)enumerators.get();
>> >if (termEnum == null) {
>> >  termEnum = terms();
>> >  enumerators.set(termEnum);
>> >}
>> >return termEnum;
>> >  }
>> >
>> > With this:
>> >  private SegmentTermEnum getEnum() {
>> > return terms()
>> >  }
>> >
>> >
>> >> What I'm seeing while profiling (and in production) is the
>> >> accumulation of these:
>> >>
>> >> org.apache.lucene.search.FieldCacheImpl$Entry
>> >> org.apache.lucene.search.FieldCacheImpl$CreationPlaceholder
>> >>
>> >> This is related to http://issues.apache.org/jira/browse/LUCENE-651
>> >> (the commit for that patch also happens to coincide with when I
>> >> started seeing the leak).
>> >
>> > I'll take a look into that.  Solr hasn't sync'd to that version of
>> > Lucene yet, so we haven't seen this problem of course :-)
>> >
>> > -Yonik
>> >
>> >  
>> -
>> > To unsubscribe, e-mail: [EMAIL PROTECTED]
>> > For additional commands, e-mail: [EMAIL PROTECTED]
>> >
>>
>>
>> -
>> To unsubscribe, e-mail: [EMAIL PROTECTED]
>> For additional commands, e-mail: [EMAIL PROTECTED]
>>
>>
>
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: ThreadLocal leak (was Re: Leaking org.apache.lucene.index.* objects)

2006-12-19 Thread robert engels
Sorry, it wasn't this particular thread. Sorry for the confusion.  
This thread is actually on the money I believe (The FieldCache issues).


I got my wires crossed during a vent. It is just difficult when you  
spend numerous hours to diagnose "the problem" and then have to  
revisit the same issue every couple of months.


On Dec 19, 2006, at 3:25 PM, Mike Klaas wrote:


On 12/19/06, robert engels <[EMAIL PROTECTED]> wrote:

I would suggest that in order to even bring up "thread local issues"
in the future that the submitter supplies a pure Java NON-LUCENE test
case that demonstrates the problem (just as you would if reporting a
bug to Sun).

All of the "guessing" about what is going on is counter productive.
You can review the JDK source code, and you can run test cases.

There is nothing inherently broken in ThreadLocals, and people that
keep claiming that are only doing a disservice to the people that
make sure Java is a robust and reliable platform.


I didn't observe any maligning of java ThreadLocals in this thread--I
only noticed questions about lucene's use of them.  I can understand
that you are worried about FUD, but I don't think that is happening
here, nor are anyone's efforts being disprespected.

-Mike


On Dec 19, 2006, at 1:41 PM, Yonik Seeley wrote:

> On 12/19/06, Otis Gospodnetic <[EMAIL PROTECTED]> wrote:
>> I _think_ Robert is right and ThreadLocals are not the problem (I
>> tried getting rid of them, and replacing them with an instance var
>> last week, but run into problems with multi-threaded unit tests).
>
> If you want to try getting rid of the ThreadLocals, you can't  
replace
> with instance vars because the stuff put into the Threadlocals  
isn't

> thread safe.
> Replace the ThreadLocal stuff with the code that populated it in  
the

> first place.
>
> So for instance, in TermInfosReader, replace this
>  private SegmentTermEnum getEnum() {
>SegmentTermEnum termEnum = (SegmentTermEnum)enumerators.get();
>if (termEnum == null) {
>  termEnum = terms();
>  enumerators.set(termEnum);
>}
>return termEnum;
>  }
>
> With this:
>  private SegmentTermEnum getEnum() {
> return terms()
>  }
>
>
>> What I'm seeing while profiling (and in production) is the
>> accumulation of these:
>>
>> org.apache.lucene.search.FieldCacheImpl$Entry
>> org.apache.lucene.search.FieldCacheImpl$CreationPlaceholder
>>
>> This is related to http://issues.apache.org/jira/browse/LUCENE-651
>> (the commit for that patch also happens to coincide with when I
>> started seeing the leak).
>
> I'll take a look into that.  Solr hasn't sync'd to that version of
> Lucene yet, so we haven't seen this problem of course :-)
>
> -Yonik
>
>  
-

> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: ThreadLocal leak (was Re: Leaking org.apache.lucene.index.* objects)

2006-12-19 Thread Mike Klaas

On 12/19/06, robert engels <[EMAIL PROTECTED]> wrote:

I would suggest that in order to even bring up "thread local issues"
in the future that the submitter supplies a pure Java NON-LUCENE test
case that demonstrates the problem (just as you would if reporting a
bug to Sun).

All of the "guessing" about what is going on is counter productive.
You can review the JDK source code, and you can run test cases.

There is nothing inherently broken in ThreadLocals, and people that
keep claiming that are only doing a disservice to the people that
make sure Java is a robust and reliable platform.


I didn't observe any maligning of java ThreadLocals in this thread--I
only noticed questions about lucene's use of them.  I can understand
that you are worried about FUD, but I don't think that is happening
here, nor are anyone's efforts being disprespected.

-Mike


On Dec 19, 2006, at 1:41 PM, Yonik Seeley wrote:

> On 12/19/06, Otis Gospodnetic <[EMAIL PROTECTED]> wrote:
>> I _think_ Robert is right and ThreadLocals are not the problem (I
>> tried getting rid of them, and replacing them with an instance var
>> last week, but run into problems with multi-threaded unit tests).
>
> If you want to try getting rid of the ThreadLocals, you can't replace
> with instance vars because the stuff put into the Threadlocals isn't
> thread safe.
> Replace the ThreadLocal stuff with the code that populated it in the
> first place.
>
> So for instance, in TermInfosReader, replace this
>  private SegmentTermEnum getEnum() {
>SegmentTermEnum termEnum = (SegmentTermEnum)enumerators.get();
>if (termEnum == null) {
>  termEnum = terms();
>  enumerators.set(termEnum);
>}
>return termEnum;
>  }
>
> With this:
>  private SegmentTermEnum getEnum() {
> return terms()
>  }
>
>
>> What I'm seeing while profiling (and in production) is the
>> accumulation of these:
>>
>> org.apache.lucene.search.FieldCacheImpl$Entry
>> org.apache.lucene.search.FieldCacheImpl$CreationPlaceholder
>>
>> This is related to http://issues.apache.org/jira/browse/LUCENE-651
>> (the commit for that patch also happens to coincide with when I
>> started seeing the leak).
>
> I'll take a look into that.  Solr hasn't sync'd to that version of
> Lucene yet, so we haven't seen this problem of course :-)
>
> -Yonik
>
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: ThreadLocal leak (was Re: Leaking org.apache.lucene.index.* objects)

2006-12-19 Thread robert engels
I would suggest that in order to even bring up "thread local issues"  
in the future that the submitter supplies a pure Java NON-LUCENE test  
case that demonstrates the problem (just as you would if reporting a  
bug to Sun).


All of the "guessing" about what is going on is counter productive.  
You can review the JDK source code, and you can run test cases.


There is nothing inherently broken in ThreadLocals, and people that  
keep claiming that are only doing a disservice to the people that  
make sure Java is a robust and reliable platform.



On Dec 19, 2006, at 1:41 PM, Yonik Seeley wrote:


On 12/19/06, Otis Gospodnetic <[EMAIL PROTECTED]> wrote:
I _think_ Robert is right and ThreadLocals are not the problem (I  
tried getting rid of them, and replacing them with an instance var  
last week, but run into problems with multi-threaded unit tests).


If you want to try getting rid of the ThreadLocals, you can't replace
with instance vars because the stuff put into the Threadlocals isn't
thread safe.
Replace the ThreadLocal stuff with the code that populated it in the
first place.

So for instance, in TermInfosReader, replace this
 private SegmentTermEnum getEnum() {
   SegmentTermEnum termEnum = (SegmentTermEnum)enumerators.get();
   if (termEnum == null) {
 termEnum = terms();
 enumerators.set(termEnum);
   }
   return termEnum;
 }

With this:
 private SegmentTermEnum getEnum() {
return terms()
 }


What I'm seeing while profiling (and in production) is the  
accumulation of these:


org.apache.lucene.search.FieldCacheImpl$Entry
org.apache.lucene.search.FieldCacheImpl$CreationPlaceholder

This is related to http://issues.apache.org/jira/browse/LUCENE-651  
(the commit for that patch also happens to coincide with when I  
started seeing the leak).


I'll take a look into that.  Solr hasn't sync'd to that version of
Lucene yet, so we haven't seen this problem of course :-)

-Yonik

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: ThreadLocal leak (was Re: Leaking org.apache.lucene.index.* objects)

2006-12-19 Thread Yonik Seeley

On 12/19/06, Otis Gospodnetic <[EMAIL PROTECTED]> wrote:

Bingo! :)  I'll try the patch shortly and report back in a bit.


Cool, since you can serve as a test-case, I'll hold of on committing
until I hear how it works.


What do you think about that alternative approach I mentioned?  Instead of 
having FieldCacheImpl be aware of all IndexReaders, have FieldCache be an inst 
var in IndexReader?


I wonder why it wasn't done that way to start with perhaps to
completely separate sorting from index reading.  Anyway, it's not
backward compatible, and it doesn't buy us much to change now does it?
We would get rid of a singe hash lookup on the reader, which is
insignificant compared to anything the FieldCache is used for anyway.

-Yonik
http://incubator.apache.org/solr Solr, the open-source Lucene search server

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: ThreadLocal leak (was Re: Leaking org.apache.lucene.index.* objects)

2006-12-19 Thread Otis Gospodnetic
Bingo! :)  I'll try the patch shortly and report back in a bit.

What do you think about that alternative approach I mentioned?  Instead of 
having FieldCacheImpl be aware of all IndexReaders, have FieldCache be an inst 
var in IndexReader?  That way we can clean up/lose reference to FieldCache when 
we close IndexReader instead of relying on GC to clean up IndexReaders from 
FieldCacheImpl's WeakHashMap.  Wouldn't that simplify the code?  I think we 
wouldn't have to have some of those caches in FieldCacheImpl then.

Thanks,
Otis

- Original Message 
From: Yonik Seeley <[EMAIL PROTECTED]>
To: java-dev@lucene.apache.org
Sent: Tuesday, December 19, 2006 2:50:43 PM
Subject: Re: ThreadLocal leak (was Re: Leaking org.apache.lucene.index.* 
objects)

On 12/19/06, Yonik Seeley <[EMAIL PROTECTED]> wrote:
> On 12/19/06, Otis Gospodnetic <[EMAIL PROTECTED]> wrote:
> > What I'm seeing while profiling (and in production) is the accumulation of 
> > these:
> >
> > org.apache.lucene.search.FieldCacheImpl$Entry
> > org.apache.lucene.search.FieldCacheImpl$CreationPlaceholder
> >
> > This is related to http://issues.apache.org/jira/browse/LUCENE-651 (the 
> > commit for that patch also happens to coincide with when I started seeing 
> > the leak).
>
> I'll take a look into that.  Solr hasn't sync'd to that version of
> Lucene yet, so we haven't seen this problem of course :-)

OK, I see the problem... I opened
http://issues.apache.org/jira/browse/LUCENE-754
to track and it should hopefully be fixed soon


-- 
-Yonik
http://incubator.apache.org/solr Solr, the open-source Lucene search server

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: ThreadLocal leak (was Re: Leaking org.apache.lucene.index.* objects)

2006-12-19 Thread Yonik Seeley

LOL! of course I stopped to open the bug,
grab a sandwitch, etc ;-)

On 12/19/06, Chris Hostetter <[EMAIL PROTECTED]> wrote:

Ha! ... I beat you by 16 whole seconds!

You have to be faster then that if you want to roll with the Big Dogs!

: Date: Tue, 19 Dec 2006 11:50:07 -0800 (PST)
: From: Chris Hostetter
: Subject: Re: ThreadLocal leak (was Re: Leaking org.apache.lucene.index.*

: Date: Tue, 19 Dec 2006 11:50:23 -0800 (PST)
: From: "Yonik Seeley (JIRA)"
: Subject: [jira] Created: (LUCENE-754) FieldCache keeps hard references to

-Hoss


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: ThreadLocal leak (was Re: Leaking org.apache.lucene.index.* objects)

2006-12-19 Thread Chris Hostetter


Ha! ... I beat you by 16 whole seconds!

You have to be faster then that if you want to roll with the Big Dogs!

: Date: Tue, 19 Dec 2006 11:50:07 -0800 (PST)
: From: Chris Hostetter
: Subject: Re: ThreadLocal leak (was Re: Leaking org.apache.lucene.index.*

: Date: Tue, 19 Dec 2006 11:50:23 -0800 (PST)
: From: "Yonik Seeley (JIRA)"
: Subject: [jira] Created: (LUCENE-754) FieldCache keeps hard references to




-Hoss


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: ThreadLocal leak (was Re: Leaking org.apache.lucene.index.* objects)

2006-12-19 Thread Yonik Seeley

On 12/19/06, Yonik Seeley <[EMAIL PROTECTED]> wrote:

On 12/19/06, Otis Gospodnetic <[EMAIL PROTECTED]> wrote:
> What I'm seeing while profiling (and in production) is the accumulation of 
these:
>
> org.apache.lucene.search.FieldCacheImpl$Entry
> org.apache.lucene.search.FieldCacheImpl$CreationPlaceholder
>
> This is related to http://issues.apache.org/jira/browse/LUCENE-651 (the 
commit for that patch also happens to coincide with when I started seeing the 
leak).

I'll take a look into that.  Solr hasn't sync'd to that version of
Lucene yet, so we haven't seen this problem of course :-)


OK, I see the problem... I opened
http://issues.apache.org/jira/browse/LUCENE-754
to track and it should hopefully be fixed soon


--
-Yonik
http://incubator.apache.org/solr Solr, the open-source Lucene search server

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: ThreadLocal leak (was Re: Leaking org.apache.lucene.index.* objects)

2006-12-19 Thread Chris Hostetter

: Can anyone else have a look at the patch for LUCENE-651 (or just at
: FieldCacheImpl) and see if there is a quick fix?

If i'm not mistaken, isn't line #65 wrong? ...

  synchronized (readerCache) {
innerCache = (Map) readerCache.get(reader);
if (innerCache == null) {
  innerCache = new HashMap();
  readerCache.put(reader, innerCache);
  value = null;
} else {
  value = innerCache.get(key);
}
if (value == null) {
  value = new CreationPlaceholder();
  innerCache.put(reader, value);  <--- L65
}
  }


...why is the reader being used as the key of the innerCache? Shouldn't
the "key" (param) be used here? ... nothing seems to ever remove the
reader from the innerCache, so i don't see how the WeekHashMap readerCache
would ever let go of it either.


: Also, wouldn't it be better if IndexReaders had a reference to their
: copy FieldCache, which they can kill when they are closed?  What we have
: now is a chunky FieldCacheImpl that has references to IndexReaders, and
: doesn't know when they are closed, and thus it can't remove them and
: their data from its internal caches?

as i recall, we've discussed that on the mailing list as being a "better
API" for dealing with FieldCaches ... but it would be an API changes, and
no one has (as of yet) attempted it and submitted a patch.




-Hoss


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: ThreadLocal leak (was Re: Leaking org.apache.lucene.index.* objects)

2006-12-19 Thread Yonik Seeley

On 12/19/06, Otis Gospodnetic <[EMAIL PROTECTED]> wrote:

I _think_ Robert is right and ThreadLocals are not the problem (I tried getting 
rid of them, and replacing them with an instance var last week, but run into 
problems with multi-threaded unit tests).


If you want to try getting rid of the ThreadLocals, you can't replace
with instance vars because the stuff put into the Threadlocals isn't
thread safe.
Replace the ThreadLocal stuff with the code that populated it in the
first place.

So for instance, in TermInfosReader, replace this
 private SegmentTermEnum getEnum() {
   SegmentTermEnum termEnum = (SegmentTermEnum)enumerators.get();
   if (termEnum == null) {
 termEnum = terms();
 enumerators.set(termEnum);
   }
   return termEnum;
 }

With this:
 private SegmentTermEnum getEnum() {
return terms()
 }



What I'm seeing while profiling (and in production) is the accumulation of 
these:

org.apache.lucene.search.FieldCacheImpl$Entry
org.apache.lucene.search.FieldCacheImpl$CreationPlaceholder

This is related to http://issues.apache.org/jira/browse/LUCENE-651 (the commit 
for that patch also happens to coincide with when I started seeing the leak).


I'll take a look into that.  Solr hasn't sync'd to that version of
Lucene yet, so we haven't seen this problem of course :-)

-Yonik

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: ThreadLocal leak (was Re: Leaking org.apache.lucene.index.* objects)

2006-12-19 Thread Otis Gospodnetic
Hi,

I _think_ Robert is right and ThreadLocals are not the problem (I tried getting 
rid of them, and replacing them with an instance var last week, but run into 
problems with multi-threaded unit tests).
What I'm seeing while profiling (and in production) is the accumulation of 
these:

org.apache.lucene.search.FieldCacheImpl$Entry
org.apache.lucene.search.FieldCacheImpl$CreationPlaceholder

This is related to http://issues.apache.org/jira/browse/LUCENE-651 (the commit 
for that patch also happens to coincide with when I started seeing the leak).

The number of those 2 FieldCache* instances seems to be about 2 x # of unique 
IndexReaders (indices) that have been searched so far.

I reversed that LUCENE-651 patch locally, and now I am *not* seeing the number 
of FieldCacheImpl$Entry instances grow any more! Hm, hm, hm, hm. :)  
FieldCacheImpl$CreationPlaceholder came with LUCENE-651, so I'm not seeing any 
of them, of course.

I _think_ the reason I see this leak and most other people don't, is
because I'm running an application with LOTS of frequently-changing
indices (look at http://simpy.com ).  That FieldCacheImpl$CreationPlaceholder 
uses IndexReaders as
keys in a WeakHashMap *and* another HashMap.  Even though I close my 
IndexSearchers and they close their underlying IndexReaders, I think 
FieldCacheImpl is holding onto them, thus not allowing them to be GCed.  Here 
is what the dump of my production heap is showing right now, for instance:

$ jmap -histo:live `jps | grep Server | awk '{print $1}'` | grep SegmentReader
 46:  6630  583440  org.apache.lucene.index.SegmentReader

$ jmap -histo:live `jps | grep Server | awk '{print $1}'` | grep MultiReader
110:   735   41160  org.apache.lucene.index.MultiReader

$ jmap -histo:live `jps | grep Server | awk '{print $1}'` | grep FieldCache
 74:  7444  178656  org.apache.lucene.search.FieldCacheImpl$Entry
 85:  7434  118944  
org.apache.lucene.search.FieldCacheImpl$CreationPlaceholder

6630 + 735 = 7365 ~ 7434


But, like I said, I'm closing IndexSearchers (and they are closing their 
IndexReaders), so there are only 114 open of them at the moment:

$ jmap -histo:live `jps | grep Server | awk '{print $1}'` | grep IndexSearcher
337:   1142736  org.apache.lucene.search.IndexSearcher

So, I think that LUCENE-651 introduced a bug.
I think the reason why Robert and others are not seeing this is because 1) few 
people are using 2.1-dev AND 2) running apps with LOTs of indices.  If I had an 
app with a single or a handful of indices, I would probably never have noticed 
this leak.

Can anyone else have a look at the patch for LUCENE-651 (or just at 
FieldCacheImpl) and see if there is a quick fix?
Also, wouldn't it be better if IndexReaders had a reference to their copy 
FieldCache, which they can kill when they are closed?  What we have now is a 
chunky FieldCacheImpl that has references to IndexReaders, and doesn't know 
when they are closed, and thus it can't remove them and their data from its 
internal caches?

Thanks,
Otis


- Original Message 
From: Mark Miller <[EMAIL PROTECTED]>
To: java-dev@lucene.apache.org
Sent: Sunday, December 17, 2006 9:22:05 PM
Subject: Re: ThreadLocal leak (was Re: Leaking org.apache.lucene.index.* 
objects)

I think that Otis originally said that the problem came up when he 
started using the latest build off of the trunk. I don't believe he 
experienced the problem on 2.0. Anyone on the bleeding edge not noticing 
the leak? I am going to be deploying with 2.trunk soon and am very 
interested .

robert engels wrote:
> Our search server also used long-lived threads. No memory problems 
> whatsoever.
>
> On Dec 17, 2006, at 3:51 PM, Paul Smith wrote:
>
>>
>> On 16/12/2006, at 6:15 PM, Otis Gospodnetic wrote:
>>
>>> Moving to java-dev, I think this belongs here.
>>> I've been looking at this problem some more today and reading about 
>>> ThreadLocals.  It's easy to misuse them and end up with memory 
>>> leaks, apparently... and I think we may have this problem here.
>>>
>>> The problem here is that ThreadLocals are tied to Threads, and I 
>>> think the assumption in TermInfosReader and SegmentReader is that 
>>> (search) Threads are short-lived: they come in, scan the index, do 
>>> the search, return and die.  In this scenario, their ThreadLocals go 
>>> to heaven with them, too, and memory is freed up.
>>
>> Otis, we have an index server being served inside Tomcat, where an 
>> Application instance makes a search request via vanilla HTTP post, so 
>> our connector threads definitely do stay alive for quite a while.  
>> We're using Lucene 2.0, and our Index server is THE most stable of 
>> all our components, up

Re: ThreadLocal leak (was Re: Leaking org.apache.lucene.index.* objects)

2006-12-18 Thread robert engels
There is no inherent problem with ThreadLocal. It is a viable  
solution to synchronization issues in most cases.


On Dec 18, 2006, at 11:25 AM, Bernhard Messer wrote:


Otis,

i figured out a similar problem when running a very heavy loaded  
search application in a servlet container. The reasone using  
ThreadLocals was to get rid of synchronized method calls e.g in  
TermVectorsReader which would break down the overall search  
performance. Currently i do not see an easy solution to fix both,  
the synchronization and ThreadLocal problem.


Bernhard

Otis Gospodnetic wrote:

Moving to java-dev, I think this belongs here.
I've been looking at this problem some more today and reading  
about ThreadLocals.  It's easy to misuse them and end up with  
memory leaks, apparently... and I think we may have this problem  
here.


The problem here is that ThreadLocals are tied to Threads, and I  
think the assumption in TermInfosReader and SegmentReader is that  
(search) Threads are short-lived: they come in, scan the index, do  
the search, return and die.  In this scenario, their ThreadLocals  
go to heaven with them, too, and memory is freed up.


But when Threads are long-lived, as they are in thread pools (e.g.  
those in servlet containers), those ThreadLocals stay alive even  
after a single search request is done.  Moreover, the Thread is  
reused, and the new TermInfosReader and SegmentReader put some new  
values in that ThreadLocal on top of the old values (I think) from  
the previous search request.  Because the Thread still has  
references to ThreadLocals and the values in them, the values  
never get GCed.


I tried making ThreadLocals in TIR and SR static, I tried wrapping  
values saved in TLs in WeakReference, I've tried using WeakHashMap  
like in Robert Engel's FixedThreadLocal class from LUCENE-436, but  
nothing helped.  I thought about adding a public static method to  
TIR and SR, so one could call it at the end of a search request  
(think servlet filter) and clear the TL for the current thread,  
but that would require making TIR and SR public and I'm not 100%  
sure if it would work, plus that exposes the implementation  
details too much.

I don't have a solution yet.
But do we *really* need ThreadLocal in TIR and SR?  The only thing  
that TL is doing there is acting as a per-thread storage of some  
cloned value (in TIR we clone SegmentTermEnum and in SR we clone  
TermVectorsReader).  Why can't we just store those cloned values  
in instance variables?  Isn't whoever is calling TIR and SR going  
to be calling the same instance of TIR and SR anyway, and thus get  
access to those cloned values?


I'm really amazed that we haven't heard any reports about this  
before.  I am not sure why my application started showing this  
leak only about 3 weeks ago.  It is getting more pounded on than  
before, so maybe that made the leak more obvious.  My guess is  
that more common Lucene usage is with a single index or a small  
number of them, and with short-lived threads, where this problem  
isn't easily visible.  In my case I deal with a few tens of  
thousands of indices and several parallel search threads that live  
forever in the thread pool.


Any thoughts about this or possible suggestions for a fix?
Thanks,
Otis



- Original Message 
From: Otis Gospodnetic <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Friday, December 15, 2006 12:28:29 PM
Subject: Leaking org.apache.lucene.index.* objects

Hi,

About 2-3 weeks ago I emailed about a memory leak in my  
application.  I then found some problems in my code (I wasn't  
closing IndexSearchers explicitly) and took care of those.  Now I  
see my app is still leaking memory - jconsole clearly shows the  
"Tenured Gen" memory pool getting filled up until I hit the OOM,  
but I can't seem to pin-point the source.


I found that a bunch or o.a.l.index.* objects are not getting  
GCed, even though they should.  For example:


$ jmap -histo:live 7825 | grep apache.lucene.index | head -20 |  
sort -k2 -nr

num   #instances#bytes  class name
--
  4:   176484098831040   
org.apache.lucene.index.CompoundFileReader$CSIndexInput

  5:   211921567814880  org.apache.lucene.index.TermInfo
  7:   111245935598688  org.apache.lucene.index.SegmentReader 
$Norm

  9:   213231134116976  org.apache.lucene.index.Term
 12:   111789726829528  org.apache.lucene.index.FieldInfo
 13:22534018027200  org.apache.lucene.index.SegmentTermEnum
 15:58972714153448  org.apache.lucene.index.TermBuffer
 21: 86033 8718504  [Lorg.apache.lucene.index.TermInfo;
 20: 86033 8718504  [Lorg.apache.lucene.index.Term;
 23: 86120 7578560  org.apache.lucene.index.SegmentReader
 26: 90501 5068056  org.apache.lucene.store.FSIndexInput
 27: 86120 4822720  org.apache.lucene.index.TermInfosReader
 33: 86130 3445200  org.apache.lucene.index.SegmentInfo

Re: ThreadLocal leak (was Re: Leaking org.apache.lucene.index.* objects)

2006-12-18 Thread Bernhard Messer

Otis,

i figured out a similar problem when running a very heavy loaded search 
application in a servlet container. The reasone using ThreadLocals was 
to get rid of synchronized method calls e.g in TermVectorsReader which 
would break down the overall search performance. Currently i do not see 
an easy solution to fix both, the synchronization and ThreadLocal problem.


Bernhard

Otis Gospodnetic wrote:

Moving to java-dev, I think this belongs here.
I've been looking at this problem some more today and reading about 
ThreadLocals.  It's easy to misuse them and end up with memory leaks, 
apparently... and I think we may have this problem here.

The problem here is that ThreadLocals are tied to Threads, and I think the 
assumption in TermInfosReader and SegmentReader is that (search) Threads are 
short-lived: they come in, scan the index, do the search, return and die.  In 
this scenario, their ThreadLocals go to heaven with them, too, and memory is 
freed up.

But when Threads are long-lived, as they are in thread pools (e.g. those in 
servlet containers), those ThreadLocals stay alive even after a single search 
request is done.  Moreover, the Thread is reused, and the new TermInfosReader 
and SegmentReader put some new values in that ThreadLocal on top of the old 
values (I think) from the previous search request.  Because the Thread still 
has references to ThreadLocals and the values in them, the values never get 
GCed.

I tried making ThreadLocals in TIR and SR static, I tried wrapping values saved 
in TLs in WeakReference, I've tried using WeakHashMap like in Robert Engel's 
FixedThreadLocal class from LUCENE-436, but nothing helped.  I thought about 
adding a public static method to TIR and SR, so one could call it at the end of 
a search request (think servlet filter) and clear the TL for the current 
thread, but that would require making TIR and SR public and I'm not 100% sure 
if it would work, plus that exposes the implementation details too much.
I don't have a solution yet.
But do we *really* need ThreadLocal in TIR and SR?  The only thing that TL is 
doing there is acting as a per-thread storage of some cloned value (in TIR we 
clone SegmentTermEnum and in SR we clone TermVectorsReader).  Why can't we just 
store those cloned values in instance variables?  Isn't whoever is calling TIR 
and SR going to be calling the same instance of TIR and SR anyway, and thus get 
access to those cloned values?

I'm really amazed that we haven't heard any reports about this before.  I am 
not sure why my application started showing this leak only about 3 weeks ago.  
It is getting more pounded on than before, so maybe that made the leak more 
obvious.  My guess is that more common Lucene usage is with a single index or a 
small number of them, and with short-lived threads, where this problem isn't 
easily visible.  In my case I deal with a few tens of thousands of indices and 
several parallel search threads that live forever in the thread pool.

Any thoughts about this or possible suggestions for a fix?
Thanks,
Otis



- Original Message 
From: Otis Gospodnetic <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Friday, December 15, 2006 12:28:29 PM
Subject: Leaking org.apache.lucene.index.* objects

Hi,

About 2-3 weeks ago I emailed about a memory leak in my application.  I then found some 
problems in my code (I wasn't closing IndexSearchers explicitly) and took care of those.  
Now I see my app is still leaking memory - jconsole clearly shows the "Tenured 
Gen" memory pool getting filled up until I hit the OOM, but I can't seem to 
pin-point the source.

I found that a bunch or o.a.l.index.* objects are not getting GCed, even though 
they should.  For example:

$ jmap -histo:live 7825 | grep apache.lucene.index | head -20 | sort -k2 -nr
num   #instances#bytes  class name
--
  4:   176484098831040  
org.apache.lucene.index.CompoundFileReader$CSIndexInput
  5:   211921567814880  org.apache.lucene.index.TermInfo
  7:   111245935598688  org.apache.lucene.index.SegmentReader$Norm
  9:   213231134116976  org.apache.lucene.index.Term
 12:   111789726829528  org.apache.lucene.index.FieldInfo
 13:22534018027200  org.apache.lucene.index.SegmentTermEnum
 15:58972714153448  org.apache.lucene.index.TermBuffer
 21: 86033 8718504  [Lorg.apache.lucene.index.TermInfo;
 20: 86033 8718504  [Lorg.apache.lucene.index.Term;
 23: 86120 7578560  org.apache.lucene.index.SegmentReader
 26: 90501 5068056  org.apache.lucene.store.FSIndexInput
 27: 86120 4822720  org.apache.lucene.index.TermInfosReader
 33: 86130 3445200  org.apache.lucene.index.SegmentInfo
 36: 87355 2795360  org.apache.lucene.store.FSIndexInput$Descriptor
 38: 86120 2755840  org.apache.lucene.index.FieldsReader
 39: 86050 2753600  org.apache.lucene.index.CompoundFileReader
 42: 46903 2251344  o

Re: ThreadLocal leak (was Re: Leaking org.apache.lucene.index.* objects)

2006-12-17 Thread Mark Miller
I think that Otis originally said that the problem came up when he 
started using the latest build off of the trunk. I don't believe he 
experienced the problem on 2.0. Anyone on the bleeding edge not noticing 
the leak? I am going to be deploying with 2.trunk soon and am very 
interested .


robert engels wrote:
Our search server also used long-lived threads. No memory problems 
whatsoever.


On Dec 17, 2006, at 3:51 PM, Paul Smith wrote:



On 16/12/2006, at 6:15 PM, Otis Gospodnetic wrote:


Moving to java-dev, I think this belongs here.
I've been looking at this problem some more today and reading about 
ThreadLocals.  It's easy to misuse them and end up with memory 
leaks, apparently... and I think we may have this problem here.


The problem here is that ThreadLocals are tied to Threads, and I 
think the assumption in TermInfosReader and SegmentReader is that 
(search) Threads are short-lived: they come in, scan the index, do 
the search, return and die.  In this scenario, their ThreadLocals go 
to heaven with them, too, and memory is freed up.


Otis, we have an index server being served inside Tomcat, where an 
Application instance makes a search request via vanilla HTTP post, so 
our connector threads definitely do stay alive for quite a while.  
We're using Lucene 2.0, and our Index server is THE most stable of 
all our components, up for over month (before being taken down for 
updates) searching hundreds of various sized indexes sized up to 7Gb 
in size, serving 1-2 requests/second during peak usage.


No memory leak spotted at our end, but I'm watching this thread with 
interest! :)


cheers,

Paul Smith



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: ThreadLocal leak (was Re: Leaking org.apache.lucene.index.* objects)

2006-12-17 Thread robert engels
Our search server also used long-lived threads. No memory problems  
whatsoever.


On Dec 17, 2006, at 3:51 PM, Paul Smith wrote:



On 16/12/2006, at 6:15 PM, Otis Gospodnetic wrote:


Moving to java-dev, I think this belongs here.
I've been looking at this problem some more today and reading  
about ThreadLocals.  It's easy to misuse them and end up with  
memory leaks, apparently... and I think we may have this problem  
here.


The problem here is that ThreadLocals are tied to Threads, and I  
think the assumption in TermInfosReader and SegmentReader is that  
(search) Threads are short-lived: they come in, scan the index, do  
the search, return and die.  In this scenario, their ThreadLocals  
go to heaven with them, too, and memory is freed up.


Otis, we have an index server being served inside Tomcat, where an  
Application instance makes a search request via vanilla HTTP post,  
so our connector threads definitely do stay alive for quite a  
while.  We're using Lucene 2.0, and our Index server is THE most  
stable of all our components, up for over month (before being taken  
down for updates) searching hundreds of various sized indexes sized  
up to 7Gb in size, serving 1-2 requests/second during peak usage.


No memory leak spotted at our end, but I'm watching this thread  
with interest! :)


cheers,

Paul Smith



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: ThreadLocal leak (was Re: Leaking org.apache.lucene.index.* objects)

2006-12-17 Thread Paul Smith


On 16/12/2006, at 6:15 PM, Otis Gospodnetic wrote:


Moving to java-dev, I think this belongs here.
I've been looking at this problem some more today and reading about  
ThreadLocals.  It's easy to misuse them and end up with memory  
leaks, apparently... and I think we may have this problem here.


The problem here is that ThreadLocals are tied to Threads, and I  
think the assumption in TermInfosReader and SegmentReader is that  
(search) Threads are short-lived: they come in, scan the index, do  
the search, return and die.  In this scenario, their ThreadLocals  
go to heaven with them, too, and memory is freed up.


Otis, we have an index server being served inside Tomcat, where an  
Application instance makes a search request via vanilla HTTP post, so  
our connector threads definitely do stay alive for quite a while.   
We're using Lucene 2.0, and our Index server is THE most stable of  
all our components, up for over month (before being taken down for  
updates) searching hundreds of various sized indexes sized up to 7Gb  
in size, serving 1-2 requests/second during peak usage.


No memory leak spotted at our end, but I'm watching this thread with  
interest! :)


cheers,

Paul Smith

smime.p7s
Description: S/MIME cryptographic signature


Re: ThreadLocal leak (was Re: Leaking org.apache.lucene.index.* objects)

2006-12-16 Thread robert engels
I can basically GUARANTEE that unless you are using large indexes and  
loading into a RamDirectory (and use Java prior to 1.5) , that there  
is NO ISSUE in using a ThreadLocal,


There is something else wrong in your application.

PLEASE get a good profiler and perform the required testing. It is  
amazing that so many people have spent so much time on this issue  
already, when in the end it has ALWAYS been demonstrated to be a user  
code error.


The use of ThreadLocals in Lucene is correct.

If you honestly believe there is some issue, please provide a self  
contained test case that demonstrates the problem.


On Dec 16, 2006, at 1:15 AM, Otis Gospodnetic wrote:


Moving to java-dev, I think this belongs here.
I've been looking at this problem some more today and reading about  
ThreadLocals.  It's easy to misuse them and end up with memory  
leaks, apparently... and I think we may have this problem here.


The problem here is that ThreadLocals are tied to Threads, and I  
think the assumption in TermInfosReader and SegmentReader is that  
(search) Threads are short-lived: they come in, scan the index, do  
the search, return and die.  In this scenario, their ThreadLocals  
go to heaven with them, too, and memory is freed up.


But when Threads are long-lived, as they are in thread pools (e.g.  
those in servlet containers), those ThreadLocals stay alive even  
after a single search request is done.  Moreover, the Thread is  
reused, and the new TermInfosReader and SegmentReader put some new  
values in that ThreadLocal on top of the old values (I think) from  
the previous search request.  Because the Thread still has  
references to ThreadLocals and the values in them, the values never  
get GCed.


I tried making ThreadLocals in TIR and SR static, I tried wrapping  
values saved in TLs in WeakReference, I've tried using WeakHashMap  
like in Robert Engel's FixedThreadLocal class from LUCENE-436, but  
nothing helped.  I thought about adding a public static method to  
TIR and SR, so one could call it at the end of a search request  
(think servlet filter) and clear the TL for the current thread, but  
that would require making TIR and SR public and I'm not 100% sure  
if it would work, plus that exposes the implementation details too  
much.

I don't have a solution yet.
But do we *really* need ThreadLocal in TIR and SR?  The only thing  
that TL is doing there is acting as a per-thread storage of some  
cloned value (in TIR we clone SegmentTermEnum and in SR we clone  
TermVectorsReader).  Why can't we just store those cloned values in  
instance variables?  Isn't whoever is calling TIR and SR going to  
be calling the same instance of TIR and SR anyway, and thus get  
access to those cloned values?


I'm really amazed that we haven't heard any reports about this  
before.  I am not sure why my application started showing this leak  
only about 3 weeks ago.  It is getting more pounded on than before,  
so maybe that made the leak more obvious.  My guess is that more  
common Lucene usage is with a single index or a small number of  
them, and with short-lived threads, where this problem isn't easily  
visible.  In my case I deal with a few tens of thousands of indices  
and several parallel search threads that live forever in the thread  
pool.


Any thoughts about this or possible suggestions for a fix?
Thanks,
Otis



- Original Message 
From: Otis Gospodnetic <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Friday, December 15, 2006 12:28:29 PM
Subject: Leaking org.apache.lucene.index.* objects

Hi,

About 2-3 weeks ago I emailed about a memory leak in my  
application.  I then found some problems in my code (I wasn't  
closing IndexSearchers explicitly) and took care of those.  Now I  
see my app is still leaking memory - jconsole clearly shows the  
"Tenured Gen" memory pool getting filled up until I hit the OOM,  
but I can't seem to pin-point the source.


I found that a bunch or o.a.l.index.* objects are not getting GCed,  
even though they should.  For example:


$ jmap -histo:live 7825 | grep apache.lucene.index | head -20 |  
sort -k2 -nr

num   #instances#bytes  class name
--
  4:   176484098831040   
org.apache.lucene.index.CompoundFileReader$CSIndexInput

  5:   211921567814880  org.apache.lucene.index.TermInfo
  7:   111245935598688  org.apache.lucene.index.SegmentReader$Norm
  9:   213231134116976  org.apache.lucene.index.Term
 12:   111789726829528  org.apache.lucene.index.FieldInfo
 13:22534018027200  org.apache.lucene.index.SegmentTermEnum
 15:58972714153448  org.apache.lucene.index.TermBuffer
 21: 86033 8718504  [Lorg.apache.lucene.index.TermInfo;
 20: 86033 8718504  [Lorg.apache.lucene.index.Term;
 23: 86120 7578560  org.apache.lucene.index.SegmentReader
 26: 90501 5068056  org.apache.lucene.store.FSIndexInput
 27: 86120 4822720  org.apache.luc

ThreadLocal leak (was Re: Leaking org.apache.lucene.index.* objects)

2006-12-15 Thread Otis Gospodnetic
Moving to java-dev, I think this belongs here.
I've been looking at this problem some more today and reading about 
ThreadLocals.  It's easy to misuse them and end up with memory leaks, 
apparently... and I think we may have this problem here.

The problem here is that ThreadLocals are tied to Threads, and I think the 
assumption in TermInfosReader and SegmentReader is that (search) Threads are 
short-lived: they come in, scan the index, do the search, return and die.  In 
this scenario, their ThreadLocals go to heaven with them, too, and memory is 
freed up.

But when Threads are long-lived, as they are in thread pools (e.g. those in 
servlet containers), those ThreadLocals stay alive even after a single search 
request is done.  Moreover, the Thread is reused, and the new TermInfosReader 
and SegmentReader put some new values in that ThreadLocal on top of the old 
values (I think) from the previous search request.  Because the Thread still 
has references to ThreadLocals and the values in them, the values never get 
GCed.

I tried making ThreadLocals in TIR and SR static, I tried wrapping values saved 
in TLs in WeakReference, I've tried using WeakHashMap like in Robert Engel's 
FixedThreadLocal class from LUCENE-436, but nothing helped.  I thought about 
adding a public static method to TIR and SR, so one could call it at the end of 
a search request (think servlet filter) and clear the TL for the current 
thread, but that would require making TIR and SR public and I'm not 100% sure 
if it would work, plus that exposes the implementation details too much.
I don't have a solution yet.
But do we *really* need ThreadLocal in TIR and SR?  The only thing that TL is 
doing there is acting as a per-thread storage of some cloned value (in TIR we 
clone SegmentTermEnum and in SR we clone TermVectorsReader).  Why can't we just 
store those cloned values in instance variables?  Isn't whoever is calling TIR 
and SR going to be calling the same instance of TIR and SR anyway, and thus get 
access to those cloned values?

I'm really amazed that we haven't heard any reports about this before.  I am 
not sure why my application started showing this leak only about 3 weeks ago.  
It is getting more pounded on than before, so maybe that made the leak more 
obvious.  My guess is that more common Lucene usage is with a single index or a 
small number of them, and with short-lived threads, where this problem isn't 
easily visible.  In my case I deal with a few tens of thousands of indices and 
several parallel search threads that live forever in the thread pool.

Any thoughts about this or possible suggestions for a fix?
Thanks,
Otis



- Original Message 
From: Otis Gospodnetic <[EMAIL PROTECTED]>
To: java-user@lucene.apache.org
Sent: Friday, December 15, 2006 12:28:29 PM
Subject: Leaking org.apache.lucene.index.* objects

Hi,

About 2-3 weeks ago I emailed about a memory leak in my application.  I then 
found some problems in my code (I wasn't closing IndexSearchers explicitly) and 
took care of those.  Now I see my app is still leaking memory - jconsole 
clearly shows the "Tenured Gen" memory pool getting filled up until I hit the 
OOM, but I can't seem to pin-point the source.

I found that a bunch or o.a.l.index.* objects are not getting GCed, even though 
they should.  For example:

$ jmap -histo:live 7825 | grep apache.lucene.index | head -20 | sort -k2 -nr
num   #instances#bytes  class name
--
  4:   176484098831040  
org.apache.lucene.index.CompoundFileReader$CSIndexInput
  5:   211921567814880  org.apache.lucene.index.TermInfo
  7:   111245935598688  org.apache.lucene.index.SegmentReader$Norm
  9:   213231134116976  org.apache.lucene.index.Term
 12:   111789726829528  org.apache.lucene.index.FieldInfo
 13:22534018027200  org.apache.lucene.index.SegmentTermEnum
 15:58972714153448  org.apache.lucene.index.TermBuffer
 21: 86033 8718504  [Lorg.apache.lucene.index.TermInfo;
 20: 86033 8718504  [Lorg.apache.lucene.index.Term;
 23: 86120 7578560  org.apache.lucene.index.SegmentReader
 26: 90501 5068056  org.apache.lucene.store.FSIndexInput
 27: 86120 4822720  org.apache.lucene.index.TermInfosReader
 33: 86130 3445200  org.apache.lucene.index.SegmentInfo
 36: 87355 2795360  org.apache.lucene.store.FSIndexInput$Descriptor
 38: 86120 2755840  org.apache.lucene.index.FieldsReader
 39: 86050 2753600  org.apache.lucene.index.CompoundFileReader
 42: 46903 2251344  org.apache.lucene.index.SegmentInfos
 43: 93778 2250672  org.apache.lucene.search.FieldCacheImpl$Entry
 45: 93778 1500448  
org.apache.lucene.search.FieldCacheImpl$CreationPlaceholder
 47: 86510 1384160  org.apache.lucene.index.FieldInfos

I'm running my app in search-only mode - no adds or deletes.
The counts of these objects just keeps going up, even though I am explicitly 
closing