Re: ThreadLocal leak (was Re: Leaking org.apache.lucene.index.* objects)
: > What do you think about that alternative approach I mentioned? Instead of having FieldCacheImpl be aware of all IndexReaders, have FieldCache be an inst var in IndexReader? : : I wonder why it wasn't done that way to start with perhaps to : completely separate sorting from index reading. Anyway, it's not : backward compatible, and it doesn't buy us much to change now does it? : We would get rid of a singe hash lookup on the reader, which is : insignificant compared to anything the FieldCache is used for anyway. the discussion on this i'm remembering was here... http://www.nabble.com/Poor-performance-%22race-condition%22-in-FieldSortedHitQueue-tf2070773.html#a5767432 Doron speculated that the reason FieldCache was seperate from IndexReader was to keep the "caching" at a higher level then the "reading", i suggested that if we're going to change the API, we should move the "refrence" to the cache up into the Searcher since that's the place Sorting is acctually done, but that it should still be based on weak ref to the reader since multiple searchers might share a reader. thinking about it now however, i would think it does make sense for the IndexReader to hang on to it's field cache ... it would not only simplify things a lot, and allow an IndexReader to explicilty "clear" it's cache when it's closed, but it would also allow us to intellegently populate the cache based on the individual segments in a MultiReader (so if some of hte segments haven't hcange,d we don't reread that part of hte cache) -Hoss - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: ThreadLocal leak (was Re: Leaking org.apache.lucene.index.* objects)
I think Robert might have still had LUCENE-436 in the back of his mind. I think I'll take care of that one soon - I think those finalize() overrides are no longer missing, and that's the only "issue" in LUCENE-436. I think finalize()s there are not hurting us, but since they contain hacks for pre-1.4.2 JVMs, we could clean that up now. FixedThreadLocal.java that's attach there won't be needed. Otis - Original Message From: robert engels <[EMAIL PROTECTED]> To: java-dev@lucene.apache.org Sent: Tuesday, December 19, 2006 4:52:18 PM Subject: Re: ThreadLocal leak (was Re: Leaking org.apache.lucene.index.* objects) Sorry, it wasn't this particular thread. Sorry for the confusion. This thread is actually on the money I believe (The FieldCache issues). I got my wires crossed during a vent. It is just difficult when you spend numerous hours to diagnose "the problem" and then have to revisit the same issue every couple of months. On Dec 19, 2006, at 3:25 PM, Mike Klaas wrote: > On 12/19/06, robert engels <[EMAIL PROTECTED]> wrote: >> I would suggest that in order to even bring up "thread local issues" >> in the future that the submitter supplies a pure Java NON-LUCENE test >> case that demonstrates the problem (just as you would if reporting a >> bug to Sun). >> >> All of the "guessing" about what is going on is counter productive. >> You can review the JDK source code, and you can run test cases. >> >> There is nothing inherently broken in ThreadLocals, and people that >> keep claiming that are only doing a disservice to the people that >> make sure Java is a robust and reliable platform. > > I didn't observe any maligning of java ThreadLocals in this thread--I > only noticed questions about lucene's use of them. I can understand > that you are worried about FUD, but I don't think that is happening > here, nor are anyone's efforts being disprespected. > > -Mike > >> On Dec 19, 2006, at 1:41 PM, Yonik Seeley wrote: >> >> > On 12/19/06, Otis Gospodnetic <[EMAIL PROTECTED]> wrote: >> >> I _think_ Robert is right and ThreadLocals are not the problem (I >> >> tried getting rid of them, and replacing them with an instance var >> >> last week, but run into problems with multi-threaded unit tests). >> > >> > If you want to try getting rid of the ThreadLocals, you can't >> replace >> > with instance vars because the stuff put into the Threadlocals >> isn't >> > thread safe. >> > Replace the ThreadLocal stuff with the code that populated it in >> the >> > first place. >> > >> > So for instance, in TermInfosReader, replace this >> > private SegmentTermEnum getEnum() { >> >SegmentTermEnum termEnum = (SegmentTermEnum)enumerators.get(); >> >if (termEnum == null) { >> > termEnum = terms(); >> > enumerators.set(termEnum); >> >} >> >return termEnum; >> > } >> > >> > With this: >> > private SegmentTermEnum getEnum() { >> > return terms() >> > } >> > >> > >> >> What I'm seeing while profiling (and in production) is the >> >> accumulation of these: >> >> >> >> org.apache.lucene.search.FieldCacheImpl$Entry >> >> org.apache.lucene.search.FieldCacheImpl$CreationPlaceholder >> >> >> >> This is related to http://issues.apache.org/jira/browse/LUCENE-651 >> >> (the commit for that patch also happens to coincide with when I >> >> started seeing the leak). >> > >> > I'll take a look into that. Solr hasn't sync'd to that version of >> > Lucene yet, so we haven't seen this problem of course :-) >> > >> > -Yonik >> > >> > >> - >> > To unsubscribe, e-mail: [EMAIL PROTECTED] >> > For additional commands, e-mail: [EMAIL PROTECTED] >> > >> >> >> - >> To unsubscribe, e-mail: [EMAIL PROTECTED] >> For additional commands, e-mail: [EMAIL PROTECTED] >> >> > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: ThreadLocal leak (was Re: Leaking org.apache.lucene.index.* objects)
Sorry, it wasn't this particular thread. Sorry for the confusion. This thread is actually on the money I believe (The FieldCache issues). I got my wires crossed during a vent. It is just difficult when you spend numerous hours to diagnose "the problem" and then have to revisit the same issue every couple of months. On Dec 19, 2006, at 3:25 PM, Mike Klaas wrote: On 12/19/06, robert engels <[EMAIL PROTECTED]> wrote: I would suggest that in order to even bring up "thread local issues" in the future that the submitter supplies a pure Java NON-LUCENE test case that demonstrates the problem (just as you would if reporting a bug to Sun). All of the "guessing" about what is going on is counter productive. You can review the JDK source code, and you can run test cases. There is nothing inherently broken in ThreadLocals, and people that keep claiming that are only doing a disservice to the people that make sure Java is a robust and reliable platform. I didn't observe any maligning of java ThreadLocals in this thread--I only noticed questions about lucene's use of them. I can understand that you are worried about FUD, but I don't think that is happening here, nor are anyone's efforts being disprespected. -Mike On Dec 19, 2006, at 1:41 PM, Yonik Seeley wrote: > On 12/19/06, Otis Gospodnetic <[EMAIL PROTECTED]> wrote: >> I _think_ Robert is right and ThreadLocals are not the problem (I >> tried getting rid of them, and replacing them with an instance var >> last week, but run into problems with multi-threaded unit tests). > > If you want to try getting rid of the ThreadLocals, you can't replace > with instance vars because the stuff put into the Threadlocals isn't > thread safe. > Replace the ThreadLocal stuff with the code that populated it in the > first place. > > So for instance, in TermInfosReader, replace this > private SegmentTermEnum getEnum() { >SegmentTermEnum termEnum = (SegmentTermEnum)enumerators.get(); >if (termEnum == null) { > termEnum = terms(); > enumerators.set(termEnum); >} >return termEnum; > } > > With this: > private SegmentTermEnum getEnum() { > return terms() > } > > >> What I'm seeing while profiling (and in production) is the >> accumulation of these: >> >> org.apache.lucene.search.FieldCacheImpl$Entry >> org.apache.lucene.search.FieldCacheImpl$CreationPlaceholder >> >> This is related to http://issues.apache.org/jira/browse/LUCENE-651 >> (the commit for that patch also happens to coincide with when I >> started seeing the leak). > > I'll take a look into that. Solr hasn't sync'd to that version of > Lucene yet, so we haven't seen this problem of course :-) > > -Yonik > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: ThreadLocal leak (was Re: Leaking org.apache.lucene.index.* objects)
On 12/19/06, robert engels <[EMAIL PROTECTED]> wrote: I would suggest that in order to even bring up "thread local issues" in the future that the submitter supplies a pure Java NON-LUCENE test case that demonstrates the problem (just as you would if reporting a bug to Sun). All of the "guessing" about what is going on is counter productive. You can review the JDK source code, and you can run test cases. There is nothing inherently broken in ThreadLocals, and people that keep claiming that are only doing a disservice to the people that make sure Java is a robust and reliable platform. I didn't observe any maligning of java ThreadLocals in this thread--I only noticed questions about lucene's use of them. I can understand that you are worried about FUD, but I don't think that is happening here, nor are anyone's efforts being disprespected. -Mike On Dec 19, 2006, at 1:41 PM, Yonik Seeley wrote: > On 12/19/06, Otis Gospodnetic <[EMAIL PROTECTED]> wrote: >> I _think_ Robert is right and ThreadLocals are not the problem (I >> tried getting rid of them, and replacing them with an instance var >> last week, but run into problems with multi-threaded unit tests). > > If you want to try getting rid of the ThreadLocals, you can't replace > with instance vars because the stuff put into the Threadlocals isn't > thread safe. > Replace the ThreadLocal stuff with the code that populated it in the > first place. > > So for instance, in TermInfosReader, replace this > private SegmentTermEnum getEnum() { >SegmentTermEnum termEnum = (SegmentTermEnum)enumerators.get(); >if (termEnum == null) { > termEnum = terms(); > enumerators.set(termEnum); >} >return termEnum; > } > > With this: > private SegmentTermEnum getEnum() { > return terms() > } > > >> What I'm seeing while profiling (and in production) is the >> accumulation of these: >> >> org.apache.lucene.search.FieldCacheImpl$Entry >> org.apache.lucene.search.FieldCacheImpl$CreationPlaceholder >> >> This is related to http://issues.apache.org/jira/browse/LUCENE-651 >> (the commit for that patch also happens to coincide with when I >> started seeing the leak). > > I'll take a look into that. Solr hasn't sync'd to that version of > Lucene yet, so we haven't seen this problem of course :-) > > -Yonik > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: ThreadLocal leak (was Re: Leaking org.apache.lucene.index.* objects)
I would suggest that in order to even bring up "thread local issues" in the future that the submitter supplies a pure Java NON-LUCENE test case that demonstrates the problem (just as you would if reporting a bug to Sun). All of the "guessing" about what is going on is counter productive. You can review the JDK source code, and you can run test cases. There is nothing inherently broken in ThreadLocals, and people that keep claiming that are only doing a disservice to the people that make sure Java is a robust and reliable platform. On Dec 19, 2006, at 1:41 PM, Yonik Seeley wrote: On 12/19/06, Otis Gospodnetic <[EMAIL PROTECTED]> wrote: I _think_ Robert is right and ThreadLocals are not the problem (I tried getting rid of them, and replacing them with an instance var last week, but run into problems with multi-threaded unit tests). If you want to try getting rid of the ThreadLocals, you can't replace with instance vars because the stuff put into the Threadlocals isn't thread safe. Replace the ThreadLocal stuff with the code that populated it in the first place. So for instance, in TermInfosReader, replace this private SegmentTermEnum getEnum() { SegmentTermEnum termEnum = (SegmentTermEnum)enumerators.get(); if (termEnum == null) { termEnum = terms(); enumerators.set(termEnum); } return termEnum; } With this: private SegmentTermEnum getEnum() { return terms() } What I'm seeing while profiling (and in production) is the accumulation of these: org.apache.lucene.search.FieldCacheImpl$Entry org.apache.lucene.search.FieldCacheImpl$CreationPlaceholder This is related to http://issues.apache.org/jira/browse/LUCENE-651 (the commit for that patch also happens to coincide with when I started seeing the leak). I'll take a look into that. Solr hasn't sync'd to that version of Lucene yet, so we haven't seen this problem of course :-) -Yonik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: ThreadLocal leak (was Re: Leaking org.apache.lucene.index.* objects)
On 12/19/06, Otis Gospodnetic <[EMAIL PROTECTED]> wrote: Bingo! :) I'll try the patch shortly and report back in a bit. Cool, since you can serve as a test-case, I'll hold of on committing until I hear how it works. What do you think about that alternative approach I mentioned? Instead of having FieldCacheImpl be aware of all IndexReaders, have FieldCache be an inst var in IndexReader? I wonder why it wasn't done that way to start with perhaps to completely separate sorting from index reading. Anyway, it's not backward compatible, and it doesn't buy us much to change now does it? We would get rid of a singe hash lookup on the reader, which is insignificant compared to anything the FieldCache is used for anyway. -Yonik http://incubator.apache.org/solr Solr, the open-source Lucene search server - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: ThreadLocal leak (was Re: Leaking org.apache.lucene.index.* objects)
Bingo! :) I'll try the patch shortly and report back in a bit. What do you think about that alternative approach I mentioned? Instead of having FieldCacheImpl be aware of all IndexReaders, have FieldCache be an inst var in IndexReader? That way we can clean up/lose reference to FieldCache when we close IndexReader instead of relying on GC to clean up IndexReaders from FieldCacheImpl's WeakHashMap. Wouldn't that simplify the code? I think we wouldn't have to have some of those caches in FieldCacheImpl then. Thanks, Otis - Original Message From: Yonik Seeley <[EMAIL PROTECTED]> To: java-dev@lucene.apache.org Sent: Tuesday, December 19, 2006 2:50:43 PM Subject: Re: ThreadLocal leak (was Re: Leaking org.apache.lucene.index.* objects) On 12/19/06, Yonik Seeley <[EMAIL PROTECTED]> wrote: > On 12/19/06, Otis Gospodnetic <[EMAIL PROTECTED]> wrote: > > What I'm seeing while profiling (and in production) is the accumulation of > > these: > > > > org.apache.lucene.search.FieldCacheImpl$Entry > > org.apache.lucene.search.FieldCacheImpl$CreationPlaceholder > > > > This is related to http://issues.apache.org/jira/browse/LUCENE-651 (the > > commit for that patch also happens to coincide with when I started seeing > > the leak). > > I'll take a look into that. Solr hasn't sync'd to that version of > Lucene yet, so we haven't seen this problem of course :-) OK, I see the problem... I opened http://issues.apache.org/jira/browse/LUCENE-754 to track and it should hopefully be fixed soon -- -Yonik http://incubator.apache.org/solr Solr, the open-source Lucene search server - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: ThreadLocal leak (was Re: Leaking org.apache.lucene.index.* objects)
LOL! of course I stopped to open the bug, grab a sandwitch, etc ;-) On 12/19/06, Chris Hostetter <[EMAIL PROTECTED]> wrote: Ha! ... I beat you by 16 whole seconds! You have to be faster then that if you want to roll with the Big Dogs! : Date: Tue, 19 Dec 2006 11:50:07 -0800 (PST) : From: Chris Hostetter : Subject: Re: ThreadLocal leak (was Re: Leaking org.apache.lucene.index.* : Date: Tue, 19 Dec 2006 11:50:23 -0800 (PST) : From: "Yonik Seeley (JIRA)" : Subject: [jira] Created: (LUCENE-754) FieldCache keeps hard references to -Hoss - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: ThreadLocal leak (was Re: Leaking org.apache.lucene.index.* objects)
Ha! ... I beat you by 16 whole seconds! You have to be faster then that if you want to roll with the Big Dogs! : Date: Tue, 19 Dec 2006 11:50:07 -0800 (PST) : From: Chris Hostetter : Subject: Re: ThreadLocal leak (was Re: Leaking org.apache.lucene.index.* : Date: Tue, 19 Dec 2006 11:50:23 -0800 (PST) : From: "Yonik Seeley (JIRA)" : Subject: [jira] Created: (LUCENE-754) FieldCache keeps hard references to -Hoss - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: ThreadLocal leak (was Re: Leaking org.apache.lucene.index.* objects)
On 12/19/06, Yonik Seeley <[EMAIL PROTECTED]> wrote: On 12/19/06, Otis Gospodnetic <[EMAIL PROTECTED]> wrote: > What I'm seeing while profiling (and in production) is the accumulation of these: > > org.apache.lucene.search.FieldCacheImpl$Entry > org.apache.lucene.search.FieldCacheImpl$CreationPlaceholder > > This is related to http://issues.apache.org/jira/browse/LUCENE-651 (the commit for that patch also happens to coincide with when I started seeing the leak). I'll take a look into that. Solr hasn't sync'd to that version of Lucene yet, so we haven't seen this problem of course :-) OK, I see the problem... I opened http://issues.apache.org/jira/browse/LUCENE-754 to track and it should hopefully be fixed soon -- -Yonik http://incubator.apache.org/solr Solr, the open-source Lucene search server - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: ThreadLocal leak (was Re: Leaking org.apache.lucene.index.* objects)
: Can anyone else have a look at the patch for LUCENE-651 (or just at : FieldCacheImpl) and see if there is a quick fix? If i'm not mistaken, isn't line #65 wrong? ... synchronized (readerCache) { innerCache = (Map) readerCache.get(reader); if (innerCache == null) { innerCache = new HashMap(); readerCache.put(reader, innerCache); value = null; } else { value = innerCache.get(key); } if (value == null) { value = new CreationPlaceholder(); innerCache.put(reader, value); <--- L65 } } ...why is the reader being used as the key of the innerCache? Shouldn't the "key" (param) be used here? ... nothing seems to ever remove the reader from the innerCache, so i don't see how the WeekHashMap readerCache would ever let go of it either. : Also, wouldn't it be better if IndexReaders had a reference to their : copy FieldCache, which they can kill when they are closed? What we have : now is a chunky FieldCacheImpl that has references to IndexReaders, and : doesn't know when they are closed, and thus it can't remove them and : their data from its internal caches? as i recall, we've discussed that on the mailing list as being a "better API" for dealing with FieldCaches ... but it would be an API changes, and no one has (as of yet) attempted it and submitted a patch. -Hoss - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: ThreadLocal leak (was Re: Leaking org.apache.lucene.index.* objects)
On 12/19/06, Otis Gospodnetic <[EMAIL PROTECTED]> wrote: I _think_ Robert is right and ThreadLocals are not the problem (I tried getting rid of them, and replacing them with an instance var last week, but run into problems with multi-threaded unit tests). If you want to try getting rid of the ThreadLocals, you can't replace with instance vars because the stuff put into the Threadlocals isn't thread safe. Replace the ThreadLocal stuff with the code that populated it in the first place. So for instance, in TermInfosReader, replace this private SegmentTermEnum getEnum() { SegmentTermEnum termEnum = (SegmentTermEnum)enumerators.get(); if (termEnum == null) { termEnum = terms(); enumerators.set(termEnum); } return termEnum; } With this: private SegmentTermEnum getEnum() { return terms() } What I'm seeing while profiling (and in production) is the accumulation of these: org.apache.lucene.search.FieldCacheImpl$Entry org.apache.lucene.search.FieldCacheImpl$CreationPlaceholder This is related to http://issues.apache.org/jira/browse/LUCENE-651 (the commit for that patch also happens to coincide with when I started seeing the leak). I'll take a look into that. Solr hasn't sync'd to that version of Lucene yet, so we haven't seen this problem of course :-) -Yonik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: ThreadLocal leak (was Re: Leaking org.apache.lucene.index.* objects)
Hi, I _think_ Robert is right and ThreadLocals are not the problem (I tried getting rid of them, and replacing them with an instance var last week, but run into problems with multi-threaded unit tests). What I'm seeing while profiling (and in production) is the accumulation of these: org.apache.lucene.search.FieldCacheImpl$Entry org.apache.lucene.search.FieldCacheImpl$CreationPlaceholder This is related to http://issues.apache.org/jira/browse/LUCENE-651 (the commit for that patch also happens to coincide with when I started seeing the leak). The number of those 2 FieldCache* instances seems to be about 2 x # of unique IndexReaders (indices) that have been searched so far. I reversed that LUCENE-651 patch locally, and now I am *not* seeing the number of FieldCacheImpl$Entry instances grow any more! Hm, hm, hm, hm. :) FieldCacheImpl$CreationPlaceholder came with LUCENE-651, so I'm not seeing any of them, of course. I _think_ the reason I see this leak and most other people don't, is because I'm running an application with LOTS of frequently-changing indices (look at http://simpy.com ). That FieldCacheImpl$CreationPlaceholder uses IndexReaders as keys in a WeakHashMap *and* another HashMap. Even though I close my IndexSearchers and they close their underlying IndexReaders, I think FieldCacheImpl is holding onto them, thus not allowing them to be GCed. Here is what the dump of my production heap is showing right now, for instance: $ jmap -histo:live `jps | grep Server | awk '{print $1}'` | grep SegmentReader 46: 6630 583440 org.apache.lucene.index.SegmentReader $ jmap -histo:live `jps | grep Server | awk '{print $1}'` | grep MultiReader 110: 735 41160 org.apache.lucene.index.MultiReader $ jmap -histo:live `jps | grep Server | awk '{print $1}'` | grep FieldCache 74: 7444 178656 org.apache.lucene.search.FieldCacheImpl$Entry 85: 7434 118944 org.apache.lucene.search.FieldCacheImpl$CreationPlaceholder 6630 + 735 = 7365 ~ 7434 But, like I said, I'm closing IndexSearchers (and they are closing their IndexReaders), so there are only 114 open of them at the moment: $ jmap -histo:live `jps | grep Server | awk '{print $1}'` | grep IndexSearcher 337: 1142736 org.apache.lucene.search.IndexSearcher So, I think that LUCENE-651 introduced a bug. I think the reason why Robert and others are not seeing this is because 1) few people are using 2.1-dev AND 2) running apps with LOTs of indices. If I had an app with a single or a handful of indices, I would probably never have noticed this leak. Can anyone else have a look at the patch for LUCENE-651 (or just at FieldCacheImpl) and see if there is a quick fix? Also, wouldn't it be better if IndexReaders had a reference to their copy FieldCache, which they can kill when they are closed? What we have now is a chunky FieldCacheImpl that has references to IndexReaders, and doesn't know when they are closed, and thus it can't remove them and their data from its internal caches? Thanks, Otis - Original Message From: Mark Miller <[EMAIL PROTECTED]> To: java-dev@lucene.apache.org Sent: Sunday, December 17, 2006 9:22:05 PM Subject: Re: ThreadLocal leak (was Re: Leaking org.apache.lucene.index.* objects) I think that Otis originally said that the problem came up when he started using the latest build off of the trunk. I don't believe he experienced the problem on 2.0. Anyone on the bleeding edge not noticing the leak? I am going to be deploying with 2.trunk soon and am very interested . robert engels wrote: > Our search server also used long-lived threads. No memory problems > whatsoever. > > On Dec 17, 2006, at 3:51 PM, Paul Smith wrote: > >> >> On 16/12/2006, at 6:15 PM, Otis Gospodnetic wrote: >> >>> Moving to java-dev, I think this belongs here. >>> I've been looking at this problem some more today and reading about >>> ThreadLocals. It's easy to misuse them and end up with memory >>> leaks, apparently... and I think we may have this problem here. >>> >>> The problem here is that ThreadLocals are tied to Threads, and I >>> think the assumption in TermInfosReader and SegmentReader is that >>> (search) Threads are short-lived: they come in, scan the index, do >>> the search, return and die. In this scenario, their ThreadLocals go >>> to heaven with them, too, and memory is freed up. >> >> Otis, we have an index server being served inside Tomcat, where an >> Application instance makes a search request via vanilla HTTP post, so >> our connector threads definitely do stay alive for quite a while. >> We're using Lucene 2.0, and our Index server is THE most stable of >> all our components, up
Re: ThreadLocal leak (was Re: Leaking org.apache.lucene.index.* objects)
There is no inherent problem with ThreadLocal. It is a viable solution to synchronization issues in most cases. On Dec 18, 2006, at 11:25 AM, Bernhard Messer wrote: Otis, i figured out a similar problem when running a very heavy loaded search application in a servlet container. The reasone using ThreadLocals was to get rid of synchronized method calls e.g in TermVectorsReader which would break down the overall search performance. Currently i do not see an easy solution to fix both, the synchronization and ThreadLocal problem. Bernhard Otis Gospodnetic wrote: Moving to java-dev, I think this belongs here. I've been looking at this problem some more today and reading about ThreadLocals. It's easy to misuse them and end up with memory leaks, apparently... and I think we may have this problem here. The problem here is that ThreadLocals are tied to Threads, and I think the assumption in TermInfosReader and SegmentReader is that (search) Threads are short-lived: they come in, scan the index, do the search, return and die. In this scenario, their ThreadLocals go to heaven with them, too, and memory is freed up. But when Threads are long-lived, as they are in thread pools (e.g. those in servlet containers), those ThreadLocals stay alive even after a single search request is done. Moreover, the Thread is reused, and the new TermInfosReader and SegmentReader put some new values in that ThreadLocal on top of the old values (I think) from the previous search request. Because the Thread still has references to ThreadLocals and the values in them, the values never get GCed. I tried making ThreadLocals in TIR and SR static, I tried wrapping values saved in TLs in WeakReference, I've tried using WeakHashMap like in Robert Engel's FixedThreadLocal class from LUCENE-436, but nothing helped. I thought about adding a public static method to TIR and SR, so one could call it at the end of a search request (think servlet filter) and clear the TL for the current thread, but that would require making TIR and SR public and I'm not 100% sure if it would work, plus that exposes the implementation details too much. I don't have a solution yet. But do we *really* need ThreadLocal in TIR and SR? The only thing that TL is doing there is acting as a per-thread storage of some cloned value (in TIR we clone SegmentTermEnum and in SR we clone TermVectorsReader). Why can't we just store those cloned values in instance variables? Isn't whoever is calling TIR and SR going to be calling the same instance of TIR and SR anyway, and thus get access to those cloned values? I'm really amazed that we haven't heard any reports about this before. I am not sure why my application started showing this leak only about 3 weeks ago. It is getting more pounded on than before, so maybe that made the leak more obvious. My guess is that more common Lucene usage is with a single index or a small number of them, and with short-lived threads, where this problem isn't easily visible. In my case I deal with a few tens of thousands of indices and several parallel search threads that live forever in the thread pool. Any thoughts about this or possible suggestions for a fix? Thanks, Otis - Original Message From: Otis Gospodnetic <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Friday, December 15, 2006 12:28:29 PM Subject: Leaking org.apache.lucene.index.* objects Hi, About 2-3 weeks ago I emailed about a memory leak in my application. I then found some problems in my code (I wasn't closing IndexSearchers explicitly) and took care of those. Now I see my app is still leaking memory - jconsole clearly shows the "Tenured Gen" memory pool getting filled up until I hit the OOM, but I can't seem to pin-point the source. I found that a bunch or o.a.l.index.* objects are not getting GCed, even though they should. For example: $ jmap -histo:live 7825 | grep apache.lucene.index | head -20 | sort -k2 -nr num #instances#bytes class name -- 4: 176484098831040 org.apache.lucene.index.CompoundFileReader$CSIndexInput 5: 211921567814880 org.apache.lucene.index.TermInfo 7: 111245935598688 org.apache.lucene.index.SegmentReader $Norm 9: 213231134116976 org.apache.lucene.index.Term 12: 111789726829528 org.apache.lucene.index.FieldInfo 13:22534018027200 org.apache.lucene.index.SegmentTermEnum 15:58972714153448 org.apache.lucene.index.TermBuffer 21: 86033 8718504 [Lorg.apache.lucene.index.TermInfo; 20: 86033 8718504 [Lorg.apache.lucene.index.Term; 23: 86120 7578560 org.apache.lucene.index.SegmentReader 26: 90501 5068056 org.apache.lucene.store.FSIndexInput 27: 86120 4822720 org.apache.lucene.index.TermInfosReader 33: 86130 3445200 org.apache.lucene.index.SegmentInfo
Re: ThreadLocal leak (was Re: Leaking org.apache.lucene.index.* objects)
Otis, i figured out a similar problem when running a very heavy loaded search application in a servlet container. The reasone using ThreadLocals was to get rid of synchronized method calls e.g in TermVectorsReader which would break down the overall search performance. Currently i do not see an easy solution to fix both, the synchronization and ThreadLocal problem. Bernhard Otis Gospodnetic wrote: Moving to java-dev, I think this belongs here. I've been looking at this problem some more today and reading about ThreadLocals. It's easy to misuse them and end up with memory leaks, apparently... and I think we may have this problem here. The problem here is that ThreadLocals are tied to Threads, and I think the assumption in TermInfosReader and SegmentReader is that (search) Threads are short-lived: they come in, scan the index, do the search, return and die. In this scenario, their ThreadLocals go to heaven with them, too, and memory is freed up. But when Threads are long-lived, as they are in thread pools (e.g. those in servlet containers), those ThreadLocals stay alive even after a single search request is done. Moreover, the Thread is reused, and the new TermInfosReader and SegmentReader put some new values in that ThreadLocal on top of the old values (I think) from the previous search request. Because the Thread still has references to ThreadLocals and the values in them, the values never get GCed. I tried making ThreadLocals in TIR and SR static, I tried wrapping values saved in TLs in WeakReference, I've tried using WeakHashMap like in Robert Engel's FixedThreadLocal class from LUCENE-436, but nothing helped. I thought about adding a public static method to TIR and SR, so one could call it at the end of a search request (think servlet filter) and clear the TL for the current thread, but that would require making TIR and SR public and I'm not 100% sure if it would work, plus that exposes the implementation details too much. I don't have a solution yet. But do we *really* need ThreadLocal in TIR and SR? The only thing that TL is doing there is acting as a per-thread storage of some cloned value (in TIR we clone SegmentTermEnum and in SR we clone TermVectorsReader). Why can't we just store those cloned values in instance variables? Isn't whoever is calling TIR and SR going to be calling the same instance of TIR and SR anyway, and thus get access to those cloned values? I'm really amazed that we haven't heard any reports about this before. I am not sure why my application started showing this leak only about 3 weeks ago. It is getting more pounded on than before, so maybe that made the leak more obvious. My guess is that more common Lucene usage is with a single index or a small number of them, and with short-lived threads, where this problem isn't easily visible. In my case I deal with a few tens of thousands of indices and several parallel search threads that live forever in the thread pool. Any thoughts about this or possible suggestions for a fix? Thanks, Otis - Original Message From: Otis Gospodnetic <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Friday, December 15, 2006 12:28:29 PM Subject: Leaking org.apache.lucene.index.* objects Hi, About 2-3 weeks ago I emailed about a memory leak in my application. I then found some problems in my code (I wasn't closing IndexSearchers explicitly) and took care of those. Now I see my app is still leaking memory - jconsole clearly shows the "Tenured Gen" memory pool getting filled up until I hit the OOM, but I can't seem to pin-point the source. I found that a bunch or o.a.l.index.* objects are not getting GCed, even though they should. For example: $ jmap -histo:live 7825 | grep apache.lucene.index | head -20 | sort -k2 -nr num #instances#bytes class name -- 4: 176484098831040 org.apache.lucene.index.CompoundFileReader$CSIndexInput 5: 211921567814880 org.apache.lucene.index.TermInfo 7: 111245935598688 org.apache.lucene.index.SegmentReader$Norm 9: 213231134116976 org.apache.lucene.index.Term 12: 111789726829528 org.apache.lucene.index.FieldInfo 13:22534018027200 org.apache.lucene.index.SegmentTermEnum 15:58972714153448 org.apache.lucene.index.TermBuffer 21: 86033 8718504 [Lorg.apache.lucene.index.TermInfo; 20: 86033 8718504 [Lorg.apache.lucene.index.Term; 23: 86120 7578560 org.apache.lucene.index.SegmentReader 26: 90501 5068056 org.apache.lucene.store.FSIndexInput 27: 86120 4822720 org.apache.lucene.index.TermInfosReader 33: 86130 3445200 org.apache.lucene.index.SegmentInfo 36: 87355 2795360 org.apache.lucene.store.FSIndexInput$Descriptor 38: 86120 2755840 org.apache.lucene.index.FieldsReader 39: 86050 2753600 org.apache.lucene.index.CompoundFileReader 42: 46903 2251344 o
Re: ThreadLocal leak (was Re: Leaking org.apache.lucene.index.* objects)
I think that Otis originally said that the problem came up when he started using the latest build off of the trunk. I don't believe he experienced the problem on 2.0. Anyone on the bleeding edge not noticing the leak? I am going to be deploying with 2.trunk soon and am very interested . robert engels wrote: Our search server also used long-lived threads. No memory problems whatsoever. On Dec 17, 2006, at 3:51 PM, Paul Smith wrote: On 16/12/2006, at 6:15 PM, Otis Gospodnetic wrote: Moving to java-dev, I think this belongs here. I've been looking at this problem some more today and reading about ThreadLocals. It's easy to misuse them and end up with memory leaks, apparently... and I think we may have this problem here. The problem here is that ThreadLocals are tied to Threads, and I think the assumption in TermInfosReader and SegmentReader is that (search) Threads are short-lived: they come in, scan the index, do the search, return and die. In this scenario, their ThreadLocals go to heaven with them, too, and memory is freed up. Otis, we have an index server being served inside Tomcat, where an Application instance makes a search request via vanilla HTTP post, so our connector threads definitely do stay alive for quite a while. We're using Lucene 2.0, and our Index server is THE most stable of all our components, up for over month (before being taken down for updates) searching hundreds of various sized indexes sized up to 7Gb in size, serving 1-2 requests/second during peak usage. No memory leak spotted at our end, but I'm watching this thread with interest! :) cheers, Paul Smith - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: ThreadLocal leak (was Re: Leaking org.apache.lucene.index.* objects)
Our search server also used long-lived threads. No memory problems whatsoever. On Dec 17, 2006, at 3:51 PM, Paul Smith wrote: On 16/12/2006, at 6:15 PM, Otis Gospodnetic wrote: Moving to java-dev, I think this belongs here. I've been looking at this problem some more today and reading about ThreadLocals. It's easy to misuse them and end up with memory leaks, apparently... and I think we may have this problem here. The problem here is that ThreadLocals are tied to Threads, and I think the assumption in TermInfosReader and SegmentReader is that (search) Threads are short-lived: they come in, scan the index, do the search, return and die. In this scenario, their ThreadLocals go to heaven with them, too, and memory is freed up. Otis, we have an index server being served inside Tomcat, where an Application instance makes a search request via vanilla HTTP post, so our connector threads definitely do stay alive for quite a while. We're using Lucene 2.0, and our Index server is THE most stable of all our components, up for over month (before being taken down for updates) searching hundreds of various sized indexes sized up to 7Gb in size, serving 1-2 requests/second during peak usage. No memory leak spotted at our end, but I'm watching this thread with interest! :) cheers, Paul Smith - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: ThreadLocal leak (was Re: Leaking org.apache.lucene.index.* objects)
On 16/12/2006, at 6:15 PM, Otis Gospodnetic wrote: Moving to java-dev, I think this belongs here. I've been looking at this problem some more today and reading about ThreadLocals. It's easy to misuse them and end up with memory leaks, apparently... and I think we may have this problem here. The problem here is that ThreadLocals are tied to Threads, and I think the assumption in TermInfosReader and SegmentReader is that (search) Threads are short-lived: they come in, scan the index, do the search, return and die. In this scenario, their ThreadLocals go to heaven with them, too, and memory is freed up. Otis, we have an index server being served inside Tomcat, where an Application instance makes a search request via vanilla HTTP post, so our connector threads definitely do stay alive for quite a while. We're using Lucene 2.0, and our Index server is THE most stable of all our components, up for over month (before being taken down for updates) searching hundreds of various sized indexes sized up to 7Gb in size, serving 1-2 requests/second during peak usage. No memory leak spotted at our end, but I'm watching this thread with interest! :) cheers, Paul Smith smime.p7s Description: S/MIME cryptographic signature
Re: ThreadLocal leak (was Re: Leaking org.apache.lucene.index.* objects)
I can basically GUARANTEE that unless you are using large indexes and loading into a RamDirectory (and use Java prior to 1.5) , that there is NO ISSUE in using a ThreadLocal, There is something else wrong in your application. PLEASE get a good profiler and perform the required testing. It is amazing that so many people have spent so much time on this issue already, when in the end it has ALWAYS been demonstrated to be a user code error. The use of ThreadLocals in Lucene is correct. If you honestly believe there is some issue, please provide a self contained test case that demonstrates the problem. On Dec 16, 2006, at 1:15 AM, Otis Gospodnetic wrote: Moving to java-dev, I think this belongs here. I've been looking at this problem some more today and reading about ThreadLocals. It's easy to misuse them and end up with memory leaks, apparently... and I think we may have this problem here. The problem here is that ThreadLocals are tied to Threads, and I think the assumption in TermInfosReader and SegmentReader is that (search) Threads are short-lived: they come in, scan the index, do the search, return and die. In this scenario, their ThreadLocals go to heaven with them, too, and memory is freed up. But when Threads are long-lived, as they are in thread pools (e.g. those in servlet containers), those ThreadLocals stay alive even after a single search request is done. Moreover, the Thread is reused, and the new TermInfosReader and SegmentReader put some new values in that ThreadLocal on top of the old values (I think) from the previous search request. Because the Thread still has references to ThreadLocals and the values in them, the values never get GCed. I tried making ThreadLocals in TIR and SR static, I tried wrapping values saved in TLs in WeakReference, I've tried using WeakHashMap like in Robert Engel's FixedThreadLocal class from LUCENE-436, but nothing helped. I thought about adding a public static method to TIR and SR, so one could call it at the end of a search request (think servlet filter) and clear the TL for the current thread, but that would require making TIR and SR public and I'm not 100% sure if it would work, plus that exposes the implementation details too much. I don't have a solution yet. But do we *really* need ThreadLocal in TIR and SR? The only thing that TL is doing there is acting as a per-thread storage of some cloned value (in TIR we clone SegmentTermEnum and in SR we clone TermVectorsReader). Why can't we just store those cloned values in instance variables? Isn't whoever is calling TIR and SR going to be calling the same instance of TIR and SR anyway, and thus get access to those cloned values? I'm really amazed that we haven't heard any reports about this before. I am not sure why my application started showing this leak only about 3 weeks ago. It is getting more pounded on than before, so maybe that made the leak more obvious. My guess is that more common Lucene usage is with a single index or a small number of them, and with short-lived threads, where this problem isn't easily visible. In my case I deal with a few tens of thousands of indices and several parallel search threads that live forever in the thread pool. Any thoughts about this or possible suggestions for a fix? Thanks, Otis - Original Message From: Otis Gospodnetic <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Friday, December 15, 2006 12:28:29 PM Subject: Leaking org.apache.lucene.index.* objects Hi, About 2-3 weeks ago I emailed about a memory leak in my application. I then found some problems in my code (I wasn't closing IndexSearchers explicitly) and took care of those. Now I see my app is still leaking memory - jconsole clearly shows the "Tenured Gen" memory pool getting filled up until I hit the OOM, but I can't seem to pin-point the source. I found that a bunch or o.a.l.index.* objects are not getting GCed, even though they should. For example: $ jmap -histo:live 7825 | grep apache.lucene.index | head -20 | sort -k2 -nr num #instances#bytes class name -- 4: 176484098831040 org.apache.lucene.index.CompoundFileReader$CSIndexInput 5: 211921567814880 org.apache.lucene.index.TermInfo 7: 111245935598688 org.apache.lucene.index.SegmentReader$Norm 9: 213231134116976 org.apache.lucene.index.Term 12: 111789726829528 org.apache.lucene.index.FieldInfo 13:22534018027200 org.apache.lucene.index.SegmentTermEnum 15:58972714153448 org.apache.lucene.index.TermBuffer 21: 86033 8718504 [Lorg.apache.lucene.index.TermInfo; 20: 86033 8718504 [Lorg.apache.lucene.index.Term; 23: 86120 7578560 org.apache.lucene.index.SegmentReader 26: 90501 5068056 org.apache.lucene.store.FSIndexInput 27: 86120 4822720 org.apache.luc
ThreadLocal leak (was Re: Leaking org.apache.lucene.index.* objects)
Moving to java-dev, I think this belongs here. I've been looking at this problem some more today and reading about ThreadLocals. It's easy to misuse them and end up with memory leaks, apparently... and I think we may have this problem here. The problem here is that ThreadLocals are tied to Threads, and I think the assumption in TermInfosReader and SegmentReader is that (search) Threads are short-lived: they come in, scan the index, do the search, return and die. In this scenario, their ThreadLocals go to heaven with them, too, and memory is freed up. But when Threads are long-lived, as they are in thread pools (e.g. those in servlet containers), those ThreadLocals stay alive even after a single search request is done. Moreover, the Thread is reused, and the new TermInfosReader and SegmentReader put some new values in that ThreadLocal on top of the old values (I think) from the previous search request. Because the Thread still has references to ThreadLocals and the values in them, the values never get GCed. I tried making ThreadLocals in TIR and SR static, I tried wrapping values saved in TLs in WeakReference, I've tried using WeakHashMap like in Robert Engel's FixedThreadLocal class from LUCENE-436, but nothing helped. I thought about adding a public static method to TIR and SR, so one could call it at the end of a search request (think servlet filter) and clear the TL for the current thread, but that would require making TIR and SR public and I'm not 100% sure if it would work, plus that exposes the implementation details too much. I don't have a solution yet. But do we *really* need ThreadLocal in TIR and SR? The only thing that TL is doing there is acting as a per-thread storage of some cloned value (in TIR we clone SegmentTermEnum and in SR we clone TermVectorsReader). Why can't we just store those cloned values in instance variables? Isn't whoever is calling TIR and SR going to be calling the same instance of TIR and SR anyway, and thus get access to those cloned values? I'm really amazed that we haven't heard any reports about this before. I am not sure why my application started showing this leak only about 3 weeks ago. It is getting more pounded on than before, so maybe that made the leak more obvious. My guess is that more common Lucene usage is with a single index or a small number of them, and with short-lived threads, where this problem isn't easily visible. In my case I deal with a few tens of thousands of indices and several parallel search threads that live forever in the thread pool. Any thoughts about this or possible suggestions for a fix? Thanks, Otis - Original Message From: Otis Gospodnetic <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Friday, December 15, 2006 12:28:29 PM Subject: Leaking org.apache.lucene.index.* objects Hi, About 2-3 weeks ago I emailed about a memory leak in my application. I then found some problems in my code (I wasn't closing IndexSearchers explicitly) and took care of those. Now I see my app is still leaking memory - jconsole clearly shows the "Tenured Gen" memory pool getting filled up until I hit the OOM, but I can't seem to pin-point the source. I found that a bunch or o.a.l.index.* objects are not getting GCed, even though they should. For example: $ jmap -histo:live 7825 | grep apache.lucene.index | head -20 | sort -k2 -nr num #instances#bytes class name -- 4: 176484098831040 org.apache.lucene.index.CompoundFileReader$CSIndexInput 5: 211921567814880 org.apache.lucene.index.TermInfo 7: 111245935598688 org.apache.lucene.index.SegmentReader$Norm 9: 213231134116976 org.apache.lucene.index.Term 12: 111789726829528 org.apache.lucene.index.FieldInfo 13:22534018027200 org.apache.lucene.index.SegmentTermEnum 15:58972714153448 org.apache.lucene.index.TermBuffer 21: 86033 8718504 [Lorg.apache.lucene.index.TermInfo; 20: 86033 8718504 [Lorg.apache.lucene.index.Term; 23: 86120 7578560 org.apache.lucene.index.SegmentReader 26: 90501 5068056 org.apache.lucene.store.FSIndexInput 27: 86120 4822720 org.apache.lucene.index.TermInfosReader 33: 86130 3445200 org.apache.lucene.index.SegmentInfo 36: 87355 2795360 org.apache.lucene.store.FSIndexInput$Descriptor 38: 86120 2755840 org.apache.lucene.index.FieldsReader 39: 86050 2753600 org.apache.lucene.index.CompoundFileReader 42: 46903 2251344 org.apache.lucene.index.SegmentInfos 43: 93778 2250672 org.apache.lucene.search.FieldCacheImpl$Entry 45: 93778 1500448 org.apache.lucene.search.FieldCacheImpl$CreationPlaceholder 47: 86510 1384160 org.apache.lucene.index.FieldInfos I'm running my app in search-only mode - no adds or deletes. The counts of these objects just keeps going up, even though I am explicitly closing