search performance
Greetings Despite following all the recommended optimizations (as described at http://wiki.apache.org/lucene-java/ImproveSearchingSpeed) , in some of our installations, search performance has reached the point where is it unacceptably slow. For instance, in one environment, the total index size is 200GB, with 150 million documents indexed. With NRT enabled, search speed is roughly 5 minutes on average. The server resources are: 2x6 Core Intel CPU, 128GB, 2 SSD for index and RAID 0, with Linux. The only thing we haven't yet done, is to upgrade Lucene from 4.7.x to 4.8.x. Is this likely to make any noticeable difference in performance? Clearly, longer term, we need to move to a distributed search model. We thought to take advantage of the distributed search features offered in Solr, however, our solution is very tightly integrated into Lucene directly (since Solr didn't exist when we started out). Moving to Solr now seems like a daunting prospect. We've also following the Katta project with interest, but it doesn't appear support distributed indexing, and development on it seems to have stalled. It would be nice if there were a distributed search project on the Lucene level that we could use. I realize this is a rather vague question, but are there any further suggestions on ways to improve search performance? We need cheap and dirty ideas, as well as longer term advice on a possible path forward. Much appreciate Jamie - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: remapping docIds in a read only offline built index
Hello, I'm still interested in having the answer to the following question : In a 1-segment read-only index (that is built offline once and then frozen), is it possible to remap the docIds ? I may have a (working but not optimal) answer to my original problem : I may use a MultiReader and 3 index to get the following composite index docId : document - 1 : bookA 2 : bookB M: linkA M+1 : linkB ... N+1 : sentenceA N+2 : sentenceB ... 30 :sentenceZZZ This solution should be slower that if I only built 1 index while having the docId equal to the order in which I added the documents. On 05/12/2014 06:01 PM, Olivier Binda wrote: In a 1-segment (parallel) read-only index, that is built offline once (and then frozen), is it possible to remap the docIds as the last step (i.e... to have the exact same index, except that the docIds are all equal to the ord the docs where added to the index) ? Say I have the read only index docId : document 1 : bookB 2 : sentenceB 3 : linkA 4 : linkC 5 : sentenceC 6 : sentenceA 7 : bookA ... 30 : linkD I would like to have instead the read-only index docId : document 1 : bookA 2 : bookB M : linkA M+1: linkB ... N+1 : sentenceA N+2 : sentenceB ... 30:sentenceZZZ This would allow me to reduce the amount of ram to cache the type of each document - without remapping, I need at least log2(types)* documents bits here 2 * 30 bits - with remapping, I need only to remember ints M and N Also, if I need to cache 1 byte of metadata for each book - without remapping, I would need 1 byte * documents here 30 bytes - with remapping, I would only need 1 byte * books here M - 1 bytes I tried building such an index with LogMergePolicy/NoMergePolicy/extending the ram buffer but (maybee I did something wrong), the docIds were always reshuffled (maybee because my index was big and I was over a threshold) Best regards, Olivier - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: remapping docIds in a read only offline built index
The index sorting APIs (in lucene/misc) can do this. E.g. you could make a SortingAtomicReader, with your sort criteria, then use addIndexes(IR[]) to add it to a new index. That resulting index would have 1 segment and the docIDs would be in your order. Mike McCandless http://blog.mikemccandless.com On Mon, May 12, 2014 at 12:01 PM, Olivier Binda olivier.bi...@wanadoo.fr wrote: In a 1-segment (parallel) read-only index, that is built offline once (and then frozen), is it possible to remap the docIds as the last step (i.e... to have the exact same index, except that the docIds are all equal to the ord the docs where added to the index) ? Say I have the read only index docId : document 1 : bookB 2 : sentenceB 3 : linkA 4 : linkC 5 : sentenceC 6 : sentenceA 7 : bookA ... 30 : linkD I would like to have instead the read-only index docId : document 1 : bookA 2 : bookB M : linkA M+1: linkB ... N+1 : sentenceA N+2 : sentenceB ... 30:sentenceZZZ This would allow me to reduce the amount of ram to cache the type of each document - without remapping, I need at least log2(types)* documents bits here 2 * 30 bits - with remapping, I need only to remember ints M and N Also, if I need to cache 1 byte of metadata for each book - without remapping, I would need 1 byte * documents here 30 bytes - with remapping, I would only need 1 byte * books here M - 1 bytes I tried building such an index with LogMergePolicy/NoMergePolicy/extending the ram buffer but (maybee I did something wrong), the docIds were always reshuffled (maybee because my index was big and I was over a threshold) Best regards, Olivier - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: remapping docIds in a read only offline built index
Very nice ! That is exactly what I needed. Thank you very much ! On 06/02/2014 09:26 AM, Michael McCandless wrote: The index sorting APIs (in lucene/misc) can do this. E.g. you could make a SortingAtomicReader, with your sort criteria, then use addIndexes(IR[]) to add it to a new index. That resulting index would have 1 segment and the docIDs would be in your order. Mike McCandless http://blog.mikemccandless.com On Mon, May 12, 2014 at 12:01 PM, Olivier Binda olivier.bi...@wanadoo.fr wrote: In a 1-segment (parallel) read-only index, that is built offline once (and then frozen), is it possible to remap the docIds as the last step (i.e... to have the exact same index, except that the docIds are all equal to the ord the docs where added to the index) ? Say I have the read only index docId : document 1 : bookB 2 : sentenceB 3 : linkA 4 : linkC 5 : sentenceC 6 : sentenceA 7 : bookA ... 30 : linkD I would like to have instead the read-only index docId : document 1 : bookA 2 : bookB M : linkA M+1: linkB ... N+1 : sentenceA N+2 : sentenceB ... 30:sentenceZZZ This would allow me to reduce the amount of ram to cache the type of each document - without remapping, I need at least log2(types)* documents bits here 2 * 30 bits - with remapping, I need only to remember ints M and N Also, if I need to cache 1 byte of metadata for each book - without remapping, I would need 1 byte * documents here 30 bytes - with remapping, I would only need 1 byte * books here M - 1 bytes I tried building such an index with LogMergePolicy/NoMergePolicy/extending the ram buffer but (maybee I did something wrong), the docIds were always reshuffled (maybee because my index was big and I was over a threshold) Best regards, Olivier - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: search performance
What kind of queries are you pushing into the index. Do they match a lot of documents ? Do you do any sorting on the result set? What is the average document size ? Do you have a lot of update traffic ? What kind of schema does your index use ? On Mon, Jun 2, 2014 at 6:51 AM, Jamie ja...@mailarchiva.com wrote: Greetings Despite following all the recommended optimizations (as described at http://wiki.apache.org/lucene-java/ImproveSearchingSpeed) , in some of our installations, search performance has reached the point where is it unacceptably slow. For instance, in one environment, the total index size is 200GB, with 150 million documents indexed. With NRT enabled, search speed is roughly 5 minutes on average. The server resources are: 2x6 Core Intel CPU, 128GB, 2 SSD for index and RAID 0, with Linux. The only thing we haven't yet done, is to upgrade Lucene from 4.7.x to 4.8.x. Is this likely to make any noticeable difference in performance? Clearly, longer term, we need to move to a distributed search model. We thought to take advantage of the distributed search features offered in Solr, however, our solution is very tightly integrated into Lucene directly (since Solr didn't exist when we started out). Moving to Solr now seems like a daunting prospect. We've also following the Katta project with interest, but it doesn't appear support distributed indexing, and development on it seems to have stalled. It would be nice if there were a distributed search project on the Lucene level that we could use. I realize this is a rather vague question, but are there any further suggestions on ways to improve search performance? We need cheap and dirty ideas, as well as longer term advice on a possible path forward. Much appreciate Jamie - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: search performance
Tom Thanks for the offer of assistance. On 2014/06/02, 12:02 PM, Tincu Gabriel wrote: What kind of queries are you pushing into the index. We are indexing regular emails + attachments. Typical query is something like: filter: to:mbox08 from:mbox08 cc:mbox08 bcc:mbox08 deliveredto:mbox08 sender:mbox08 recipient:mbox08 combined with filter query cat:email We also use range queries based on date. Do they match a lot of documents ? Yes, although we are using a collector... TopFieldCollector fieldCollector = TopFieldCollector.create(sort, max,true, false, false, true); We use pagination, so only returning 1000 documents or so at a time. Do you do any sorting on the result set? Yes What is the average document size ? approx 100KB, We are indexing email body + attachment content. Do you have a lot of update traffic ? Yes we have alot of update traffic, particularly in the environment i referred to. Is there a way to prioritize searching as apposed to update? I suppose we could block all indexing while searching is on the go? Is there such as option in Lucene, or should we implement this? What kind of schema does your index use ? Not sure exactly what you are referring to here. We do have alot of stored fields (to, from bcc, cc, etc.). The body and attachments are analyzed. Regards Jamie - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: search performance
Do you have enough system memory to fit the entire index in OS system memory so that the OS can fully cache it instead of thrashing with I/O? Do you see a lot of I/O or are the queries compute-bound? You said you have a 128GB machine, so that sounds small for your index. Have you tried a 256GB machine? How frequent are your commits for updates while doing queries? -- Jack Krupansky -Original Message- From: Jamie Sent: Monday, June 2, 2014 2:51 AM To: java-user@lucene.apache.org Subject: search performance Greetings Despite following all the recommended optimizations (as described at http://wiki.apache.org/lucene-java/ImproveSearchingSpeed) , in some of our installations, search performance has reached the point where is it unacceptably slow. For instance, in one environment, the total index size is 200GB, with 150 million documents indexed. With NRT enabled, search speed is roughly 5 minutes on average. The server resources are: 2x6 Core Intel CPU, 128GB, 2 SSD for index and RAID 0, with Linux. The only thing we haven't yet done, is to upgrade Lucene from 4.7.x to 4.8.x. Is this likely to make any noticeable difference in performance? Clearly, longer term, we need to move to a distributed search model. We thought to take advantage of the distributed search features offered in Solr, however, our solution is very tightly integrated into Lucene directly (since Solr didn't exist when we started out). Moving to Solr now seems like a daunting prospect. We've also following the Katta project with interest, but it doesn't appear support distributed indexing, and development on it seems to have stalled. It would be nice if there were a distributed search project on the Lucene level that we could use. I realize this is a rather vague question, but are there any further suggestions on ways to improve search performance? We need cheap and dirty ideas, as well as longer term advice on a possible path forward. Much appreciate Jamie - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: search performance
Jack First off, thanks for applying your mind to our performance problem. On 2014/06/02, 1:34 PM, Jack Krupansky wrote: Do you have enough system memory to fit the entire index in OS system memory so that the OS can fully cache it instead of thrashing with I/O? Do you see a lot of I/O or are the queries compute-bound? Nice idea. The index is 200GB, the machine currently has 128GB RAM. We are using SSDs, but disappointingly, installing them didn't reduce search times to acceptable levels. I'll have to check your last question regarding I/O... I assume it is I/O bound, though will double check... Currently, we are using fsDirectory = new NRTCachingDirectory(fsDir, 5.0, 60.0); Are you proposing we increase maxCachedMB or use the RAMDirectory? With the latter, we will still need to persistent the index data to disk, as it is undergoing constant updates. You said you have a 128GB machine, so that sounds small for your index. Have you tried a 256GB machine? Nope..didn't think it would make much of a different. I suppose, assuming we could store the entire index in RAM it would be helpful. How does one do this with Lucene, while still persisting the data? How frequent are your commits for updates while doing queries? Around ten to fifteen documents are being constantly added per second. Thank again Jamie - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: MultiReader docid reliability
Hi Erick, the good reason for now is caching, we use them to store the results in cache, and I wanted a better explanation of ephemeral do understand the possible life of the cache. From the answers, ephemeral can be related to the opening of the indexreader (in general for precaution) and all kind of modifications to the index can be another interpretation. Than it's not necessary, was just a matter of better understanding the javadoc; I see the javadoc is the same for all the IndexReader than I presume there are no differences from the various implementations. nicola. On Fri, 2014-05-30 at 12:50 -0700, Erick Erickson wrote: If you do an optimize, btw, the internal doc IDs may change. But _why_ do you want to keep them? You may have very good reasons, but it's not clear that this is necessary/desirable from what you've said so far... Best, Erick On Fri, May 30, 2014 at 7:49 AM, Nicola Buso nb...@ebi.ac.uk wrote: Hi, thanks Michael and Alan. Is enough to know that re-opening the index there is no guarantee that the docids are maintained also if the index does not change. And I will try the question also on the Solr mailinglist. nicola. On Fri, 2014-05-30 at 10:41 -0400, Michael Sokolov wrote: There is a Solr document cache that holds field values too, see: http://wiki.apache.org/solr/SolrCaching Maybe take this question over to the solr mailing list? -Mike On 5/30/2014 10:32 AM, Alan Woodward wrote: Solr caches hold lucene docids, which are invalidated every time a new searcher is opened. The various fields for a response aren't cached as far as I know, they're reloaded on each request. But loading the fields for 10 documents is typically very fast, compared to searching over a very large collection. Alan Woodward www.flax.co.uk On 30 May 2014, at 11:20, Nicola Buso wrote: Hi Alan, just to make it more typical (yes there are not IndexWriters open on that indexes) how solr is caching results? the first thing I would like to do is to store the docs ids and return to the reader for the real content. Is solr storing the whole results with all values? nicola. On Fri, 2014-05-30 at 11:05 +0100, Alan Woodward wrote: If the index is truly unchanging (ie there's no IndexWriter open on it) then I guess the document numbers will be stable across reopens. But this is a pretty specialized situation, and the docs are really there to warn you off trying to rely on this for more typical uses. Alan Woodward www.flax.co.uk On 30 May 2014, at 10:39, Nicola Buso wrote: Hi Alan, thanks a lot for the reply. For what I understood from your reply if the index is not changing (no adds, deletes even updates) the docs id viewed by the MultiReader will not change if you open more times that unchanged index also in different environments. If this is true (my understanding) the word ephemeral in the API could be elaborated a bit more. nicola On Fri, 2014-05-30 at 09:26 +0100, Alan Woodward wrote: Hi Nicola, 1) A session here means as long as you have that MultiReader open. IndexReaders see a snapshot of the index and so document ids shouldn't change over the lifetime of an IndexReader, even if the index is being updated. 2) MultiReader just takes an array of subindexes, so as long as the subindexes are passed to the MultiReader constructor in the same order on both machines, the docBase assigned to each reader context should be the same. Alan Woodward www.flax.co.uk On 29 May 2014, at 14:29, Nicola Buso wrote: Hi, from the javadocs: For efficiency, in this API documents are often referred to via document numbers, non-negative integers which each name a
Re: search performance
MMapDirectory will do the job for you. RamDirectory has a big warning in the class description stating that the performance will get killed by an index larger than a few hundred MB, and NRTCachingDirectory is a wrapper for RamDirectory and suitable for low update rates. MMap will use the system RAM to cache as much of the index it can and only hit disk when the portion of the index you're trying to access isn't cached. I'd put my money on switching directory implementations and see what kind of performance gains that brings to the table. On Mon, Jun 2, 2014 at 11:50 AM, Jamie ja...@mailarchiva.com wrote: Jack First off, thanks for applying your mind to our performance problem. On 2014/06/02, 1:34 PM, Jack Krupansky wrote: Do you have enough system memory to fit the entire index in OS system memory so that the OS can fully cache it instead of thrashing with I/O? Do you see a lot of I/O or are the queries compute-bound? Nice idea. The index is 200GB, the machine currently has 128GB RAM. We are using SSDs, but disappointingly, installing them didn't reduce search times to acceptable levels. I'll have to check your last question regarding I/O... I assume it is I/O bound, though will double check... Currently, we are using fsDirectory = new NRTCachingDirectory(fsDir, 5.0, 60.0); Are you proposing we increase maxCachedMB or use the RAMDirectory? With the latter, we will still need to persistent the index data to disk, as it is undergoing constant updates. You said you have a 128GB machine, so that sounds small for your index. Have you tried a 256GB machine? Nope..didn't think it would make much of a different. I suppose, assuming we could store the entire index in RAM it would be helpful. How does one do this with Lucene, while still persisting the data? How frequent are your commits for updates while doing queries? Around ten to fifteen documents are being constantly added per second. Thank again Jamie - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: search performance
I was under the impression that NRTCachingDirectory will instantiate an MMapDirectory if a 64 bit platform is detected? Is this not the case? On 2014/06/02, 2:09 PM, Tincu Gabriel wrote: MMapDirectory will do the job for you. RamDirectory has a big warning in the class description stating that the performance will get killed by an index larger than a few hundred MB, and NRTCachingDirectory is a wrapper for RamDirectory and suitable for low update rates. MMap will use the system RAM to cache as much of the index it can and only hit disk when the portion of the index you're trying to access isn't cached. I'd put my money on switching directory implementations and see what kind of performance gains that brings to the table.
Re: search performance
My bad, It's using the RamDirectory as a cache and a delegate directory that you pass in the constructor to do the disk operations, limiting the use of the RamDirectory to files that fit a certain size. So i guess the underlying Directory implementation will be whatever you choose it to be. I'd still try using a MMapDirectory and see if that improves performance. Also, regarding the pagination, you said you're retrieving 1000 documents at a time. Does that mean that if a query matches 1 documents you want all of them retrieved ? On Mon, Jun 2, 2014 at 12:51 PM, Jamie ja...@mailarchiva.com wrote: I was under the impression that NRTCachingDirectory will instantiate an MMapDirectory if a 64 bit platform is detected? Is this not the case? On 2014/06/02, 2:09 PM, Tincu Gabriel wrote: MMapDirectory will do the job for you. RamDirectory has a big warning in the class description stating that the performance will get killed by an index larger than a few hundred MB, and NRTCachingDirectory is a wrapper for RamDirectory and suitable for low update rates. MMap will use the system RAM to cache as much of the index it can and only hit disk when the portion of the index you're trying to access isn't cached. I'd put my money on switching directory implementations and see what kind of performance gains that brings to the table.
Re: search performance
I assume you meant 1000 documents. Yes, the page size is in fact configurable. However, it only obtains the page size * 3. It preloads the following and previous page too. The point is, it only obtains the documents that are needed. On 2014/06/02, 3:03 PM, Tincu Gabriel wrote: My bad, It's using the RamDirectory as a cache and a delegate directory that you pass in the constructor to do the disk operations, limiting the use of the RamDirectory to files that fit a certain size. So i guess the underlying Directory implementation will be whatever you choose it to be. I'd still try using a MMapDirectory and see if that improves performance. Also, regarding the pagination, you said you're retrieving 1000 documents at a time. Does that mean that if a query matches 1 documents you want all of them retrieved ? On Mon, Jun 2, 2014 at 12:51 PM, Jamie ja...@mailarchiva.com wrote: I was under the impression that NRTCachingDirectory will instantiate an MMapDirectory if a 64 bit platform is detected? Is this not the case? On 2014/06/02, 2:09 PM, Tincu Gabriel wrote: MMapDirectory will do the job for you. RamDirectory has a big warning in the class description stating that the performance will get killed by an index larger than a few hundred MB, and NRTCachingDirectory is a wrapper for RamDirectory and suitable for low update rates. MMap will use the system RAM to cache as much of the index it can and only hit disk when the portion of the index you're trying to access isn't cached. I'd put my money on switching directory implementations and see what kind of performance gains that brings to the table. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: search performance
This is an interesting performance problem and I think there is probably not a single answer here, so I'll just layout the steps I would take to tackle this: 1. What is the variance of the query latency? You said the average is 5 minutes, but is it due to some really bad queries or most queries have the same perf? 2. We kind of assume that index size and number of docs is the issue here. Can you validate that assumption by trying to index with 10M, 50M, … docs and see how worse the performance is getting as a function of size? 3. What is the average doc hits for the bad queries? If you queries matches a lot of hits, scoring will be very expensive. While you only ask for 1000 top scored docs, Lucene still needs to score all the hits to get that 1000 docs. If this is the case, there could be some work around, but Iet's make sure that it's indeed the situation we are dealing with here. Hope this helps, Tri On Jun 01, 2014, at 11:50 PM, Jamie ja...@mailarchiva.com wrote: Greetings Despite following all the recommended optimizations (as described at http://wiki.apache.org/lucene-java/ImproveSearchingSpeed) , in some of our installations, search performance has reached the point where is it unacceptably slow. For instance, in one environment, the total index size is 200GB, with 150 million documents indexed. With NRT enabled, search speed is roughly 5 minutes on average. The server resources are: 2x6 Core Intel CPU, 128GB, 2 SSD for index and RAID 0, with Linux. The only thing we haven't yet done, is to upgrade Lucene from 4.7.x to 4.8.x. Is this likely to make any noticeable difference in performance? Clearly, longer term, we need to move to a distributed search model. We thought to take advantage of the distributed search features offered in Solr, however, our solution is very tightly integrated into Lucene directly (since Solr didn't exist when we started out). Moving to Solr now seems like a daunting prospect. We've also following the Katta project with interest, but it doesn't appear support distributed indexing, and development on it seems to have stalled. It would be nice if there were a distributed search project on the Lucene level that we could use. I realize this is a rather vague question, but are there any further suggestions on ways to improve search performance? We need cheap and dirty ideas, as well as longer term advice on a possible path forward. Much appreciate Jamie - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Possible order violation in lucene library version 2.4.1
Hi, I am working on a research project on data race detection, and am using the DaCapo benchmarks for evaluation. I am using the benchmark lusearch from the 2009 suite, which uses lucene library 2.4.1. For one test case, I am monitoring a pair of accesses say, Lorg/apache/lucene/store/Directory;.init ()V:40(6) and Lorg/apache/lucene/store/FSDirectory;.close ()V:524(1). The format is class name.method name method desc:line(byte code index). During my work, I am getting AlreadyClosedExceptions on the FSDirectory from the ensureOpen() method for some threads, which I am guessing is probably due to an order violation. I have actually introduced delays in my instrumentation which delays threads that execute the code in Lorg/apache/lucene/store/FSDirectory;.close ()V. This is causing the other query threads to throw an exception. Here is the exception trace: org.apache.lucene.store.AlreadyClosedException: this Directory is closed at org.apache.lucene.store.Directory.ensureOpen(Directory.java:220) at org.apache.lucene.store.FSDirectory.list(FSDirectory.java:320) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:533) at org.apache.lucene.index.DirectoryIndexReader.open(DirectoryIndexReader.java:115) at org.apache.lucene.index.IndexReader.open(IndexReader.java:316) at org.apache.lucene.index.IndexReader.open(IndexReader.java:206) at org.dacapo.lusearch.Search$QueryProcessor.init(Search.java:207) at org.dacapo.lusearch.Search$QueryThread.run(Search.java:179) org.apache.lucene.store.AlreadyClosedException: this Directory is closed at org.apache.lucene.store.Directory.ensureOpen(Directory.java:220) at org.apache.lucene.store.FSDirectory.list(FSDirectory.java:320) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:533) at org.apache.lucene.index.DirectoryIndexReader.open(DirectoryIndexReader.java:115) at org.apache.lucene.index.IndexReader.open(IndexReader.java:316) at org.apache.lucene.index.IndexReader.open(IndexReader.java:206) at org.dacapo.lusearch.Search$QueryProcessor.init(Search.java:207) at org.dacapo.lusearch.Search$QueryThread.run(Search.java:179) java.lang.NullPointerException at org.dacapo.lusearch.Search$QueryProcessor.run(Search.java:226) at org.dacapo.lusearch.Search$QueryThread.run(Search.java:179) java.lang.NullPointerException at org.dacapo.lusearch.Search$QueryProcessor.run(Search.java:226) at org.dacapo.lusearch.Search$QueryThread.run(Search.java:179) org.apache.lucene.store.AlreadyClosedException: this Directory is closed at org.apache.lucene.store.Directory.ensureOpen(Directory.java:220) at org.apache.lucene.store.FSDirectory.list(FSDirectory.java:320) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:533) at org.apache.lucene.index.DirectoryIndexReader.open(DirectoryIndexReader.java:115) at org.apache.lucene.index.IndexReader.open(IndexReader.java:316) at org.apache.lucene.index.IndexReader.open(IndexReader.java:206) at org.dacapo.lusearch.Search$QueryProcessor.init(Search.java:207) at org.dacapo.lusearch.Search$QueryThread.run(Search.java:179) java.lang.NullPointerException at org.dacapo.lusearch.Search$QueryProcessor.run(Search.java:226) at org.dacapo.lusearch.Search$QueryThread.run(Search.java:179) It would help if someone can give suggestions as to what can be going wrong here. I have verified that the issue isn't probably in my instrumentation. Just simply instrumenting the lucene source locations with sleeps() also reproduces the error. --Regards, Swarnendu Biswas.