Re: DefaultIndexAccessor
Mark, We deployed our indexer (using defaultIndexAccessor) on one of the production site and getting this error, Caused by: java.util.concurrent.RejectedExecutionException at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.reject(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.execute(Unknown Source) at org.apache.lucene.indexaccessor.DefaultIndexAccessor.release(DefaultIndexAccessor.java:514) This is happening repeatedly every time the indexer runs. This is running your latest IndexAccessor-021508 code. Any ideas (it's kind of urgent for us)? Thanks, -vivek On Fri, Feb 15, 2008 at 6:50 PM, vivek sar [EMAIL PROTECTED] wrote: Mark, Thanks for the quick fix. Actually, it is possible that there might had been simultaneous queries using the MultiSearcher. I assumed it was thread-safe, thus was re-using the same instance. I'll update my application code as well. Thanks, -vivek On Feb 15, 2008 5:56 PM, Mark Miller [EMAIL PROTECTED] wrote: Here is the fix: https://issues.apache.org/jira/browse/LUCENE-1026 vivek sar wrote: Mark, There seems to be some issue with DefaultMultiIndexAccessor.java. I got following NPE exception, 2008-02-13 07:10:28,021 ERROR [http-7501-Processor6] ReportServiceImpl - java.lang.NullPointerException at org.apache.lucene.indexaccessor.DefaultMultiIndexAccessor.release(DefaultMultiIndexAccessor.java:89) Looks like the IndexAccessor for one of the Searcher in the MultiSearcher returned null. Not sure how is that possible, any ideas how is that possible? In my case it caused a critical error as the writer thread was stuck forever (we found out after couple of days) because of this, PS thread 9 prio=1 tid=0x2aac70eb95d0 nid=0x6ba in Object.wait() [0x47533000..0x47533b80] at java.lang.Object.wait(Native Method) - waiting on 0x2aab3e5c7700 (a org.apache.lucene.indexaccessor.DefaultIndexAccessor) at java.lang.Object.wait(Unknown Source) at org.apache.lucene.indexaccessor.DefaultIndexAccessor.waitForReadersAndCloseCached(DefaultIndexAccessor.java:593) at org.apache.lucene.indexaccessor.DefaultIndexAccessor.release(DefaultIndexAccessor.java:510) - locked 0x2aab3e5c7700 (a org.apache.lucene.indexaccessor.DefaultIndexAccessor) The only way to recover was to re-start the application. I use both MultiSearcher and IndexSearcher in my application, I've looked at your code but not able to pinpoint how can it go wrong? Of course, you do have to check for null in the MultiIndexAccessor.release, but how could you get null index accessor at first place? I do call IndexAccessor.close during partitioning of indexes, but the close should wait for all Searchers to close before doing anything. Do you have any updates to your code since 02/04/2008? Thanks, -vivek On Feb 6, 2008 8:37 AM, Jay [EMAIL PROTECTED] wrote: Thanks for your clarifications, Mark! Jay Mark Miller wrote: 5. Although currently IndexSearcher.close() does almost nothing except to close the internal index reader, it might be a safer to close searcher itself as well in closeCachedSearcher(), just in case, the searcher may have other resources to release in the future version of Lucene. Didn't catch that as well. You are right, great idea Jay, thanks. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: DefaultIndexAccessor
Mark, Some more information, 1) I run indexwriter every 5 mins 2) After every cycle I check if I need to partition (based on the index size) 3) In the partition interface, a) I first call close on the index accessor (so all the searchers can close before I move that index) accessor = IndexAccessorFactory.getInstance().getAccessor(dir.getFile()); accessor.close(); b) Then I re-open the index accessor, accessor = indexFactory.getAccessor(dir.getFile()); accessor.open(); c) I optimized the my indexes using the Index Writer (that I get from the accessor). masterWriter = this.indexAccessor.getWriter(false); masterWriter.optimize(optimizeSegment); d) Once the optimization is done I release the masterWriter, this.indexAccessor.release(masterWriter); Now here is where I get the RejectedExecutionException. Reading up little more on this exception, http://pveentjer.wordpress.com/2008/02/06/are-you-dealing-with-the-rejectedexecutionexception/, I see this might be happening because something got stuck during the close cycle, so the ExecutorSerivce is not accepting any new tasks. I'm not sure how would this happen. The critical problem is once I get this exception, every release call throws the same exception (looks like shutdown never gets done). Because of this my readers are never refreshed and I can not read any new indexes. May be I've to check whether the accessor is completely closed before re-opening? Could you in your release check whether the pool (ExecutorService) is in shutdown state? Any thing else I can check? Thanks, -vivek On Thu, Feb 28, 2008 at 1:26 PM, vivek sar [EMAIL PROTECTED] wrote: Mark, We deployed our indexer (using defaultIndexAccessor) on one of the production site and getting this error, Caused by: java.util.concurrent.RejectedExecutionException at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.reject(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.execute(Unknown Source) at org.apache.lucene.indexaccessor.DefaultIndexAccessor.release(DefaultIndexAccessor.java:514) This is happening repeatedly every time the indexer runs. This is running your latest IndexAccessor-021508 code. Any ideas (it's kind of urgent for us)? Thanks, -vivek On Fri, Feb 15, 2008 at 6:50 PM, vivek sar [EMAIL PROTECTED] wrote: Mark, Thanks for the quick fix. Actually, it is possible that there might had been simultaneous queries using the MultiSearcher. I assumed it was thread-safe, thus was re-using the same instance. I'll update my application code as well. Thanks, -vivek On Feb 15, 2008 5:56 PM, Mark Miller [EMAIL PROTECTED] wrote: Here is the fix: https://issues.apache.org/jira/browse/LUCENE-1026 vivek sar wrote: Mark, There seems to be some issue with DefaultMultiIndexAccessor.java. I got following NPE exception, 2008-02-13 07:10:28,021 ERROR [http-7501-Processor6] ReportServiceImpl - java.lang.NullPointerException at org.apache.lucene.indexaccessor.DefaultMultiIndexAccessor.release(DefaultMultiIndexAccessor.java:89) Looks like the IndexAccessor for one of the Searcher in the MultiSearcher returned null. Not sure how is that possible, any ideas how is that possible? In my case it caused a critical error as the writer thread was stuck forever (we found out after couple of days) because of this, PS thread 9 prio=1 tid=0x2aac70eb95d0 nid=0x6ba in Object.wait() [0x47533000..0x47533b80] at java.lang.Object.wait(Native Method) - waiting on 0x2aab3e5c7700 (a org.apache.lucene.indexaccessor.DefaultIndexAccessor) at java.lang.Object.wait(Unknown Source) at org.apache.lucene.indexaccessor.DefaultIndexAccessor.waitForReadersAndCloseCached(DefaultIndexAccessor.java:593) at org.apache.lucene.indexaccessor.DefaultIndexAccessor.release(DefaultIndexAccessor.java:510) - locked 0x2aab3e5c7700 (a org.apache.lucene.indexaccessor.DefaultIndexAccessor) The only way to recover was to re-start the application. I use both MultiSearcher and IndexSearcher in my application, I've looked at your code but not able to pinpoint how can it go wrong? Of course, you do have to check for null in the MultiIndexAccessor.release, but how could you get null index accessor at first place? I do call IndexAccessor.close during partitioning of indexes,
Re: DefaultIndexAccessor
Hey vivek, Sorry you ran into this. I believe the problem is that I had just not foreseen the use case of closing and then reopening the Accessor. The only time I ever close the Accessors is when I am shutting down the JVM. What do you do about all of the IndexAccessor requests while it is in a closed state? Could their be a better way of accomplishing this without closing the Accessor? Would a new method that just stalled everything be better? Then you wouldn't have to recreate any resources possibly? In any case, the problem is that after the Executor gets shutdown it is not reopened in the open method. I can certainly change this, but I need to look for any other issues as well. I will add an open after a shutdown test to investigate. I am going to think about the issue further and I will get back to you soon. Thanks for all of the details. - Mark vivek sar wrote: Mark, Some more information, 1) I run indexwriter every 5 mins 2) After every cycle I check if I need to partition (based on the index size) 3) In the partition interface, a) I first call close on the index accessor (so all the searchers can close before I move that index) accessor = IndexAccessorFactory.getInstance().getAccessor(dir.getFile()); accessor.close(); b) Then I re-open the index accessor, accessor = indexFactory.getAccessor(dir.getFile()); accessor.open(); c) I optimized the my indexes using the Index Writer (that I get from the accessor). masterWriter = this.indexAccessor.getWriter(false); masterWriter.optimize(optimizeSegment); d) Once the optimization is done I release the masterWriter, this.indexAccessor.release(masterWriter); Now here is where I get the RejectedExecutionException. Reading up little more on this exception, http://pveentjer.wordpress.com/2008/02/06/are-you-dealing-with-the-rejectedexecutionexception/, I see this might be happening because something got stuck during the close cycle, so the ExecutorSerivce is not accepting any new tasks. I'm not sure how would this happen. The critical problem is once I get this exception, every release call throws the same exception (looks like shutdown never gets done). Because of this my readers are never refreshed and I can not read any new indexes. May be I've to check whether the accessor is completely closed before re-opening? Could you in your release check whether the pool (ExecutorService) is in shutdown state? Any thing else I can check? Thanks, -vivek On Thu, Feb 28, 2008 at 1:26 PM, vivek sar [EMAIL PROTECTED] wrote: Mark, We deployed our indexer (using defaultIndexAccessor) on one of the production site and getting this error, Caused by: java.util.concurrent.RejectedExecutionException at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.reject(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.execute(Unknown Source) at org.apache.lucene.indexaccessor.DefaultIndexAccessor.release(DefaultIndexAccessor.java:514) This is happening repeatedly every time the indexer runs. This is running your latest IndexAccessor-021508 code. Any ideas (it's kind of urgent for us)? Thanks, -vivek On Fri, Feb 15, 2008 at 6:50 PM, vivek sar [EMAIL PROTECTED] wrote: Mark, Thanks for the quick fix. Actually, it is possible that there might had been simultaneous queries using the MultiSearcher. I assumed it was thread-safe, thus was re-using the same instance. I'll update my application code as well. Thanks, -vivek On Feb 15, 2008 5:56 PM, Mark Miller [EMAIL PROTECTED] wrote: Here is the fix: https://issues.apache.org/jira/browse/LUCENE-1026 vivek sar wrote: Mark, There seems to be some issue with DefaultMultiIndexAccessor.java. I got following NPE exception, 2008-02-13 07:10:28,021 ERROR [http-7501-Processor6] ReportServiceImpl - java.lang.NullPointerException at org.apache.lucene.indexaccessor.DefaultMultiIndexAccessor.release(DefaultMultiIndexAccessor.java:89) Looks like the IndexAccessor for one of the Searcher in the MultiSearcher returned null. Not sure how is that possible, any ideas how is that possible? In my case it caused a critical error as the writer thread was stuck forever (we found out after couple of days) because of this, PS thread 9 prio=1 tid=0x2aac70eb95d0 nid=0x6ba in Object.wait() [0x47533000..0x47533b80] at java.lang.Object.wait(Native Method) - waiting on 0x2aab3e5c7700 (a org.apache.lucene.indexaccessor.DefaultIndexAccessor)
Re: DefaultIndexAccessor
Mark, Just for my clarification, 1) Would you have indexStop and indexStart methods? If that's the case then I don't have to call close() at all. These new methods would serve as just cleaning up the caches and not closing the thread pool. I would prefer not to call close() and init() again if possible. The reason we have to do partition is because our index size grows over 50G a week and then optimization takes hours. I'd a thread going on this topic in the mailing list, http://www.gossamer-threads.com/lists/lucene/java-user/57366?search_string=partition;#57366. Thanks, -vivek On Thu, Feb 28, 2008 at 5:01 PM, Mark Miller [EMAIL PROTECTED] wrote: I added the Thread Pool recently, so things did probably work before that. I am certainly willing to put the Thread Pool init in the open call instead of the constructor. As for the best method to use, I was thinking of something along the same lines as what you suggest. One of the decisions will be how to handle shutting down method calls on the Accessor. Throw an Exception or block? In any case, I will put up code that makes the above change and your code should work as it did. I'll be sure to add this to the test cases. Just as a personal interest question, what has led you to setup your index this way? Adding partitions as it grows that is. - Mark vivek sar wrote: Mark, Yes, I think that's what precisely is happening. I call accessor.close, which shuts down all the ExecutorService. I was assuming the accessor.open would re-open it (I think that's how it worked in older version of your IndexAccessor). Basically, I need a way to stop (or close) all the IndexSearchers for a specific IndexAccessor and do not allow them to re-open until I flag the indexAccessor that it's safe to give out new index searchers. So I am able to optimize the index, rename it and move it to somewhere else during partitioning. Right now without closing the searchers I can not rename the index as it wouldn't allow me to if some other thread has a file handle to that index. I don't know if there is a way to get an exclusive writer thread to an index using IndexAccessor. I would think a better way for me would be to, 1) Call a method on IndexAccessor, let's say stopIndex() - This would clear all the caches (stop all the open searchers, readers and writers) and flag the index accessor so no other reader or writer thread can be taken from this index accessor 2) I use my own (not using IndexAccessor) IndexWriter to do optimization on the index that needs to be partitioned and release it 3) Once done with partition, I call another method on IndexAccessor, let's say startIndex() - This will simply flag so now the IndexAccessor would allow to get searchers, readers and writers. The start would have to reopen all the searchers and readers. Not sure if this is a good design for what I am trying to do. This would require two new methods on IndexAccessor - stopIndex() and startIndex(). Any thoughts? Thanks, -vivek On Thu, Feb 28, 2008 at 3:55 PM, Mark Miller [EMAIL PROTECTED] wrote: Hey vivek, Sorry you ran into this. I believe the problem is that I had just not foreseen the use case of closing and then reopening the Accessor. The only time I ever close the Accessors is when I am shutting down the JVM. What do you do about all of the IndexAccessor requests while it is in a closed state? Could their be a better way of accomplishing this without closing the Accessor? Would a new method that just stalled everything be better? Then you wouldn't have to recreate any resources possibly? In any case, the problem is that after the Executor gets shutdown it is not reopened in the open method. I can certainly change this, but I need to look for any other issues as well. I will add an open after a shutdown test to investigate. I am going to think about the issue further and I will get back to you soon. Thanks for all of the details. - Mark vivek sar wrote: Mark, Some more information, 1) I run indexwriter every 5 mins 2) After every cycle I check if I need to partition (based on the index size) 3) In the partition interface, a) I first call close on the index accessor (so all the searchers can close before I move that index) accessor = IndexAccessorFactory.getInstance().getAccessor(dir.getFile()); accessor.close(); b) Then I re-open the index accessor, accessor = indexFactory.getAccessor(dir.getFile()); accessor.open(); c) I optimized the my indexes using the Index Writer (that I get from the
Re: DefaultIndexAccessor
vivek sar wrote: Mark, Just for my clarification, 1) Would you have indexStop and indexStart methods? If that's the case then I don't have to call close() at all. These new methods would serve as just cleaning up the caches and not closing the thread pool. Yes. This is the approach I agree with. I am putting up new code that allows the close(), open() calls anyway though. There is nothing keeping it from working and it used to work so its a good idea to make it work again. It is also a quick fix for you. https://issues.apache.org/jira/browse/LUCENE-1026 I will be adding the new stop start calls quickly, but I don't want to rush it out. I would prefer not to call close() and init() again if possible. The reason we have to do partition is because our index size grows over 50G a week and then optimization takes hours. I'd a thread going on this topic in the mailing list, http://www.gossamer-threads.com/lists/lucene/java-user/57366?search_string=partition;#57366. Gotchya. A comment I have on that is that you might try keeping the mergefactor really low as well. This will keep searches faster, make optimization much faster (its amortized), and not slow down writes that much in my experience (since IndexAccessor drops the writes off (spawns new thread) anyway, slightly longer writes shouldnt be a big deal at all. I'd try out even as low as 2 or 3. I run some fairly large interactive indexes and the writes, even when blocking until the write is done, are pretty darn responsive. - Mark Thanks, -vivek On Thu, Feb 28, 2008 at 5:01 PM, Mark Miller [EMAIL PROTECTED] wrote: I added the Thread Pool recently, so things did probably work before that. I am certainly willing to put the Thread Pool init in the open call instead of the constructor. As for the best method to use, I was thinking of something along the same lines as what you suggest. One of the decisions will be how to handle shutting down method calls on the Accessor. Throw an Exception or block? In any case, I will put up code that makes the above change and your code should work as it did. I'll be sure to add this to the test cases. Just as a personal interest question, what has led you to setup your index this way? Adding partitions as it grows that is. - Mark vivek sar wrote: Mark, Yes, I think that's what precisely is happening. I call accessor.close, which shuts down all the ExecutorService. I was assuming the accessor.open would re-open it (I think that's how it worked in older version of your IndexAccessor). Basically, I need a way to stop (or close) all the IndexSearchers for a specific IndexAccessor and do not allow them to re-open until I flag the indexAccessor that it's safe to give out new index searchers. So I am able to optimize the index, rename it and move it to somewhere else during partitioning. Right now without closing the searchers I can not rename the index as it wouldn't allow me to if some other thread has a file handle to that index. I don't know if there is a way to get an exclusive writer thread to an index using IndexAccessor. I would think a better way for me would be to, 1) Call a method on IndexAccessor, let's say stopIndex() - This would clear all the caches (stop all the open searchers, readers and writers) and flag the index accessor so no other reader or writer thread can be taken from this index accessor 2) I use my own (not using IndexAccessor) IndexWriter to do optimization on the index that needs to be partitioned and release it 3) Once done with partition, I call another method on IndexAccessor, let's say startIndex() - This will simply flag so now the IndexAccessor would allow to get searchers, readers and writers. The start would have to reopen all the searchers and readers. Not sure if this is a good design for what I am trying to do. This would require two new methods on IndexAccessor - stopIndex() and startIndex(). Any thoughts? Thanks, -vivek On Thu, Feb 28, 2008 at 3:55 PM, Mark Miller [EMAIL PROTECTED] wrote: Hey vivek, Sorry you ran into this. I believe the problem is that I had just not foreseen the use case of closing and then reopening the Accessor. The only time I ever close the Accessors is when I am shutting down the JVM. What do you do about all of the IndexAccessor requests while it is in a closed state? Could their be a better way of accomplishing this without closing the Accessor? Would a new method that just stalled everything be better? Then you wouldn't have to recreate any resources possibly? In any case, the problem is that after the Executor gets shutdown it is not reopened in the open method. I can certainly change this, but I need to look for any other issues as well. I will add an open after a shutdown test to investigate. I am going to think about the issue further and I will get back to you
Re: DefaultIndexAccessor
Thanks Mark. I'll wait for your enhancements in IndexAccessor on the new methods. I use mergeFactor = 100. I've read about the merge factor and it's hard to balance both the read/write optimization. What's the number do you use? Thanks again. -vivek On Thu, Feb 28, 2008 at 7:14 PM, Mark Miller [EMAIL PROTECTED] wrote: vivek sar wrote: Mark, Just for my clarification, 1) Would you have indexStop and indexStart methods? If that's the case then I don't have to call close() at all. These new methods would serve as just cleaning up the caches and not closing the thread pool. Yes. This is the approach I agree with. I am putting up new code that allows the close(), open() calls anyway though. There is nothing keeping it from working and it used to work so its a good idea to make it work again. It is also a quick fix for you. https://issues.apache.org/jira/browse/LUCENE-1026 I will be adding the new stop start calls quickly, but I don't want to rush it out. I would prefer not to call close() and init() again if possible. The reason we have to do partition is because our index size grows over 50G a week and then optimization takes hours. I'd a thread going on this topic in the mailing list, http://www.gossamer-threads.com/lists/lucene/java-user/57366?search_string=partition;#57366. Gotchya. A comment I have on that is that you might try keeping the mergefactor really low as well. This will keep searches faster, make optimization much faster (its amortized), and not slow down writes that much in my experience (since IndexAccessor drops the writes off (spawns new thread) anyway, slightly longer writes shouldnt be a big deal at all. I'd try out even as low as 2 or 3. I run some fairly large interactive indexes and the writes, even when blocking until the write is done, are pretty darn responsive. - Mark Thanks, -vivek On Thu, Feb 28, 2008 at 5:01 PM, Mark Miller [EMAIL PROTECTED] wrote: I added the Thread Pool recently, so things did probably work before that. I am certainly willing to put the Thread Pool init in the open call instead of the constructor. As for the best method to use, I was thinking of something along the same lines as what you suggest. One of the decisions will be how to handle shutting down method calls on the Accessor. Throw an Exception or block? In any case, I will put up code that makes the above change and your code should work as it did. I'll be sure to add this to the test cases. Just as a personal interest question, what has led you to setup your index this way? Adding partitions as it grows that is. - Mark vivek sar wrote: Mark, Yes, I think that's what precisely is happening. I call accessor.close, which shuts down all the ExecutorService. I was assuming the accessor.open would re-open it (I think that's how it worked in older version of your IndexAccessor). Basically, I need a way to stop (or close) all the IndexSearchers for a specific IndexAccessor and do not allow them to re-open until I flag the indexAccessor that it's safe to give out new index searchers. So I am able to optimize the index, rename it and move it to somewhere else during partitioning. Right now without closing the searchers I can not rename the index as it wouldn't allow me to if some other thread has a file handle to that index. I don't know if there is a way to get an exclusive writer thread to an index using IndexAccessor. I would think a better way for me would be to, 1) Call a method on IndexAccessor, let's say stopIndex() - This would clear all the caches (stop all the open searchers, readers and writers) and flag the index accessor so no other reader or writer thread can be taken from this index accessor 2) I use my own (not using IndexAccessor) IndexWriter to do optimization on the index that needs to be partitioned and release it 3) Once done with partition, I call another method on IndexAccessor, let's say startIndex() - This will simply flag so now the IndexAccessor would allow to get searchers, readers and writers. The start would have to reopen all the searchers and readers. Not sure if this is a good design for what I am trying to do. This would require two new methods on IndexAccessor - stopIndex() and startIndex(). Any thoughts? Thanks, -vivek On Thu, Feb 28, 2008 at 3:55 PM, Mark Miller [EMAIL PROTECTED] wrote: Hey vivek, Sorry you ran into this. I believe the problem is that I had just not foreseen the use case of closing and then reopening the Accessor. The only time I ever close the Accessors is when I am shutting down the JVM. What do you do about
Re: DefaultIndexAccessor
Mark, There seems to be some issue with DefaultMultiIndexAccessor.java. I got following NPE exception, 2008-02-13 07:10:28,021 ERROR [http-7501-Processor6] ReportServiceImpl - java.lang.NullPointerException at org.apache.lucene.indexaccessor.DefaultMultiIndexAccessor.release(DefaultMultiIndexAccessor.java:89) Looks like the IndexAccessor for one of the Searcher in the MultiSearcher returned null. Not sure how is that possible, any ideas how is that possible? In my case it caused a critical error as the writer thread was stuck forever (we found out after couple of days) because of this, PS thread 9 prio=1 tid=0x2aac70eb95d0 nid=0x6ba in Object.wait() [0x47533000..0x47533b80] at java.lang.Object.wait(Native Method) - waiting on 0x2aab3e5c7700 (a org.apache.lucene.indexaccessor.DefaultIndexAccessor) at java.lang.Object.wait(Unknown Source) at org.apache.lucene.indexaccessor.DefaultIndexAccessor.waitForReadersAndCloseCached(DefaultIndexAccessor.java:593) at org.apache.lucene.indexaccessor.DefaultIndexAccessor.release(DefaultIndexAccessor.java:510) - locked 0x2aab3e5c7700 (a org.apache.lucene.indexaccessor.DefaultIndexAccessor) The only way to recover was to re-start the application. I use both MultiSearcher and IndexSearcher in my application, I've looked at your code but not able to pinpoint how can it go wrong? Of course, you do have to check for null in the MultiIndexAccessor.release, but how could you get null index accessor at first place? I do call IndexAccessor.close during partitioning of indexes, but the close should wait for all Searchers to close before doing anything. Do you have any updates to your code since 02/04/2008? Thanks, -vivek On Feb 6, 2008 8:37 AM, Jay [EMAIL PROTECTED] wrote: Thanks for your clarifications, Mark! Jay Mark Miller wrote: 5. Although currently IndexSearcher.close() does almost nothing except to close the internal index reader, it might be a safer to close searcher itself as well in closeCachedSearcher(), just in case, the searcher may have other resources to release in the future version of Lucene. Didn't catch that as well. You are right, great idea Jay, thanks. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: DefaultIndexAccessor
Hey vivek, sorry to hear you are having problems. I am trying to figure out how you may be seeing this problem. The IndexAccessor cannot return null because you would get an IllegalStateException not a NullPointerException. Also, the released MultiSearcher cannot be null because the Exception would have been thrown sooner. Releasing a null Searcher throws no Exception. So a possibility is that you are returning a foreign MultiSearcher? Unlikely, but I don't see anything else at the moment. The MultiSearcher code is really pretty simple and actually recreates a MultiSearcher on every request...it did not appear to be worth it to coordinate closed sub Accessors with a cache for the MultiSearcher (I wrote the code at one point, and later got rid of it). So really the MultiSearcher is just a simple class that gets cached sub Searchers for each index and creates a one time use MultiSearcher. A simple cache is kept around that identifies which Accessor needs to release which sub Searcher. It's all rather simple, and I am struggling to see another possibility beyond returning a foreign MultiSearcher somehow. I will keep looking and keep you posted. In the mean time, do you have any other data or code snippets to share? vivek sar wrote: Mark, There seems to be some issue with DefaultMultiIndexAccessor.java. I got following NPE exception, 2008-02-13 07:10:28,021 ERROR [http-7501-Processor6] ReportServiceImpl - java.lang.NullPointerException at org.apache.lucene.indexaccessor.DefaultMultiIndexAccessor.release(DefaultMultiIndexAccessor.java:89) Looks like the IndexAccessor for one of the Searcher in the MultiSearcher returned null. Not sure how is that possible, any ideas how is that possible? In my case it caused a critical error as the writer thread was stuck forever (we found out after couple of days) because of this, PS thread 9 prio=1 tid=0x2aac70eb95d0 nid=0x6ba in Object.wait() [0x47533000..0x47533b80] at java.lang.Object.wait(Native Method) - waiting on 0x2aab3e5c7700 (a org.apache.lucene.indexaccessor.DefaultIndexAccessor) at java.lang.Object.wait(Unknown Source) at org.apache.lucene.indexaccessor.DefaultIndexAccessor.waitForReadersAndCloseCached(DefaultIndexAccessor.java:593) at org.apache.lucene.indexaccessor.DefaultIndexAccessor.release(DefaultIndexAccessor.java:510) - locked 0x2aab3e5c7700 (a org.apache.lucene.indexaccessor.DefaultIndexAccessor) The only way to recover was to re-start the application. I use both MultiSearcher and IndexSearcher in my application, I've looked at your code but not able to pinpoint how can it go wrong? Of course, you do have to check for null in the MultiIndexAccessor.release, but how could you get null index accessor at first place? I do call IndexAccessor.close during partitioning of indexes, but the close should wait for all Searchers to close before doing anything. Do you have any updates to your code since 02/04/2008? Thanks, -vivek On Feb 6, 2008 8:37 AM, Jay [EMAIL PROTECTED] wrote: Thanks for your clarifications, Mark! Jay Mark Miller wrote: 5. Although currently IndexSearcher.close() does almost nothing except to close the internal index reader, it might be a safer to close searcher itself as well in closeCachedSearcher(), just in case, the searcher may have other resources to release in the future version of Lucene. Didn't catch that as well. You are right, great idea Jay, thanks. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: DefaultIndexAccessor
Okay, sorry about this one vivek. Added to the unit tests to expose this. When I took out the MultiSearcher caching, I kept the concept of sharing a single MultiIndexAccessor. Unfortunately, this meant that multiple threads were sharing the same Searcher to Accessor Map that was used to track which Accessor needs to release which Searcher. Because of this, a thread might come along and pull a Searcher out of that Map right before another Thread tries to release that same cached Searcher instance. The result is that it is not there, and hence the NullPointerException. Nice to have this added to the Unit Tests. The fix is to create a new MultiIndexAccessor on every request, and recommend you get one for each Thread as the class is now not thread safe. Construction of a MultiIndexAccessor is pretty much nothing in terms of time. This way each thread has its own Map of Searcher to Accessors. A good tip is to make a simple page that simply prints out the Searcher/Writer use counts. You can then check this page occasionally and see if it appears there are Writers or Searchers stuck out you know you have a problem. Under normal circumstances it should not be possible and indicates a bug somewhere. I will the fix shortly. - Mark vivek sar wrote: Mark, There seems to be some issue with DefaultMultiIndexAccessor.java. I got following NPE exception, 2008-02-13 07:10:28,021 ERROR [http-7501-Processor6] ReportServiceImpl - java.lang.NullPointerException at org.apache.lucene.indexaccessor.DefaultMultiIndexAccessor.release(DefaultMultiIndexAccessor.java:89) Looks like the IndexAccessor for one of the Searcher in the MultiSearcher returned null. Not sure how is that possible, any ideas how is that possible? In my case it caused a critical error as the writer thread was stuck forever (we found out after couple of days) because of this, PS thread 9 prio=1 tid=0x2aac70eb95d0 nid=0x6ba in Object.wait() [0x47533000..0x47533b80] at java.lang.Object.wait(Native Method) - waiting on 0x2aab3e5c7700 (a org.apache.lucene.indexaccessor.DefaultIndexAccessor) at java.lang.Object.wait(Unknown Source) at org.apache.lucene.indexaccessor.DefaultIndexAccessor.waitForReadersAndCloseCached(DefaultIndexAccessor.java:593) at org.apache.lucene.indexaccessor.DefaultIndexAccessor.release(DefaultIndexAccessor.java:510) - locked 0x2aab3e5c7700 (a org.apache.lucene.indexaccessor.DefaultIndexAccessor) The only way to recover was to re-start the application. I use both MultiSearcher and IndexSearcher in my application, I've looked at your code but not able to pinpoint how can it go wrong? Of course, you do have to check for null in the MultiIndexAccessor.release, but how could you get null index accessor at first place? I do call IndexAccessor.close during partitioning of indexes, but the close should wait for all Searchers to close before doing anything. Do you have any updates to your code since 02/04/2008? Thanks, -vivek On Feb 6, 2008 8:37 AM, Jay [EMAIL PROTECTED] wrote: Thanks for your clarifications, Mark! Jay Mark Miller wrote: 5. Although currently IndexSearcher.close() does almost nothing except to close the internal index reader, it might be a safer to close searcher itself as well in closeCachedSearcher(), just in case, the searcher may have other resources to release in the future version of Lucene. Didn't catch that as well. You are right, great idea Jay, thanks. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: DefaultIndexAccessor
Mark, Here is the scenario when I saw this exception, 1) A search was run which uses MultiSearcher. This search took more than 3 mins to complete (due to index size and multiple indices) 2) Just a minute after the search was started, we started writing (in a separate thread) to one of the index which was searched on 3) Writer finished in few seconds and called writer.release 4) Since, the multi-searcher was still running the writer.release waits for all multisearcher to complete 5) The MultiSearcher finally completes and calls MultiIndexAccessor.release for the multisearcher 6) At this point for some reason the multisearcher throws NPE. I'm not sure whether the NPE was on the index that was being updated by writer or some other index 7) Because of NPE the index that the writer was writing to never gets released and Writer gets stuck I haven't been able to reproduce this again. Is IndexAccessor-02.07.2008.zip (https://issues.apache.org/jira/browse/LUCENE-1026) most up to date code you got? You mentioned the new jar should, Releasing a Writer never blocks for a reopen now - so after adding a doc it may be a second or two before its visible to new Searchers Do you think this would help the case I ran into where the writer was stuck because of the searcher release? I think you may still want to check for null in the MultiIndexAccessor.release(), i.e., public synchronized void release(Searcher multiSearcher) { Searchable[] searchers = ((MultiSearcher) multiSearcher).getSearchables(); IndexAccessor accessor = null; for (Searchable searchable : searchers) { if(searchable != null){ accessor = multiSearcherAccessors.remove(searchable); if(accessor != null){ accessor.release((Searcher) searchable); } } } } Thanks, -vivek On Feb 15, 2008 5:02 PM, Mark Miller [EMAIL PROTECTED] wrote: Hey vivek, sorry to hear you are having problems. I am trying to figure out how you may be seeing this problem. The IndexAccessor cannot return null because you would get an IllegalStateException not a NullPointerException. Also, the released MultiSearcher cannot be null because the Exception would have been thrown sooner. Releasing a null Searcher throws no Exception. So a possibility is that you are returning a foreign MultiSearcher? Unlikely, but I don't see anything else at the moment. The MultiSearcher code is really pretty simple and actually recreates a MultiSearcher on every request...it did not appear to be worth it to coordinate closed sub Accessors with a cache for the MultiSearcher (I wrote the code at one point, and later got rid of it). So really the MultiSearcher is just a simple class that gets cached sub Searchers for each index and creates a one time use MultiSearcher. A simple cache is kept around that identifies which Accessor needs to release which sub Searcher. It's all rather simple, and I am struggling to see another possibility beyond returning a foreign MultiSearcher somehow. I will keep looking and keep you posted. In the mean time, do you have any other data or code snippets to share? vivek sar wrote: Mark, There seems to be some issue with DefaultMultiIndexAccessor.java. I got following NPE exception, 2008-02-13 07:10:28,021 ERROR [http-7501-Processor6] ReportServiceImpl - java.lang.NullPointerException at org.apache.lucene.indexaccessor.DefaultMultiIndexAccessor.release(DefaultMultiIndexAccessor.java:89) Looks like the IndexAccessor for one of the Searcher in the MultiSearcher returned null. Not sure how is that possible, any ideas how is that possible? In my case it caused a critical error as the writer thread was stuck forever (we found out after couple of days) because of this, PS thread 9 prio=1 tid=0x2aac70eb95d0 nid=0x6ba in Object.wait() [0x47533000..0x47533b80] at java.lang.Object.wait(Native Method) - waiting on 0x2aab3e5c7700 (a org.apache.lucene.indexaccessor.DefaultIndexAccessor) at java.lang.Object.wait(Unknown Source) at org.apache.lucene.indexaccessor.DefaultIndexAccessor.waitForReadersAndCloseCached(DefaultIndexAccessor.java:593) at org.apache.lucene.indexaccessor.DefaultIndexAccessor.release(DefaultIndexAccessor.java:510) - locked 0x2aab3e5c7700 (a org.apache.lucene.indexaccessor.DefaultIndexAccessor) The only way to recover was to re-start the application. I use both MultiSearcher and IndexSearcher in my application, I've looked at your code but not able to pinpoint how can it go wrong? Of course, you do have to check for null in the MultiIndexAccessor.release, but how could you get null index accessor at first place? I do call IndexAccessor.close during partitioning of indexes, but the close should wait for all Searchers to close before doing anything. Do you have any updates to your code since
Re: DefaultIndexAccessor
Here is the fix: https://issues.apache.org/jira/browse/LUCENE-1026 vivek sar wrote: Mark, There seems to be some issue with DefaultMultiIndexAccessor.java. I got following NPE exception, 2008-02-13 07:10:28,021 ERROR [http-7501-Processor6] ReportServiceImpl - java.lang.NullPointerException at org.apache.lucene.indexaccessor.DefaultMultiIndexAccessor.release(DefaultMultiIndexAccessor.java:89) Looks like the IndexAccessor for one of the Searcher in the MultiSearcher returned null. Not sure how is that possible, any ideas how is that possible? In my case it caused a critical error as the writer thread was stuck forever (we found out after couple of days) because of this, PS thread 9 prio=1 tid=0x2aac70eb95d0 nid=0x6ba in Object.wait() [0x47533000..0x47533b80] at java.lang.Object.wait(Native Method) - waiting on 0x2aab3e5c7700 (a org.apache.lucene.indexaccessor.DefaultIndexAccessor) at java.lang.Object.wait(Unknown Source) at org.apache.lucene.indexaccessor.DefaultIndexAccessor.waitForReadersAndCloseCached(DefaultIndexAccessor.java:593) at org.apache.lucene.indexaccessor.DefaultIndexAccessor.release(DefaultIndexAccessor.java:510) - locked 0x2aab3e5c7700 (a org.apache.lucene.indexaccessor.DefaultIndexAccessor) The only way to recover was to re-start the application. I use both MultiSearcher and IndexSearcher in my application, I've looked at your code but not able to pinpoint how can it go wrong? Of course, you do have to check for null in the MultiIndexAccessor.release, but how could you get null index accessor at first place? I do call IndexAccessor.close during partitioning of indexes, but the close should wait for all Searchers to close before doing anything. Do you have any updates to your code since 02/04/2008? Thanks, -vivek On Feb 6, 2008 8:37 AM, Jay [EMAIL PROTECTED] wrote: Thanks for your clarifications, Mark! Jay Mark Miller wrote: 5. Although currently IndexSearcher.close() does almost nothing except to close the internal index reader, it might be a safer to close searcher itself as well in closeCachedSearcher(), just in case, the searcher may have other resources to release in the future version of Lucene. Didn't catch that as well. You are right, great idea Jay, thanks. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: DefaultIndexAccessor
Mark, Thanks for the quick fix. Actually, it is possible that there might had been simultaneous queries using the MultiSearcher. I assumed it was thread-safe, thus was re-using the same instance. I'll update my application code as well. Thanks, -vivek On Feb 15, 2008 5:56 PM, Mark Miller [EMAIL PROTECTED] wrote: Here is the fix: https://issues.apache.org/jira/browse/LUCENE-1026 vivek sar wrote: Mark, There seems to be some issue with DefaultMultiIndexAccessor.java. I got following NPE exception, 2008-02-13 07:10:28,021 ERROR [http-7501-Processor6] ReportServiceImpl - java.lang.NullPointerException at org.apache.lucene.indexaccessor.DefaultMultiIndexAccessor.release(DefaultMultiIndexAccessor.java:89) Looks like the IndexAccessor for one of the Searcher in the MultiSearcher returned null. Not sure how is that possible, any ideas how is that possible? In my case it caused a critical error as the writer thread was stuck forever (we found out after couple of days) because of this, PS thread 9 prio=1 tid=0x2aac70eb95d0 nid=0x6ba in Object.wait() [0x47533000..0x47533b80] at java.lang.Object.wait(Native Method) - waiting on 0x2aab3e5c7700 (a org.apache.lucene.indexaccessor.DefaultIndexAccessor) at java.lang.Object.wait(Unknown Source) at org.apache.lucene.indexaccessor.DefaultIndexAccessor.waitForReadersAndCloseCached(DefaultIndexAccessor.java:593) at org.apache.lucene.indexaccessor.DefaultIndexAccessor.release(DefaultIndexAccessor.java:510) - locked 0x2aab3e5c7700 (a org.apache.lucene.indexaccessor.DefaultIndexAccessor) The only way to recover was to re-start the application. I use both MultiSearcher and IndexSearcher in my application, I've looked at your code but not able to pinpoint how can it go wrong? Of course, you do have to check for null in the MultiIndexAccessor.release, but how could you get null index accessor at first place? I do call IndexAccessor.close during partitioning of indexes, but the close should wait for all Searchers to close before doing anything. Do you have any updates to your code since 02/04/2008? Thanks, -vivek On Feb 6, 2008 8:37 AM, Jay [EMAIL PROTECTED] wrote: Thanks for your clarifications, Mark! Jay Mark Miller wrote: 5. Although currently IndexSearcher.close() does almost nothing except to close the internal index reader, it might be a safer to close searcher itself as well in closeCachedSearcher(), just in case, the searcher may have other resources to release in the future version of Lucene. Didn't catch that as well. You are right, great idea Jay, thanks. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: DefaultIndexAccessor
Thanks for the feedback jay. One at a time: Jay wrote: Great effort for much improved indexaccessor, Mark! A couple questions and observations: 1. In release(Searcher), you removed a check if the given searcher is the cached one from an earlier version. This could potentially cause problems for some people. This is something that I meant to come back to. The problem is that the Searcher you are returning may have already been replaced in the Searcher cache...so retired searchers must be checked too. Will consider again. 2. The createdSearchers variable is not really used: you just populate it and print it out. What's the purpose for it? This was a debug check that I had been using. Will cleanup. 3. The variable numSearchersForRetirment is used in WarmingIndexAccessor not DefaultIndexAccessor. Thanks. Will move. 4. I wish that in the next release of Lucene, they will add searcher reopen api so that we do not have to wor around it. Not sure how this would play out, but an interesting thought... 5. Although currently IndexSearcher.close() does almost nothing except to close the internal index reader, it might be a safer to close searcher itself as well in closeCachedSearcher(), just in case, the searcher may have other resources to release in the future version of Lucene. The problem is that because I am supplying the reader, calling Searcher.close() won't close it. The Searcher has to be created without a supplied Reader for it to be able to close itself. I got the same itch though... Thanks! Jay I appreciate the feedback! I'll be working more on this. I think more can be done. - Mark - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: DefaultIndexAccessor
5. Although currently IndexSearcher.close() does almost nothing except to close the internal index reader, it might be a safer to close searcher itself as well in closeCachedSearcher(), just in case, the searcher may have other resources to release in the future version of Lucene. Didn't catch that as well. You are right, great idea Jay, thanks. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: DefaultIndexAccessor
Thanks for your clarifications, Mark! Jay Mark Miller wrote: 5. Although currently IndexSearcher.close() does almost nothing except to close the internal index reader, it might be a safer to close searcher itself as well in closeCachedSearcher(), just in case, the searcher may have other resources to release in the future version of Lucene. Didn't catch that as well. You are right, great idea Jay, thanks. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: DefaultIndexAccessor
Great effort for much improved indexaccessor, Mark! A couple questions and observations: 1. In release(Searcher), you removed a check if the given searcher is the cached one from an earlier version. This could potentially cause problems for some people. 2. The createdSearchers variable is not really used: you just populate it and print it out. What's the purpose for it? 3. The variable numSearchersForRetirment is used in WarmingIndexAccessor not DefaultIndexAccessor. 4. I wish that in the next release of Lucene, they will add searcher reopen api so that we do not have to wor around it. 5. Although currently IndexSearcher.close() does almost nothing except to close the internal index reader, it might be a safer to close searcher itself as well in closeCachedSearcher(), just in case, the searcher may have other resources to release in the future version of Lucene. Thanks! Jay Mark Miller wrote: For anyone following this thread who would like to check this out, I put up the new code with the warming capability: https://issues.apache.org/jira/browse/LUCENE-1026 https://issues.apache.org/jira/secure/attachment/12374729/IndexAccessor-02.04.2008.zip IndexAccessor-02.04.2008.zip https://issues.apache.org/jira/secure/attachment/12374729/IndexAccessor-02.04.2008.zip (32 kb) See the comment at the bottom. Cam Bazz wrote: Hello Mark, Thank you for your lengthy and valuable clarification. I have the case - before adding to the index, i must check if a document exist with the same key (actually, double key) - or before deleting a document - I must ensure it exists in the index. Currently I am doing it with my custom caching routine. It works quite well upto 32M documents. but after that something happens and it really slows down. I will experiment with your implementation, as soon as I can. It is very cool by the way. Will it be included in the next release? Best, -C.B. On Feb 4, 2008 7:15 PM, Mark Miller [EMAIL PROTECTED] wrote: The purpose of IndexAccessor is to coordinate Readers/Writers for a Lucene index. Readers and Writers in Lucene are multi-threaded in that multiple threads may use them at the same time, but they must/should be shared and there are special rules (You cannot delete with a Reader while a Writer is working on the index). Also, you need to refresh Reader views every so often; this is expensive (though usually much less so with the new reopen method). IndexAccessor enforces the rules and controls Reader refreshing. Instead of worrying about caching or index interaction rules, you just ask for your Reader/Writer, use it to search or add a doc, and then return it. The rest is taken care of for you. This is done by keeping a cached Writer and Searcher(s) that all threads share. References to the Searchers are counted so that after a Writer is returned (and no other thread has a reference to the Writer), IndexAccessor waits for all of the current Searchers to come back and then reopens their Readers. In this regard, you get a similar setup to what Solr might give: from any thread you just add docs and run searches -- you don't have to worry about refreshing Readers or sharing Writers/Readers or one thread deleting with a Reader while another thread tries to write with a Writer. This setup allows you to do other cool things, like warm Searchers before putting them into action. Thats what the code I am posting soon is be capable of - when the Readers are reopened, search requests will still be handled by the old Readers while the new Searchers run a sample query with optional sort fields. This will make sure the Reader is open and its sort caches are loaded before the first thread tries to use it. Much faster response to applications. You must open a new Reader or reopen a Reader to see recently added docs...IndexAccessor provides no real way around that. But it does make the reopening much easier -- and your application that just wants to add docs and search at will from multiple threads, won't have to worry about it. You can bail out here, or if you want further clarification I will include an alternate attempt at what IndexAccessor is below. - Mark When accessing a Lucene index from multiple threads, there are a variety of issues that you must address. 1. The Readers/Writer should be shared across threads. 2. Readers must periodically be refreshed, either be creating new instances or using the new reopen method. 3. A Reader that writes needs to be properly coordinated with a Writer eg they cannot be used at the same time. IndexAccessor addresses each of these issues. How it works: A single Writer is shared among threads that try to concurrently retrieve and use a Writer. Once all of these threads release their reference to the Writer, it is closed and upon the next request a new one is created. A single Searcher for each Similarity is also
Re: DefaultIndexAccessor
IndexAccessor-1.26.2008.zip is the latest one. I will be dating a zip from now on. I hope to post new code with the warming either tonight or tomorrow night. I would be ecstatic to have some help vetting that. Also, I am thinking of making a change so that when you release the Writer the thread that releases does not block until reopen. I think the original author did this so that if you add a doc with a thread and then immediately search from the same thread, you are guaranteed to find the doc. However, this gaurentee did not hold -- if another thread had a reference to the Writer and a new thread grabbed a Writer and then quicly released before the first thread, you will have added a doc but it will not be visible until the first thread releases its reference to the Writer...since the concept is not enforced anyway, you might as well not block for the final thread that releases the Writer either. Instead I will grab a thread from a thread pool to do the reopening with that thread, and return right after closing the Writer. The result is that you cannot add a doc and search and expect to find it without waiting a second or too. But this way things will be consistent, and an app that adds docs will be a bit more responsiveeg it wont hang as Readers are being reopened. I also have to bring the AccessProvider classes back. No easy way to use your own custom Readers without it...I shouldn't have stripped it out. - Mark Cam Bazz wrote: Hello, Regarding https://issues.apache.org/jira/browse/LUCENE-1026 , this seems very interesting. I have read the discussion on the page, but I could not figure out which set of files is the latest. Is it the IndexAccessor-1.26.2008.zip file? I will read through the code, make my own tests, and send some feedback. Best. -C.B. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: DefaultIndexAccessor
Hello Mark, I have been reading the code - and honestly I have not understood how it works. I was hoping that this was a solution to the case when you are adding documents - in a multithreaded way, it allows other non-writer threads to be able to see documents added without refreshing the indexsearcher - by using some caching mechanism. Could you elaborate what IndexAccessor does and how it does it a little bit more? Best Regards, -C.B. On Feb 4, 2008 3:06 PM, Mark Miller [EMAIL PROTECTED] wrote: IndexAccessor-1.26.2008.zip is the latest one. I will be dating a zip from now on. I hope to post new code with the warming either tonight or tomorrow night. I would be ecstatic to have some help vetting that. Also, I am thinking of making a change so that when you release the Writer the thread that releases does not block until reopen. I think the original author did this so that if you add a doc with a thread and then immediately search from the same thread, you are guaranteed to find the doc. However, this gaurentee did not hold -- if another thread had a reference to the Writer and a new thread grabbed a Writer and then quicly released before the first thread, you will have added a doc but it will not be visible until the first thread releases its reference to the Writer...since the concept is not enforced anyway, you might as well not block for the final thread that releases the Writer either. Instead I will grab a thread from a thread pool to do the reopening with that thread, and return right after closing the Writer. The result is that you cannot add a doc and search and expect to find it without waiting a second or too. But this way things will be consistent, and an app that adds docs will be a bit more responsiveeg it wont hang as Readers are being reopened. I also have to bring the AccessProvider classes back. No easy way to use your own custom Readers without it...I shouldn't have stripped it out. - Mark Cam Bazz wrote: Hello, Regarding https://issues.apache.org/jira/browse/LUCENE-1026 , this seems very interesting. I have read the discussion on the page, but I could not figure out which set of files is the latest. Is it the IndexAccessor-1.26.2008.zip file? I will read through the code, make my own tests, and send some feedback. Best. -C.B. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: DefaultIndexAccessor
The purpose of IndexAccessor is to coordinate Readers/Writers for a Lucene index. Readers and Writers in Lucene are multi-threaded in that multiple threads may use them at the same time, but they must/should be shared and there are special rules (You cannot delete with a Reader while a Writer is working on the index). Also, you need to refresh Reader views every so often; this is expensive (though usually much less so with the new reopen method). IndexAccessor enforces the rules and controls Reader refreshing. Instead of worrying about caching or index interaction rules, you just ask for your Reader/Writer, use it to search or add a doc, and then return it. The rest is taken care of for you. This is done by keeping a cached Writer and Searcher(s) that all threads share. References to the Searchers are counted so that after a Writer is returned (and no other thread has a reference to the Writer), IndexAccessor waits for all of the current Searchers to come back and then reopens their Readers. In this regard, you get a similar setup to what Solr might give: from any thread you just add docs and run searches -- you don't have to worry about refreshing Readers or sharing Writers/Readers or one thread deleting with a Reader while another thread tries to write with a Writer. This setup allows you to do other cool things, like warm Searchers before putting them into action. Thats what the code I am posting soon is be capable of - when the Readers are reopened, search requests will still be handled by the old Readers while the new Searchers run a sample query with optional sort fields. This will make sure the Reader is open and its sort caches are loaded before the first thread tries to use it. Much faster response to applications. You must open a new Reader or reopen a Reader to see recently added docs...IndexAccessor provides no real way around that. But it does make the reopening much easier -- and your application that just wants to add docs and search at will from multiple threads, won't have to worry about it. You can bail out here, or if you want further clarification I will include an alternate attempt at what IndexAccessor is below. - Mark When accessing a Lucene index from multiple threads, there are a variety of issues that you must address. 1. The Readers/Writer should be shared across threads. 2. Readers must periodically be refreshed, either be creating new instances or using the new reopen method. 3. A Reader that writes needs to be properly coordinated with a Writer eg they cannot be used at the same time. IndexAccessor addresses each of these issues. How it works: A single Writer is shared among threads that try to concurrently retrieve and use a Writer. Once all of these threads release their reference to the Writer, it is closed and upon the next request a new one is created. A single Searcher for each Similarity is also shared across threads. Upon first request, a new Searcher is created. This Searcher is then returned upon every request. A count of every Searcher reference retrieved is maintained. When all references to a Writer are released, the Writer is closed and after waiting for all of the Searchers to be returned, the Searchers are reopened. Without warming enabled, new requests for Searchers/Readers must wait for this reopen to complete. If warming is enabled, the old Searchers/Readers continue handling Searcher requests until the Readers have been reopened and any requested sort caches have been loaded. If you ask for a writing Reader, you will not get it until a Writer is released and vice versa. The result is that you can freely use Writers/Readers/Searchers from any thread without considering thread interactions. *** If you want to add docs, just ask for a Writer, add the docs, and release the Writer. If you want to search, get a Searcher, search, and release the Searcher. You don't have to worry about reopening Readers or coordinating access. *** You still do have to consider things like hogging the Writer/Readers - if you don't occasionally release them, things will not stay very interactive. The best method is to just get the object, use it, and then return it in a finally block. Batch load multiple docs, but if your just randomly adding a doc, get the Writer, add it, and then release the Writer in a finally block. If you are batch loading a million docs and you want to be able to see them as they are added: get the writer and add 10,000 docs (or something), release the Writer, get the Writer and add 10,000 docs, etc. Cam Bazz wrote: Hello Mark, I have been reading the code - and honestly I have not understood how it works. I was hoping that this was a solution to the case when you are adding documents - in a multithreaded way, it allows other non-writer threads to be able to see documents
Re: DefaultIndexAccessor
I replied to the wrong thread -- sorry about that: You still have to be careful if you want to alternate a search and write. If you are loading a lot of docs this way, you would want to hold the Writer to batch the docs, but while you are holding it, you will not have a fresh view of the index - so you could add the same doc twice if it came twice in a batch. The only way to be sure you avoid this is to reopen readers after you add every doc. This is just not going to be a fast way of doing things...but if you have a high mergefactor, the new reopen method will prob make it *much* faster. Or if you are sure that the batch won't contain duplicates, you can batch load. Cam Bazz wrote: Hello Mark, Thank you for your lengthy and valuable clarification. I have the case - before adding to the index, i must check if a document exist with the same key (actually, double key) - or before deleting a document - I must ensure it exists in the index. Currently I am doing it with my custom caching routine. It works quite well upto 32M documents. but after that something happens and it really slows down. I will experiment with your implementation, as soon as I can. It is very cool by the way. Will it be included in the next release? Best, -C.B. On Feb 4, 2008 7:15 PM, Mark Miller [EMAIL PROTECTED] wrote: The purpose of IndexAccessor is to coordinate Readers/Writers for a Lucene index. Readers and Writers in Lucene are multi-threaded in that multiple threads may use them at the same time, but they must/should be shared and there are special rules (You cannot delete with a Reader while a Writer is working on the index). Also, you need to refresh Reader views every so often; this is expensive (though usually much less so with the new reopen method). IndexAccessor enforces the rules and controls Reader refreshing. Instead of worrying about caching or index interaction rules, you just ask for your Reader/Writer, use it to search or add a doc, and then return it. The rest is taken care of for you. This is done by keeping a cached Writer and Searcher(s) that all threads share. References to the Searchers are counted so that after a Writer is returned (and no other thread has a reference to the Writer), IndexAccessor waits for all of the current Searchers to come back and then reopens their Readers. In this regard, you get a similar setup to what Solr might give: from any thread you just add docs and run searches -- you don't have to worry about refreshing Readers or sharing Writers/Readers or one thread deleting with a Reader while another thread tries to write with a Writer. This setup allows you to do other cool things, like warm Searchers before putting them into action. Thats what the code I am posting soon is be capable of - when the Readers are reopened, search requests will still be handled by the old Readers while the new Searchers run a sample query with optional sort fields. This will make sure the Reader is open and its sort caches are loaded before the first thread tries to use it. Much faster response to applications. You must open a new Reader or reopen a Reader to see recently added docs...IndexAccessor provides no real way around that. But it does make the reopening much easier -- and your application that just wants to add docs and search at will from multiple threads, won't have to worry about it. You can bail out here, or if you want further clarification I will include an alternate attempt at what IndexAccessor is below. - Mark When accessing a Lucene index from multiple threads, there are a variety of issues that you must address. 1. The Readers/Writer should be shared across threads. 2. Readers must periodically be refreshed, either be creating new instances or using the new reopen method. 3. A Reader that writes needs to be properly coordinated with a Writer eg they cannot be used at the same time. IndexAccessor addresses each of these issues. How it works: A single Writer is shared among threads that try to concurrently retrieve and use a Writer. Once all of these threads release their reference to the Writer, it is closed and upon the next request a new one is created. A single Searcher for each Similarity is also shared across threads. Upon first request, a new Searcher is created. This Searcher is then returned upon every request. A count of every Searcher reference retrieved is maintained. When all references to a Writer are released, the Writer is closed and after waiting for all of the Searchers to be returned, the Searchers are reopened. Without warming enabled, new requests for Searchers/Readers must wait for this reopen to complete. If warming is enabled, the old Searchers/Readers continue handling Searcher requests until the Readers have been reopened and any requested sort caches have been loaded. If you ask for a writing Reader, you will not get it
Re: DefaultIndexAccessor
For anyone following this thread who would like to check this out, I put up the new code with the warming capability: https://issues.apache.org/jira/browse/LUCENE-1026 https://issues.apache.org/jira/secure/attachment/12374729/IndexAccessor-02.04.2008.zip IndexAccessor-02.04.2008.zip https://issues.apache.org/jira/secure/attachment/12374729/IndexAccessor-02.04.2008.zip (32 kb) See the comment at the bottom. Cam Bazz wrote: Hello Mark, Thank you for your lengthy and valuable clarification. I have the case - before adding to the index, i must check if a document exist with the same key (actually, double key) - or before deleting a document - I must ensure it exists in the index. Currently I am doing it with my custom caching routine. It works quite well upto 32M documents. but after that something happens and it really slows down. I will experiment with your implementation, as soon as I can. It is very cool by the way. Will it be included in the next release? Best, -C.B. On Feb 4, 2008 7:15 PM, Mark Miller [EMAIL PROTECTED] wrote: The purpose of IndexAccessor is to coordinate Readers/Writers for a Lucene index. Readers and Writers in Lucene are multi-threaded in that multiple threads may use them at the same time, but they must/should be shared and there are special rules (You cannot delete with a Reader while a Writer is working on the index). Also, you need to refresh Reader views every so often; this is expensive (though usually much less so with the new reopen method). IndexAccessor enforces the rules and controls Reader refreshing. Instead of worrying about caching or index interaction rules, you just ask for your Reader/Writer, use it to search or add a doc, and then return it. The rest is taken care of for you. This is done by keeping a cached Writer and Searcher(s) that all threads share. References to the Searchers are counted so that after a Writer is returned (and no other thread has a reference to the Writer), IndexAccessor waits for all of the current Searchers to come back and then reopens their Readers. In this regard, you get a similar setup to what Solr might give: from any thread you just add docs and run searches -- you don't have to worry about refreshing Readers or sharing Writers/Readers or one thread deleting with a Reader while another thread tries to write with a Writer. This setup allows you to do other cool things, like warm Searchers before putting them into action. Thats what the code I am posting soon is be capable of - when the Readers are reopened, search requests will still be handled by the old Readers while the new Searchers run a sample query with optional sort fields. This will make sure the Reader is open and its sort caches are loaded before the first thread tries to use it. Much faster response to applications. You must open a new Reader or reopen a Reader to see recently added docs...IndexAccessor provides no real way around that. But it does make the reopening much easier -- and your application that just wants to add docs and search at will from multiple threads, won't have to worry about it. You can bail out here, or if you want further clarification I will include an alternate attempt at what IndexAccessor is below. - Mark When accessing a Lucene index from multiple threads, there are a variety of issues that you must address. 1. The Readers/Writer should be shared across threads. 2. Readers must periodically be refreshed, either be creating new instances or using the new reopen method. 3. A Reader that writes needs to be properly coordinated with a Writer eg they cannot be used at the same time. IndexAccessor addresses each of these issues. How it works: A single Writer is shared among threads that try to concurrently retrieve and use a Writer. Once all of these threads release their reference to the Writer, it is closed and upon the next request a new one is created. A single Searcher for each Similarity is also shared across threads. Upon first request, a new Searcher is created. This Searcher is then returned upon every request. A count of every Searcher reference retrieved is maintained. When all references to a Writer are released, the Writer is closed and after waiting for all of the Searchers to be returned, the Searchers are reopened. Without warming enabled, new requests for Searchers/Readers must wait for this reopen to complete. If warming is enabled, the old Searchers/Readers continue handling Searcher requests until the Readers have been reopened and any requested sort caches have been loaded. If you ask for a writing Reader, you will not get it until a Writer is released and vice versa. The result is that you can freely use Writers/Readers/Searchers from any thread without considering thread interactions. *** If you want to add docs, just ask for a Writer, add the docs, and release the Writer. If you want to