[jira] Updated: (LUCENE-1238) intermittent failures of TestTimeLimitedCollector.testTimeoutMultiThreaded in nightly tests
[ https://issues.apache.org/jira/browse/LUCENE-1238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doron Cohen updated LUCENE-1238: Summary: intermittent failures of TestTimeLimitedCollector.testTimeoutMultiThreaded in nightly tests (was: intermittent faiures of TestTimeLimitedCollector.testTimeoutMultiThreaded in nightly tests) fixed typo in summary. intermittent failures of TestTimeLimitedCollector.testTimeoutMultiThreaded in nightly tests Key: LUCENE-1238 URL: https://issues.apache.org/jira/browse/LUCENE-1238 Project: Lucene - Java Issue Type: Bug Affects Versions: 2.4 Reporter: Doron Cohen Assignee: Doron Cohen Priority: Minor Occasionly TestTimeLimitedCollector.testTimeoutMultiThreaded fails. e.g. with this output: {noformat} [junit] - Standard Error - [junit] Exception in thread Thread-97 junit.framework.AssertionFailedError: no hits found! [junit] at junit.framework.Assert.fail(Assert.java:47) [junit] at junit.framework.Assert.assertTrue(Assert.java:20) [junit] at org.apache.lucene.search.TestTimeLimitedCollector.doTestTimeout(TestTimeLimitedCollector.java:152) [junit] at org.apache.lucene.search.TestTimeLimitedCollector.access$100(TestTimeLimitedCollector.java:38) [junit] at org.apache.lucene.search.TestTimeLimitedCollector$1.run(TestTimeLimitedCollector.java:231) [junit] Exception in thread Thread-85 junit.framework.AssertionFailedError: no hits found! [junit] at junit.framework.Assert.fail(Assert.java:47) [junit] at junit.framework.Assert.assertTrue(Assert.java:20) [junit] at org.apache.lucene.search.TestTimeLimitedCollector.doTestTimeout(TestTimeLimitedCollector.java:152) [junit] at org.apache.lucene.search.TestTimeLimitedCollector.access$100(TestTimeLimitedCollector.java:38) [junit] at org.apache.lucene.search.TestTimeLimitedCollector$1.run(TestTimeLimitedCollector.java:231) [junit] - --- [junit] Testcase: testTimeoutMultiThreaded(org.apache.lucene.search.TestTimeLimitedCollector): FAILED [junit] some threads failed! expected:50 but was:48 [junit] junit.framework.AssertionFailedError: some threads failed! expected:50 but was:48 [junit] at org.apache.lucene.search.TestTimeLimitedCollector.doTestMultiThreads(TestTimeLimitedCollector.java:255) [junit] at org.apache.lucene.search.TestTimeLimitedCollector.testTimeoutMultiThreaded(TestTimeLimitedCollector.java:220) [junit] {noformat} Problem either in test or in TimeLimitedCollector. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-1238) intermittent failures of TestTimeLimitedCollector.testTimeoutMultiThreaded in nightly tests
[ https://issues.apache.org/jira/browse/LUCENE-1238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doron Cohen updated LUCENE-1238: Lucene Fields: [New, Patch Available] (was: [New]) intermittent failures of TestTimeLimitedCollector.testTimeoutMultiThreaded in nightly tests Key: LUCENE-1238 URL: https://issues.apache.org/jira/browse/LUCENE-1238 Project: Lucene - Java Issue Type: Bug Affects Versions: 2.4 Reporter: Doron Cohen Assignee: Doron Cohen Priority: Minor Attachments: LUCENE-1238.patch Occasionly TestTimeLimitedCollector.testTimeoutMultiThreaded fails. e.g. with this output: {noformat} [junit] - Standard Error - [junit] Exception in thread Thread-97 junit.framework.AssertionFailedError: no hits found! [junit] at junit.framework.Assert.fail(Assert.java:47) [junit] at junit.framework.Assert.assertTrue(Assert.java:20) [junit] at org.apache.lucene.search.TestTimeLimitedCollector.doTestTimeout(TestTimeLimitedCollector.java:152) [junit] at org.apache.lucene.search.TestTimeLimitedCollector.access$100(TestTimeLimitedCollector.java:38) [junit] at org.apache.lucene.search.TestTimeLimitedCollector$1.run(TestTimeLimitedCollector.java:231) [junit] Exception in thread Thread-85 junit.framework.AssertionFailedError: no hits found! [junit] at junit.framework.Assert.fail(Assert.java:47) [junit] at junit.framework.Assert.assertTrue(Assert.java:20) [junit] at org.apache.lucene.search.TestTimeLimitedCollector.doTestTimeout(TestTimeLimitedCollector.java:152) [junit] at org.apache.lucene.search.TestTimeLimitedCollector.access$100(TestTimeLimitedCollector.java:38) [junit] at org.apache.lucene.search.TestTimeLimitedCollector$1.run(TestTimeLimitedCollector.java:231) [junit] - --- [junit] Testcase: testTimeoutMultiThreaded(org.apache.lucene.search.TestTimeLimitedCollector): FAILED [junit] some threads failed! expected:50 but was:48 [junit] junit.framework.AssertionFailedError: some threads failed! expected:50 but was:48 [junit] at org.apache.lucene.search.TestTimeLimitedCollector.doTestMultiThreads(TestTimeLimitedCollector.java:255) [junit] at org.apache.lucene.search.TestTimeLimitedCollector.testTimeoutMultiThreaded(TestTimeLimitedCollector.java:220) [junit] {noformat} Problem either in test or in TimeLimitedCollector. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-1238) intermittent failures of TestTimeLimitedCollector.testTimeoutMultiThreaded in nightly tests
[ https://issues.apache.org/jira/browse/LUCENE-1238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doron Cohen updated LUCENE-1238: Attachment: LUCENE-1238.patch Problem was in test. However fix adds a greediness option to time-limited-collector (TLC): * A greedy TLC upon timeout would allow the wrapped collector to collect current doc and only then throw the timeout exception. * A non greedy TLC (the default, as before) will immediately throw the exception. For the test, setting to greedy allows to require that at least one doc was collected. I addition this patch: * Adds missing javadocs for TLC constructor. * Increase slack in timeout requirements in the test. This is to prevent further noise in this: HLC is required to timeout not too soon and not too late, but in a busy machine the not too late part is problematic to test. I considered to removing this part (not too late), but decided to leave it in for now. * Adds a test for the setGreedy() option. All TLC tests pass. intermittent failures of TestTimeLimitedCollector.testTimeoutMultiThreaded in nightly tests Key: LUCENE-1238 URL: https://issues.apache.org/jira/browse/LUCENE-1238 Project: Lucene - Java Issue Type: Bug Affects Versions: 2.4 Reporter: Doron Cohen Assignee: Doron Cohen Priority: Minor Attachments: LUCENE-1238.patch Occasionly TestTimeLimitedCollector.testTimeoutMultiThreaded fails. e.g. with this output: {noformat} [junit] - Standard Error - [junit] Exception in thread Thread-97 junit.framework.AssertionFailedError: no hits found! [junit] at junit.framework.Assert.fail(Assert.java:47) [junit] at junit.framework.Assert.assertTrue(Assert.java:20) [junit] at org.apache.lucene.search.TestTimeLimitedCollector.doTestTimeout(TestTimeLimitedCollector.java:152) [junit] at org.apache.lucene.search.TestTimeLimitedCollector.access$100(TestTimeLimitedCollector.java:38) [junit] at org.apache.lucene.search.TestTimeLimitedCollector$1.run(TestTimeLimitedCollector.java:231) [junit] Exception in thread Thread-85 junit.framework.AssertionFailedError: no hits found! [junit] at junit.framework.Assert.fail(Assert.java:47) [junit] at junit.framework.Assert.assertTrue(Assert.java:20) [junit] at org.apache.lucene.search.TestTimeLimitedCollector.doTestTimeout(TestTimeLimitedCollector.java:152) [junit] at org.apache.lucene.search.TestTimeLimitedCollector.access$100(TestTimeLimitedCollector.java:38) [junit] at org.apache.lucene.search.TestTimeLimitedCollector$1.run(TestTimeLimitedCollector.java:231) [junit] - --- [junit] Testcase: testTimeoutMultiThreaded(org.apache.lucene.search.TestTimeLimitedCollector): FAILED [junit] some threads failed! expected:50 but was:48 [junit] junit.framework.AssertionFailedError: some threads failed! expected:50 but was:48 [junit] at org.apache.lucene.search.TestTimeLimitedCollector.doTestMultiThreads(TestTimeLimitedCollector.java:255) [junit] at org.apache.lucene.search.TestTimeLimitedCollector.testTimeoutMultiThreaded(TestTimeLimitedCollector.java:220) [junit] {noformat} Problem either in test or in TimeLimitedCollector. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-1239) IndexWriter deadlock when using ConcurrentMergeScheduler
[ https://issues.apache.org/jira/browse/LUCENE-1239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12579792#action_12579792 ] Michael Lossos commented on LUCENE-1239: Here's the related Compass bug: http://issues.compass-project.org/browse/CMP-581 In case this ends up being a problem in Compass's ExecutorMergeScheduler, though that doesn't look to be the case at the moment. IndexWriter deadlock when using ConcurrentMergeScheduler Key: LUCENE-1239 URL: https://issues.apache.org/jira/browse/LUCENE-1239 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: 2.3.1 Environment: Compass 2.0.0M3 (nightly build #57), Lucene 2.3.1, Spring Framework 2.0.7.0 Reporter: Michael Lossos I'm trying to update our application from Compass 2.0.0M1 with Lucene 2.2 to Compass 2.0.0M3 (latest build) with Lucene 2.3.1. I'm holding all other things constant and only changing the Compass and Lucene jars. I'm recreating the search index for our data and seeing deadlock in Lucene's IndexWriter. It appears to be waiting on a signal from the merge thread. I've tried creating a simple reproduction case for this but to no avail. Doing the exact same steps with Compass 2.0.0M1 and Lucene 2.2 has no problems and recreates our search index. That is to say, it's not our code. In particular, the main thread performing the commit (Lucene document save) from Compass is calling Lucene's IndexWriter.optimize(). We're using Compass's ExecutorMergeScheduler to handle the merging, and it is calling IndexWriter.merge(). The main thread in IndexWriter.optimize() enters the wait() at the bottom of that method and is never notified. I can't tell if this is because optimizeMergesPending() is returning true incorrectly, or if IndexWriter.merge()'s notifyAll() is being called prematurely. Looking at the code, it doesn't seem possible for IndexWriter.optimize() to be waiting and miss a notifyAll(), and Lucene's IndexWriter.merge() was recently fixed to always call notifyAll() even on exceptions -- that is all the relevant IndexWriter code looks properly synchronized. Nevertheless, I'm seeing the deadlock behavior described, and it's reproducible using our app and our test data set. Could someone familiar with IndexWriter's synchronization code take another look at it? I'm sorry that I can't give you a simple reproduction test case. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Created: (LUCENE-1239) IndexWriter deadlock when using ConcurrentMergeScheduler
IndexWriter deadlock when using ConcurrentMergeScheduler Key: LUCENE-1239 URL: https://issues.apache.org/jira/browse/LUCENE-1239 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: 2.3.1 Environment: Compass 2.0.0M3 (nightly build #57), Lucene 2.3.1, Spring Framework 2.0.7.0 Reporter: Michael Lossos I'm trying to update our application from Compass 2.0.0M1 with Lucene 2.2 to Compass 2.0.0M3 (latest build) with Lucene 2.3.1. I'm holding all other things constant and only changing the Compass and Lucene jars. I'm recreating the search index for our data and seeing deadlock in Lucene's IndexWriter. It appears to be waiting on a signal from the merge thread. I've tried creating a simple reproduction case for this but to no avail. Doing the exact same steps with Compass 2.0.0M1 and Lucene 2.2 has no problems and recreates our search index. That is to say, it's not our code. In particular, the main thread performing the commit (Lucene document save) from Compass is calling Lucene's IndexWriter.optimize(). We're using Compass's ExecutorMergeScheduler to handle the merging, and it is calling IndexWriter.merge(). The main thread in IndexWriter.optimize() enters the wait() at the bottom of that method and is never notified. I can't tell if this is because optimizeMergesPending() is returning true incorrectly, or if IndexWriter.merge()'s notifyAll() is being called prematurely. Looking at the code, it doesn't seem possible for IndexWriter.optimize() to be waiting and miss a notifyAll(), and Lucene's IndexWriter.merge() was recently fixed to always call notifyAll() even on exceptions -- that is all the relevant IndexWriter code looks properly synchronized. Nevertheless, I'm seeing the deadlock behavior described, and it's reproducible using our app and our test data set. Could someone familiar with IndexWriter's synchronization code take another look at it? I'm sorry that I can't give you a simple reproduction test case. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Assigned: (LUCENE-1239) IndexWriter deadlock when using ConcurrentMergeScheduler
[ https://issues.apache.org/jira/browse/LUCENE-1239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless reassigned LUCENE-1239: -- Assignee: Michael McCandless IndexWriter deadlock when using ConcurrentMergeScheduler Key: LUCENE-1239 URL: https://issues.apache.org/jira/browse/LUCENE-1239 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: 2.3.1 Environment: Compass 2.0.0M3 (nightly build #57), Lucene 2.3.1, Spring Framework 2.0.7.0 Reporter: Michael Lossos Assignee: Michael McCandless I'm trying to update our application from Compass 2.0.0M1 with Lucene 2.2 to Compass 2.0.0M3 (latest build) with Lucene 2.3.1. I'm holding all other things constant and only changing the Compass and Lucene jars. I'm recreating the search index for our data and seeing deadlock in Lucene's IndexWriter. It appears to be waiting on a signal from the merge thread. I've tried creating a simple reproduction case for this but to no avail. Doing the exact same steps with Compass 2.0.0M1 and Lucene 2.2 has no problems and recreates our search index. That is to say, it's not our code. In particular, the main thread performing the commit (Lucene document save) from Compass is calling Lucene's IndexWriter.optimize(). We're using Compass's ExecutorMergeScheduler to handle the merging, and it is calling IndexWriter.merge(). The main thread in IndexWriter.optimize() enters the wait() at the bottom of that method and is never notified. I can't tell if this is because optimizeMergesPending() is returning true incorrectly, or if IndexWriter.merge()'s notifyAll() is being called prematurely. Looking at the code, it doesn't seem possible for IndexWriter.optimize() to be waiting and miss a notifyAll(), and Lucene's IndexWriter.merge() was recently fixed to always call notifyAll() even on exceptions -- that is all the relevant IndexWriter code looks properly synchronized. Nevertheless, I'm seeing the deadlock behavior described, and it's reproducible using our app and our test data set. Could someone familiar with IndexWriter's synchronization code take another look at it? I'm sorry that I can't give you a simple reproduction test case. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-1239) IndexWriter deadlock when using ConcurrentMergeScheduler
[ https://issues.apache.org/jira/browse/LUCENE-1239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12579803#action_12579803 ] Michael McCandless commented on LUCENE-1239: Is it possible to call IndexWriter.setInfoStream(..), get the hang to happen, and post the resulting output? IndexWriter deadlock when using ConcurrentMergeScheduler Key: LUCENE-1239 URL: https://issues.apache.org/jira/browse/LUCENE-1239 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: 2.3.1 Environment: Compass 2.0.0M3 (nightly build #57), Lucene 2.3.1, Spring Framework 2.0.7.0 Reporter: Michael Lossos Assignee: Michael McCandless I'm trying to update our application from Compass 2.0.0M1 with Lucene 2.2 to Compass 2.0.0M3 (latest build) with Lucene 2.3.1. I'm holding all other things constant and only changing the Compass and Lucene jars. I'm recreating the search index for our data and seeing deadlock in Lucene's IndexWriter. It appears to be waiting on a signal from the merge thread. I've tried creating a simple reproduction case for this but to no avail. Doing the exact same steps with Compass 2.0.0M1 and Lucene 2.2 has no problems and recreates our search index. That is to say, it's not our code. In particular, the main thread performing the commit (Lucene document save) from Compass is calling Lucene's IndexWriter.optimize(). We're using Compass's ExecutorMergeScheduler to handle the merging, and it is calling IndexWriter.merge(). The main thread in IndexWriter.optimize() enters the wait() at the bottom of that method and is never notified. I can't tell if this is because optimizeMergesPending() is returning true incorrectly, or if IndexWriter.merge()'s notifyAll() is being called prematurely. Looking at the code, it doesn't seem possible for IndexWriter.optimize() to be waiting and miss a notifyAll(), and Lucene's IndexWriter.merge() was recently fixed to always call notifyAll() even on exceptions -- that is all the relevant IndexWriter code looks properly synchronized. Nevertheless, I'm seeing the deadlock behavior described, and it's reproducible using our app and our test data set. Could someone familiar with IndexWriter's synchronization code take another look at it? I'm sorry that I can't give you a simple reproduction test case. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-1239) IndexWriter deadlock when using ConcurrentMergeScheduler
[ https://issues.apache.org/jira/browse/LUCENE-1239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12579819#action_12579819 ] Michael McCandless commented on LUCENE-1239: If you replace Compass's ExecutorMergeScheduler with Lucene's ConcurrentMergeScheduler, does the deadlock still happen? One thing that makes me nervous about ExecutorMergeScheduler is this comment: // Compass: No need to execute continous merges, we simply reschedule another merge, if there is any, using executor manager and the corresponding change which is to schedule a new job instead of using the while loop to run new merges. If I understand that code correctly, the executorManager will re-call the run() method on MergeThread when there is a cascaded merge. But that won't do the right thing because it will run startMerge rather than the newly returned (cascaded) merge. That would then cause the deadlock because the cascaded merge is never issued. IndexWriter deadlock when using ConcurrentMergeScheduler Key: LUCENE-1239 URL: https://issues.apache.org/jira/browse/LUCENE-1239 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: 2.3.1 Environment: Compass 2.0.0M3 (nightly build #57), Lucene 2.3.1, Spring Framework 2.0.7.0 Reporter: Michael Lossos Assignee: Michael McCandless I'm trying to update our application from Compass 2.0.0M1 with Lucene 2.2 to Compass 2.0.0M3 (latest build) with Lucene 2.3.1. I'm holding all other things constant and only changing the Compass and Lucene jars. I'm recreating the search index for our data and seeing deadlock in Lucene's IndexWriter. It appears to be waiting on a signal from the merge thread. I've tried creating a simple reproduction case for this but to no avail. Doing the exact same steps with Compass 2.0.0M1 and Lucene 2.2 has no problems and recreates our search index. That is to say, it's not our code. In particular, the main thread performing the commit (Lucene document save) from Compass is calling Lucene's IndexWriter.optimize(). We're using Compass's ExecutorMergeScheduler to handle the merging, and it is calling IndexWriter.merge(). The main thread in IndexWriter.optimize() enters the wait() at the bottom of that method and is never notified. I can't tell if this is because optimizeMergesPending() is returning true incorrectly, or if IndexWriter.merge()'s notifyAll() is being called prematurely. Looking at the code, it doesn't seem possible for IndexWriter.optimize() to be waiting and miss a notifyAll(), and Lucene's IndexWriter.merge() was recently fixed to always call notifyAll() even on exceptions -- that is all the relevant IndexWriter code looks properly synchronized. Nevertheless, I'm seeing the deadlock behavior described, and it's reproducible using our app and our test data set. Could someone familiar with IndexWriter's synchronization code take another look at it? I'm sorry that I can't give you a simple reproduction test case. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Resolved: (LUCENE-1239) IndexWriter deadlock when using ConcurrentMergeScheduler
[ https://issues.apache.org/jira/browse/LUCENE-1239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Lossos resolved LUCENE-1239. Resolution: Invalid You're right, when I use Lucene's ConcurrentMergeScheduler, I don't see the deadlock. I'll bounce this back to Compass for fixing. Thank you Michael for looking into this! IndexWriter deadlock when using ConcurrentMergeScheduler Key: LUCENE-1239 URL: https://issues.apache.org/jira/browse/LUCENE-1239 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: 2.3.1 Environment: Compass 2.0.0M3 (nightly build #57), Lucene 2.3.1, Spring Framework 2.0.7.0 Reporter: Michael Lossos Assignee: Michael McCandless I'm trying to update our application from Compass 2.0.0M1 with Lucene 2.2 to Compass 2.0.0M3 (latest build) with Lucene 2.3.1. I'm holding all other things constant and only changing the Compass and Lucene jars. I'm recreating the search index for our data and seeing deadlock in Lucene's IndexWriter. It appears to be waiting on a signal from the merge thread. I've tried creating a simple reproduction case for this but to no avail. Doing the exact same steps with Compass 2.0.0M1 and Lucene 2.2 has no problems and recreates our search index. That is to say, it's not our code. In particular, the main thread performing the commit (Lucene document save) from Compass is calling Lucene's IndexWriter.optimize(). We're using Compass's ExecutorMergeScheduler to handle the merging, and it is calling IndexWriter.merge(). The main thread in IndexWriter.optimize() enters the wait() at the bottom of that method and is never notified. I can't tell if this is because optimizeMergesPending() is returning true incorrectly, or if IndexWriter.merge()'s notifyAll() is being called prematurely. Looking at the code, it doesn't seem possible for IndexWriter.optimize() to be waiting and miss a notifyAll(), and Lucene's IndexWriter.merge() was recently fixed to always call notifyAll() even on exceptions -- that is all the relevant IndexWriter code looks properly synchronized. Nevertheless, I'm seeing the deadlock behavior described, and it's reproducible using our app and our test data set. Could someone familiar with IndexWriter's synchronization code take another look at it? I'm sorry that I can't give you a simple reproduction test case. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-1239) IndexWriter deadlock when using ConcurrentMergeScheduler
[ https://issues.apache.org/jira/browse/LUCENE-1239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12579829#action_12579829 ] Michael McCandless commented on LUCENE-1239: Phew :) IndexWriter deadlock when using ConcurrentMergeScheduler Key: LUCENE-1239 URL: https://issues.apache.org/jira/browse/LUCENE-1239 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: 2.3.1 Environment: Compass 2.0.0M3 (nightly build #57), Lucene 2.3.1, Spring Framework 2.0.7.0 Reporter: Michael Lossos Assignee: Michael McCandless I'm trying to update our application from Compass 2.0.0M1 with Lucene 2.2 to Compass 2.0.0M3 (latest build) with Lucene 2.3.1. I'm holding all other things constant and only changing the Compass and Lucene jars. I'm recreating the search index for our data and seeing deadlock in Lucene's IndexWriter. It appears to be waiting on a signal from the merge thread. I've tried creating a simple reproduction case for this but to no avail. Doing the exact same steps with Compass 2.0.0M1 and Lucene 2.2 has no problems and recreates our search index. That is to say, it's not our code. In particular, the main thread performing the commit (Lucene document save) from Compass is calling Lucene's IndexWriter.optimize(). We're using Compass's ExecutorMergeScheduler to handle the merging, and it is calling IndexWriter.merge(). The main thread in IndexWriter.optimize() enters the wait() at the bottom of that method and is never notified. I can't tell if this is because optimizeMergesPending() is returning true incorrectly, or if IndexWriter.merge()'s notifyAll() is being called prematurely. Looking at the code, it doesn't seem possible for IndexWriter.optimize() to be waiting and miss a notifyAll(), and Lucene's IndexWriter.merge() was recently fixed to always call notifyAll() even on exceptions -- that is all the relevant IndexWriter code looks properly synchronized. Nevertheless, I'm seeing the deadlock behavior described, and it's reproducible using our app and our test data set. Could someone familiar with IndexWriter's synchronization code take another look at it? I'm sorry that I can't give you a simple reproduction test case. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
WordNet synonyms overhead
Hi, I am especially interessted in the WordNet synonym expansion that was discussed in the Lucene in Action book. Is there anyone here on the list who has experience with this approach? I'm curious about how much the synonym expansion will increase the size of an index. Are there any reliable figures of real-life applications? Kind regards, Harald - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: WordNet synonyms overhead
Harald Näger a écrit : Hi, I am especially interessted in the WordNet synonym expansion that was discussed in the Lucene in Action book. Is there anyone here on the list who has experience with this approach? I'm curious about how much the synonym expansion will increase the size of an index. Are there any reliable figures of real-life applications? Query expansion is better than index expansion. Faster use, smaller index, less noise when you search. M. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
SearchTimeout tests was Re: Build failed in Hudson: Lucene-trunk #408
Seems like we will need to explore some timeout values. This is always tricky when you get on Hudson, as Solr has similar problems with some of it's time based tests. On Mar 17, 2008, at 10:55 PM, Apache Hudson Server wrote: See http://hudson.zones.apache.org/hudson/job/Lucene-trunk/408/changes Changes: [doronc] fix formatting in CHANGES.txt to prevent perl errors in creating changes.html. [mikemccand] LUCENE-1233: correct javadocs [junit] [junit] Testsuite: org.apache.lucene.search.TestTimeLimitedCollector [junit] Tests run: 5, Failures: 1, Errors: 0, Time elapsed: 13.849 sec [junit] [junit] - Standard Error - [junit] Exception in thread Thread-97 junit.framework.AssertionFailedError: no hits found! [junit] at junit.framework.Assert.fail(Assert.java:47) [junit] at junit.framework.Assert.assertTrue(Assert.java:20) [junit] at org .apache .lucene .search .TestTimeLimitedCollector .doTestTimeout(TestTimeLimitedCollector.java:152) [junit] at org.apache.lucene.search.TestTimeLimitedCollector.access $100(TestTimeLimitedCollector.java:38) [junit] at org.apache.lucene.search.TestTimeLimitedCollector $1.run(TestTimeLimitedCollector.java:231) [junit] Exception in thread Thread-85 junit.framework.AssertionFailedError: no hits found! [junit] at junit.framework.Assert.fail(Assert.java:47) [junit] at junit.framework.Assert.assertTrue(Assert.java:20) [junit] at org .apache .lucene .search .TestTimeLimitedCollector .doTestTimeout(TestTimeLimitedCollector.java:152) [junit] at org.apache.lucene.search.TestTimeLimitedCollector.access $100(TestTimeLimitedCollector.java:38) [junit] at org.apache.lucene.search.TestTimeLimitedCollector $1.run(TestTimeLimitedCollector.java:231) [junit] - --- [junit] Testcase: testTimeoutMultiThreaded (org.apache.lucene.search.TestTimeLimitedCollector): FAILED [junit] some threads failed! expected:50 but was:48 [junit] junit.framework.AssertionFailedError: some threads failed! expected:50 but was:48 [junit] at org .apache .lucene .search .TestTimeLimitedCollector .doTestMultiThreads(TestTimeLimitedCollector.java:255) [junit] at org .apache .lucene .search .TestTimeLimitedCollector .testTimeoutMultiThreaded(TestTimeLimitedCollector.java:220) [junit] [junit] [junit] Test org.apache.lucene.search.TestTimeLimitedCollector FAILED - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-1239) IndexWriter deadlock when using ConcurrentMergeScheduler
[ https://issues.apache.org/jira/browse/LUCENE-1239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12579874#action_12579874 ] Shay Banon commented on LUCENE-1239: Yea, it looks like it is my bad, great catch!. While trying to create a better scheduler (at least in terms of reusing threads instead of creating them), I wondered if there is a chance that the current scheduler can be enhanced to support an extension point for that... . I can give such a refactoring a go if you think it make sense. IndexWriter deadlock when using ConcurrentMergeScheduler Key: LUCENE-1239 URL: https://issues.apache.org/jira/browse/LUCENE-1239 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: 2.3.1 Environment: Compass 2.0.0M3 (nightly build #57), Lucene 2.3.1, Spring Framework 2.0.7.0 Reporter: Michael Lossos Assignee: Michael McCandless I'm trying to update our application from Compass 2.0.0M1 with Lucene 2.2 to Compass 2.0.0M3 (latest build) with Lucene 2.3.1. I'm holding all other things constant and only changing the Compass and Lucene jars. I'm recreating the search index for our data and seeing deadlock in Lucene's IndexWriter. It appears to be waiting on a signal from the merge thread. I've tried creating a simple reproduction case for this but to no avail. Doing the exact same steps with Compass 2.0.0M1 and Lucene 2.2 has no problems and recreates our search index. That is to say, it's not our code. In particular, the main thread performing the commit (Lucene document save) from Compass is calling Lucene's IndexWriter.optimize(). We're using Compass's ExecutorMergeScheduler to handle the merging, and it is calling IndexWriter.merge(). The main thread in IndexWriter.optimize() enters the wait() at the bottom of that method and is never notified. I can't tell if this is because optimizeMergesPending() is returning true incorrectly, or if IndexWriter.merge()'s notifyAll() is being called prematurely. Looking at the code, it doesn't seem possible for IndexWriter.optimize() to be waiting and miss a notifyAll(), and Lucene's IndexWriter.merge() was recently fixed to always call notifyAll() even on exceptions -- that is all the relevant IndexWriter code looks properly synchronized. Nevertheless, I'm seeing the deadlock behavior described, and it's reproducible using our app and our test data set. Could someone familiar with IndexWriter's synchronization code take another look at it? I'm sorry that I can't give you a simple reproduction test case. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-1239) IndexWriter deadlock when using ConcurrentMergeScheduler
[ https://issues.apache.org/jira/browse/LUCENE-1239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12579909#action_12579909 ] Michael McCandless commented on LUCENE-1239: {quote} I wondered if there is a chance that the current scheduler can be enhanced to support an extension point for that... . I can give such a refactoring a go if you think it make sense. {quote} That would be much appreciated! You should start from the trunk version of CMS: it already has been somewhat factored to allow subclasses to override things, though I think maybe not quite enough for this case. IndexWriter deadlock when using ConcurrentMergeScheduler Key: LUCENE-1239 URL: https://issues.apache.org/jira/browse/LUCENE-1239 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: 2.3.1 Environment: Compass 2.0.0M3 (nightly build #57), Lucene 2.3.1, Spring Framework 2.0.7.0 Reporter: Michael Lossos Assignee: Michael McCandless I'm trying to update our application from Compass 2.0.0M1 with Lucene 2.2 to Compass 2.0.0M3 (latest build) with Lucene 2.3.1. I'm holding all other things constant and only changing the Compass and Lucene jars. I'm recreating the search index for our data and seeing deadlock in Lucene's IndexWriter. It appears to be waiting on a signal from the merge thread. I've tried creating a simple reproduction case for this but to no avail. Doing the exact same steps with Compass 2.0.0M1 and Lucene 2.2 has no problems and recreates our search index. That is to say, it's not our code. In particular, the main thread performing the commit (Lucene document save) from Compass is calling Lucene's IndexWriter.optimize(). We're using Compass's ExecutorMergeScheduler to handle the merging, and it is calling IndexWriter.merge(). The main thread in IndexWriter.optimize() enters the wait() at the bottom of that method and is never notified. I can't tell if this is because optimizeMergesPending() is returning true incorrectly, or if IndexWriter.merge()'s notifyAll() is being called prematurely. Looking at the code, it doesn't seem possible for IndexWriter.optimize() to be waiting and miss a notifyAll(), and Lucene's IndexWriter.merge() was recently fixed to always call notifyAll() even on exceptions -- that is all the relevant IndexWriter code looks properly synchronized. Nevertheless, I'm seeing the deadlock behavior described, and it's reproducible using our app and our test data set. Could someone familiar with IndexWriter's synchronization code take another look at it? I'm sorry that I can't give you a simple reproduction test case. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-400) NGramFilter -- construct n-grams from a TokenStream
[ https://issues.apache.org/jira/browse/LUCENE-400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12579933#action_12579933 ] Steven Rowe commented on LUCENE-400: re-ping, Otis, do you still plan to commit? NGramFilter -- construct n-grams from a TokenStream --- Key: LUCENE-400 URL: https://issues.apache.org/jira/browse/LUCENE-400 Project: Lucene - Java Issue Type: Improvement Components: Analysis Affects Versions: unspecified Environment: Operating System: All Platform: All Reporter: Sebastian Kirsch Assignee: Otis Gospodnetic Priority: Minor Fix For: 2.4 Attachments: LUCENE-400.patch, NGramAnalyzerWrapper.java, NGramAnalyzerWrapperTest.java, NGramFilter.java, NGramFilterTest.java This filter constructs n-grams (token combinations up to a fixed size, sometimes called shingles) from a token stream. The filter sets start offsets, end offsets and position increments, so highlighting and phrase queries should work. Position increments 1 in the input stream are replaced by filler tokens (tokens with termText _ and endOffset - startOffset = 0) in the output n-grams. (Position increments 1 in the input stream are usually caused by removing some tokens, eg. stopwords, from a stream.) The filter uses CircularFifoBuffer and UnboundedFifoBuffer from Apache Commons-Collections. Filter, test case and an analyzer are attached. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: SearchTimeout tests was Re: Build failed in Hudson: Lucene-trunk #408
Hi Grant, I looked at this already - see patch in LUCENE-1238 - I thought 'll give a few more hours for comments and then commit. I believe this should solve the problem, more details in JIRA. BR. Doron On Tue, Mar 18, 2008 at 3:39 PM, Grant Ingersoll [EMAIL PROTECTED] wrote: Seems like we will need to explore some timeout values. This is always tricky when you get on Hudson, as Solr has similar problems with some of it's time based tests. On Mar 17, 2008, at 10:55 PM, Apache Hudson Server wrote: See http://hudson.zones.apache.org/hudson/job/Lucene-trunk/408/changes Changes: [doronc] fix formatting in CHANGES.txt to prevent perl errors in creating changes.html. [mikemccand] LUCENE-1233: correct javadocs [junit] [junit] Testsuite: org.apache.lucene.search.TestTimeLimitedCollector [junit] Tests run: 5, Failures: 1, Errors: 0, Time elapsed: 13.849 sec [junit] [junit] - Standard Error - [junit] Exception in thread Thread-97 junit.framework.AssertionFailedError: no hits found! [junit]at junit.framework.Assert.fail(Assert.java:47) [junit]at junit.framework.Assert.assertTrue(Assert.java:20) [junit]at org .apache .lucene .search .TestTimeLimitedCollector .doTestTimeout(TestTimeLimitedCollector.java:152) [junit]at org.apache.lucene.search.TestTimeLimitedCollector.access $100(TestTimeLimitedCollector.java:38) [junit]at org.apache.lucene.search.TestTimeLimitedCollector $1.run(TestTimeLimitedCollector.java:231) [junit] Exception in thread Thread-85 junit.framework.AssertionFailedError: no hits found! [junit]at junit.framework.Assert.fail(Assert.java:47) [junit]at junit.framework.Assert.assertTrue(Assert.java:20) [junit]at org .apache .lucene .search .TestTimeLimitedCollector .doTestTimeout(TestTimeLimitedCollector.java:152) [junit]at org.apache.lucene.search.TestTimeLimitedCollector.access $100(TestTimeLimitedCollector.java:38) [junit]at org.apache.lucene.search.TestTimeLimitedCollector $1.run(TestTimeLimitedCollector.java:231) [junit] - --- [junit] Testcase: testTimeoutMultiThreaded (org.apache.lucene.search.TestTimeLimitedCollector): FAILED [junit] some threads failed! expected:50 but was:48 [junit] junit.framework.AssertionFailedError: some threads failed! expected:50 but was:48 [junit]at org .apache .lucene .search .TestTimeLimitedCollector .doTestMultiThreads(TestTimeLimitedCollector.java:255) [junit]at org .apache .lucene .search .TestTimeLimitedCollector .testTimeoutMultiThreaded(TestTimeLimitedCollector.java:220) [junit] [junit] [junit] Test org.apache.lucene.search.TestTimeLimitedCollector FAILED - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-1238) intermittent failures of TestTimeLimitedCollector.testTimeoutMultiThreaded in nightly tests
[ https://issues.apache.org/jira/browse/LUCENE-1238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12579960#action_12579960 ] Doron Cohen commented on LUCENE-1238: - I intend to commit this later today. intermittent failures of TestTimeLimitedCollector.testTimeoutMultiThreaded in nightly tests Key: LUCENE-1238 URL: https://issues.apache.org/jira/browse/LUCENE-1238 Project: Lucene - Java Issue Type: Bug Affects Versions: 2.4 Reporter: Doron Cohen Assignee: Doron Cohen Priority: Minor Attachments: LUCENE-1238.patch Occasionly TestTimeLimitedCollector.testTimeoutMultiThreaded fails. e.g. with this output: {noformat} [junit] - Standard Error - [junit] Exception in thread Thread-97 junit.framework.AssertionFailedError: no hits found! [junit] at junit.framework.Assert.fail(Assert.java:47) [junit] at junit.framework.Assert.assertTrue(Assert.java:20) [junit] at org.apache.lucene.search.TestTimeLimitedCollector.doTestTimeout(TestTimeLimitedCollector.java:152) [junit] at org.apache.lucene.search.TestTimeLimitedCollector.access$100(TestTimeLimitedCollector.java:38) [junit] at org.apache.lucene.search.TestTimeLimitedCollector$1.run(TestTimeLimitedCollector.java:231) [junit] Exception in thread Thread-85 junit.framework.AssertionFailedError: no hits found! [junit] at junit.framework.Assert.fail(Assert.java:47) [junit] at junit.framework.Assert.assertTrue(Assert.java:20) [junit] at org.apache.lucene.search.TestTimeLimitedCollector.doTestTimeout(TestTimeLimitedCollector.java:152) [junit] at org.apache.lucene.search.TestTimeLimitedCollector.access$100(TestTimeLimitedCollector.java:38) [junit] at org.apache.lucene.search.TestTimeLimitedCollector$1.run(TestTimeLimitedCollector.java:231) [junit] - --- [junit] Testcase: testTimeoutMultiThreaded(org.apache.lucene.search.TestTimeLimitedCollector): FAILED [junit] some threads failed! expected:50 but was:48 [junit] junit.framework.AssertionFailedError: some threads failed! expected:50 but was:48 [junit] at org.apache.lucene.search.TestTimeLimitedCollector.doTestMultiThreads(TestTimeLimitedCollector.java:255) [junit] at org.apache.lucene.search.TestTimeLimitedCollector.testTimeoutMultiThreaded(TestTimeLimitedCollector.java:220) [junit] {noformat} Problem either in test or in TimeLimitedCollector. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Resolved: (LUCENE-1238) intermittent failures of TestTimeLimitedCollector.testTimeoutMultiThreaded in nightly tests
[ https://issues.apache.org/jira/browse/LUCENE-1238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doron Cohen resolved LUCENE-1238. - Resolution: Fixed Lucene Fields: [Patch Available] (was: [Patch Available, New]) Committed. intermittent failures of TestTimeLimitedCollector.testTimeoutMultiThreaded in nightly tests Key: LUCENE-1238 URL: https://issues.apache.org/jira/browse/LUCENE-1238 Project: Lucene - Java Issue Type: Bug Affects Versions: 2.4 Reporter: Doron Cohen Assignee: Doron Cohen Priority: Minor Attachments: LUCENE-1238.patch Occasionly TestTimeLimitedCollector.testTimeoutMultiThreaded fails. e.g. with this output: {noformat} [junit] - Standard Error - [junit] Exception in thread Thread-97 junit.framework.AssertionFailedError: no hits found! [junit] at junit.framework.Assert.fail(Assert.java:47) [junit] at junit.framework.Assert.assertTrue(Assert.java:20) [junit] at org.apache.lucene.search.TestTimeLimitedCollector.doTestTimeout(TestTimeLimitedCollector.java:152) [junit] at org.apache.lucene.search.TestTimeLimitedCollector.access$100(TestTimeLimitedCollector.java:38) [junit] at org.apache.lucene.search.TestTimeLimitedCollector$1.run(TestTimeLimitedCollector.java:231) [junit] Exception in thread Thread-85 junit.framework.AssertionFailedError: no hits found! [junit] at junit.framework.Assert.fail(Assert.java:47) [junit] at junit.framework.Assert.assertTrue(Assert.java:20) [junit] at org.apache.lucene.search.TestTimeLimitedCollector.doTestTimeout(TestTimeLimitedCollector.java:152) [junit] at org.apache.lucene.search.TestTimeLimitedCollector.access$100(TestTimeLimitedCollector.java:38) [junit] at org.apache.lucene.search.TestTimeLimitedCollector$1.run(TestTimeLimitedCollector.java:231) [junit] - --- [junit] Testcase: testTimeoutMultiThreaded(org.apache.lucene.search.TestTimeLimitedCollector): FAILED [junit] some threads failed! expected:50 but was:48 [junit] junit.framework.AssertionFailedError: some threads failed! expected:50 but was:48 [junit] at org.apache.lucene.search.TestTimeLimitedCollector.doTestMultiThreads(TestTimeLimitedCollector.java:255) [junit] at org.apache.lucene.search.TestTimeLimitedCollector.testTimeoutMultiThreaded(TestTimeLimitedCollector.java:220) [junit] {noformat} Problem either in test or in TimeLimitedCollector. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Created: (LUCENE-1240) TermsFilter: reuse TermDocs
TermsFilter: reuse TermDocs --- Key: LUCENE-1240 URL: https://issues.apache.org/jira/browse/LUCENE-1240 Project: Lucene - Java Issue Type: Improvement Components: Search Affects Versions: 2.3.1 Reporter: Trejkaz TermsFilter currently calls termDocs(Term) once per term in the TermsFilter. If we sort the terms it's filtering on, this can be optimised to call termDocs() once and then skip(Term) once per term, which should significantly speed up this filter. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-1240) TermsFilter: reuse TermDocs
[ https://issues.apache.org/jira/browse/LUCENE-1240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Trejkaz updated LUCENE-1240: Lucene Fields: [New, Patch Available] (was: [New]) TermsFilter: reuse TermDocs --- Key: LUCENE-1240 URL: https://issues.apache.org/jira/browse/LUCENE-1240 Project: Lucene - Java Issue Type: Improvement Components: Search Affects Versions: 2.3.1 Reporter: Trejkaz Attachments: terms-filter.patch TermsFilter currently calls termDocs(Term) once per term in the TermsFilter. If we sort the terms it's filtering on, this can be optimised to call termDocs() once and then skip(Term) once per term, which should significantly speed up this filter. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Resolved: (LUCENE-1240) TermsFilter: reuse TermDocs
[ https://issues.apache.org/jira/browse/LUCENE-1240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Harwood resolved LUCENE-1240. -- Resolution: Fixed Fix Version/s: 2.3.2 Lucene Fields: [New, Patch Available] (was: [Patch Available, New]) Committed this fix and added new Junit test as part of r638631 TermsFilter: reuse TermDocs --- Key: LUCENE-1240 URL: https://issues.apache.org/jira/browse/LUCENE-1240 Project: Lucene - Java Issue Type: Improvement Components: Search Affects Versions: 2.3.1 Reporter: Trejkaz Fix For: 2.3.2 Attachments: terms-filter.patch TermsFilter currently calls termDocs(Term) once per term in the TermsFilter. If we sort the terms it's filtering on, this can be optimised to call termDocs() once and then skip(Term) once per term, which should significantly speed up this filter. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Fieldable, AbstractField, Field
: Really, I think we could go just back to a single Field class instead : of the three classes Fieldable, AbstractField and Field. If we had : this then LUCENE-1219 would be easier to cleanly implement. It's probably worth reviewing the orriginal reasons why Fieldable and AbstractField were added... http://issues.apache.org/jira/browse/LUCENE-545 I'm not intimately familiar with most of this, but at it's core, the purpose seems primarily related to Fields in *returned* documents after a search has been performed, particularly relating to lazy loading -- so that alternate impls could be returned based on FieldSelector options. I'm not sure how much consideration was given to the impacts on future changes to the API of Documents/Fields being *indexed*. somwhere i wrote a nice long diatribe on how in my opinion the biggest flaw in Lucene's general API was the reuse of Document and Field for two radically differnet purposes, such that half the methods in each class are meaningless in the other half of the contexts they are used for... i can't find it, but here's a less angry version of the same sentiment, pls some followup discussion... http://www.nabble.com/-jira--Created%3A-%28LUCENE-778%29-Allow-overriding-a-Document-to8406796.html https://issues.apache.org/jira/browse/LUCENE-778 (note that parallel disscussion occured both in email replies and in Jira comments, they are both worth reading) All of this is fixable in Lucene 3.0, where we will be free to change the API; but in the meantime, the fact that 2.3 uses an interface means we are stuck with supporting it without changing it in 2.4 since right now clients can implement their own Fieldable impl and then pass it to Document.add(Fieldable) before indexing the doc. (things would be a lot easier if the old Document.add(Field) has been left alone and document as being explicitly for *indexing* docs, while a new method was used for Documents being returned by searches ... but that's water under the bridge) The best short term approach I can think of for addressing LUCENE-1219 in 2.4: 1) list the new methods in a new interface that extends Fieldable (ByteArrayReuseFieldable or something) 2) add the new methods to AbstractField so that it implements ByteArrayReuseFieldable 3) put an instanceof check for ByteArrayReuseFieldable in DocumentsWriter. It's not pretty, but it's backwards compatible. This reminds me of a slightly off topic idea that's been floating arround in the back of my head for a while relating to our backwards compatibility commitments and the issues of interfaces and abstract classes (which i haven't though through all the way, but i'm throwing it out there as long as we're talking about it) ... Committers tend to prefer abstract classes for extension points because it makes it easier to support backwards compatibility in the cases were we want to add methods to extendable APIs and the default behavior for these new methods is simple (or obvious delegation to existing methods) so that people who have writen custom impls can upgrade easily without needing to invest time in making changes. But abstract classes can be harder to mock when doing mock testing, and some developers would prefer interfaces that they can implement with their existing classes -- i suspect these people who would prefer interfaces are willing to invest the time to make changes to their impls when upgrading lucene if the interfaces were to change. Perhaps the solution is a middle ground: altering our APIs such that all extension points we advertise have both an abstract base class as well as an interface and all methods that take them as arguments use the interface name. then we relax our backcompat commitments such that we garuntee between minor releases that the interfaces won't change unless the corrisponding abstract base class changes to acocunt for it ... so if customers subclass the base class their code will continue to work, but if they implement the interface directly ignoring the base class they are on their own to ensure their code compiles against future minor versions. Like i said, i haven't thought it through completely, but at first glance it seems like it would give both commiters and lucene users a lot of extra flexibility, without sacrificing much in the way of compatibility commitments. they key would be in adopting it rigirously and religiously. -Hoss - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: TokenFilter question
: I was trying to apply both : org.apache.solr.analysis.WordDelimiterFilter and : org.apache.lucene.analysis.ngram.NGramTokenFilter. : : Can I achive this with lucene's TokenStream? Sure ... you just have to pick an ordering and wrap one arround the other. Solr does this anytime you define an analyzer using a tokenizer and a list of filters : While thinking about TokenFilters, I came to an idea that : the TokenStream should have a structured representation. I've thought about that once or twice over the years as well... it would make things like multiword synonyms a lot easier to deal with if instead of a TokenStream we could have a directed TokenGraph with a single start and a single end (ie: only one node with no incoming links and only one node with no outgoing links). But even if you had a graph based api for Analyzers to express the set of tokens found, what would the end product look like? what would the format be of an index that stored Term position information as graph connections (esentially 3 dimensional info) instead of simple numeric position (1 dimensional) ? could it be searched as quickly? Most of the time, things that I think would be easier with a TokenGraph are still feasible using judicious use of positionIncrement, slop, and artifical marker tokens ... with Payloads even more complex things should move into the realm of practical (but it's likely I'm putting Payloads on too much of a pedestal ... I've never actually tried using them for anything) -Hoss - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]