Solr-trunk - Build # 1350 - Still Failing
Build: https://hudson.apache.org/hudson/job/Solr-trunk/1350/ 1 tests failed. FAILED: org.apache.solr.handler.clustering.DistributedClusteringComponentTest.testDistribSearch Error Message: Some threads threw uncaught exceptions! Stack Trace: junit.framework.AssertionFailedError: Some threads threw uncaught exceptions! at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1104) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1042) at org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:499) at org.apache.solr.SolrTestCaseJ4.tearDown(SolrTestCaseJ4.java:92) at org.apache.solr.BaseDistributedSearchTestCase.tearDown(BaseDistributedSearchTestCase.java:144) Build Log (for compile errors): [...truncated 9643 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-2129) Provide a Solr module for dynamic metadata extraction/indexing with Apache UIMA
[ https://issues.apache.org/jira/browse/SOLR-2129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12974114#action_12974114 ] Tommaso Teofili commented on SOLR-2129: --- Hi Kamil, can you please take a look at your trunk/solr/contrib/uima does the lib folder exist? Can you find the jars in there? Let me know and thanks for your feedback Provide a Solr module for dynamic metadata extraction/indexing with Apache UIMA --- Key: SOLR-2129 URL: https://issues.apache.org/jira/browse/SOLR-2129 Project: Solr Issue Type: New Feature Reporter: Tommaso Teofili Assignee: Robert Muir Attachments: lib-jars.zip, SOLR-2129-asf-headers.patch, SOLR-2129-version2.patch, SOLR-2129-version3.patch, SOLR-2129.patch Provide components to enable Apache UIMA automatic metadata extraction to be exploited when indexing documents. The purpose of this is to get unstructured information inside a document and create structured metadata (as fields) to enrich each document. Basically this can be done with a custom UpdateRequestProcessor which triggers UIMA while indexing documents. The basic UIMA implementation of UpdateRequestProcessor extracts sentences (with a tokenizer and an hidden Markov model tagger), named entities, language, suggested category, keywords and concepts (exploiting external services from OpenCalais and AlchemyAPI). Such an implementation can be easily extended adding or selecting different UIMA analysis engines, both from UIMA repositories on the web or creating new ones from scratch. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Created: (LUCENE-2830) Use StringBuilder instead of StringBuffer in benchmark
Use StringBuilder instead of StringBuffer in benchmark -- Key: LUCENE-2830 URL: https://issues.apache.org/jira/browse/LUCENE-2830 Project: Lucene - Java Issue Type: Improvement Components: contrib/benchmark Reporter: Shai Erera Assignee: Shai Erera Priority: Minor Fix For: 3.1, 4.0 Minor change - use StringBuilder instead of StringBuffer in benchmark's code. We don't need the synchronization of StringBuffer in all the places that I've checked. The only place where it _could_ be a problem is in HtmlParser's API - one method accepts a StringBuffer and it's an interface. But I think it's ok to change benchmark's API, back-compat wise and so I'd like to either change it to accept a String, or remove the method altogether -- no code in benchmark uses it, and if anyone needs it, he can pass StringReader to the other method. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Lucene-Solr-tests-only-3.x - Build # 2822 - Still Failing
Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-3.x/2822/ 1 tests failed. FAILED: org.apache.solr.handler.clustering.DistributedClusteringComponentTest.testDistribSearch Error Message: Some threads threw uncaught exceptions! Stack Trace: junit.framework.AssertionFailedError: Some threads threw uncaught exceptions! at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:950) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:888) at org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:371) at org.apache.solr.SolrTestCaseJ4.tearDown(SolrTestCaseJ4.java:78) at org.apache.solr.BaseDistributedSearchTestCase.tearDown(BaseDistributedSearchTestCase.java:130) Build Log (for compile errors): [...truncated 10663 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2830) Use StringBuilder instead of StringBuffer in benchmark
[ https://issues.apache.org/jira/browse/LUCENE-2830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-2830: --- Attachment: LUCENE-2830.patch Patch replaces StringBuffer with StringBuilder. I did not yet remove the parse() method from HtmlParser - if people are ok with it, I'll remove it. For now, I changed the parameter to String. All tests pass. Use StringBuilder instead of StringBuffer in benchmark -- Key: LUCENE-2830 URL: https://issues.apache.org/jira/browse/LUCENE-2830 Project: Lucene - Java Issue Type: Improvement Components: contrib/benchmark Reporter: Shai Erera Assignee: Shai Erera Priority: Minor Fix For: 3.1, 4.0 Attachments: LUCENE-2830.patch Minor change - use StringBuilder instead of StringBuffer in benchmark's code. We don't need the synchronization of StringBuffer in all the places that I've checked. The only place where it _could_ be a problem is in HtmlParser's API - one method accepts a StringBuffer and it's an interface. But I think it's ok to change benchmark's API, back-compat wise and so I'd like to either change it to accept a String, or remove the method altogether -- no code in benchmark uses it, and if anyone needs it, he can pass StringReader to the other method. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Lucene-Solr-tests-only-trunk - Build # 2849 - Still Failing
Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/2849/ 1 tests failed. FAILED: org.apache.solr.handler.clustering.DistributedClusteringComponentTest.testDistribSearch Error Message: Some threads threw uncaught exceptions! Stack Trace: junit.framework.AssertionFailedError: Some threads threw uncaught exceptions! at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1104) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1042) at org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:499) at org.apache.solr.SolrTestCaseJ4.tearDown(SolrTestCaseJ4.java:92) at org.apache.solr.BaseDistributedSearchTestCase.tearDown(BaseDistributedSearchTestCase.java:144) Build Log (for compile errors): [...truncated 9739 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2694) MTQ rewrite + weight/scorer init should be single pass
[ https://issues.apache.org/jira/browse/LUCENE-2694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12974121#action_12974121 ] Simon Willnauer commented on LUCENE-2694: - {quote} I think instead of ReaderView we could change Weight.scorer API so that instead of receiving IndexReader reader, it receives a struct that has parent reader, sub reader, ord of that sub? It's easy to be back compat because we could just forward to prior scorer method with only the sub? {quote} Mike I am not sure if that helps us here. If you use this method you can not disambiguate between the set of readers that where used to create the PerReaderTermState and the once that have a certain ord assigned to it. Disambiguation would be more difficult if we do that. IMO sharing a ReaderView seems to be the best solution so far. I don't think we should bind it to an IR directly since users can easily build a ReaderView from a Composite Reader. Yet, for searching it would be nice to have a ReaderView on Seacher / IndexSearcher which can be triggered upon weight creation. That way we can also disambiguate between PerReaderTermState given to the TermQuery ctor when we create the weight so that if the view doesn' t match we either create a new PerReaderTermState or just don't use it for this weight. I thought about TermsEnum#ord() again. I don' t think we should really add it back though. Its really an implementation detail and folks that wanna use it should be aware of that and cast correctly. On the other hand I don't like to have the seek(ord) in TermsEnum either if we remove #ord(). I think we should remove it from the interface entirely though. simon MTQ rewrite + weight/scorer init should be single pass -- Key: LUCENE-2694 URL: https://issues.apache.org/jira/browse/LUCENE-2694 Project: Lucene - Java Issue Type: Improvement Components: Search Reporter: Michael McCandless Assignee: Simon Willnauer Fix For: 4.0 Attachments: LUCENE-2694-FTE.patch, LUCENE-2694.patch, LUCENE-2694.patch, LUCENE-2694.patch, LUCENE-2694.patch, LUCENE-2694.patch, LUCENE-2694.patch Spinoff of LUCENE-2690 (see the hacked patch on that issue)... Once we fix MTQ rewrite to be per-segment, we should take it further and make weight/scorer init also run in the same single pass as rewrite. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Lucene-Solr-tests-only-3.x - Build # 2823 - Still Failing
Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-3.x/2823/ 1 tests failed. FAILED: org.apache.solr.handler.clustering.DistributedClusteringComponentTest.testDistribSearch Error Message: Some threads threw uncaught exceptions! Stack Trace: junit.framework.AssertionFailedError: Some threads threw uncaught exceptions! at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:950) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:888) at org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:371) at org.apache.solr.SolrTestCaseJ4.tearDown(SolrTestCaseJ4.java:78) at org.apache.solr.BaseDistributedSearchTestCase.tearDown(BaseDistributedSearchTestCase.java:130) Build Log (for compile errors): [...truncated 10602 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Lucene-Solr-tests-only-trunk - Build # 2850 - Still Failing
Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/2850/ 1 tests failed. FAILED: org.apache.solr.handler.clustering.DistributedClusteringComponentTest.testDistribSearch Error Message: Some threads threw uncaught exceptions! Stack Trace: junit.framework.AssertionFailedError: Some threads threw uncaught exceptions! at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1104) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1042) at org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:499) at org.apache.solr.SolrTestCaseJ4.tearDown(SolrTestCaseJ4.java:92) at org.apache.solr.BaseDistributedSearchTestCase.tearDown(BaseDistributedSearchTestCase.java:144) Build Log (for compile errors): [...truncated 9665 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Lucene-Solr-tests-only-3.x - Build # 2824 - Still Failing
Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-3.x/2824/ 1 tests failed. FAILED: org.apache.solr.handler.clustering.DistributedClusteringComponentTest.testDistribSearch Error Message: Some threads threw uncaught exceptions! Stack Trace: junit.framework.AssertionFailedError: Some threads threw uncaught exceptions! at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:950) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:888) at org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:371) at org.apache.solr.SolrTestCaseJ4.tearDown(SolrTestCaseJ4.java:78) at org.apache.solr.BaseDistributedSearchTestCase.tearDown(BaseDistributedSearchTestCase.java:130) Build Log (for compile errors): [...truncated 10666 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Lucene-Solr-tests-only-trunk - Build # 2851 - Still Failing
Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/2851/ 1 tests failed. FAILED: org.apache.solr.handler.clustering.DistributedClusteringComponentTest.testDistribSearch Error Message: Some threads threw uncaught exceptions! Stack Trace: junit.framework.AssertionFailedError: Some threads threw uncaught exceptions! at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1104) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1042) at org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:499) at org.apache.solr.SolrTestCaseJ4.tearDown(SolrTestCaseJ4.java:92) at org.apache.solr.BaseDistributedSearchTestCase.tearDown(BaseDistributedSearchTestCase.java:144) Build Log (for compile errors): [...truncated 9870 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Lucene-Solr-tests-only-3.x - Build # 2825 - Still Failing
Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-3.x/2825/ 1 tests failed. FAILED: org.apache.solr.handler.clustering.DistributedClusteringComponentTest.testDistribSearch Error Message: Some threads threw uncaught exceptions! Stack Trace: junit.framework.AssertionFailedError: Some threads threw uncaught exceptions! at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:950) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:888) at org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:371) at org.apache.solr.SolrTestCaseJ4.tearDown(SolrTestCaseJ4.java:78) at org.apache.solr.BaseDistributedSearchTestCase.tearDown(BaseDistributedSearchTestCase.java:130) Build Log (for compile errors): [...truncated 10804 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Lucene-Solr-tests-only-trunk - Build # 2852 - Still Failing
Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/2852/ 1 tests failed. FAILED: org.apache.solr.handler.clustering.DistributedClusteringComponentTest.testDistribSearch Error Message: Some threads threw uncaught exceptions! Stack Trace: junit.framework.AssertionFailedError: Some threads threw uncaught exceptions! at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1104) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1042) at org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:499) at org.apache.solr.SolrTestCaseJ4.tearDown(SolrTestCaseJ4.java:92) at org.apache.solr.BaseDistributedSearchTestCase.tearDown(BaseDistributedSearchTestCase.java:144) Build Log (for compile errors): [...truncated 9787 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Lucene-Solr-tests-only-3.x - Build # 2826 - Still Failing
Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-3.x/2826/ 1 tests failed. FAILED: org.apache.solr.handler.clustering.DistributedClusteringComponentTest.testDistribSearch Error Message: Some threads threw uncaught exceptions! Stack Trace: junit.framework.AssertionFailedError: Some threads threw uncaught exceptions! at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:950) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:888) at org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:371) at org.apache.solr.SolrTestCaseJ4.tearDown(SolrTestCaseJ4.java:78) at org.apache.solr.BaseDistributedSearchTestCase.tearDown(BaseDistributedSearchTestCase.java:130) Build Log (for compile errors): [...truncated 10609 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Lucene-Solr-tests-only-trunk - Build # 2853 - Still Failing
Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/2853/ 1 tests failed. FAILED: org.apache.solr.handler.clustering.DistributedClusteringComponentTest.testDistribSearch Error Message: Some threads threw uncaught exceptions! Stack Trace: junit.framework.AssertionFailedError: Some threads threw uncaught exceptions! at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1104) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1042) at org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:499) at org.apache.solr.SolrTestCaseJ4.tearDown(SolrTestCaseJ4.java:92) at org.apache.solr.BaseDistributedSearchTestCase.tearDown(BaseDistributedSearchTestCase.java:144) Build Log (for compile errors): [...truncated 9781 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Lucene-Solr-tests-only-3.x - Build # 2827 - Still Failing
Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-3.x/2827/ 1 tests failed. FAILED: org.apache.solr.handler.clustering.DistributedClusteringComponentTest.testDistribSearch Error Message: Some threads threw uncaught exceptions! Stack Trace: junit.framework.AssertionFailedError: Some threads threw uncaught exceptions! at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:950) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:888) at org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:371) at org.apache.solr.SolrTestCaseJ4.tearDown(SolrTestCaseJ4.java:78) at org.apache.solr.BaseDistributedSearchTestCase.tearDown(BaseDistributedSearchTestCase.java:130) Build Log (for compile errors): [...truncated 10742 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Lucene-Solr-tests-only-3.x - Build # 2828 - Still Failing
Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-3.x/2828/ 1 tests failed. FAILED: org.apache.solr.handler.clustering.DistributedClusteringComponentTest.testDistribSearch Error Message: Some threads threw uncaught exceptions! Stack Trace: junit.framework.AssertionFailedError: Some threads threw uncaught exceptions! at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:950) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:888) at org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:371) at org.apache.solr.SolrTestCaseJ4.tearDown(SolrTestCaseJ4.java:78) at org.apache.solr.BaseDistributedSearchTestCase.tearDown(BaseDistributedSearchTestCase.java:130) Build Log (for compile errors): [...truncated 10697 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Lucene-Solr-tests-only-trunk - Build # 2855 - Failure
Build: https://hudson.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/2855/ 1 tests failed. REGRESSION: org.apache.solr.handler.clustering.DistributedClusteringComponentTest.testDistribSearch Error Message: Some threads threw uncaught exceptions! Stack Trace: junit.framework.AssertionFailedError: Some threads threw uncaught exceptions! at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1104) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1042) at org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:499) at org.apache.solr.SolrTestCaseJ4.tearDown(SolrTestCaseJ4.java:92) at org.apache.solr.BaseDistributedSearchTestCase.tearDown(BaseDistributedSearchTestCase.java:144) Build Log (for compile errors): [...truncated 9786 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Lucene-Solr-tests-only-trunk - Build # 2839 - Still Failing
On Wed, Dec 22, 2010 at 2:10 AM, Chris Hostetter hossman_luc...@fucit.org wrote: : NOTE: reproduce with: ant test -Dtestcase=DistributedClusteringComponentTest : -Dtestmethod=testDistribSearch : -Dtests.seed=4959909076277587079:-8952133138041211916 -Dtests.multiplier=3 : : But I couldn't reproduce it on my mac. It's failing consistently on both the trunk and 3x hudson jobs, for the past ~10 builds (as of right now) since you added the test, with a consistent SEVERE error in the logs -- i don't think it has anything to do with the random seed. I personally can't reproduce the failure on either trunk or 3x; regardless of wether i try to run just a single test, or all tests in parallel. This test fails always on my computer too... its not just hudson. I added an @Ignore until it can be resolved. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: svn commit: r1051872 - /lucene/dev/trunk/solr/contrib/clustering/src/test/java/org/apache/solr/handler/clustering/DistributedClusteringComponentTest.java
Thank you! Koji Sekiguchi from mobile On 2010/12/22, at 21:27, rm...@apache.org wrote: Author: rmuir Date: Wed Dec 22 12:27:06 2010 New Revision: 1051872 URL: http://svn.apache.org/viewvc?rev=1051872view=rev Log: SOLR-2282: disable failing test Modified: lucene/dev/trunk/solr/contrib/clustering/src/test/java/org/apache/solr/handler/clustering/DistributedClusteringComponentTest.java Modified: lucene/dev/trunk/solr/contrib/clustering/src/test/java/org/apache/solr/handler/clustering/DistributedClusteringComponentTest.java URL: http://svn.apache.org/viewvc/lucene/dev/trunk/solr/contrib/clustering/src/test/java/org/apache/solr/handler/clustering/DistributedClusteringComponentTest.java?rev=1051872r1=1051871r2=1051872view=diff == --- lucene/dev/trunk/solr/contrib/clustering/src/test/java/org/apache/solr/handler/clustering/DistributedClusteringComponentTest.java (original) +++ lucene/dev/trunk/solr/contrib/clustering/src/test/java/org/apache/solr/handler/clustering/DistributedClusteringComponentTest.java Wed Dec 22 12:27:06 2010 @@ -20,6 +20,9 @@ package org.apache.solr.handler.clusteri import org.apache.solr.BaseDistributedSearchTestCase; import org.apache.solr.common.params.CommonParams; +import org.junit.Ignore; + +...@ignore(FIXME: test fails on hudson) public class DistributedClusteringComponentTest extends BaseDistributedSearchTestCase { - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-1526) Client Side Tika integration
[ https://issues.apache.org/jira/browse/SOLR-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12974176#action_12974176 ] Jan Høydahl commented on SOLR-1526: --- I linked this issue to SOLR-1763, as they attempt to solve the same thing, on client vs server side. Instead of creating two solutions, we should base these two on same code base and config, so that it is easy to switch between them. Perhaps someone starts with server-side extraction but then want to optimize performance by going client-side. The switch should be intuitive. Thus, should we consider porting the whole UpdateProcessorChain to SolrJ? How cool would it be to choose whether to execute an UP on client or server side simply by configuration change? I realize that some UP's may depend on SolrCore or have other difficult dependencies, but it should be possible to work around, not? Client Side Tika integration Key: SOLR-1526 URL: https://issues.apache.org/jira/browse/SOLR-1526 Project: Solr Issue Type: New Feature Components: clients - java Reporter: Grant Ingersoll Priority: Minor Fix For: Next Often times it is cost prohibitive to send full, rich documents over the wire. The contrib/extraction library has server side integration with Tika, but it would be nice to have a client side implementation as well. It should support both metadata and content or just metadata. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Created: (SOLR-2293) SolrCloud distributed indexing
SolrCloud distributed indexing -- Key: SOLR-2293 URL: https://issues.apache.org/jira/browse/SOLR-2293 Project: Solr Issue Type: New Feature Components: SolrCloud Reporter: Jan Høydahl Add SolrCloud support for distributed indexing, as described in http://wiki.apache.org/solr/DistributedSearch#Distributed_Indexing and the Support user specified partitioning paragraph of http://wiki.apache.org/solr/SolrCloud#High_level_design_goals Currently, the client needs to decide what shard indexer to talk to for each document. Common partitioning strategies include has-based, date-based and custom. Solr should have the capability of accepting a document update on any of the nodes in a cluster, and perform partitioning and distribution of updates to correct shard, based on current ZK config. The ShardDistributionPolicy should be pluggable, with the most common provided out of the box. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2829) improve termquery pk lookup performance
[ https://issues.apache.org/jira/browse/LUCENE-2829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12974212#action_12974212 ] Yonik Seeley commented on LUCENE-2829: -- Why not keep the TermState cache and use it for all queries except MTQ, while using a different mechanism for MTQ to avoid trashing the cache? The cache has a number of advantages that may never be duplicated in a different type of API, including - actually cache frequently used terms across different requests - cache terms reused in the same request. term proximity boosting is an example: +united +states united states^10 improve termquery pk lookup performance - Key: LUCENE-2829 URL: https://issues.apache.org/jira/browse/LUCENE-2829 Project: Lucene - Java Issue Type: Improvement Components: Search Reporter: Robert Muir Attachments: LUCENE-2829.patch For things that are like primary keys and don't exist in some segments (worst case is primary/unique key that only exists in 1) we do wasted seeks. While LUCENE-2694 tries to solve some of this issue with TermState, I'm concerned we could every backport that to 3.1 for example. This is a simpler solution here just to solve this one problem in termquery... we could just revert it in trunk when we resolve LUCENE-2694, but I don't think we should leave things as they are in 3.x -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Issue Comment Edited: (LUCENE-2829) improve termquery pk lookup performance
[ https://issues.apache.org/jira/browse/LUCENE-2829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12974212#action_12974212 ] Yonik Seeley edited comment on LUCENE-2829 at 12/22/10 9:24 AM: Why not keep the TermState cache and use it for all queries except MTQ, while using a different mechanism for MTQ to avoid trashing the cache? The cache has a number of advantages that may never be duplicated in a different type of API, including - actually cache frequently used terms across different requests - cache terms reused in the same request. term proximity boosting is an example: +united +states united states^10 edit: and as robert previously pointed out, if we cached misses as well, then we could avoid needless seeks on segments that don't contain the term. was (Author: ysee...@gmail.com): Why not keep the TermState cache and use it for all queries except MTQ, while using a different mechanism for MTQ to avoid trashing the cache? The cache has a number of advantages that may never be duplicated in a different type of API, including - actually cache frequently used terms across different requests - cache terms reused in the same request. term proximity boosting is an example: +united +states united states^10 improve termquery pk lookup performance - Key: LUCENE-2829 URL: https://issues.apache.org/jira/browse/LUCENE-2829 Project: Lucene - Java Issue Type: Improvement Components: Search Reporter: Robert Muir Attachments: LUCENE-2829.patch For things that are like primary keys and don't exist in some segments (worst case is primary/unique key that only exists in 1) we do wasted seeks. While LUCENE-2694 tries to solve some of this issue with TermState, I'm concerned we could every backport that to 3.1 for example. This is a simpler solution here just to solve this one problem in termquery... we could just revert it in trunk when we resolve LUCENE-2694, but I don't think we should leave things as they are in 3.x -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2723) Speed up Lucene's low level bulk postings read API
[ https://issues.apache.org/jira/browse/LUCENE-2723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12974221#action_12974221 ] Yonik Seeley commented on LUCENE-2723: -- Should we keep MultiBulkPostingsEnum? Even when someone writes their code to work per-segment, not all IndexReader implementations may be able to provide segment-level readers. ParallelReader is one that can't currently? Speed up Lucene's low level bulk postings read API -- Key: LUCENE-2723 URL: https://issues.apache.org/jira/browse/LUCENE-2723 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 4.0 Attachments: LUCENE-2723-termscorer.patch, LUCENE-2723-termscorer.patch, LUCENE-2723-termscorer.patch, LUCENE-2723.patch, LUCENE-2723.patch, LUCENE-2723.patch, LUCENE-2723.patch, LUCENE-2723.patch, LUCENE-2723_bulkvint.patch, LUCENE-2723_facetPerSeg.patch, LUCENE-2723_facetPerSeg.patch, LUCENE-2723_openEnum.patch, LUCENE-2723_termscorer.patch, LUCENE-2723_wastedint.patch Spinoff from LUCENE-1410. The flex DocsEnum has a simple bulk-read API that reads the next chunk of docs/freqs. But it's a poor fit for intblock codecs like FOR/PFOR (from LUCENE-1410). This is not unlike sucking coffee through those tiny plastic coffee stirrers they hand out airplanes that, surprisingly, also happen to function as a straw. As a result we see no perf gain from using FOR/PFOR. I had hacked up a fix for this, described at in my blog post at http://chbits.blogspot.com/2010/08/lucene-performance-with-pfordelta-codec.html I'm opening this issue to get that work to a committable point. So... I've worked out a new bulk-read API to address performance bottleneck. It has some big changes over the current bulk-read API: * You can now also bulk-read positions (but not payloads), but, I have yet to cutover positional queries. * The buffer contains doc deltas, not absolute values, for docIDs and positions (freqs are absolute). * Deleted docs are not filtered out. * The doc freq buffers need not be aligned. For fixed intblock codecs (FOR/PFOR) they will be, but for varint codecs (Simple9/16, Group varint, etc.) they won't be. It's still a work in progress... -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2829) improve termquery pk lookup performance
[ https://issues.apache.org/jira/browse/LUCENE-2829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12974223#action_12974223 ] Robert Muir commented on LUCENE-2829: - bq. edit: and as robert previously pointed out, if we cached misses as well, then we could avoid needless seeks on segments that don't contain the term. True, this is a good idea, just a little tricker: * In trunk, we have TermsEnum.seek(BytesRef text, boolean useCache), defaulting to true. * FilteredTermsEnum passes false here, so the multitermqueries don't populate the cache with garbage while enumerating (eg foo*), only explicitly at the end with cacheTerm() (per-segment) for the ones that were actually accepted. They sum up their docFreq themselves to prevent the first wasted seek in TermQuery. * So this solution would make MTQ worse, as it would cause them to trash the caches in the second wasted seek (the docsenum) where they do not today, with negative entries for the segments where the term doesn't exist. Today they do this wasted seek, but they don't trash the cache here. The only solution to prevent that is the PerReaderTermState (or something equally complicated). * We would have to look at other places where negative entries would hurt, for example rebuilding spellcheck indexes uses this 'termExists()' method implemented with docFreq. So we would have to likely change spellcheck's code to use a TermsEnum and seek(term, false)... using a termsenum in parallel with the spellcheck dictionary would obviously be more efficient for the index-based spellcheck case (forget about caching) versus docFreq()'ing every term... *but* we cannot assume the spellcheck Dictionary is actually in term order, (imagine the File-based dictionary case), so we can't implement this today. On 3.x i think its slightly less complicated as there is already a hack in the cache to prevent sequential termsenums from trashing it (e.g. foo*), and pretty much all the MTQs just enumerate sequentially anyway... (except NRQ which doesn't enum many terms anyway, likely not a problem). But we would have to at least fix the spellcheck case there too I think. Not saying I don't like your idea... just saying there's more work to do it. improve termquery pk lookup performance - Key: LUCENE-2829 URL: https://issues.apache.org/jira/browse/LUCENE-2829 Project: Lucene - Java Issue Type: Improvement Components: Search Reporter: Robert Muir Attachments: LUCENE-2829.patch For things that are like primary keys and don't exist in some segments (worst case is primary/unique key that only exists in 1) we do wasted seeks. While LUCENE-2694 tries to solve some of this issue with TermState, I'm concerned we could every backport that to 3.1 for example. This is a simpler solution here just to solve this one problem in termquery... we could just revert it in trunk when we resolve LUCENE-2694, but I don't think we should leave things as they are in 3.x -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-2290) the termsInfosDivisor for readers opened by indexWriter should be configurable in Solr
[ https://issues.apache.org/jira/browse/SOLR-2290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12974224#action_12974224 ] Jason Rutherglen commented on SOLR-2290: I think it'll require creating a new sub-element of mainIndex and indexDefaults called perhaps indexWriterConfig? Because attributes such as unlockOnStartup and reopenReaders cannot be injected in, and we probably don't want to mix injected properties with non-injected properties? the termsInfosDivisor for readers opened by indexWriter should be configurable in Solr -- Key: SOLR-2290 URL: https://issues.apache.org/jira/browse/SOLR-2290 Project: Solr Issue Type: New Feature Reporter: Tom Burton-West Priority: Minor Solr allows users to set the termInfosIndexDivisor used by the indexReader during search time in solrconfig.xml, but not in the indexReader opened by the IndexWriter when indexing/merging. When dealing with an index with a large number of unique terms, setting the termInfosIndexDivisor at search time is helpful in reducing memory use. It would also be helpful in reducing memory use during indexing/merging if it was made configurable for indexReaders opened by indexWriter during indexing/merging. This thread contains some background: http://www.lucidimagination.com/search/document/b5c756a366e1a0d6/memory_use_during_merges_oom In the Lucene 3.x branch it looks like this is done in IndexWriterConfig.setReaderTermsIndexDivisor, although there is also this method signature in IndexWriter.java: IndexReader getReader(int termInfosIndexDivisor) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2829) improve termquery pk lookup performance
[ https://issues.apache.org/jira/browse/LUCENE-2829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12974229#action_12974229 ] Robert Muir commented on LUCENE-2829: - On further thought Yonik, your idea is really completely unrelated. We shouldn't be seeking to terms/relying upon the terms dictionary cache internally when we don't need to... whether or not its populated with negative entries for the more general case is unrelated, even if we go that route we shouldn't be lazy and rely upon that. improve termquery pk lookup performance - Key: LUCENE-2829 URL: https://issues.apache.org/jira/browse/LUCENE-2829 Project: Lucene - Java Issue Type: Improvement Components: Search Reporter: Robert Muir Attachments: LUCENE-2829.patch For things that are like primary keys and don't exist in some segments (worst case is primary/unique key that only exists in 1) we do wasted seeks. While LUCENE-2694 tries to solve some of this issue with TermState, I'm concerned we could every backport that to 3.1 for example. This is a simpler solution here just to solve this one problem in termquery... we could just revert it in trunk when we resolve LUCENE-2694, but I don't think we should leave things as they are in 3.x -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (SOLR-2275) Spaces around mm parameter in dismax configuration cause NumberFormatException
[ https://issues.apache.org/jira/browse/SOLR-2275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson updated SOLR-2275: - Attachment: SOLR-2275-3_x.patch Hoss: Thanks for committing the trunk, here's the patch for the current (22-Dec) 3_x branch. It's ready to apply as far as I can tell. All tests pass. Spaces around mm parameter in dismax configuration cause NumberFormatException -- Key: SOLR-2275 URL: https://issues.apache.org/jira/browse/SOLR-2275 Project: Solr Issue Type: Bug Components: Schema and Analysis Affects Versions: Next Reporter: Erick Erickson Assignee: Erick Erickson Priority: Minor Fix For: 4.0 Attachments: SOLR-2275-3_x.patch, SOLR-2275.patch Original Estimate: 2h Remaining Estimate: 2h Any whitespace around simple mm parameters in the configuration file produces a NumberFormatException at SolrPluginUtils.java:625. E.g. str 2 str. Adding whitespace in tests also causes this error to occur. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2312) Search on IndexWriter's RAM Buffer
[ https://issues.apache.org/jira/browse/LUCENE-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12974247#action_12974247 ] Jason Rutherglen commented on LUCENE-2312: -- bq. Obtaining a normal fieldcache entry should work the same on an RT reader as any other reader Yes. I'm still confused as to how DocValues fits into all of this. bq. TOVC should continue to work as it does today It should, otherwise there'll be performance considerations. The main proposal here is incrementally updating FC values and how to continue to use DocTermsIndex for non-RT readers mixed with DocTerms for RT readers, either in TOVC or somewhere else. Search on IndexWriter's RAM Buffer -- Key: LUCENE-2312 URL: https://issues.apache.org/jira/browse/LUCENE-2312 Project: Lucene - Java Issue Type: New Feature Components: Search Affects Versions: Realtime Branch Reporter: Jason Rutherglen Assignee: Michael Busch Fix For: Realtime Branch Attachments: LUCENE-2312-FC.patch, LUCENE-2312.patch In order to offer user's near realtime search, without incurring an indexing performance penalty, we can implement search on IndexWriter's RAM buffer. This is the buffer that is filled in RAM as documents are indexed. Currently the RAM buffer is flushed to the underlying directory (usually disk) before being made searchable. Todays Lucene based NRT systems must incur the cost of merging segments, which can slow indexing. Michael Busch has good suggestions regarding how to handle deletes using max doc ids. https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841923page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841923 The area that isn't fully fleshed out is the terms dictionary, which needs to be sorted prior to queries executing. Currently IW implements a specialized hash table. Michael B has a suggestion here: https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841915page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841915 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [jira] Commented: (LUCENENET-385) Searching string with Special charactor not working
It looks like you're going to have to build from source: https://svn.apache.org/repos/asf/lucene/lucene.net/tags/Lucene.Net_2_9_2/ Peter Mateja peter.mat...@gmail.com On Wed, Dec 22, 2010 at 4:22 AM, Abhilash C R (JIRA) j...@apache.orgwrote: [ https://issues.apache.org/jira/browse/LUCENENET-385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12974136#action_12974136] Abhilash C R commented on LUCENENET-385: Hi, How can download Lucene.Net 2.9.2? I couldnt do it from the website. Please guide me. thanks Abhilash Searching string with Special charactor not working --- Key: LUCENENET-385 URL: https://issues.apache.org/jira/browse/LUCENENET-385 Project: Lucene.Net Issue Type: Task Environment: .NET Framework 2.0+, C#.NET, ASP.NET, Webservices Reporter: Abhilash C R I have came acroos an issue with search option in our application which uses Lucene.Net 2.0 version. The scenario is if I try search a text TestTest (it is actually TestTest.doc, which is trying to search), it returns 0 hits. While debugging I could see that the line which wrote to Parse the query is giving the problem, Here is the error line code: Query q=null; q = new global::Lucene.Net.QueryParsers.QueryParser(content, new StandardAnalyzer()).Parse(query); The variable query at above point contains as this: (title:(TestTest) shorttitle:(TestTest) content:(TestTest) keywords:(TestTest) description:(TestTest) ) and q will get as this: title:test test shorttitle:test test content:test test keywords:test test description:test test And hence the hit length will be 0 at IndexSearcher searcher = new IndexSearcher(indexPath); Hits hits = searcher.Search(q); I tried adding\ before , tried escape, tried enclosing the text in a but all result the same outcome. Could anyone please hlep me with any fix to it? If require I can post the full code here. Hope to hear from Lucene.Net. Many thanks Abhilash -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (LUCENE-2312) Search on IndexWriter's RAM Buffer
[ https://issues.apache.org/jira/browse/LUCENE-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12974260#action_12974260 ] Simon Willnauer commented on LUCENE-2312: - bq. Yes. I'm still confused as to how DocValues fits into all of this. DocValues == column stride fields does that help ? simon Search on IndexWriter's RAM Buffer -- Key: LUCENE-2312 URL: https://issues.apache.org/jira/browse/LUCENE-2312 Project: Lucene - Java Issue Type: New Feature Components: Search Affects Versions: Realtime Branch Reporter: Jason Rutherglen Assignee: Michael Busch Fix For: Realtime Branch Attachments: LUCENE-2312-FC.patch, LUCENE-2312.patch In order to offer user's near realtime search, without incurring an indexing performance penalty, we can implement search on IndexWriter's RAM buffer. This is the buffer that is filled in RAM as documents are indexed. Currently the RAM buffer is flushed to the underlying directory (usually disk) before being made searchable. Todays Lucene based NRT systems must incur the cost of merging segments, which can slow indexing. Michael Busch has good suggestions regarding how to handle deletes using max doc ids. https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841923page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841923 The area that isn't fully fleshed out is the terms dictionary, which needs to be sorted prior to queries executing. Currently IW implements a specialized hash table. Michael B has a suggestion here: https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841915page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841915 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: strange problem of PForDelta decoder
Those are nice speedups! Did you use the 4.0 branch (ie trunk) or the bulkpostings branch for this test? Mike On Tue, Dec 21, 2010 at 9:59 PM, Li Li fancye...@gmail.com wrote: great improvement! I did a test in our data set. doc count is about 2M+ and index size after optimization is about 13.3GB(including fdt) it seems lucene4's index format is better than lucene2.9.3. and PFor give good results. Besides BlockEncoder for frq and pos. is there any other modification for lucene 4? decoder \ avg time single word(ms) and query(ms) or query(ms) VINT in lucene 2.9 11.2 36.5 38.6 VINT in lucene 4 branch 10.6 26.5 35.4 PFor in lucene 4 branch 8.1 22.5 30.7 2010/12/21 Li Li fancye...@gmail.com: OK we should have a look at that one still. We need to converge on a good default codec for 4.0. Fortunately it's trivial to take any int block encoder (fixed or variable block) and make a Lucene codec out of it! I suggests you not to use this one, I fixed dozens of bugs but it still failed when with random tests. it's codes is hand coded rather than generated by program. But we may learn something from it. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2829) improve termquery pk lookup performance
[ https://issues.apache.org/jira/browse/LUCENE-2829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12974262#action_12974262 ] Michael McCandless commented on LUCENE-2829: bq. The cache has a number of advantages that may never be duplicated in a different type of API +1 -- I agree we should keep the TermState cache. It has benefits outside of re-use win a single query. But allowing term-lookup-intensive clients like MTQ to do their own caching (ie pulling the TermState from the enum) is also important. I think we need both. On caching misses... that makes me nervous. If there are apps out there that do alot of checking for terms that don't exist that can destroy the cache. The cache is a great safety net but I think our core queries should be good consumers, when possible, and hold their own TermState. improve termquery pk lookup performance - Key: LUCENE-2829 URL: https://issues.apache.org/jira/browse/LUCENE-2829 Project: Lucene - Java Issue Type: Improvement Components: Search Reporter: Robert Muir Attachments: LUCENE-2829.patch For things that are like primary keys and don't exist in some segments (worst case is primary/unique key that only exists in 1) we do wasted seeks. While LUCENE-2694 tries to solve some of this issue with TermState, I'm concerned we could every backport that to 3.1 for example. This is a simpler solution here just to solve this one problem in termquery... we could just revert it in trunk when we resolve LUCENE-2694, but I don't think we should leave things as they are in 3.x -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2312) Search on IndexWriter's RAM Buffer
[ https://issues.apache.org/jira/browse/LUCENE-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12974264#action_12974264 ] Jason Rutherglen commented on LUCENE-2312: -- bq. DocValues == column stride fields Ok, that makes sense! I'm going to leave this alone for now, however I agree that ideally we'd leave TOVC alone and at a higher level intermix the ord and non-ord doc terms. It's hard to immediately determine how that'd work given the slot concept, which seems to be an ord or value per reader that's directly comparable? Is there an example of mixing multiple comparators for a given field? Search on IndexWriter's RAM Buffer -- Key: LUCENE-2312 URL: https://issues.apache.org/jira/browse/LUCENE-2312 Project: Lucene - Java Issue Type: New Feature Components: Search Affects Versions: Realtime Branch Reporter: Jason Rutherglen Assignee: Michael Busch Fix For: Realtime Branch Attachments: LUCENE-2312-FC.patch, LUCENE-2312.patch In order to offer user's near realtime search, without incurring an indexing performance penalty, we can implement search on IndexWriter's RAM buffer. This is the buffer that is filled in RAM as documents are indexed. Currently the RAM buffer is flushed to the underlying directory (usually disk) before being made searchable. Todays Lucene based NRT systems must incur the cost of merging segments, which can slow indexing. Michael Busch has good suggestions regarding how to handle deletes using max doc ids. https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841923page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841923 The area that isn't fully fleshed out is the terms dictionary, which needs to be sorted prior to queries executing. Currently IW implements a specialized hash table. Michael B has a suggestion here: https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841915page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841915 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2694) MTQ rewrite + weight/scorer init should be single pass
[ https://issues.apache.org/jira/browse/LUCENE-2694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12974265#action_12974265 ] Michael McCandless commented on LUCENE-2694: bq. after all I think this must be done in a different issue though +1 If, when we now pass a naked IndexReader (eg to Weight.scorer, Weight.explain, Filter.getDocIdSet) we replace that with a ReaderContext which has reader, its parent, and its ord, then this precursor makes both TermState (this issue) and the awesome PK speedup (LUCENE-2829) much simpler. And I agree we should break it out as its own issue. It's good to do that as its own issue since that's a rote API cutover -- we are passing a struct instead of a naked reader, but otherwise no change. This also lets us solve cases where the Filter needs the full context, eg LUCENE-2348. Also, with this I think we should sharpen in the jdocs that when you call Query.rewrite the returned query must be searched only against he same reader you rewrote against. Similarly when you create a Weight, it should only be used against the same Searcher used to create it from a Query. MTQ rewrite + weight/scorer init should be single pass -- Key: LUCENE-2694 URL: https://issues.apache.org/jira/browse/LUCENE-2694 Project: Lucene - Java Issue Type: Improvement Components: Search Reporter: Michael McCandless Assignee: Simon Willnauer Fix For: 4.0 Attachments: LUCENE-2694-FTE.patch, LUCENE-2694.patch, LUCENE-2694.patch, LUCENE-2694.patch, LUCENE-2694.patch, LUCENE-2694.patch, LUCENE-2694.patch Spinoff of LUCENE-2690 (see the hacked patch on that issue)... Once we fix MTQ rewrite to be per-segment, we should take it further and make weight/scorer init also run in the same single pass as rewrite. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2723) Speed up Lucene's low level bulk postings read API
[ https://issues.apache.org/jira/browse/LUCENE-2723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12974272#action_12974272 ] Michael McCandless commented on LUCENE-2723: bq. Should we keep MultiBulkPostingsEnum? I think we have to keep it. EG if someone makes a SlowMultiReaderWrapper and then run searches on it... bq. ParallelReader is one that can't currently? ParallelReader is a tricky one. If your ParallelReader only contains SegmentReaders (and eg you make a MultiReader on top), then everything's great, because ParallelReader dispatches by field to a unique SegmentReader. But if instead you make a ParallelReader whose child readers are themselves MultiReaders, then, yes it's basically the same as wrapping all of these subs in a SlowMultiReaderWrapper. Speed up Lucene's low level bulk postings read API -- Key: LUCENE-2723 URL: https://issues.apache.org/jira/browse/LUCENE-2723 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 4.0 Attachments: LUCENE-2723-termscorer.patch, LUCENE-2723-termscorer.patch, LUCENE-2723-termscorer.patch, LUCENE-2723.patch, LUCENE-2723.patch, LUCENE-2723.patch, LUCENE-2723.patch, LUCENE-2723.patch, LUCENE-2723_bulkvint.patch, LUCENE-2723_facetPerSeg.patch, LUCENE-2723_facetPerSeg.patch, LUCENE-2723_openEnum.patch, LUCENE-2723_termscorer.patch, LUCENE-2723_wastedint.patch Spinoff from LUCENE-1410. The flex DocsEnum has a simple bulk-read API that reads the next chunk of docs/freqs. But it's a poor fit for intblock codecs like FOR/PFOR (from LUCENE-1410). This is not unlike sucking coffee through those tiny plastic coffee stirrers they hand out airplanes that, surprisingly, also happen to function as a straw. As a result we see no perf gain from using FOR/PFOR. I had hacked up a fix for this, described at in my blog post at http://chbits.blogspot.com/2010/08/lucene-performance-with-pfordelta-codec.html I'm opening this issue to get that work to a committable point. So... I've worked out a new bulk-read API to address performance bottleneck. It has some big changes over the current bulk-read API: * You can now also bulk-read positions (but not payloads), but, I have yet to cutover positional queries. * The buffer contains doc deltas, not absolute values, for docIDs and positions (freqs are absolute). * Deleted docs are not filtered out. * The doc freq buffers need not be aligned. For fixed intblock codecs (FOR/PFOR) they will be, but for varint codecs (Simple9/16, Group varint, etc.) they won't be. It's still a work in progress... -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2829) improve termquery pk lookup performance
[ https://issues.apache.org/jira/browse/LUCENE-2829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12974274#action_12974274 ] Earwin Burrfoot commented on LUCENE-2829: - Term lookup misses can be alleviated by a simple Bloom Filter. No caching misses required, helps both PK and near-PK queries. improve termquery pk lookup performance - Key: LUCENE-2829 URL: https://issues.apache.org/jira/browse/LUCENE-2829 Project: Lucene - Java Issue Type: Improvement Components: Search Reporter: Robert Muir Attachments: LUCENE-2829.patch For things that are like primary keys and don't exist in some segments (worst case is primary/unique key that only exists in 1) we do wasted seeks. While LUCENE-2694 tries to solve some of this issue with TermState, I'm concerned we could every backport that to 3.1 for example. This is a simpler solution here just to solve this one problem in termquery... we could just revert it in trunk when we resolve LUCENE-2694, but I don't think we should leave things as they are in 3.x -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2391) Spellchecker uses default IW mergefactor/ramMB settings of 300/10
[ https://issues.apache.org/jira/browse/LUCENE-2391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-2391: Attachment: LUCENE-2391.patch Here's a patch to speed up the spellchecker build. * i wired the default RamMB to IWConfig's default * i didnt mess with the mergefactor for now (because the default is still to optimize) * but i added an additional 'optimize' parameter so you can update your spellcheck index without re-optimizing. * when updating, i changed the exists() to work per-segment, so its reasonable if the index isn't optimized. * the exists() check now bypasses the term dictionary cache, which is stupid and just slows it down. * we don't do any of the exists() logic if the index is empty (this is the case for i think solr which completely rebuilds and doesnt do an incremental update) * the startXXX, endXXX, and word fields can only contain one term per document. I turned off norms, positions, and tf for these. * the gramXXX field is unchanged, i didnt want to change spellchecker scoring in any way. But we could reasonably in the future likely omit norms here too since i think its gonna be very short. {noformat} trunk: scratch build time: 229,803ms index size: 214,322,200 bytes no-op update time (updating but there is no new terms to add): 4,619ms patch: scratch build time: 99,214ms index size: 177,781,273 bytes no-op update time: 2,504ms {noformat} i still left the optimize default on, but really i think for most users (e.g. solr) they should set mergefactor to be maybe a bit more reasonable, set optimize to false, and the scratch build is then must faster (60,000 ms), but the no-op update time is heavier (eg 16,000ms). Still, if you are rebuilding on every commit for smallish updates something like 20-30 seconds is a lot better than 100seconds, but for now I kept the defaults as is (optimizing every time). Spellchecker uses default IW mergefactor/ramMB settings of 300/10 - Key: LUCENE-2391 URL: https://issues.apache.org/jira/browse/LUCENE-2391 Project: Lucene - Java Issue Type: Improvement Components: contrib/spellchecker Reporter: Mark Miller Priority: Trivial Attachments: LUCENE-2391.patch These settings seem odd - I'd like to investigate what makes most sense here. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1410) PFOR implementation
[ https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12974303#action_12974303 ] Paul Elschot commented on LUCENE-1410: -- bq. ... is it possible to encode # of exception bytes in header? In the first implementation the start index of the exception chain is in the header (5 or 6 bits iirc). In the second implementation (by Hoa Yan) there is no exception chain, so the number of exceptions must somehow be encoded in the header. That means encoding the # exception bytes in the header would be easier in the second implementation, but it is also possible in the first one. I would expect that a few bits for the number of encoded integers would also be added in the header (think 32, 64, 128...). The number of frame bits takes 5 bits. That means that there are about 2 bytes unused in the header now, and I'd expect 1 byte to be enough to encode the number of bytes for the exceptions. For example a bad case in the first implementation of 10 exceptions of 4 bytes means 40 bytes data, that fits in 6 bits, the same bad case in the second implementation would also need to store the indexes of the exceptions in 10*5 bits, totalling 90 bytes that can be encoded in 7 bits. However, I don't know what the worst case # exceptions is. (This gets into vsencoding...) For the moment I'll just leave this unchanged and get the tests working on the current first implementation. PFOR implementation --- Key: LUCENE-1410 URL: https://issues.apache.org/jira/browse/LUCENE-1410 Project: Lucene - Java Issue Type: New Feature Components: Index Reporter: Paul Elschot Priority: Minor Fix For: Bulk Postings branch Attachments: autogen.tgz, for-summary.txt, LUCENE-1410-codecs.tar.bz2, LUCENE-1410.patch, LUCENE-1410.patch, LUCENE-1410.patch, LUCENE-1410.patch, LUCENE-1410b.patch, LUCENE-1410c.patch, LUCENE-1410d.patch, LUCENE-1410e.patch, TermQueryTests.tgz, TestPFor2.java, TestPFor2.java, TestPFor2.java Original Estimate: 21840h Remaining Estimate: 21840h Implementation of Patched Frame of Reference. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1410) PFOR implementation
[ https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12974307#action_12974307 ] Paul Elschot commented on LUCENE-1410: -- bq. I've tested everything I can think of and it seems this nio ByteBuffer/IntBuffer approach is always the fastest ... Did you also test without a copy (without the readbytes() call) into the underlying byte array for the IntBuffer? That might be even faster, and it could be possible when using for example a BufferedIndexInput or an MMapDirectory. For decent buffer.get() speed the starting byte would need to be aligned at an int border. PFOR implementation --- Key: LUCENE-1410 URL: https://issues.apache.org/jira/browse/LUCENE-1410 Project: Lucene - Java Issue Type: New Feature Components: Index Reporter: Paul Elschot Priority: Minor Fix For: Bulk Postings branch Attachments: autogen.tgz, for-summary.txt, LUCENE-1410-codecs.tar.bz2, LUCENE-1410.patch, LUCENE-1410.patch, LUCENE-1410.patch, LUCENE-1410.patch, LUCENE-1410b.patch, LUCENE-1410c.patch, LUCENE-1410d.patch, LUCENE-1410e.patch, TermQueryTests.tgz, TestPFor2.java, TestPFor2.java, TestPFor2.java Original Estimate: 21840h Remaining Estimate: 21840h Implementation of Patched Frame of Reference. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2829) improve termquery pk lookup performance
[ https://issues.apache.org/jira/browse/LUCENE-2829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12974310#action_12974310 ] Robert Muir commented on LUCENE-2829: - Bloom filters and negative caches are nice, but please open separate issues! I am starting to feel like its mandatory to refactor the entirety of lucene to make a single incremental improvement. So, I'd like to proceed with this issue as-is, to make TermWeight explicitly do less seeks. improve termquery pk lookup performance - Key: LUCENE-2829 URL: https://issues.apache.org/jira/browse/LUCENE-2829 Project: Lucene - Java Issue Type: Improvement Components: Search Reporter: Robert Muir Attachments: LUCENE-2829.patch For things that are like primary keys and don't exist in some segments (worst case is primary/unique key that only exists in 1) we do wasted seeks. While LUCENE-2694 tries to solve some of this issue with TermState, I'm concerned we could every backport that to 3.1 for example. This is a simpler solution here just to solve this one problem in termquery... we could just revert it in trunk when we resolve LUCENE-2694, but I don't think we should leave things as they are in 3.x -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Reopened: (SOLR-2282) Distributed Support for Search Result Clustering
[ https://issues.apache.org/jira/browse/SOLR-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man reopened SOLR-2282: Reopening issue. The new test added by this issue... org.apache.solr.handler.clustering.DistributedClusteringComponentTest.testDistribSearch ...was failing consistently on both hudson, and robert muir's machine, so rmuir disabled it with @Ignore. we should get to the bottom of this before resolving error from hudson... {quote} Error Message Some threads threw uncaught exceptions! Stacktrace junit.framework.AssertionFailedError: Some threads threw uncaught exceptions! at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:950) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:888) at org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:371) at org.apache.solr.SolrTestCaseJ4.tearDown(SolrTestCaseJ4.java:78) at org.apache.solr.BaseDistributedSearchTestCase.tearDown(BaseDistributedSearchTestCase.java:130) Standard Error 22-Dec-2010 6:27:38 AM org.apache.solr.common.SolrException log SEVERE: java.lang.Error: Error: could not match input at org.carrot2.text.analysis.ExtendedWhitespaceTokenizerImpl.zzScanError(ExtendedWhitespaceTokenizerImpl.java:687) at org.carrot2.text.analysis.ExtendedWhitespaceTokenizerImpl.getNextToken(ExtendedWhitespaceTokenizerImpl.java:836) at org.carrot2.text.analysis.ExtendedWhitespaceTokenizer.nextToken(ExtendedWhitespaceTokenizer.java:46) at org.carrot2.text.preprocessing.Tokenizer.tokenize(Tokenizer.java:147) at org.carrot2.text.preprocessing.pipeline.CompletePreprocessingPipeline.preprocess(CompletePreprocessingPipeline.java:54) at org.carrot2.text.preprocessing.pipeline.BasicPreprocessingPipeline.preprocess(BasicPreprocessingPipeline.java:92) at org.carrot2.clustering.lingo.LingoClusteringAlgorithm.cluster(LingoClusteringAlgorithm.java:199) at org.carrot2.clustering.lingo.LingoClusteringAlgorithm.access$000(LingoClusteringAlgorithm.java:44) at org.carrot2.clustering.lingo.LingoClusteringAlgorithm$1.process(LingoClusteringAlgorithm.java:178) at org.carrot2.text.clustering.MultilingualClustering.clusterByLanguage(MultilingualClustering.java:222) at org.carrot2.text.clustering.MultilingualClustering.process(MultilingualClustering.java:110) at org.carrot2.clustering.lingo.LingoClusteringAlgorithm.process(LingoClusteringAlgorithm.java:171) at org.carrot2.core.ControllerUtils.performProcessing(ControllerUtils.java:101) at org.carrot2.core.Controller.process(Controller.java:287) at org.carrot2.core.Controller.process(Controller.java:180) at org.apache.solr.handler.clustering.carrot2.CarrotClusteringEngine.cluster(CarrotClusteringEngine.java:105) at org.apache.solr.handler.clustering.ClusteringComponent.finishStage(ClusteringComponent.java:171) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:296) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1358) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:341) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:244) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) NOTE: reproduce with: ant test -Dtestcase=DistributedClusteringComponentTest -Dtestmethod=testDistribSearch -Dtests.seed=41204997274180:6405396687385598457 -Dtests.multiplier=3 The following exceptions were thrown by threads: *** Thread: Thread-13 *** junit.framework.AssertionFailedError: .clusters.length:4!=5 at
[jira] Issue Comment Edited: (LUCENE-1410) PFOR implementation
[ https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12974303#action_12974303 ] Paul Elschot edited comment on LUCENE-1410 at 12/22/10 1:13 PM: bq. ... is it possible to encode # of exception bytes in header? In the first implementation the start index of the exception chain is in the header (5 or 6 bits iirc). In the second implementation (by Hoa Yan) there is no exception chain, so the number of exceptions must somehow be encoded in the header. That means encoding the # exception bytes in the header would be easier in the second implementation, but it is also possible in the first one. I would expect that a few bits for the number of encoded integers would also be added in the header (think 32, 64, 128...). The number of frame bits takes 5 bits. That means that there are about 2 bytes unused in the header now, and I'd expect 1 byte to be enough to encode the number of bytes for the exceptions. For example a bad case in the first implementation of 10 exceptions of 4 bytes means 40 bytes data, that fits in 6 bits, the same bad case in the second implementation would also need to store the indexes of the exceptions in 10*5 bits, for a total of about 48 bytes that can be still be encoded in 6 bits. However, I don't know what the worst case # exceptions is. (This gets into vsencoding...) For the moment I'll just leave this unchanged and get the tests working on the current first implementation. was (Author: paul.elsc...@xs4all.nl): bq. ... is it possible to encode # of exception bytes in header? In the first implementation the start index of the exception chain is in the header (5 or 6 bits iirc). In the second implementation (by Hoa Yan) there is no exception chain, so the number of exceptions must somehow be encoded in the header. That means encoding the # exception bytes in the header would be easier in the second implementation, but it is also possible in the first one. I would expect that a few bits for the number of encoded integers would also be added in the header (think 32, 64, 128...). The number of frame bits takes 5 bits. That means that there are about 2 bytes unused in the header now, and I'd expect 1 byte to be enough to encode the number of bytes for the exceptions. For example a bad case in the first implementation of 10 exceptions of 4 bytes means 40 bytes data, that fits in 6 bits, the same bad case in the second implementation would also need to store the indexes of the exceptions in 10*5 bits, totalling 90 bytes that can be encoded in 7 bits. However, I don't know what the worst case # exceptions is. (This gets into vsencoding...) For the moment I'll just leave this unchanged and get the tests working on the current first implementation. PFOR implementation --- Key: LUCENE-1410 URL: https://issues.apache.org/jira/browse/LUCENE-1410 Project: Lucene - Java Issue Type: New Feature Components: Index Reporter: Paul Elschot Priority: Minor Fix For: Bulk Postings branch Attachments: autogen.tgz, for-summary.txt, LUCENE-1410-codecs.tar.bz2, LUCENE-1410.patch, LUCENE-1410.patch, LUCENE-1410.patch, LUCENE-1410.patch, LUCENE-1410b.patch, LUCENE-1410c.patch, LUCENE-1410d.patch, LUCENE-1410e.patch, TermQueryTests.tgz, TestPFor2.java, TestPFor2.java, TestPFor2.java Original Estimate: 21840h Remaining Estimate: 21840h Implementation of Patched Frame of Reference. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-2282) Distributed Support for Search Result Clustering
[ https://issues.apache.org/jira/browse/SOLR-2282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12974314#action_12974314 ] Stanislaw Osinski commented on SOLR-2282: - This may be related to a concurrency bug we fixed in the latest (3.4.2) release of Carrot2. Tomorrow morning I can prepare a Carrot2 upgrade patch, which should hopefully fix the problem. Distributed Support for Search Result Clustering Key: SOLR-2282 URL: https://issues.apache.org/jira/browse/SOLR-2282 Project: Solr Issue Type: New Feature Components: contrib - Clustering Affects Versions: 1.4, 1.4.1 Reporter: Koji Sekiguchi Assignee: Koji Sekiguchi Priority: Minor Fix For: 3.1, 4.0 Attachments: SOLR-2282.patch, SOLR-2282.patch, SOLR-2282.patch, SOLR-2282.patch, SOLR-2282.patch Brad Giaccio contributed a patch for this in SOLR-769. I'd like to incorporate it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1410) PFOR implementation
[ https://issues.apache.org/jira/browse/LUCENE-1410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12974315#action_12974315 ] Robert Muir commented on LUCENE-1410: - {quote} Did you also test without a copy (without the readbytes() call) into the underlying byte array for the IntBuffer? That might be even faster, and it could be possible when using for example a BufferedIndexInput or an MMapDirectory. For decent buffer.get() speed the starting byte would need to be aligned at an int border. {quote} Yes, for the mmap case I tried the original dangerous hack, exposing in Intbuffer view of its internal mapped byte buffer. I also tried mmapindexinput keeping track of its own intbuffer view. we might be able to have some gains by allowing a directory to return an IntBufferIndexInput of some sort (separate from DataInput/IndexInput) that basically just positions an IntBuffer view (the default implementation would fill from an indexinput into a bytebuffer like we do now), but I haven't tested this across all the directories yet... it might help NIOFS though as it would bypass the double-buffering of BufferedIndexInput. For SimpleFS it would be the same, and for MMap i'm not very hopeful it would be better, but maybe not worse. if that worked maybe we could do the same with Long, for things like simple-8b (http://onlinelibrary.wiley.com/doi/10.1002/spe.948/abstract) PFOR implementation --- Key: LUCENE-1410 URL: https://issues.apache.org/jira/browse/LUCENE-1410 Project: Lucene - Java Issue Type: New Feature Components: Index Reporter: Paul Elschot Priority: Minor Fix For: Bulk Postings branch Attachments: autogen.tgz, for-summary.txt, LUCENE-1410-codecs.tar.bz2, LUCENE-1410.patch, LUCENE-1410.patch, LUCENE-1410.patch, LUCENE-1410.patch, LUCENE-1410b.patch, LUCENE-1410c.patch, LUCENE-1410d.patch, LUCENE-1410e.patch, TermQueryTests.tgz, TestPFor2.java, TestPFor2.java, TestPFor2.java Original Estimate: 21840h Remaining Estimate: 21840h Implementation of Patched Frame of Reference. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2830) Use StringBuilder instead of StringBuffer in benchmark
[ https://issues.apache.org/jira/browse/LUCENE-2830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-2830: --- Attachment: LUCENE-2830.patch Since parse(*, StringBuffer, *) is not used, and whoever wants to use it can use the Reader variant and pass new StringReader(), I removed it. I plan to commit tomorrow. Use StringBuilder instead of StringBuffer in benchmark -- Key: LUCENE-2830 URL: https://issues.apache.org/jira/browse/LUCENE-2830 Project: Lucene - Java Issue Type: Improvement Components: contrib/benchmark Reporter: Shai Erera Assignee: Shai Erera Priority: Minor Fix For: 3.1, 4.0 Attachments: LUCENE-2830.patch, LUCENE-2830.patch Minor change - use StringBuilder instead of StringBuffer in benchmark's code. We don't need the synchronization of StringBuffer in all the places that I've checked. The only place where it _could_ be a problem is in HtmlParser's API - one method accepts a StringBuffer and it's an interface. But I think it's ok to change benchmark's API, back-compat wise and so I'd like to either change it to accept a String, or remove the method altogether -- no code in benchmark uses it, and if anyone needs it, he can pass StringReader to the other method. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
LuceneTestCase.threadCleanup incorrectly reports left running threads
Hi I noticed that some tests report threads are left running, even when those tests never create and start a Thread. Digging deeper I found out that the tests report Signal Dispatcher and Attach handler as two threads that are left running. If I run the test from eclipse, then a ReaderThread and Signal Dispatcher are reported. ReaderThread belongs to JUnit framework and the other two are initiated by some framework, and definitely not from our tests. So I was thinking if instead of reporting those threads, we should inspect each running Thread's stacktrace and report it only if it contains an org.apache.lucene/solr package. Otherwise it cannot be started from our tests? What do you think? Shai
Re: LuceneTestCase.threadCleanup incorrectly reports left running threads
On Wed, Dec 22, 2010 at 2:14 PM, Shai Erera ser...@gmail.com wrote: Hi I noticed that some tests report threads are left running, even when those tests never create and start a Thread. Digging deeper I found out that the tests report Signal Dispatcher and Attach handler as two threads that are left running. If I run the test from eclipse, then a ReaderThread and Signal Dispatcher are reported. ReaderThread belongs to JUnit framework and the other two are initiated by some framework, and definitely not from our tests. So I was thinking if instead of reporting those threads, we should inspect each running Thread's stacktrace and report it only if it contains an org.apache.lucene/solr package. Otherwise it cannot be started from our tests? What do you think? are you running the tests from eclipse or something in this case (i think i've seen these from eclipse) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: LuceneTestCase.threadCleanup incorrectly reports left running threads
here is a (imperfect) patch for eclipse, can you try this? any threads running at this point are not our own. Index: lucene/src/test/org/apache/lucene/util/LuceneTestCase.java === --- lucene/src/test/org/apache/lucene/util/LuceneTestCase.java (revision 1051872) +++ lucene/src/test/org/apache/lucene/util/LuceneTestCase.java (working copy) @@ -522,6 +522,13 @@ // jvm-wide list of 'rogue threads' we found, so they only get reported once. private final static IdentityHashMapThread,Boolean rogueThreads = new IdentityHashMapThread,Boolean(); + static { +// just a hack for things like eclipse threads +for (Thread t : Thread.getAllStackTraces().keySet()) { + rogueThreads.put(t, true); +} + } + /** * Looks for leftover running threads, trying to kill them off, * so they don't fail future tests. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2829) improve termquery pk lookup performance
[ https://issues.apache.org/jira/browse/LUCENE-2829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12974350#action_12974350 ] Earwin Burrfoot commented on LUCENE-2829: - Nobody halts your progress, we're merely discussing. I, on the other hand, have a feeling that Lucene is overflowing with single incremental improvements aka hacks, as they are easier and faster to implement than trying to get a bigger picture, and, yes, rebuilding everything :) For example, better term dict code will make this issue (somewhat hackish, admit it?) irrelevant. Whether we implement bloom filters, or just guarantee to keep the whole term dict in memory with reasonable lookup routine (eg. as FST). Having said that, I reiterate, I'm not here to stop you or turn this issue into something else. improve termquery pk lookup performance - Key: LUCENE-2829 URL: https://issues.apache.org/jira/browse/LUCENE-2829 Project: Lucene - Java Issue Type: Improvement Components: Search Reporter: Robert Muir Attachments: LUCENE-2829.patch For things that are like primary keys and don't exist in some segments (worst case is primary/unique key that only exists in 1) we do wasted seeks. While LUCENE-2694 tries to solve some of this issue with TermState, I'm concerned we could every backport that to 3.1 for example. This is a simpler solution here just to solve this one problem in termquery... we could just revert it in trunk when we resolve LUCENE-2694, but I don't think we should leave things as they are in 3.x -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2829) improve termquery pk lookup performance
[ https://issues.apache.org/jira/browse/LUCENE-2829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12974354#action_12974354 ] Robert Muir commented on LUCENE-2829: - bq. For example, better term dict code will make this issue (somewhat hackish, admit it?) irrelevant. Right, it is hackish, but what is a worse hack is wasted seeks in our next 3.1 release because we can't keep scope under control and fix small problems without rewriting everything, which means less gets backported to our stable branch. Anyway, I'm just gonna mark this won't fix so I don't have to deal with it anymore. improve termquery pk lookup performance - Key: LUCENE-2829 URL: https://issues.apache.org/jira/browse/LUCENE-2829 Project: Lucene - Java Issue Type: Improvement Components: Search Reporter: Robert Muir Attachments: LUCENE-2829.patch For things that are like primary keys and don't exist in some segments (worst case is primary/unique key that only exists in 1) we do wasted seeks. While LUCENE-2694 tries to solve some of this issue with TermState, I'm concerned we could every backport that to 3.1 for example. This is a simpler solution here just to solve this one problem in termquery... we could just revert it in trunk when we resolve LUCENE-2694, but I don't think we should leave things as they are in 3.x -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Created: (SOLR-2294) How to combine OR with geofilt
How to combine OR with geofilt -- Key: SOLR-2294 URL: https://issues.apache.org/jira/browse/SOLR-2294 Project: Solr Issue Type: Bug Affects Versions: 3.1 Reporter: Bill Bell Fix For: 3.1 We would like to combine fq={!geofilt} OR state:CO... This generates an error. Are there other ways to do an OR between fq= ? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Lucene-3.x - Build # 219 - Still Failing
Build: https://hudson.apache.org/hudson/job/Lucene-3.x/219/ All tests passed Build Log (for compile errors): [...truncated 21431 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: RT branch status
Cool! I'm getting to this on a weekend. On Tue, Dec 21, 2010 at 11:44, Michael Busch busch...@gmail.com wrote: After merging trunk into the RT branch it's finally compiling again and up-to-date. Several tests are failing now after the merge (43 out of 1427 are failing), which is not too surprising, because so many things have changed (segment-deletes, flush control, termsHash refactoring, removal of doc stores, etc). Especially IndexWriter and DocumentsWriter are in a somewhat messy state, but I wanted to share my current state, so I committed the merge. I'll try this week to understand the new changes (especially deletes) and make them work with the DWPT. The following areas need work: * deletes * thread-safety * error handling and aborting * flush-by-ram (LUCENE-2573) Also, some tests deadlock. Not surprisingly either, cause flushcontrol etc. introduce new synchronized blocks. Before the merge all tests were passing, except the ones testing flush-by-ram functionality. I'll keep working on getting the branch back into that state again soon. Help is definitely welcome! I'd love to get this branch ready so that we can merge it into trunk as soon as possible. As Mike's experiments show having DWPTs will not only be beneficial for RT search, but also increase indexing performance in general. Michael PS: Thanks for the patience! - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org -- Kirill Zakharenko/Кирилл Захаренко (ear...@gmail.com) Phone: +7 (495) 683-567-4 ICQ: 104465785 - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: strange problem of PForDelta decoder
I used the bulkpostings branch(https://svn.apache.org/repos/asf/lucene/dev/branches/bulkpostings/lucene) does trunk have PForDelta decoder/encoder ? 2010/12/23 Michael McCandless luc...@mikemccandless.com: Those are nice speedups! Did you use the 4.0 branch (ie trunk) or the bulkpostings branch for this test? Mike On Tue, Dec 21, 2010 at 9:59 PM, Li Li fancye...@gmail.com wrote: great improvement! I did a test in our data set. doc count is about 2M+ and index size after optimization is about 13.3GB(including fdt) it seems lucene4's index format is better than lucene2.9.3. and PFor give good results. Besides BlockEncoder for frq and pos. is there any other modification for lucene 4? decoder \ avg time single word(ms) and query(ms) or query(ms) VINT in lucene 2.9 11.2 36.5 38.6 VINT in lucene 4 branch 10.6 26.5 35.4 PFor in lucene 4 branch 8.1 22.5 30.7 2010/12/21 Li Li fancye...@gmail.com: OK we should have a look at that one still. We need to converge on a good default codec for 4.0. Fortunately it's trivial to take any int block encoder (fixed or variable block) and make a Lucene codec out of it! I suggests you not to use this one, I fixed dozens of bugs but it still failed when with random tests. it's codes is hand coded rather than generated by program. But we may learn something from it. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments
[ https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12974484#action_12974484 ] Jason Rutherglen commented on LUCENE-2324: -- Also, it'd be great if we could summarize the changes trunk - DWPT branch. Per thread DocumentsWriters that write their own private segments - Key: LUCENE-2324 URL: https://issues.apache.org/jira/browse/LUCENE-2324 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Michael Busch Assignee: Michael Busch Priority: Minor Fix For: Realtime Branch Attachments: lucene-2324.patch, lucene-2324.patch, LUCENE-2324.patch See LUCENE-2293 for motivation and more details. I'm copying here Mike's summary he posted on 2293: Change the approach for how we buffer in RAM to a more isolated approach, whereby IW has N fully independent RAM segments in-process and when a doc needs to be indexed it's added to one of them. Each segment would also write its own doc stores and normal segment merging (not the inefficient merge we now do on flush) would merge them. This should be a good simplification in the chain (eg maybe we can remove the *PerThread classes). The segments can flush independently, letting us make much better concurrent use of IO CPU. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments
[ https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Rutherglen updated LUCENE-2324: - Attachment: test.out Here's ant test-core output. Looks like it's deadlocking in TestIndexWriter? There are some IR.reopen failures, a null pointer, and a delete count I'll look at. Per thread DocumentsWriters that write their own private segments - Key: LUCENE-2324 URL: https://issues.apache.org/jira/browse/LUCENE-2324 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Michael Busch Assignee: Michael Busch Priority: Minor Fix For: Realtime Branch Attachments: lucene-2324.patch, lucene-2324.patch, LUCENE-2324.patch, test.out See LUCENE-2293 for motivation and more details. I'm copying here Mike's summary he posted on 2293: Change the approach for how we buffer in RAM to a more isolated approach, whereby IW has N fully independent RAM segments in-process and when a doc needs to be indexed it's added to one of them. Each segment would also write its own doc stores and normal segment merging (not the inefficient merge we now do on flush) would merge them. This should be a good simplification in the chain (eg maybe we can remove the *PerThread classes). The segments can flush independently, letting us make much better concurrent use of IO CPU. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments
[ https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Rutherglen updated LUCENE-2324: - Attachment: LUCENE-2324-SMALL.patch Small patch fixing the num deletes test null pointer. The TestIndexReaderReopen failure seems to have something to do with flushing deletes. Per thread DocumentsWriters that write their own private segments - Key: LUCENE-2324 URL: https://issues.apache.org/jira/browse/LUCENE-2324 Project: Lucene - Java Issue Type: Improvement Components: Index Reporter: Michael Busch Assignee: Michael Busch Priority: Minor Fix For: Realtime Branch Attachments: LUCENE-2324-SMALL.patch, lucene-2324.patch, lucene-2324.patch, LUCENE-2324.patch, test.out See LUCENE-2293 for motivation and more details. I'm copying here Mike's summary he posted on 2293: Change the approach for how we buffer in RAM to a more isolated approach, whereby IW has N fully independent RAM segments in-process and when a doc needs to be indexed it's added to one of them. Each segment would also write its own doc stores and normal segment merging (not the inefficient merge we now do on flush) would merge them. This should be a good simplification in the chain (eg maybe we can remove the *PerThread classes). The segments can flush independently, letting us make much better concurrent use of IO CPU. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Resolved: (LUCENE-2830) Use StringBuilder instead of StringBuffer in benchmark
[ https://issues.apache.org/jira/browse/LUCENE-2830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera resolved LUCENE-2830. Resolution: Fixed Committed revision 1052180 (3x). Committed revision 1052182 (trunk). Use StringBuilder instead of StringBuffer in benchmark -- Key: LUCENE-2830 URL: https://issues.apache.org/jira/browse/LUCENE-2830 Project: Lucene - Java Issue Type: Improvement Components: contrib/benchmark Reporter: Shai Erera Assignee: Shai Erera Priority: Minor Fix For: 3.1, 4.0 Attachments: LUCENE-2830.patch, LUCENE-2830.patch Minor change - use StringBuilder instead of StringBuffer in benchmark's code. We don't need the synchronization of StringBuffer in all the places that I've checked. The only place where it _could_ be a problem is in HtmlParser's API - one method accepts a StringBuffer and it's an interface. But I think it's ok to change benchmark's API, back-compat wise and so I'd like to either change it to accept a String, or remove the method altogether -- no code in benchmark uses it, and if anyone needs it, he can pass StringReader to the other method. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-2294) How to combine OR with geofilt
[ https://issues.apache.org/jira/browse/SOLR-2294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12974551#action_12974551 ] Bill Bell commented on SOLR-2294: - I did find a way to do it It only works in this order - {code} http://localhost:8983/solr/select?q=*:*qt=standardfq=state:CO OR _query_:{!geofilt} ... {code} This does not work: {code} http://localhost:8983/solr/select?q=*:*qt=standardfq={!geofilt} OR state:CO ... {code} How to combine OR with geofilt -- Key: SOLR-2294 URL: https://issues.apache.org/jira/browse/SOLR-2294 Project: Solr Issue Type: Bug Affects Versions: 3.1 Reporter: Bill Bell Fix For: 3.1 We would like to combine fq={!geofilt} OR state:CO... This generates an error. Are there other ways to do an OR between fq= ? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: LuceneTestCase.threadCleanup incorrectly reports left running threads
I ran the test from both eclipse and Ant, and got similar warnings. With your patch most of the 'false alarms' do not show up again, but I still see a strange failure. I add this to after the System.err.print(left thread running): System.err.println(Arrays.toString(t.getStackTrace())); -- it prints the stack trace. And here is what I get: [junit] - Standard Error - [junit] WARNING: test method: 'testIndexAndSearchTasks' left thread running: Thread[file lock watchdog,6,main] [junit] [java.lang.Object.wait(Native Method), java.lang.Object.wait(Object.java:167), java.util.Timer$TimerImpl.run(Timer.java:226)] [junit] RESOURCE LEAK: test method: 'testIndexAndSearchTasks' left 1 thread(s) running [junit] NOTE: reproduce with: ant test -Dtestcase=TestPerfTasksLogic -Dtestmethod=testIndexAndSearchTasks -Dtests.seed=-792089523312439823:1164084411683706634 I don't know where this Timer is created, but I'll dig more. At any rate, I think your patch is good, and perhaps we should add the stacktrace print as well, to help with the debugging? Shai On Wed, Dec 22, 2010 at 9:35 PM, Robert Muir rcm...@gmail.com wrote: static { +// just a hack for things like eclipse threads +for (Thread t : Thread.getAllStackTraces().keySet()) { + rogueThreads.put(t, true); +} + }
Solr-3.x - Build # 205 - Still Failing
Build: https://hudson.apache.org/hudson/job/Solr-3.x/205/ All tests passed Build Log (for compile errors): [...truncated 20638 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org