Possible test framework improvement
Hi, While looking at some of the test failures it occurred to me that it would be great to have a tiny addition to the junit output for succesfull tests. Now if a test succeeds it only prints out something like this: [junit4] Suite: org.apache.solr.analysis.TestKeepFilterFactory [junit4] Completed on J0 in 0.22s, 1 test If that also had a time stamp when the test started it would be, in some cases, helpful to see what other tests were running at the same time. I think this information is implicitly available because the tests output when they finish but the proposed change would make it more obvious and easier to spot. -- Sami Siren - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3540) MultiCoreExampleTest and MultiCoreEmbedded test clash with each other
[ https://issues.apache.org/jira/browse/SOLR-3540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sami Siren updated SOLR-3540: - Attachment: SOLR-3540.patch this patch should fix the issue > MultiCoreExampleTest and MultiCoreEmbedded test clash with each other > - > > Key: SOLR-3540 > URL: https://issues.apache.org/jira/browse/SOLR-3540 > Project: Solr > Issue Type: Bug >Affects Versions: 4.0 >Reporter: Sami Siren >Assignee: Sami Siren >Priority: Minor > Attachments: SOLR-3540.patch > > > When those two tests are run at same time one of them is going to fail with > error like this: > {code} > java.lang.AssertionError > at __randomizedtesting.SeedInfo.seed([B44AE18D746BCD54:3062FA7EBBB8C061]:0) > at org.apache.solr.update.TransactionLog.(TransactionLog.java:163) > at org.apache.solr.update.TransactionLog.(TransactionLog.java:133) > at org.apache.solr.update.UpdateLog.ensureLog(UpdateLog.java:636) > {code} > This is reproducible with: > {code} > ant -Dtests.jvms=14 test > "-Dtests.class=org.apache.solr.client.solrj.embedded.*" > {code} > Looks like this is because they share the directory > example/multicore/core0/data/ -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-3540) MultiCoreExampleTest and MultiCoreEmbedded test clash with each other
Sami Siren created SOLR-3540: Summary: MultiCoreExampleTest and MultiCoreEmbedded test clash with each other Key: SOLR-3540 URL: https://issues.apache.org/jira/browse/SOLR-3540 Project: Solr Issue Type: Bug Affects Versions: 4.0 Reporter: Sami Siren Assignee: Sami Siren Priority: Minor When those two tests are run at same time one of them is going to fail with error like this: {code} java.lang.AssertionError at __randomizedtesting.SeedInfo.seed([B44AE18D746BCD54:3062FA7EBBB8C061]:0) at org.apache.solr.update.TransactionLog.(TransactionLog.java:163) at org.apache.solr.update.TransactionLog.(TransactionLog.java:133) at org.apache.solr.update.UpdateLog.ensureLog(UpdateLog.java:636) {code} This is reproducible with: {code} ant -Dtests.jvms=14 test "-Dtests.class=org.apache.solr.client.solrj.embedded.*" {code} Looks like this is because they share the directory example/multicore/core0/data/ -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3534) dismax and edismax should default to "df" when "qf" is absent.
[ https://issues.apache.org/jira/browse/SOLR-3534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13294121#comment-13294121 ] David Smiley commented on SOLR-3534: Jack: I'll improve the exception wording as you suggest. bq. Any idea what happens for the classic/Solr or flex query parsers if default search field is not present? The lucene query parser (which is the default) doesn't technically require a default field, but if there is ambiguity in the query (i.e. a simple search word) then you get an exception. > dismax and edismax should default to "df" when "qf" is absent. > -- > > Key: SOLR-3534 > URL: https://issues.apache.org/jira/browse/SOLR-3534 > Project: Solr > Issue Type: Improvement > Components: query parsers >Affects Versions: 4.0 >Reporter: David Smiley >Assignee: David Smiley >Priority: Minor > Attachments: > SOLR-3534_dismax_and_edismax_should_default_to_df_if_qf_is_absent.patch > > > The dismax and edismax query parsers should default to "df" when the "qf" > parameter is absent. They only use the defaultSearchField in schema.xml as a > fallback now. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2899) Add OpenNLP Analysis capabilities as a module
[ https://issues.apache.org/jira/browse/LUCENE-2899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13294113#comment-13294113 ] Lance Norskog commented on LUCENE-2899: --- bq. This really should just be a part of the analysis modules (with the exception of the Solr example parts). I don't know exactly how we are handling Solr examples anymore, but I seem to recall the general consensus was to not proliferate them. Can we just expose the functionality in the main one? A lot of Solr/Lucene features are only demoed in solrconfig/schema unit test files (DIH for example). That is fine. bq. The models are indeed tricky and I wonder how we can properly hook them into the tests, if at all. D'oh! Forgot about that. If we have tagged data in the project, it helps show the other parts of an NLP suite. It's hard to get a full picture of the jigsaw puzzle if you don't know NLP software. > Add OpenNLP Analysis capabilities as a module > - > > Key: LUCENE-2899 > URL: https://issues.apache.org/jira/browse/LUCENE-2899 > Project: Lucene - Java > Issue Type: New Feature > Components: modules/analysis >Reporter: Grant Ingersoll >Assignee: Grant Ingersoll >Priority: Minor > Attachments: LUCENE-2899.patch, opennlp_trunk.patch > > > Now that OpenNLP is an ASF project and has a nice license, it would be nice > to have a submodule (under analysis) that exposed capabilities for it. Drew > Farris, Tom Morton and I have code that does: > * Sentence Detection as a Tokenizer (could also be a TokenFilter, although it > would have to change slightly to buffer tokens) > * NamedEntity recognition as a TokenFilter > We are also planning a Tokenizer/TokenFilter that can put parts of speech as > either payloads (PartOfSpeechAttribute?) on a token or at the same position. > I'd propose it go under: > modules/analysis/opennlp -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-trunk-Windows-Java6-64 - Build # 518 - Failure!
Build: http://jenkins.sd-datasolutions.de/job/Lucene-Solr-trunk-Windows-Java6-64/518/ No tests ran. Build Log: [...truncated 4742 lines...] [...truncated 4742 lines...] [...truncated 4742 lines...] [...truncated 4742 lines...] [...truncated 4742 lines...] [...truncated 4742 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [JENKINS] Lucene-Solr-trunk-Linux-Java6-64 - Build # 880 - Failure!
The assert in SimpleText is wrong, it should be >= lastStartOffset. I committed a fix. On Tue, Jun 12, 2012 at 10:47 PM, wrote: > Build: > http://jenkins.sd-datasolutions.de/job/Lucene-Solr-trunk-Linux-Java6-64/880/ > > 1 tests failed. > REGRESSION: > org.apache.lucene.analysis.TestGraphTokenizers.testDoubleMockGraphTokenFilterRandom > > Error Message: > startOffset=3562 lastEndOffset=3574 > > Stack Trace: > java.lang.AssertionError: startOffset=3562 lastEndOffset=3574 > at > __randomizedtesting.SeedInfo.seed([C6631E3EBBD412A:8725212ECB3E5AFB]:0) > at > org.apache.lucene.codecs.simpletext.SimpleTextFieldsWriter$SimpleTextPostingsWriter.addPosition(SimpleTextFieldsWriter.java:155) > at > org.apache.lucene.index.FreqProxTermsWriterPerField.flush(FreqProxTermsWriterPerField.java:531) > at > org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:85) > at org.apache.lucene.index.TermsHash.flush(TermsHash.java:117) > at org.apache.lucene.index.DocInverter.flush(DocInverter.java:53) > at > org.apache.lucene.index.DocFieldProcessor.flush(DocFieldProcessor.java:82) > at > org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:480) > at > org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:422) > at > org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:554) > at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:2748) > at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:2724) > at > org.apache.lucene.index.IndexWriter.closeInternal(IndexWriter.java:904) > at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:863) > at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:827) > at > org.apache.lucene.index.RandomIndexWriter.close(RandomIndexWriter.java:438) > at org.apache.lucene.util.IOUtils.close(IOUtils.java:143) > at > org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:472) > at > org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:378) > at > org.apache.lucene.analysis.TestGraphTokenizers.testDoubleMockGraphTokenFilterRandom(TestGraphTokenizers.java:338) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1969) > at > com.carrotsearch.randomizedtesting.RandomizedRunner.access$1100(RandomizedRunner.java:132) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:814) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:875) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:889) > at > org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) > at > org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:32) > at > org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) > at > com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) > at > org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68) > at > org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48) > at > org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) > at > com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:821) > at > com.carrotsearch.randomizedtesting.RandomizedRunner.access$700(RandomizedRunner.java:132) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$3$1.run(RandomizedRunner.java:669) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:695) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:734) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:745) > at > org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) > at > org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68) > at > org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38) > at > org.apache.luc
Re: [JENKINS] Lucene-Solr-trunk-Linux-Java6-64 - Build # 880 - Failure!
I'm looking into this. On Tue, Jun 12, 2012 at 10:47 PM, wrote: > Build: > http://jenkins.sd-datasolutions.de/job/Lucene-Solr-trunk-Linux-Java6-64/880/ > > 1 tests failed. > REGRESSION: > org.apache.lucene.analysis.TestGraphTokenizers.testDoubleMockGraphTokenFilterRandom > > Error Message: > startOffset=3562 lastEndOffset=3574 > > Stack Trace: > java.lang.AssertionError: startOffset=3562 lastEndOffset=3574 > at > __randomizedtesting.SeedInfo.seed([C6631E3EBBD412A:8725212ECB3E5AFB]:0) > at > org.apache.lucene.codecs.simpletext.SimpleTextFieldsWriter$SimpleTextPostingsWriter.addPosition(SimpleTextFieldsWriter.java:155) > at > org.apache.lucene.index.FreqProxTermsWriterPerField.flush(FreqProxTermsWriterPerField.java:531) > at > org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:85) > at org.apache.lucene.index.TermsHash.flush(TermsHash.java:117) > at org.apache.lucene.index.DocInverter.flush(DocInverter.java:53) > at > org.apache.lucene.index.DocFieldProcessor.flush(DocFieldProcessor.java:82) > at > org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:480) > at > org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:422) > at > org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:554) > at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:2748) > at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:2724) > at > org.apache.lucene.index.IndexWriter.closeInternal(IndexWriter.java:904) > at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:863) > at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:827) > at > org.apache.lucene.index.RandomIndexWriter.close(RandomIndexWriter.java:438) > at org.apache.lucene.util.IOUtils.close(IOUtils.java:143) > at > org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:472) > at > org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:378) > at > org.apache.lucene.analysis.TestGraphTokenizers.testDoubleMockGraphTokenFilterRandom(TestGraphTokenizers.java:338) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1969) > at > com.carrotsearch.randomizedtesting.RandomizedRunner.access$1100(RandomizedRunner.java:132) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:814) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:875) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:889) > at > org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) > at > org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:32) > at > org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) > at > com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) > at > org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68) > at > org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48) > at > org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) > at > com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:821) > at > com.carrotsearch.randomizedtesting.RandomizedRunner.access$700(RandomizedRunner.java:132) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$3$1.run(RandomizedRunner.java:669) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:695) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:734) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:745) > at > org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) > at > org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68) > at > org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38) > at > org.apache.lucene.util.TestRuleIcuHack$1.evaluate(TestRuleIcuHack.java:51) >
[jira] [Updated] (LUCENE-4143) add backwards checkindex crosscheck
[ https://issues.apache.org/jira/browse/LUCENE-4143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-4143: Attachment: LUCENE-4143.patch > add backwards checkindex crosscheck > --- > > Key: LUCENE-4143 > URL: https://issues.apache.org/jira/browse/LUCENE-4143 > Project: Lucene - Java > Issue Type: Bug > Components: core/index >Reporter: Robert Muir > Attachments: LUCENE-4143.patch > > > This is super slow, but ensures they are actually equal (as the existing > cross-check just checks that vectors are a subset of postings). > I added a hack so that we only use it in MockDirectoryWrapper when the > delegate is a RAMDir and < 1MB in size, so it doesn't hurt test times. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-4143) add backwards checkindex crosscheck
Robert Muir created LUCENE-4143: --- Summary: add backwards checkindex crosscheck Key: LUCENE-4143 URL: https://issues.apache.org/jira/browse/LUCENE-4143 Project: Lucene - Java Issue Type: Bug Components: core/index Reporter: Robert Muir This is super slow, but ensures they are actually equal (as the existing cross-check just checks that vectors are a subset of postings). I added a hack so that we only use it in MockDirectoryWrapper when the delegate is a RAMDir and < 1MB in size, so it doesn't hurt test times. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-trunk-Linux-Java6-64 - Build # 880 - Failure!
Build: http://jenkins.sd-datasolutions.de/job/Lucene-Solr-trunk-Linux-Java6-64/880/ 1 tests failed. REGRESSION: org.apache.lucene.analysis.TestGraphTokenizers.testDoubleMockGraphTokenFilterRandom Error Message: startOffset=3562 lastEndOffset=3574 Stack Trace: java.lang.AssertionError: startOffset=3562 lastEndOffset=3574 at __randomizedtesting.SeedInfo.seed([C6631E3EBBD412A:8725212ECB3E5AFB]:0) at org.apache.lucene.codecs.simpletext.SimpleTextFieldsWriter$SimpleTextPostingsWriter.addPosition(SimpleTextFieldsWriter.java:155) at org.apache.lucene.index.FreqProxTermsWriterPerField.flush(FreqProxTermsWriterPerField.java:531) at org.apache.lucene.index.FreqProxTermsWriter.flush(FreqProxTermsWriter.java:85) at org.apache.lucene.index.TermsHash.flush(TermsHash.java:117) at org.apache.lucene.index.DocInverter.flush(DocInverter.java:53) at org.apache.lucene.index.DocFieldProcessor.flush(DocFieldProcessor.java:82) at org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:480) at org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:422) at org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:554) at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:2748) at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:2724) at org.apache.lucene.index.IndexWriter.closeInternal(IndexWriter.java:904) at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:863) at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:827) at org.apache.lucene.index.RandomIndexWriter.close(RandomIndexWriter.java:438) at org.apache.lucene.util.IOUtils.close(IOUtils.java:143) at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:472) at org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:378) at org.apache.lucene.analysis.TestGraphTokenizers.testDoubleMockGraphTokenFilterRandom(TestGraphTokenizers.java:338) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1969) at com.carrotsearch.randomizedtesting.RandomizedRunner.access$1100(RandomizedRunner.java:132) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:814) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:875) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:889) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:32) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:821) at com.carrotsearch.randomizedtesting.RandomizedRunner.access$700(RandomizedRunner.java:132) at com.carrotsearch.randomizedtesting.RandomizedRunner$3$1.run(RandomizedRunner.java:669) at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:695) at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:734) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:745) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) at org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38) at org.apache.lucene.util.TestRuleIcuHack$1.evaluate(TestRuleIcuHack.java:51) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleNoInstanceHooksOverrides$1.evaluate
[jira] [Commented] (LUCENE-4142) AnalyzerWrapper doesn't work with CharFilters.
[ https://issues.apache.org/jira/browse/LUCENE-4142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13294082#comment-13294082 ] Chris Male commented on LUCENE-4142: +1 > AnalyzerWrapper doesn't work with CharFilters. > -- > > Key: LUCENE-4142 > URL: https://issues.apache.org/jira/browse/LUCENE-4142 > Project: Lucene - Java > Issue Type: Bug > Components: modules/analysis >Affects Versions: 4.0 >Reporter: Robert Muir > Attachments: LUCENE-4142.patch > > > It doesnt override initReader (nor would it be able to) since it doesnt have > fieldName. this gives unexpected behavior. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-4.x-Windows-Java7-64 - Build # 52 - Still Failing!
Build: http://jenkins.sd-datasolutions.de/job/Lucene-Solr-4.x-Windows-Java7-64/52/ 1 tests failed. FAILED: org.apache.solr.common.util.ContentStreamTest.testURLStream Error Message: Stack Trace: java.lang.AssertionError at __randomizedtesting.SeedInfo.seed([50828D73D92F3889:60069A5A2D9BEE0D]:0) at org.junit.Assert.fail(Assert.java:92) at org.junit.Assert.assertTrue(Assert.java:43) at org.junit.Assert.assertTrue(Assert.java:54) at org.apache.solr.common.util.ContentStreamTest.testURLStream(ContentStreamTest.java:109) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1969) at com.carrotsearch.randomizedtesting.RandomizedRunner.access$1100(RandomizedRunner.java:132) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:814) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:875) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:889) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:32) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:821) at com.carrotsearch.randomizedtesting.RandomizedRunner.access$700(RandomizedRunner.java:132) at com.carrotsearch.randomizedtesting.RandomizedRunner$3$1.run(RandomizedRunner.java:669) at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:695) at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:734) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:745) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) at org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38) at org.apache.lucene.util.TestRuleIcuHack$1.evaluate(TestRuleIcuHack.java:51) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleNoInstanceHooksOverrides$1.evaluate(TestRuleNoInstanceHooksOverrides.java:53) at org.apache.lucene.util.TestRuleNoStaticHooksShadowing$1.evaluate(TestRuleNoStaticHooksShadowing.java:52) at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:36) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:56) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSuite(RandomizedRunner.java:605) at com.carrotsearch.randomizedtesting.RandomizedRunner.access$400(RandomizedRunner.java:132) at com.carrotsearch.randomizedtesting.RandomizedRunner$2.run(RandomizedRunner.java:551) Build Log: [...truncated 11889 lines...] [junit4] [junit4] Suite: org.apache.solr.client.solrj.SolrExampleBinaryTest [junit4] Completed in 8.13s, 21 tests [junit4] [junit4] Suite: org.apache.solr.common.util.NamedListTest [junit4] Completed in 0.01s, 1 test [junit4] [junit4] Suite: org.apache.solr.client.solrj.embedded.SolrExampleStreamingTest [junit4] Completed in 12.32s, 21 tests [junit4] [junit4] Suite: org.apache.solr.client.solrj.TestLBHttpSolrServer [junit4] Completed in 14.05s, 3 tests [junit4] [junit4] Suite: org.apache.solr.client.solrj.embedded.SolrExampleJettyTest [junit4] Completed in 7.35s, 22 tests [junit4] [junit4] Suite: org.apache.solr.client.solrj.embedded.JettyWebappTest [junit4] Completed in 2.54s, 1 test [junit4]
Re: [JENKINS] Lucene-Solr-trunk-Windows-Java6-64 - Build # 516 - Failure!
I just checked in a fix (hopefully) for this. The snap puller was creating a temp directory that only used down to seconds precision. I've changed it to milliseconds. -Yonik http://lucidimagination.com On Tue, Jun 12, 2012 at 5:40 PM, wrote: > Build: > http://jenkins.sd-datasolutions.de/job/Lucene-Solr-trunk-Windows-Java6-64/516/ > > 1 tests failed. > REGRESSION: org.apache.solr.handler.TestReplicationHandler.test > > Error Message: > expected:<498> but was:<0> > > Stack Trace: > java.lang.AssertionError: expected:<498> but was:<0> > at > __randomizedtesting.SeedInfo.seed([1363050687D73DF1:9B373ADC292B5009]:0) > at org.junit.Assert.fail(Assert.java:93) > at org.junit.Assert.failNotEquals(Assert.java:647) > at org.junit.Assert.assertEquals(Assert.java:128) > at org.junit.Assert.assertEquals(Assert.java:472) > at org.junit.Assert.assertEquals(Assert.java:456) > at > org.apache.solr.handler.TestReplicationHandler.doTestIndexAndConfigReplication(TestReplicationHandler.java:391) > at > org.apache.solr.handler.TestReplicationHandler.test(TestReplicationHandler.java:250) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1969) > at > com.carrotsearch.randomizedtesting.RandomizedRunner.access$1100(RandomizedRunner.java:132) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:814) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:875) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:889) > at > com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) > at > org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) > at > org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:32) > at > org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) > at > com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) > at > org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68) > at > org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48) > at > org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) > at > com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:821) > at > com.carrotsearch.randomizedtesting.RandomizedRunner.access$700(RandomizedRunner.java:132) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$3$1.run(RandomizedRunner.java:669) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:695) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:734) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:745) > at > com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) > at > org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) > at > org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68) > at > org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38) > at > org.apache.lucene.util.TestRuleIcuHack$1.evaluate(TestRuleIcuHack.java:51) > at > com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) > at > org.apache.lucene.util.TestRuleNoInstanceHooksOverrides$1.evaluate(TestRuleNoInstanceHooksOverrides.java:53) > at > org.apache.lucene.util.TestRuleNoStaticHooksShadowing$1.evaluate(TestRuleNoStaticHooksShadowing.java:52) > at > org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:36) > at > org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) > at > org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:56) > at > com.carrotsearch.randomizedtesting.RandomizedRunner.runSuite(RandomizedRunner.java:605) > at > com.carrotsearch.randomizedtesting.RandomizedRunner.access$400(Random
[jira] [Updated] (LUCENE-4141) don't allow Analyzer.offsetGap/posIncGap to be negative
[ https://issues.apache.org/jira/browse/LUCENE-4141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-4141: Attachment: LUCENE-4141_test.patch here's some initial tests: overflow checks work, but negative values cause things to go backwards undetected. > don't allow Analyzer.offsetGap/posIncGap to be negative > --- > > Key: LUCENE-4141 > URL: https://issues.apache.org/jira/browse/LUCENE-4141 > Project: Lucene - Java > Issue Type: Bug >Affects Versions: 4.0 >Reporter: Robert Muir > Attachments: LUCENE-4141_test.patch > > > unrelated but i thought about this looking at LUCENE-4139: we should check > this doesnt make a corrupt index but instead that IW throws a reasonable > exception. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4142) AnalyzerWrapper doesn't work with CharFilters.
[ https://issues.apache.org/jira/browse/LUCENE-4142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-4142: Attachment: LUCENE-4142.patch > AnalyzerWrapper doesn't work with CharFilters. > -- > > Key: LUCENE-4142 > URL: https://issues.apache.org/jira/browse/LUCENE-4142 > Project: Lucene - Java > Issue Type: Bug > Components: modules/analysis >Affects Versions: 4.0 >Reporter: Robert Muir > Attachments: LUCENE-4142.patch > > > It doesnt override initReader (nor would it be able to) since it doesnt have > fieldName. this gives unexpected behavior. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-4142) AnalyzerWrapper doesn't work with CharFilters.
Robert Muir created LUCENE-4142: --- Summary: AnalyzerWrapper doesn't work with CharFilters. Key: LUCENE-4142 URL: https://issues.apache.org/jira/browse/LUCENE-4142 Project: Lucene - Java Issue Type: Bug Components: modules/analysis Affects Versions: 4.0 Reporter: Robert Muir Attachments: LUCENE-4142.patch It doesnt override initReader (nor would it be able to) since it doesnt have fieldName. this gives unexpected behavior. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-4.x-Windows-Java7-64 - Build # 51 - Failure!
Build: http://jenkins.sd-datasolutions.de/job/Lucene-Solr-4.x-Windows-Java7-64/51/ 1 tests failed. REGRESSION: org.apache.solr.handler.TestReplicationHandler.test Error Message: expected:<498> but was:<0> Stack Trace: java.lang.AssertionError: expected:<498> but was:<0> at __randomizedtesting.SeedInfo.seed([FEF60F7A1CF29B59:76A230A0B20EF6A1]:0) at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.junit.Assert.assertEquals(Assert.java:456) at org.apache.solr.handler.TestReplicationHandler.doTestIndexAndConfigReplication(TestReplicationHandler.java:391) at org.apache.solr.handler.TestReplicationHandler.test(TestReplicationHandler.java:250) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1969) at com.carrotsearch.randomizedtesting.RandomizedRunner.access$1100(RandomizedRunner.java:132) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:814) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:875) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:889) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:32) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:821) at com.carrotsearch.randomizedtesting.RandomizedRunner.access$700(RandomizedRunner.java:132) at com.carrotsearch.randomizedtesting.RandomizedRunner$3$1.run(RandomizedRunner.java:669) at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:695) at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:734) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:745) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) at org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38) at org.apache.lucene.util.TestRuleIcuHack$1.evaluate(TestRuleIcuHack.java:51) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleNoInstanceHooksOverrides$1.evaluate(TestRuleNoInstanceHooksOverrides.java:53) at org.apache.lucene.util.TestRuleNoStaticHooksShadowing$1.evaluate(TestRuleNoStaticHooksShadowing.java:52) at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:36) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:56) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSuite(RandomizedRunner.java:605) at com.carrotsearch.randomizedtesting.RandomizedRunner.access$400(RandomizedRunner.java:132) at com.carrotsearch.randomizedtesting.RandomizedRunner$2.run(RandomizedRunner.java:551) Build Log: [...truncated 14449 lines...] [junit4] 2> 37662 T2915 C178 REQ [collection1] webapp=/solr path=/replication params={command=filecontent&checksum=true&generation=7&wt=filestream&file=_3.fdx} status=0 QTime=0 [junit4] 2> 37665 T2915 C178 REQ [
[jira] [Commented] (LUCENE-4120) FST should use packed integer arrays
[ https://issues.apache.org/jira/browse/LUCENE-4120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293988#comment-13293988 ] Robert Muir commented on LUCENE-4120: - {quote} Yes, it only affects packed FSTs. In this case, the backward compatibility would be rather easy to set-up (just fill a GrowableWriter instead of an int[]). {quote} Finally had a chance to glance through the patch. I was confusing myself about DocValues (its unaffected here). So this is no backwards break to the index format, since we don't use packed FSTs in our standard codec. I wouldn't do any backwards compatibility. > FST should use packed integer arrays > > > Key: LUCENE-4120 > URL: https://issues.apache.org/jira/browse/LUCENE-4120 > Project: Lucene - Java > Issue Type: Improvement > Components: core/FSTs >Reporter: Adrien Grand >Assignee: Adrien Grand >Priority: Minor > Fix For: 4.0 > > Attachments: LUCENE-4120.patch, LUCENE-4120.patch, LUCENE-4120.patch > > > There are some places where an int[] could be advantageously replaced with a > packed integer array. > I am thinking (at least) of: > * FST.nodeAddress (GrowableWriter) > * FST.inCounts (GrowableWriter) > * FST.nodeRefToAddress (read-only Reader) > The serialization/deserialization methods should be modified too in order to > take advantage of PackedInts.get{Reader,Writer}. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2600) ensure example schema.xml has some mention/explanation of per field similarity vs similarityprovider vs (global) similarity
[ https://issues.apache.org/jira/browse/SOLR-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man updated SOLR-2600: --- Fix Version/s: (was: 4.1) 4.0 Assignee: Hoss Man we've already seen questions about this, so i'll make sure we have at least one example > ensure example schema.xml has some mention/explanation of per field > similarity vs similarityprovider vs (global) similarity > --- > > Key: SOLR-2600 > URL: https://issues.apache.org/jira/browse/SOLR-2600 > Project: Solr > Issue Type: Task > Components: documentation >Reporter: Hoss Man >Assignee: Hoss Man >Priority: Blocker > Fix For: 4.0 > > > when SOLR-2338 was commited, there wasn't yet clear understanding of how much > the new feature per field similarity fields (vs custom similarity provider > (vs global similarity factory)) should be "advertised" in the example > configs, and what type of usage should be encouraged/promoted. > it's likely that by the time 4.0 is released, new language specific field > types will already demonstrate these features, and no additional "artificial" > usages of them will be needed, but one way or another we should ensure that > they are either demoed or mentioned in comments -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2605) queryparser parses on whitespace
[ https://issues.apache.org/jira/browse/LUCENE-2605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293940#comment-13293940 ] John Berryman commented on LUCENE-2605: --- (How's it going Jack) Interesting idea, though I really need to crack into the QueryParser and play around a little bit before I have a strong opinion myself. > queryparser parses on whitespace > > > Key: LUCENE-2605 > URL: https://issues.apache.org/jira/browse/LUCENE-2605 > Project: Lucene - Java > Issue Type: Bug > Components: core/queryparser >Reporter: Robert Muir > Fix For: 4.1 > > > The queryparser parses input on whitespace, and sends each whitespace > separated term to its own independent token stream. > This breaks the following at query-time, because they can't see across > whitespace boundaries: > * n-gram analysis > * shingles > * synonyms (especially multi-word for whitespace-separated languages) > * languages where a 'word' can contain whitespace (e.g. vietnamese) > Its also rather unexpected, as users think their > charfilters/tokenizers/tokenfilters will do the same thing at index and > querytime, but > in many cases they can't. Instead, preferably the queryparser would parse > around only real 'operators'. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-trunk-Windows-Java6-64 - Build # 516 - Failure!
Build: http://jenkins.sd-datasolutions.de/job/Lucene-Solr-trunk-Windows-Java6-64/516/ 1 tests failed. REGRESSION: org.apache.solr.handler.TestReplicationHandler.test Error Message: expected:<498> but was:<0> Stack Trace: java.lang.AssertionError: expected:<498> but was:<0> at __randomizedtesting.SeedInfo.seed([1363050687D73DF1:9B373ADC292B5009]:0) at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.junit.Assert.assertEquals(Assert.java:456) at org.apache.solr.handler.TestReplicationHandler.doTestIndexAndConfigReplication(TestReplicationHandler.java:391) at org.apache.solr.handler.TestReplicationHandler.test(TestReplicationHandler.java:250) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1969) at com.carrotsearch.randomizedtesting.RandomizedRunner.access$1100(RandomizedRunner.java:132) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:814) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:875) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:889) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:32) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:821) at com.carrotsearch.randomizedtesting.RandomizedRunner.access$700(RandomizedRunner.java:132) at com.carrotsearch.randomizedtesting.RandomizedRunner$3$1.run(RandomizedRunner.java:669) at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:695) at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:734) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:745) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) at org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38) at org.apache.lucene.util.TestRuleIcuHack$1.evaluate(TestRuleIcuHack.java:51) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleNoInstanceHooksOverrides$1.evaluate(TestRuleNoInstanceHooksOverrides.java:53) at org.apache.lucene.util.TestRuleNoStaticHooksShadowing$1.evaluate(TestRuleNoStaticHooksShadowing.java:52) at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:36) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:56) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSuite(RandomizedRunner.java:605) at com.carrotsearch.randomizedtesting.RandomizedRunner.access$400(RandomizedRunner.java:132) at com.carrotsearch.randomizedtesting.RandomizedRunner$2.run(RandomizedRunner.java:551) Build Log: [...truncated 13183 lines...] [junit4] 2> 42893 T1985 C101 REQ [collection1] webapp=/solr path=/replication params={command=filecontent&checksum=true&generation=7&wt=filestream&file=_3.si} status=0 QTime=0 [junit4] 2> 42897 T1985 C101 REQ
Re: Grouping - Boosting large groups
Great! I will do that. Thanks a lot. -- View this message in context: http://lucene.472066.n3.nabble.com/Grouping-Boosting-large-groups-tp3988959p3989298.html Sent from the Lucene - Java Developer mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3535) Add block support for XMLLoader
[ https://issues.apache.org/jira/browse/SOLR-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293924#comment-13293924 ] Erik Hatcher commented on SOLR-3535: bq. It seems like what we really want to express here is nested documents. Great point, and totally concur that the input should be hierarchical for the block join queries. But do we also need a little bit lower level direct (non-hierarchical) way call IndexWriter#addDocuments()? Or is the Solr need here purely on hierarchy modeling? > Add block support for XMLLoader > --- > > Key: SOLR-3535 > URL: https://issues.apache.org/jira/browse/SOLR-3535 > Project: Solr > Issue Type: Sub-task > Components: update >Affects Versions: 4.1, 5.0 >Reporter: Mikhail Khludnev >Priority: Minor > Attachments: SOLR-3535.patch > > > I'd like to add the following update xml message: > > > > > out of scope for now: > * other update formats > * update log support (NRT), should not be a big deal > * overwrite feature support for block updates - it's more complicated, I'll > tell you why > Alt > * wdyt about adding attribute to the current tag {pre}{pre} > * or we can establish RunBlockUpdateProcessor which treat every > as a block. > *Test is included!!* > How you'd suggest to improve the patch? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4085) improve TestBackwardsCompatibility to test Lucene 4.x features
[ https://issues.apache.org/jira/browse/LUCENE-4085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293905#comment-13293905 ] Robert Muir commented on LUCENE-4085: - I improved the situation to some extent in r1349510. We add docvalues fields of various types, a field with offsets, a field that only omits positions, etc. > improve TestBackwardsCompatibility to test Lucene 4.x features > -- > > Key: LUCENE-4085 > URL: https://issues.apache.org/jira/browse/LUCENE-4085 > Project: Lucene - Java > Issue Type: Test > Components: general/test >Affects Versions: 5.0 >Reporter: Robert Muir > > Currently TestBackwardsCompatibility doesn't test any of the new features of > 4.0: e.g. docvalues fields, fields with offsets in postings, etc etc. > We should improve the index generation and testcases (in 5.x) to ensure we > don't break these things. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Grouping - Boosting large groups
I assumed using the queries module, which isn't in 3.x. After a look at the 3.x codebase this doesn't seem to be a problem. Since all the classes you need are in: o.a.l.search.function package inside core Lucene. You can use the CustomScoreQuery & ValueSourceQuery instead of the BoostedQuery. Martijn On 12 June 2012 21:24, corwin wrote: > That's a good idea, thanks for the tip Martijn. I'm not a fan of performing > an extra search, but it does seem like it's unavoidable for this scenario. > > We are currently working with Lucene 3.5 and you mentioned that it assumes > Lucene 4 or 3.6. Any particular reason for that? I prefer not upgrading just > yet unless there's a feature that will specifically help me accomplish this. > > Thanks again, > > Corwin. > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Grouping-Boosting-large-groups-tp3988959p3989266.html > Sent from the Lucene - Java Developer mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > -- Met vriendelijke groet, Martijn van Groningen - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-4136) TestDocumentsWriterStallControl hang (reproducible)
[ https://issues.apache.org/jira/browse/LUCENE-4136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer resolved LUCENE-4136. - Resolution: Fixed Fix Version/s: 5.0 4.0 Lucene Fields: New,Patch Available (was: New) committed to branch_4x & trunk > TestDocumentsWriterStallControl hang (reproducible) > --- > > Key: LUCENE-4136 > URL: https://issues.apache.org/jira/browse/LUCENE-4136 > Project: Lucene - Java > Issue Type: Bug >Reporter: Robert Muir >Assignee: Simon Willnauer > Fix For: 4.0, 5.0 > > Attachments: LUCENE-4136.patch > > > On trunk (probably affects 4.0 too, but trunk is where i hit it): > ant test -Dtestcase=TestDocumentsWriterStallControl > -Dtests.seed=9D5404FF4A909330 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (LUCENE-4136) TestDocumentsWriterStallControl hang (reproducible)
[ https://issues.apache.org/jira/browse/LUCENE-4136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer reassigned LUCENE-4136: --- Assignee: Simon Willnauer > TestDocumentsWriterStallControl hang (reproducible) > --- > > Key: LUCENE-4136 > URL: https://issues.apache.org/jira/browse/LUCENE-4136 > Project: Lucene - Java > Issue Type: Bug >Reporter: Robert Muir >Assignee: Simon Willnauer > Attachments: LUCENE-4136.patch > > > On trunk (probably affects 4.0 too, but trunk is where i hit it): > ant test -Dtestcase=TestDocumentsWriterStallControl > -Dtests.seed=9D5404FF4A909330 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3539) rethink softCommit=true|false param on commits?
[ https://issues.apache.org/jira/browse/SOLR-3539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293886#comment-13293886 ] Hoss Man commented on SOLR-3539: This is something that started to concern me while trying to update the tutorial. I'm having a hard time articulating my concerns to myself, so this will largely be stream of consciousness... Both of these params seen defined more in terms of what they *don't* do then what they actually do -- softCommit in particular -- and while they aren't too terrible to explain indivdually, it's very hard to clearly articulate how they interplay with eachother. * openSearcher ** true - opens a new searcher against this commit point ** false - does not open a new searcher against this commit point * softCommit ** true - a new searcher is opened against the commit point, but no data is flushed to disk. ** false - the commit point is flushed to disk. Certain combinations of these params seem redundent (openSearcher=true&softCommit=true) while others not only make no sense, but are directly contradictory (openSearcher=false&softCommit=true)... | - |softCommit=true|softCommit=false| |openSearcher=true|openSearcher is redundent|OK| |openSearcher=false|contradictory (openSearcher is currently ignored)|OK| >From a vocabulary standpoint, they also seem confusing to understand. >Consider a new user, starting with the 4x example which contains the >following... {code} 15000 false {code} Documents this user adds will automaticly get flushed to disk, but won't be visible in search results until the user takes some explicit action. The user, upon reading some docs or asking on the list will become aware that he needs to open a new searcher, and will be guided to "do a commit" (or maybe a commit explicitly adding openSearcher=true). But this is actually overkill for what the user needs, because it will also flush any pending docs to disk. All the user really needs to "open a new searcher" is to do an explicit commit with softCommit=true. - I would like to suggest that we throw out the the "softCommit" param and replace it with a "flush" (or "flushToDisk" or "persist") param, which is solely concerned with the persistence of the commit, and completely disjoint from "searcher" opening which would be controled entirely with the "openSearcher" param. * openSearcher ** true - opens a new searcher against this commit point ** false - does not open a new searcher against this commit point * flush ** true - flushes this commit point to stable storage ** false - does not flush this commit point to stable storage Making the interaction much easier to understand... | - |flush=true|flush=false| |openSearcher=true|OK|OK| |openSearcher=false|OK|No-Op| I've mainly been thinking about this from a user perspective the last few days, so I haven'thad a chance to figure out how much this would impact the internals related to softCommit right now. I supsect there are a lot of places that would need to be tweaked, but hopefully most of them would just involve flipping logic (softCommit=true -> flush=false). The biggest challenges i can think of are: * how to deal with the autocommit options in solrconfig.xml. in 3x we supported a single block. On the 4x branch we support one lock and one block -- should we continue to do that? would just implicitly specify flush=false? or should we try to generalize to support N blocks where and are config options for all of them? * event eventlistener -- it looks like the SolrEventListener API had a postSoftCommit() method added to it, but it doesn't seem to be configurable in any way -- i think this is just for tests, but if it's intentionally being expost we would need to revamp it ... off the cuff i would suggest removing postSoftCommit() changing the postCommit() method to take in some new structure specifying the options on the commit. Thoughts? > rethink softCommit=true|false param on commits? > --- > > Key: SOLR-3539 > URL: https://issues.apache.org/jira/browse/SOLR-3539 > Project: Solr > Issue Type: Bug >Reporter: Hoss Man > Fix For: 4.0 > > > I think the current NTR related options when doing a commit, particularly > "openSearcher="true|false" and "softCommit=true|false", is confusing, and we > should rethink them before they get baked into the user API in 4.0. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-un
[jira] [Created] (SOLR-3539) rethink softCommit=true|false param on commits?
Hoss Man created SOLR-3539: -- Summary: rethink softCommit=true|false param on commits? Key: SOLR-3539 URL: https://issues.apache.org/jira/browse/SOLR-3539 Project: Solr Issue Type: Bug Reporter: Hoss Man Fix For: 4.0 I think the current NTR related options when doing a commit, particularly "openSearcher="true|false" and "softCommit=true|false", is confusing, and we should rethink them before they get baked into the user API in 4.0. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4132) IndexWriterConfig live settings
[ https://issues.apache.org/jira/browse/LUCENE-4132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293885#comment-13293885 ] Simon Willnauer commented on LUCENE-4132: - I think the generics here are not very complicated and also not really user facing. its only a tool here to make things nice for the user I think that justifies it. so I think this looks good though. > IndexWriterConfig live settings > --- > > Key: LUCENE-4132 > URL: https://issues.apache.org/jira/browse/LUCENE-4132 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Shai Erera >Priority: Minor > Fix For: 4.0, 5.0 > > Attachments: LUCENE-4132.patch > > > A while ago there was a discussion about making some IW settings "live" and I > remember that RAM buffer size was one of them. Judging from IW code, I see > that RAM buffer can be changed "live" as IW never caches it. > However, I don't remember which other settings were decided to be "live" and > I don't see any documentation in IW nor IWC for that. IW.getConfig mentions: > {code} > * NOTE: some settings may be changed on the > * returned {@link IndexWriterConfig}, and will take > * effect in the current IndexWriter instance. See the > * javadocs for the specific setters in {@link > * IndexWriterConfig} for details. > {code} > But there's no text on e.g. IWC.setRAMBuffer mentioning that. > I think that it'd be good if we make it easier for users to tell which of the > settings are "live" ones. There are few possible ways to do it: > * Introduce a custom @live.setting tag on the relevant IWC.set methods, and > add special text for them in build.xml > ** Or, drop the tag and just document it clearly. > * Separate IWC to two interfaces, LiveConfig and OneTimeConfig (name > proposals are welcome !), have IWC impl both, and introduce another > IW.getLiveConfig which will return that interface, thereby clearly letting > the user know which of the settings are "live". > It'd be good if IWC itself could only expose setXYZ methods for the "live" > settings though. So perhaps, off the top of my head, we can do something like > this: > * Introduce a Config object, which is essentially what IWC is today, and pass > it to IW. > * IW will create a different object, IWC from that Config and IW.getConfig > will return IWC. > * IWC itself will only have setXYZ methods for the "live" settings. > It adds another object, but user code doesn't change - it still creates a > Config object when initializing IW, and need to handle a different type if it > ever calls IW.getConfig. > Maybe that's not such a bad idea? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Typo in test framework
> I guess if it just spelled IGNORED/A, I wouldn't think it's a typo. If it's > possible, can we have it spelled correctly? It's not critical if it's too Hmm... It makes that column two characters wider! :) D. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4120) FST should use packed integer arrays
[ https://issues.apache.org/jira/browse/LUCENE-4120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adrien Grand updated LUCENE-4120: - Attachment: LUCENE-4120.patch bq. Can you move the imports under the copyright header in GrowableWriter.java? Patch updated. > FST should use packed integer arrays > > > Key: LUCENE-4120 > URL: https://issues.apache.org/jira/browse/LUCENE-4120 > Project: Lucene - Java > Issue Type: Improvement > Components: core/FSTs >Reporter: Adrien Grand >Assignee: Adrien Grand >Priority: Minor > Fix For: 4.0 > > Attachments: LUCENE-4120.patch, LUCENE-4120.patch, LUCENE-4120.patch > > > There are some places where an int[] could be advantageously replaced with a > packed integer array. > I am thinking (at least) of: > * FST.nodeAddress (GrowableWriter) > * FST.inCounts (GrowableWriter) > * FST.nodeRefToAddress (read-only Reader) > The serialization/deserialization methods should be modified too in order to > take advantage of PackedInts.get{Reader,Writer}. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4132) IndexWriterConfig live settings
[ https://issues.apache.org/jira/browse/LUCENE-4132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293868#comment-13293868 ] Shai Erera commented on LUCENE-4132: The generics is because I wanted to not duplicate code between LiveConfig and IWC, so that the live settings share the same setXYZ code. First I thought to write a separate LiveConfig class, but then the setter methods need to be duplicated. I'll take another look -- perhaps IWC.setRAMBuffer for instance can just delegate to a private LiveConfig instance.setter. That will keep the APIs without generics, with perhaps so jdoc duplication ... I can take a stab at something like that, or if you have another proposal. I don't want to let go of the builder pattern though. > IndexWriterConfig live settings > --- > > Key: LUCENE-4132 > URL: https://issues.apache.org/jira/browse/LUCENE-4132 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Shai Erera >Priority: Minor > Fix For: 4.0, 5.0 > > Attachments: LUCENE-4132.patch > > > A while ago there was a discussion about making some IW settings "live" and I > remember that RAM buffer size was one of them. Judging from IW code, I see > that RAM buffer can be changed "live" as IW never caches it. > However, I don't remember which other settings were decided to be "live" and > I don't see any documentation in IW nor IWC for that. IW.getConfig mentions: > {code} > * NOTE: some settings may be changed on the > * returned {@link IndexWriterConfig}, and will take > * effect in the current IndexWriter instance. See the > * javadocs for the specific setters in {@link > * IndexWriterConfig} for details. > {code} > But there's no text on e.g. IWC.setRAMBuffer mentioning that. > I think that it'd be good if we make it easier for users to tell which of the > settings are "live" ones. There are few possible ways to do it: > * Introduce a custom @live.setting tag on the relevant IWC.set methods, and > add special text for them in build.xml > ** Or, drop the tag and just document it clearly. > * Separate IWC to two interfaces, LiveConfig and OneTimeConfig (name > proposals are welcome !), have IWC impl both, and introduce another > IW.getLiveConfig which will return that interface, thereby clearly letting > the user know which of the settings are "live". > It'd be good if IWC itself could only expose setXYZ methods for the "live" > settings though. So perhaps, off the top of my head, we can do something like > this: > * Introduce a Config object, which is essentially what IWC is today, and pass > it to IW. > * IW will create a different object, IWC from that Config and IW.getConfig > will return IWC. > * IWC itself will only have setXYZ methods for the "live" settings. > It adds another object, but user code doesn't change - it still creates a > Config object when initializing IW, and need to handle a different type if it > ever calls IW.getConfig. > Maybe that's not such a bad idea? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-3538) Unloading a SolrCore object and specifying delete does not fully delete all Solr parts
Andre' Hazelwood created SOLR-3538: -- Summary: Unloading a SolrCore object and specifying delete does not fully delete all Solr parts Key: SOLR-3538 URL: https://issues.apache.org/jira/browse/SOLR-3538 Project: Solr Issue Type: Bug Components: multicore Affects Versions: 4.0 Environment: Windows Reporter: Andre' Hazelwood Priority: Minor If I issue a action=UNLOAD&delete=true request for a specific Solr Core on the CoreAdminHandler, all files are removed except files located in the tlog directory under the core. We are trying to manage our cores from an outside system, so having the core not actually get deleted is a pain. I would expect all files as well as the Core directory to be removed if the delete parameter is specified. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3535) Add block support for XMLLoader
[ https://issues.apache.org/jira/browse/SOLR-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293867#comment-13293867 ] Yonik Seeley commented on SOLR-3535: It seems like what we really want to express here is nested documents. Directly expressing that in the transfer syntax (XML, JSON, or binary) would seem more natural and also allow us to handle/express multiple levels of nesting. This also frees the user from having to think about details such as where the parent document goes (at the beginning or the end?). Internally representing a parent and it's child documents as a single SolrInputDocument also has a lot of benefits and seems like it's the easiest path to get this working with all of the existing code (like transaction logging, forwarding docs based on ID in cloud mode, etc). > Add block support for XMLLoader > --- > > Key: SOLR-3535 > URL: https://issues.apache.org/jira/browse/SOLR-3535 > Project: Solr > Issue Type: Sub-task > Components: update >Affects Versions: 4.1, 5.0 >Reporter: Mikhail Khludnev >Priority: Minor > Attachments: SOLR-3535.patch > > > I'd like to add the following update xml message: > > > > > out of scope for now: > * other update formats > * update log support (NRT), should not be a big deal > * overwrite feature support for block updates - it's more complicated, I'll > tell you why > Alt > * wdyt about adding attribute to the current tag {pre}{pre} > * or we can establish RunBlockUpdateProcessor which treat every > as a block. > *Test is included!!* > How you'd suggest to improve the patch? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Grouping - Boosting large groups
That's a good idea, thanks for the tip Martijn. I'm not a fan of performing an extra search, but it does seem like it's unavoidable for this scenario. We are currently working with Lucene 3.5 and you mentioned that it assumes Lucene 4 or 3.6. Any particular reason for that? I prefer not upgrading just yet unless there's a feature that will specifically help me accomplish this. Thanks again, Corwin. -- View this message in context: http://lucene.472066.n3.nabble.com/Grouping-Boosting-large-groups-tp3988959p3989266.html Sent from the Lucene - Java Developer mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-3535) Add block support for XMLLoader
[ https://issues.apache.org/jira/browse/SOLR-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293858#comment-13293858 ] Mikhail Khludnev edited comment on SOLR-3535 at 6/12/12 7:15 PM: - @Simon, the intention of this patch is index support for the parent ticket SOLR-3076. BJQ magic is explained at http://blog.mikemccandless.com/2012/01/searching-relational-content-with.html I'm going to rework the patch by this week. was (Author: mkhludnev): @Simon, the intention of this patch is index support for the parent ticket SOLR-3076. BJQ magic is explained at http://blog.mikemccandless.com/2012/01/searching-relational-content-with.html I'm going to rework the path by this week. > Add block support for XMLLoader > --- > > Key: SOLR-3535 > URL: https://issues.apache.org/jira/browse/SOLR-3535 > Project: Solr > Issue Type: Sub-task > Components: update >Affects Versions: 4.1, 5.0 >Reporter: Mikhail Khludnev >Priority: Minor > Attachments: SOLR-3535.patch > > > I'd like to add the following update xml message: > > > > > out of scope for now: > * other update formats > * update log support (NRT), should not be a big deal > * overwrite feature support for block updates - it's more complicated, I'll > tell you why > Alt > * wdyt about adding attribute to the current tag {pre}{pre} > * or we can establish RunBlockUpdateProcessor which treat every > as a block. > *Test is included!!* > How you'd suggest to improve the patch? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3535) Add block support for XMLLoader
[ https://issues.apache.org/jira/browse/SOLR-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293858#comment-13293858 ] Mikhail Khludnev commented on SOLR-3535: @Simon, the intention of this patch is index support for the parent ticket SOLR-3076. BJQ magic is explained at http://blog.mikemccandless.com/2012/01/searching-relational-content-with.html I'm going to rework the path by this week. > Add block support for XMLLoader > --- > > Key: SOLR-3535 > URL: https://issues.apache.org/jira/browse/SOLR-3535 > Project: Solr > Issue Type: Sub-task > Components: update >Affects Versions: 4.1, 5.0 >Reporter: Mikhail Khludnev >Priority: Minor > Attachments: SOLR-3535.patch > > > I'd like to add the following update xml message: > > > > > out of scope for now: > * other update formats > * update log support (NRT), should not be a big deal > * overwrite feature support for block updates - it's more complicated, I'll > tell you why > Alt > * wdyt about adding attribute to the current tag {pre}{pre} > * or we can establish RunBlockUpdateProcessor which treat every > as a block. > *Test is included!!* > How you'd suggest to improve the patch? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4132) IndexWriterConfig live settings
[ https://issues.apache.org/jira/browse/LUCENE-4132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293844#comment-13293844 ] Shai Erera commented on LUCENE-4132: I love it too, and the changes would be too horrible. We use this builder pattern everywhere. Remember, the changes in this issue are to not confuse people, that's it. They don't cause users to change their code almost at all. I don't quite understand what's the issue with the generics. If you don't look at IWC / LC code, it's not visible at all. I mean, in your application code, you won't see any generics. > IndexWriterConfig live settings > --- > > Key: LUCENE-4132 > URL: https://issues.apache.org/jira/browse/LUCENE-4132 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Shai Erera >Priority: Minor > Fix For: 4.0, 5.0 > > Attachments: LUCENE-4132.patch > > > A while ago there was a discussion about making some IW settings "live" and I > remember that RAM buffer size was one of them. Judging from IW code, I see > that RAM buffer can be changed "live" as IW never caches it. > However, I don't remember which other settings were decided to be "live" and > I don't see any documentation in IW nor IWC for that. IW.getConfig mentions: > {code} > * NOTE: some settings may be changed on the > * returned {@link IndexWriterConfig}, and will take > * effect in the current IndexWriter instance. See the > * javadocs for the specific setters in {@link > * IndexWriterConfig} for details. > {code} > But there's no text on e.g. IWC.setRAMBuffer mentioning that. > I think that it'd be good if we make it easier for users to tell which of the > settings are "live" ones. There are few possible ways to do it: > * Introduce a custom @live.setting tag on the relevant IWC.set methods, and > add special text for them in build.xml > ** Or, drop the tag and just document it clearly. > * Separate IWC to two interfaces, LiveConfig and OneTimeConfig (name > proposals are welcome !), have IWC impl both, and introduce another > IW.getLiveConfig which will return that interface, thereby clearly letting > the user know which of the settings are "live". > It'd be good if IWC itself could only expose setXYZ methods for the "live" > settings though. So perhaps, off the top of my head, we can do something like > this: > * Introduce a Config object, which is essentially what IWC is today, and pass > it to IW. > * IW will create a different object, IWC from that Config and IW.getConfig > will return IWC. > * IWC itself will only have setXYZ methods for the "live" settings. > It adds another object, but user code doesn't change - it still creates a > Config object when initializing IW, and need to handle a different type if it > ever calls IW.getConfig. > Maybe that's not such a bad idea? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Typo in test framework
Thanks Dawid, I guess if it just spelled IGNORED/A, I wouldn't think it's a typo. If it's possible, can we have it spelled correctly? It's not critical if it's too much work. Shai On Tue, Jun 12, 2012 at 9:20 PM, Dawid Weiss wrote: > Hi Shai. > > I think this question may be of relevance to others, so I allowed > myself to CC the list. So: > > > I see these printed when I run test-core: > > > > [junit4] IGNOR/A 0.00s | Test10KPulsings.test10kNotPulsed > > [junit4]> Assumption #1: 'nightly' test group is disabled (@Nightly) > > > > Is IGNOR a typo? Or is it a weird locale? > > JUnit has the notion of "ignored" test (marked with @Ignore) or > "assumption-ignored" test which is physically executed but at some > point ends with an AssumptionViolatedException: > > > https://github.com/KentBeck/junit/blob/master/src/main/java/org/junit/internal/AssumptionViolatedException.java > > The primary distinction is that the test can evaluate a condition and > decide to throw an assumption while @Ignore is unconditional. There > are also other technical side-effects -- listeners do get informed > about the cause of an assumption (an instance of the thrown exception) > while they are not informed about any cause of the ignored test (I > think because it was at some point assumed that tests can only be > ignored for one reason -- @Ignore annotation). Assumption-ignore > exceptions can happen simultaneously with other exceptions resulting > from rules -- the behavior then is not clearly defined... > > Randomizedtesting's task tries hard to report all the events > that really happened and report them -- including assumption-failed > tests. So IGNOR/A is an assumption-ignored test (as opposed to IGNORED > which is a test ignored for other reasons). > > Hope this helps, > > Dawid >
Re: Grouping - Boosting large groups
Hi Corwin, This not yet possible out of the box. However I think it is possible: 1) Create a Lucene collector that counts for all groups the number of document that match. This collector will basically compute a map with the group value as key and a count as value 2) Run this collector as an extra phase before you run the TermFirstPassGroupingCollector. 3) Use BoostedQuery with a custom value source. The custom value source can emit a boost value per document (via FunctionValues) and in this case you base it on the document count from the group to document count map computed in step 1. Note: This approach is more expensive then what you are doing now. It requires another extra search. Note: The approach assumes lucene 4.0 (which isn't released), but should be possible with lucene 3.6 (I think) Make an issue in Jira about this if you start working on it. This and similar group properties based sorting / boosting are much needed features. Martijn On 11 June 2012 18:10, corwin wrote: > Hi forum, > > I've implemented grouping using the TermFirstPassGroupingCollector and > TermSecondPassGroupingCollector, pretty much exactly as the example at the > API. This works really well. I'm getting a the groups sorted by the computed > relevance, within each groups the docs are sorted by a numeric field. So > far, so good. > > Now I want to make things more complicated by boosting larger groups in > addition to the existing relevance sort. For example, if the first result > has a relevancy score of 1 and the group has 2 docs and the second group has > a score of 0.9 and 4 docs, I want to boost the second group so it will > appear before the first. > > Basically I'm trying to boost the groups according to the number of elements > in the groups. > > I couldn't figure out how to do that or find an example anywhere. > > I hope I'm making sense > > Thanks in advance. > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Grouping-Boosting-large-groups-tp3988959.html > Sent from the Lucene - Java Developer mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org > For additional commands, e-mail: dev-h...@lucene.apache.org > -- Met vriendelijke groet, Martijn van Groningen - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Typo in test framework
Hi Shai. I think this question may be of relevance to others, so I allowed myself to CC the list. So: > I see these printed when I run test-core: > > [junit4] IGNOR/A 0.00s | Test10KPulsings.test10kNotPulsed > [junit4] > Assumption #1: 'nightly' test group is disabled (@Nightly) > > Is IGNOR a typo? Or is it a weird locale? JUnit has the notion of "ignored" test (marked with @Ignore) or "assumption-ignored" test which is physically executed but at some point ends with an AssumptionViolatedException: https://github.com/KentBeck/junit/blob/master/src/main/java/org/junit/internal/AssumptionViolatedException.java The primary distinction is that the test can evaluate a condition and decide to throw an assumption while @Ignore is unconditional. There are also other technical side-effects -- listeners do get informed about the cause of an assumption (an instance of the thrown exception) while they are not informed about any cause of the ignored test (I think because it was at some point assumed that tests can only be ignored for one reason -- @Ignore annotation). Assumption-ignore exceptions can happen simultaneously with other exceptions resulting from rules -- the behavior then is not clearly defined... Randomizedtesting's task tries hard to report all the events that really happened and report them -- including assumption-failed tests. So IGNOR/A is an assumption-ignored test (as opposed to IGNORED which is a test ignored for other reasons). Hope this helps, Dawid - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4120) FST should use packed integer arrays
[ https://issues.apache.org/jira/browse/LUCENE-4120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293812#comment-13293812 ] Dawid Weiss commented on LUCENE-4120: - bq. I think that's fine; you can't change an FST once it's built (not yet anyway...). Yeah, it'd be hard with the packed format. I once thought it'd be interesting to see incremental fst construction based on merging (much like it's done with inverted indexes). Delete would still be difficult (or impossible) but additions should be relatively easy to merge. > FST should use packed integer arrays > > > Key: LUCENE-4120 > URL: https://issues.apache.org/jira/browse/LUCENE-4120 > Project: Lucene - Java > Issue Type: Improvement > Components: core/FSTs >Reporter: Adrien Grand >Assignee: Adrien Grand >Priority: Minor > Fix For: 4.0 > > Attachments: LUCENE-4120.patch, LUCENE-4120.patch > > > There are some places where an int[] could be advantageously replaced with a > packed integer array. > I am thinking (at least) of: > * FST.nodeAddress (GrowableWriter) > * FST.inCounts (GrowableWriter) > * FST.nodeRefToAddress (read-only Reader) > The serialization/deserialization methods should be modified too in order to > take advantage of PackedInts.get{Reader,Writer}. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-4139) multivalued field with offsets makes corrumpt index
[ https://issues.apache.org/jira/browse/LUCENE-4139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir resolved LUCENE-4139. - Resolution: Fixed Fix Version/s: 5.0 4.0 > multivalued field with offsets makes corrumpt index > > > Key: LUCENE-4139 > URL: https://issues.apache.org/jira/browse/LUCENE-4139 > Project: Lucene - Java > Issue Type: Bug >Affects Versions: 4.0 >Reporter: Robert Muir > Fix For: 4.0, 5.0 > > Attachments: LUCENE-4139.patch, LUCENE-4139.patch, LUCENE-4139.patch, > LUCENE-4139_test.patch, LUCENE-4139_test.patch > > > I was trying to beef up TestBackwardsCompatibility (LUCENE-4085) but i > accidentally made a corrupt index due to a typo: > {code} > // a field with both offsets and term vectors for a cross-check > FieldType customType3 = new FieldType(TextField.TYPE_STORED); > customType3.setStoreTermVectors(true); > customType3.setStoreTermVectorPositions(true); > customType3.setStoreTermVectorOffsets(true); > customType3.setIndexOptions(IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS); > doc.add(new Field("content3", "here is more content with aaa aaa aaa", > customType3)); > // a field that omits only positions > FieldType customType4 = new FieldType(TextField.TYPE_STORED); > customType4.setStoreTermVectors(true); > customType4.setStoreTermVectorPositions(false); > customType4.setStoreTermVectorOffsets(true); > customType4.setIndexOptions(IndexOptions.DOCS_AND_FREQS); > // check out the copy-paste typo here! i forgot to change this to content4 > doc.add(new Field("content3", "here is more content with aaa aaa aaa", > customType3)); > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4132) IndexWriterConfig live settings
[ https://issues.apache.org/jira/browse/LUCENE-4132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293792#comment-13293792 ] Uwe Schindler commented on LUCENE-4132: --- We could, I am against, I love IndexWriterConfig! > IndexWriterConfig live settings > --- > > Key: LUCENE-4132 > URL: https://issues.apache.org/jira/browse/LUCENE-4132 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Shai Erera >Priority: Minor > Fix For: 4.0, 5.0 > > Attachments: LUCENE-4132.patch > > > A while ago there was a discussion about making some IW settings "live" and I > remember that RAM buffer size was one of them. Judging from IW code, I see > that RAM buffer can be changed "live" as IW never caches it. > However, I don't remember which other settings were decided to be "live" and > I don't see any documentation in IW nor IWC for that. IW.getConfig mentions: > {code} > * NOTE: some settings may be changed on the > * returned {@link IndexWriterConfig}, and will take > * effect in the current IndexWriter instance. See the > * javadocs for the specific setters in {@link > * IndexWriterConfig} for details. > {code} > But there's no text on e.g. IWC.setRAMBuffer mentioning that. > I think that it'd be good if we make it easier for users to tell which of the > settings are "live" ones. There are few possible ways to do it: > * Introduce a custom @live.setting tag on the relevant IWC.set methods, and > add special text for them in build.xml > ** Or, drop the tag and just document it clearly. > * Separate IWC to two interfaces, LiveConfig and OneTimeConfig (name > proposals are welcome !), have IWC impl both, and introduce another > IW.getLiveConfig which will return that interface, thereby clearly letting > the user know which of the settings are "live". > It'd be good if IWC itself could only expose setXYZ methods for the "live" > settings though. So perhaps, off the top of my head, we can do something like > this: > * Introduce a Config object, which is essentially what IWC is today, and pass > it to IW. > * IW will create a different object, IWC from that Config and IW.getConfig > will return IWC. > * IWC itself will only have setXYZ methods for the "live" settings. > It adds another object, but user code doesn't change - it still creates a > Config object when initializing IW, and need to handle a different type if it > ever calls IW.getConfig. > Maybe that's not such a bad idea? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4132) IndexWriterConfig live settings
[ https://issues.apache.org/jira/browse/LUCENE-4132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293791#comment-13293791 ] Michael McCandless commented on LUCENE-4132: If we remove IWC's chained setters (return void), can we simplify this? > IndexWriterConfig live settings > --- > > Key: LUCENE-4132 > URL: https://issues.apache.org/jira/browse/LUCENE-4132 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Shai Erera >Priority: Minor > Fix For: 4.0, 5.0 > > Attachments: LUCENE-4132.patch > > > A while ago there was a discussion about making some IW settings "live" and I > remember that RAM buffer size was one of them. Judging from IW code, I see > that RAM buffer can be changed "live" as IW never caches it. > However, I don't remember which other settings were decided to be "live" and > I don't see any documentation in IW nor IWC for that. IW.getConfig mentions: > {code} > * NOTE: some settings may be changed on the > * returned {@link IndexWriterConfig}, and will take > * effect in the current IndexWriter instance. See the > * javadocs for the specific setters in {@link > * IndexWriterConfig} for details. > {code} > But there's no text on e.g. IWC.setRAMBuffer mentioning that. > I think that it'd be good if we make it easier for users to tell which of the > settings are "live" ones. There are few possible ways to do it: > * Introduce a custom @live.setting tag on the relevant IWC.set methods, and > add special text for them in build.xml > ** Or, drop the tag and just document it clearly. > * Separate IWC to two interfaces, LiveConfig and OneTimeConfig (name > proposals are welcome !), have IWC impl both, and introduce another > IW.getLiveConfig which will return that interface, thereby clearly letting > the user know which of the settings are "live". > It'd be good if IWC itself could only expose setXYZ methods for the "live" > settings though. So perhaps, off the top of my head, we can do something like > this: > * Introduce a Config object, which is essentially what IWC is today, and pass > it to IW. > * IW will create a different object, IWC from that Config and IW.getConfig > will return IWC. > * IWC itself will only have setXYZ methods for the "live" settings. > It adds another object, but user code doesn't change - it still creates a > Config object when initializing IW, and need to handle a different type if it > ever calls IW.getConfig. > Maybe that's not such a bad idea? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4132) IndexWriterConfig live settings
[ https://issues.apache.org/jira/browse/LUCENE-4132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293788#comment-13293788 ] Robert Muir commented on LUCENE-4132: - Its not certified by me. Its too confusing for a class everyone must use. I dont care about the builder pattern, builder pattern simply isnt worth it for confusing generics on a config class. > IndexWriterConfig live settings > --- > > Key: LUCENE-4132 > URL: https://issues.apache.org/jira/browse/LUCENE-4132 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Shai Erera >Priority: Minor > Fix For: 4.0, 5.0 > > Attachments: LUCENE-4132.patch > > > A while ago there was a discussion about making some IW settings "live" and I > remember that RAM buffer size was one of them. Judging from IW code, I see > that RAM buffer can be changed "live" as IW never caches it. > However, I don't remember which other settings were decided to be "live" and > I don't see any documentation in IW nor IWC for that. IW.getConfig mentions: > {code} > * NOTE: some settings may be changed on the > * returned {@link IndexWriterConfig}, and will take > * effect in the current IndexWriter instance. See the > * javadocs for the specific setters in {@link > * IndexWriterConfig} for details. > {code} > But there's no text on e.g. IWC.setRAMBuffer mentioning that. > I think that it'd be good if we make it easier for users to tell which of the > settings are "live" ones. There are few possible ways to do it: > * Introduce a custom @live.setting tag on the relevant IWC.set methods, and > add special text for them in build.xml > ** Or, drop the tag and just document it clearly. > * Separate IWC to two interfaces, LiveConfig and OneTimeConfig (name > proposals are welcome !), have IWC impl both, and introduce another > IW.getLiveConfig which will return that interface, thereby clearly letting > the user know which of the settings are "live". > It'd be good if IWC itself could only expose setXYZ methods for the "live" > settings though. So perhaps, off the top of my head, we can do something like > this: > * Introduce a Config object, which is essentially what IWC is today, and pass > it to IW. > * IW will create a different object, IWC from that Config and IW.getConfig > will return IWC. > * IWC itself will only have setXYZ methods for the "live" settings. > It adds another object, but user code doesn't change - it still creates a > Config object when initializing IW, and need to handle a different type if it > ever calls IW.getConfig. > Maybe that's not such a bad idea? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4132) IndexWriterConfig live settings
[ https://issues.apache.org/jira/browse/LUCENE-4132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293784#comment-13293784 ] Uwe Schindler commented on LUCENE-4132: --- That's certified and suggested by the generics policeman. The generics are needed to make the builder API work correct (compare Enum>) > IndexWriterConfig live settings > --- > > Key: LUCENE-4132 > URL: https://issues.apache.org/jira/browse/LUCENE-4132 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Shai Erera >Priority: Minor > Fix For: 4.0, 5.0 > > Attachments: LUCENE-4132.patch > > > A while ago there was a discussion about making some IW settings "live" and I > remember that RAM buffer size was one of them. Judging from IW code, I see > that RAM buffer can be changed "live" as IW never caches it. > However, I don't remember which other settings were decided to be "live" and > I don't see any documentation in IW nor IWC for that. IW.getConfig mentions: > {code} > * NOTE: some settings may be changed on the > * returned {@link IndexWriterConfig}, and will take > * effect in the current IndexWriter instance. See the > * javadocs for the specific setters in {@link > * IndexWriterConfig} for details. > {code} > But there's no text on e.g. IWC.setRAMBuffer mentioning that. > I think that it'd be good if we make it easier for users to tell which of the > settings are "live" ones. There are few possible ways to do it: > * Introduce a custom @live.setting tag on the relevant IWC.set methods, and > add special text for them in build.xml > ** Or, drop the tag and just document it clearly. > * Separate IWC to two interfaces, LiveConfig and OneTimeConfig (name > proposals are welcome !), have IWC impl both, and introduce another > IW.getLiveConfig which will return that interface, thereby clearly letting > the user know which of the settings are "live". > It'd be good if IWC itself could only expose setXYZ methods for the "live" > settings though. So perhaps, off the top of my head, we can do something like > this: > * Introduce a Config object, which is essentially what IWC is today, and pass > it to IW. > * IW will create a different object, IWC from that Config and IW.getConfig > will return IWC. > * IWC itself will only have setXYZ methods for the "live" settings. > It adds another object, but user code doesn't change - it still creates a > Config object when initializing IW, and need to handle a different type if it > ever calls IW.getConfig. > Maybe that's not such a bad idea? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4132) IndexWriterConfig live settings
[ https://issues.apache.org/jira/browse/LUCENE-4132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293783#comment-13293783 ] Robert Muir commented on LUCENE-4132: - I think the class hierarchy/generics are too tricky. Why do we need generics? > IndexWriterConfig live settings > --- > > Key: LUCENE-4132 > URL: https://issues.apache.org/jira/browse/LUCENE-4132 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Shai Erera >Priority: Minor > Fix For: 4.0, 5.0 > > Attachments: LUCENE-4132.patch > > > A while ago there was a discussion about making some IW settings "live" and I > remember that RAM buffer size was one of them. Judging from IW code, I see > that RAM buffer can be changed "live" as IW never caches it. > However, I don't remember which other settings were decided to be "live" and > I don't see any documentation in IW nor IWC for that. IW.getConfig mentions: > {code} > * NOTE: some settings may be changed on the > * returned {@link IndexWriterConfig}, and will take > * effect in the current IndexWriter instance. See the > * javadocs for the specific setters in {@link > * IndexWriterConfig} for details. > {code} > But there's no text on e.g. IWC.setRAMBuffer mentioning that. > I think that it'd be good if we make it easier for users to tell which of the > settings are "live" ones. There are few possible ways to do it: > * Introduce a custom @live.setting tag on the relevant IWC.set methods, and > add special text for them in build.xml > ** Or, drop the tag and just document it clearly. > * Separate IWC to two interfaces, LiveConfig and OneTimeConfig (name > proposals are welcome !), have IWC impl both, and introduce another > IW.getLiveConfig which will return that interface, thereby clearly letting > the user know which of the settings are "live". > It'd be good if IWC itself could only expose setXYZ methods for the "live" > settings though. So perhaps, off the top of my head, we can do something like > this: > * Introduce a Config object, which is essentially what IWC is today, and pass > it to IW. > * IW will create a different object, IWC from that Config and IW.getConfig > will return IWC. > * IWC itself will only have setXYZ methods for the "live" settings. > It adds another object, but user code doesn't change - it still creates a > Config object when initializing IW, and need to handle a different type if it > ever calls IW.getConfig. > Maybe that's not such a bad idea? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4132) IndexWriterConfig live settings
[ https://issues.apache.org/jira/browse/LUCENE-4132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293781#comment-13293781 ] Shai Erera commented on LUCENE-4132: I had a brief chat about IWC.usedByIW with Mike (was introduced in LUCENE-4084), and we both agree it's not needed anymore, as now with IW.getConfig() returning LiveConfig and IW taking IWC in its ctor, no one can pass the same instance returned from getConfig to a new IW, and so the relevant test can be nuked, together with that AtomicBoolean. I'll nuke them then, and absorb ReadOnlyConfig into AbstractLiveConfig and stick with just two concrete clases: LiveConfig returned from IW.getConfig and IWC given to its ctor. I'll post a patch probably tomorrow. > IndexWriterConfig live settings > --- > > Key: LUCENE-4132 > URL: https://issues.apache.org/jira/browse/LUCENE-4132 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Shai Erera >Priority: Minor > Fix For: 4.0, 5.0 > > Attachments: LUCENE-4132.patch > > > A while ago there was a discussion about making some IW settings "live" and I > remember that RAM buffer size was one of them. Judging from IW code, I see > that RAM buffer can be changed "live" as IW never caches it. > However, I don't remember which other settings were decided to be "live" and > I don't see any documentation in IW nor IWC for that. IW.getConfig mentions: > {code} > * NOTE: some settings may be changed on the > * returned {@link IndexWriterConfig}, and will take > * effect in the current IndexWriter instance. See the > * javadocs for the specific setters in {@link > * IndexWriterConfig} for details. > {code} > But there's no text on e.g. IWC.setRAMBuffer mentioning that. > I think that it'd be good if we make it easier for users to tell which of the > settings are "live" ones. There are few possible ways to do it: > * Introduce a custom @live.setting tag on the relevant IWC.set methods, and > add special text for them in build.xml > ** Or, drop the tag and just document it clearly. > * Separate IWC to two interfaces, LiveConfig and OneTimeConfig (name > proposals are welcome !), have IWC impl both, and introduce another > IW.getLiveConfig which will return that interface, thereby clearly letting > the user know which of the settings are "live". > It'd be good if IWC itself could only expose setXYZ methods for the "live" > settings though. So perhaps, off the top of my head, we can do something like > this: > * Introduce a Config object, which is essentially what IWC is today, and pass > it to IW. > * IW will create a different object, IWC from that Config and IW.getConfig > will return IWC. > * IWC itself will only have setXYZ methods for the "live" settings. > It adds another object, but user code doesn't change - it still creates a > Config object when initializing IW, and need to handle a different type if it > ever calls IW.getConfig. > Maybe that's not such a bad idea? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4120) FST should use packed integer arrays
[ https://issues.apache.org/jira/browse/LUCENE-4120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293780#comment-13293780 ] Michael McCandless commented on LUCENE-4120: Patch looks great! bq. I can switch this method to Mutable but this means that it won't be possible to save a FST read from disk anymore (maybe not a problem?) I think that's fine; you can't change an FST once it's built (not yet anyway...). bq. 0..1 gives more chances to different implementations to be selected. FASTEST=7 is only useful for bitsPerValue=1 so that a Direct8 is instantiated. If we used an uniformly distributed float between COMPACT=0 and FASTEST=7, a Direct* implementation would be used more than 6/7 of the time when bitsPerValue>=4. For example, if bitsPerValue=15, a Direct16 will be instantiated if acceptableOverheadRatio>=1/15=0.07 and a Packed64 otherwise. A lower upper bound for acceptableOverheadRatio makes the latter case more likely. Ahh OK that makes sense, so let's leave it as 0..1. Can you move the imports under the copyright header in GrowableWriter.java? > FST should use packed integer arrays > > > Key: LUCENE-4120 > URL: https://issues.apache.org/jira/browse/LUCENE-4120 > Project: Lucene - Java > Issue Type: Improvement > Components: core/FSTs >Reporter: Adrien Grand >Assignee: Adrien Grand >Priority: Minor > Fix For: 4.0 > > Attachments: LUCENE-4120.patch, LUCENE-4120.patch > > > There are some places where an int[] could be advantageously replaced with a > packed integer array. > I am thinking (at least) of: > * FST.nodeAddress (GrowableWriter) > * FST.inCounts (GrowableWriter) > * FST.nodeRefToAddress (read-only Reader) > The serialization/deserialization methods should be modified too in order to > take advantage of PackedInts.get{Reader,Writer}. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3535) Add block support for XMLLoader
[ https://issues.apache.org/jira/browse/SOLR-3535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293761#comment-13293761 ] Simon Rosenthal commented on SOLR-3535: --- Mikhail: not clear to me from the code/comments exactly what this issue/patch is meant to accomplish. I'm assuming that the intention is to be able to add atomically every document in the block at once ? That is a use case which I have encountered (a batch update of a set of records with new product price information, where you want to commit them only when the complete set has been indexed, regardless of autocommits being fired off or other processes issuing commits). If that's the intention, this patch is great ! I attempted to address the problem of undesired autocommits in SOLR-2664 - enable/disable autocommit on the fly, but that patch is very out of date. I do think it should be extended to updates in CSV/JSON and updates using the SolrJ API. +1 for Erik's suggestion on the syntax. > Add block support for XMLLoader > --- > > Key: SOLR-3535 > URL: https://issues.apache.org/jira/browse/SOLR-3535 > Project: Solr > Issue Type: Sub-task > Components: update >Affects Versions: 4.1, 5.0 >Reporter: Mikhail Khludnev >Priority: Minor > Attachments: SOLR-3535.patch > > > I'd like to add the following update xml message: > > > > > out of scope for now: > * other update formats > * update log support (NRT), should not be a big deal > * overwrite feature support for block updates - it's more complicated, I'll > tell you why > Alt > * wdyt about adding attribute to the current tag {pre}{pre} > * or we can establish RunBlockUpdateProcessor which treat every > as a block. > *Test is included!!* > How you'd suggest to improve the patch? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4139) multivalued field with offsets makes corrumpt index
[ https://issues.apache.org/jira/browse/LUCENE-4139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293760#comment-13293760 ] Michael McCandless commented on LUCENE-4139: Patch looks good! Nice find. +1 > multivalued field with offsets makes corrumpt index > > > Key: LUCENE-4139 > URL: https://issues.apache.org/jira/browse/LUCENE-4139 > Project: Lucene - Java > Issue Type: Bug >Affects Versions: 4.0 >Reporter: Robert Muir > Attachments: LUCENE-4139.patch, LUCENE-4139.patch, LUCENE-4139.patch, > LUCENE-4139_test.patch, LUCENE-4139_test.patch > > > I was trying to beef up TestBackwardsCompatibility (LUCENE-4085) but i > accidentally made a corrupt index due to a typo: > {code} > // a field with both offsets and term vectors for a cross-check > FieldType customType3 = new FieldType(TextField.TYPE_STORED); > customType3.setStoreTermVectors(true); > customType3.setStoreTermVectorPositions(true); > customType3.setStoreTermVectorOffsets(true); > customType3.setIndexOptions(IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS); > doc.add(new Field("content3", "here is more content with aaa aaa aaa", > customType3)); > // a field that omits only positions > FieldType customType4 = new FieldType(TextField.TYPE_STORED); > customType4.setStoreTermVectors(true); > customType4.setStoreTermVectorPositions(false); > customType4.setStoreTermVectorOffsets(true); > customType4.setIndexOptions(IndexOptions.DOCS_AND_FREQS); > // check out the copy-paste typo here! i forgot to change this to content4 > doc.add(new Field("content3", "here is more content with aaa aaa aaa", > customType3)); > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4139) multivalued field with offsets makes corrumpt index
[ https://issues.apache.org/jira/browse/LUCENE-4139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-4139: Attachment: LUCENE-4139.patch stupid IDE. forgot to press save. This one actually has the 'prevOffset -> offsetAccum' rename. > multivalued field with offsets makes corrumpt index > > > Key: LUCENE-4139 > URL: https://issues.apache.org/jira/browse/LUCENE-4139 > Project: Lucene - Java > Issue Type: Bug >Affects Versions: 4.0 >Reporter: Robert Muir > Attachments: LUCENE-4139.patch, LUCENE-4139.patch, LUCENE-4139.patch, > LUCENE-4139_test.patch, LUCENE-4139_test.patch > > > I was trying to beef up TestBackwardsCompatibility (LUCENE-4085) but i > accidentally made a corrupt index due to a typo: > {code} > // a field with both offsets and term vectors for a cross-check > FieldType customType3 = new FieldType(TextField.TYPE_STORED); > customType3.setStoreTermVectors(true); > customType3.setStoreTermVectorPositions(true); > customType3.setStoreTermVectorOffsets(true); > customType3.setIndexOptions(IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS); > doc.add(new Field("content3", "here is more content with aaa aaa aaa", > customType3)); > // a field that omits only positions > FieldType customType4 = new FieldType(TextField.TYPE_STORED); > customType4.setStoreTermVectors(true); > customType4.setStoreTermVectorPositions(false); > customType4.setStoreTermVectorOffsets(true); > customType4.setIndexOptions(IndexOptions.DOCS_AND_FREQS); > // check out the copy-paste typo here! i forgot to change this to content4 > doc.add(new Field("content3", "here is more content with aaa aaa aaa", > customType3)); > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2605) queryparser parses on whitespace
[ https://issues.apache.org/jira/browse/LUCENE-2605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293748#comment-13293748 ] Jack Krupansky commented on LUCENE-2605: My thought on the original issue is that most query parsers should accumulate adjacent terms without intervening operators as a "term list" (quoted phrases would be a second level of term list) and that there needs to be a "list" interface for query term analysis. Rather than simply present a raw text stream for the sequence/list of terms, each term would be fed into the token stream with an attribute that indicates which source term it belongs to. The synonym processor would see a clean flow of terms and do its processing, but would also need to associate an id with each term of a multi-term synonym phrase so that multiple multi-word synonym choices for the same input term(s) don't get mixed up (i.e., multiple tokens at the same position with no indication of which original synonym phrase they came from). By having those ID's for each multi-term synonym phrase, the caller of the list analyzer could then recontruct the tree of "OR" expressions for the various multi-term synonym phrases. > queryparser parses on whitespace > > > Key: LUCENE-2605 > URL: https://issues.apache.org/jira/browse/LUCENE-2605 > Project: Lucene - Java > Issue Type: Bug > Components: core/queryparser >Reporter: Robert Muir > Fix For: 4.1 > > > The queryparser parses input on whitespace, and sends each whitespace > separated term to its own independent token stream. > This breaks the following at query-time, because they can't see across > whitespace boundaries: > * n-gram analysis > * shingles > * synonyms (especially multi-word for whitespace-separated languages) > * languages where a 'word' can contain whitespace (e.g. vietnamese) > Its also rather unexpected, as users think their > charfilters/tokenizers/tokenfilters will do the same thing at index and > querytime, but > in many cases they can't. Instead, preferably the queryparser would parse > around only real 'operators'. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Text Extraction Using iText
Start by looking at the Tika code that integrates PDFBox since that is exactly where you want to end up – if you want to integrate your code with Tika and SolrCell. http://svn.apache.org/repos/asf/tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/pdf/ If you are going to replace PDFBox in Tika for SolrCell, that is one thing, but if you want to feed the output of your extractor directly to Solr from your own client application, see the Solr XML format and the SolrJ interface. Ultimately, your extractor will produce two things: 1) extracted content or body text, and 2) metadata, all of which are simply “fields” in a “Solr input document.” http://wiki.apache.org/solr/UpdateXmlMessages http://wiki.apache.org/solr/Solrj -- Jack Krupansky From: Roland Ucker Sent: Tuesday, June 12, 2012 2:32 AM To: dev@lucene.apache.org Subject: Text Extraction Using iText Hello, I would like to write my own pdf text/metadata extraction module using iText instead of tika/pdfbox. Where to start? Any hints? Regards, Roland
[jira] [Comment Edited] (LUCENE-4132) IndexWriterConfig live settings
[ https://issues.apache.org/jira/browse/LUCENE-4132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293725#comment-13293725 ] Shai Erera edited comment on LUCENE-4132 at 6/12/12 4:13 PM: - Phew, that was tricky, but here's the end result -- refactored IndexWriterConfig into the following class hierarchy: * ReadOnlyConfig ** AbstractLiveConfig *** LiveConfig *** IndexWriterConfig * IndexWriter now takes ReadOnlyConfig, which is an abstract class with all abstract getters. * LiveConfig is returned from IndexWriter.getConfig(), and is initialized with the ReadOnlyConfig given to IW. It overrides all getters to delegate the call to the given (cloned) config. It is public but with a package-private ctor. * IndexWriterConfig is still the entry object for users to initialize an IndexWriter, and adds its own setters for the non-live settings. * The AbstractLiveConfig in the middle is used for generics and keeping the builder pattern. That way, LiveConfig.set1() and IndexWriterConfig.set1() return the proper type (LiveConfig or IndexWriterConfig respectively). I would have liked IW to keep getting IWC in its ctor, but there's one test that prevents it: TestIndexWriterConfig.testIWCInvalidReuse, which initializes an IW, call getConfig and passes it to another IW (which is invalid). I don't know why it's invalid, as IW clones the given IWC, but that is one reason why I had to factor the getters out to a shared ReadOnlyConfig. ROC is not that bad though -- it kind of protects against IW changing the given config ... At least, no user code should change following these changes, except from changing the variable type used to cache IW.getConfig() to LiveConfig, which is what we want. was (Author: shaie): Phew, that was tricky, but here's the end result -- refactored IndexWriterConfig into the following class hierarchy: - ReadOnlyConfig |_ AbstractLiveConfig |_ LiveConfig |_ IndexWriterConfig * IndexWriter now takes ReadOnlyConfig, which is an abstract class with all abstract getters. * LiveConfig is returned from IndexWriter.getConfig(), and is initialized with the ReadOnlyConfig given to IW. It overrides all getters to delegate the call to the given (cloned) config. It is public but with a package-private ctor. * IndexWriterConfig is still the entry object for users to initialize an IndexWriter, and adds its own setters for the non-live settings. * The AbstractLiveConfig in the middle is used for generics and keeping the builder pattern. That way, LiveConfig.set1() and IndexWriterConfig.set1() return the proper type (LiveConfig or IndexWriterConfig respectively). I would have liked IW to keep getting IWC in its ctor, but there's one test that prevents it: TestIndexWriterConfig.testIWCInvalidReuse, which initializes an IW, call getConfig and passes it to another IW (which is invalid). I don't know why it's invalid, as IW clones the given IWC, but that is one reason why I had to factor the getters out to a shared ReadOnlyConfig. ROC is not that bad though -- it kind of protects against IW changing the given config ... At least, no user code should change following these changes, except from changing the variable type used to cache IW.getConfig() to LiveConfig, which is what we want. > IndexWriterConfig live settings > --- > > Key: LUCENE-4132 > URL: https://issues.apache.org/jira/browse/LUCENE-4132 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Shai Erera >Priority: Minor > Fix For: 4.0, 5.0 > > Attachments: LUCENE-4132.patch > > > A while ago there was a discussion about making some IW settings "live" and I > remember that RAM buffer size was one of them. Judging from IW code, I see > that RAM buffer can be changed "live" as IW never caches it. > However, I don't remember which other settings were decided to be "live" and > I don't see any documentation in IW nor IWC for that. IW.getConfig mentions: > {code} > * NOTE: some settings may be changed on the > * returned {@link IndexWriterConfig}, and will take > * effect in the current IndexWriter instance. See the > * javadocs for the specific setters in {@link > * IndexWriterConfig} for details. > {code} > But there's no text on e.g. IWC.setRAMBuffer mentioning that. > I think that it'd be good if we make it easier for users to tell which of the > settings are "live" ones. There are few possible ways to do it: > * Introduce a custom @live.setting tag on the relevant IWC.set methods, and > add special text for them in build.xml > ** Or, drop the tag and just document it clearly. > * Separate IWC to two interfaces, LiveConfig and OneTimeConfig (name > proposals are welcome !), have IWC impl both, and introduce another > IW.ge
[jira] [Updated] (LUCENE-4132) IndexWriterConfig live settings
[ https://issues.apache.org/jira/browse/LUCENE-4132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-4132: --- Attachment: LUCENE-4132.patch Phew, that was tricky, but here's the end result -- refactored IndexWriterConfig into the following class hierarchy: - ReadOnlyConfig |_ AbstractLiveConfig |_ LiveConfig |_ IndexWriterConfig * IndexWriter now takes ReadOnlyConfig, which is an abstract class with all abstract getters. * LiveConfig is returned from IndexWriter.getConfig(), and is initialized with the ReadOnlyConfig given to IW. It overrides all getters to delegate the call to the given (cloned) config. It is public but with a package-private ctor. * IndexWriterConfig is still the entry object for users to initialize an IndexWriter, and adds its own setters for the non-live settings. * The AbstractLiveConfig in the middle is used for generics and keeping the builder pattern. That way, LiveConfig.set1() and IndexWriterConfig.set1() return the proper type (LiveConfig or IndexWriterConfig respectively). I would have liked IW to keep getting IWC in its ctor, but there's one test that prevents it: TestIndexWriterConfig.testIWCInvalidReuse, which initializes an IW, call getConfig and passes it to another IW (which is invalid). I don't know why it's invalid, as IW clones the given IWC, but that is one reason why I had to factor the getters out to a shared ReadOnlyConfig. ROC is not that bad though -- it kind of protects against IW changing the given config ... At least, no user code should change following these changes, except from changing the variable type used to cache IW.getConfig() to LiveConfig, which is what we want. > IndexWriterConfig live settings > --- > > Key: LUCENE-4132 > URL: https://issues.apache.org/jira/browse/LUCENE-4132 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Shai Erera >Priority: Minor > Fix For: 4.0, 5.0 > > Attachments: LUCENE-4132.patch > > > A while ago there was a discussion about making some IW settings "live" and I > remember that RAM buffer size was one of them. Judging from IW code, I see > that RAM buffer can be changed "live" as IW never caches it. > However, I don't remember which other settings were decided to be "live" and > I don't see any documentation in IW nor IWC for that. IW.getConfig mentions: > {code} > * NOTE: some settings may be changed on the > * returned {@link IndexWriterConfig}, and will take > * effect in the current IndexWriter instance. See the > * javadocs for the specific setters in {@link > * IndexWriterConfig} for details. > {code} > But there's no text on e.g. IWC.setRAMBuffer mentioning that. > I think that it'd be good if we make it easier for users to tell which of the > settings are "live" ones. There are few possible ways to do it: > * Introduce a custom @live.setting tag on the relevant IWC.set methods, and > add special text for them in build.xml > ** Or, drop the tag and just document it clearly. > * Separate IWC to two interfaces, LiveConfig and OneTimeConfig (name > proposals are welcome !), have IWC impl both, and introduce another > IW.getLiveConfig which will return that interface, thereby clearly letting > the user know which of the settings are "live". > It'd be good if IWC itself could only expose setXYZ methods for the "live" > settings though. So perhaps, off the top of my head, we can do something like > this: > * Introduce a Config object, which is essentially what IWC is today, and pass > it to IW. > * IW will create a different object, IWC from that Config and IW.getConfig > will return IWC. > * IWC itself will only have setXYZ methods for the "live" settings. > It adds another object, but user code doesn't change - it still creates a > Config object when initializing IW, and need to handle a different type if it > ever calls IW.getConfig. > Maybe that's not such a bad idea? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-2605) queryparser parses on whitespace
[ https://issues.apache.org/jira/browse/LUCENE-2605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293722#comment-13293722 ] John Berryman edited comment on LUCENE-2605 at 6/12/12 4:09 PM: There is somewhat of a workaround for this for defType=lucene. Just escape every whitespace with a slash. So instead of *{{new dress shoes}}* search for *{{new\ dress\ shoes}}*. Of course you lose the ability to use normal lucene syntax. I was hoping that this workaround would also work for defType=dismax, but with or without the escaped whitespace, queries get interpreted the same, incorrect way. For instance, assume I have the following line in my synonyms.txt: *{{dress shoes => dress_shoes}}*. Further assume that I have a field *{{experiment}}* that gets analysed with synonyms. A search for *{{new dress shoes}}* (with or without escaped spaces) will be interpreted as *{{+((experiment:new)~0.01 (experiment:dress)~0.01 (experiment:shoes)~0.01) (experiment:"new dress_shoes"~3)~0.01}}* The first clause is manditory and contains independently analysed tokens, so this will only match documents that contain "dress", "new", or "shoes", but never "dress shoes" because analysis takes place as expected at index time. was (Author: berryman): There is somewhat of a workaround for this for defType=lucene. Just escape every whitespace with *{{\}}* . So instead of *{{new dress shoes}}* search for *{{new\ dress\ shoes}}*. Of course you lose the ability to use normal lucene syntax. I was hoping that this workaround would also work for defType=dismax, but with or without the escaped whitespace, queries get interpreted the same, incorrect way. For instance, assume I have the following line in my synonyms.txt: *{{dress shoes => dress_shoes}}*. Further assume that I have a field *{{experiment}}* that gets analysed with synonyms. A search for *{{new dress shoes}}* (with or without escaped spaces) will be interpreted as *{{+((experiment:new)~0.01 (experiment:dress)~0.01 (experiment:shoes)~0.01) (experiment:"new dress_shoes"~3)~0.01}}* The first clause is manditory and contains independently analysed tokens, so this will only match documents that contain "dress", "new", or "shoes", but never "dress shoes" because analysis takes place as expected at index time. > queryparser parses on whitespace > > > Key: LUCENE-2605 > URL: https://issues.apache.org/jira/browse/LUCENE-2605 > Project: Lucene - Java > Issue Type: Bug > Components: core/queryparser >Reporter: Robert Muir > Fix For: 4.1 > > > The queryparser parses input on whitespace, and sends each whitespace > separated term to its own independent token stream. > This breaks the following at query-time, because they can't see across > whitespace boundaries: > * n-gram analysis > * shingles > * synonyms (especially multi-word for whitespace-separated languages) > * languages where a 'word' can contain whitespace (e.g. vietnamese) > Its also rather unexpected, as users think their > charfilters/tokenizers/tokenfilters will do the same thing at index and > querytime, but > in many cases they can't. Instead, preferably the queryparser would parse > around only real 'operators'. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2605) queryparser parses on whitespace
[ https://issues.apache.org/jira/browse/LUCENE-2605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293722#comment-13293722 ] John Berryman commented on LUCENE-2605: --- There is somewhat of a workaround for this for defType=lucene. Just escape every whitespace with *{{\}}* . So instead of *{{new dress shoes}}* search for *{{new\ dress\ shoes}}*. Of course you lose the ability to use normal lucene syntax. I was hoping that this workaround would also work for defType=dismax, but with or without the escaped whitespace, queries get interpreted the same, incorrect way. For instance, assume I have the following line in my synonyms.txt: *{{dress shoes => dress_shoes}}*. Further assume that I have a field *{{experiment}}* that gets analysed with synonyms. A search for *{{new dress shoes}}* (with or without escaped spaces) will be interpreted as *{{+((experiment:new)~0.01 (experiment:dress)~0.01 (experiment:shoes)~0.01) (experiment:"new dress_shoes"~3)~0.01}}* The first clause is manditory and contains independently analysed tokens, so this will only match documents that contain "dress", "new", or "shoes", but never "dress shoes" because analysis takes place as expected at index time. > queryparser parses on whitespace > > > Key: LUCENE-2605 > URL: https://issues.apache.org/jira/browse/LUCENE-2605 > Project: Lucene - Java > Issue Type: Bug > Components: core/queryparser >Reporter: Robert Muir > Fix For: 4.1 > > > The queryparser parses input on whitespace, and sends each whitespace > separated term to its own independent token stream. > This breaks the following at query-time, because they can't see across > whitespace boundaries: > * n-gram analysis > * shingles > * synonyms (especially multi-word for whitespace-separated languages) > * languages where a 'word' can contain whitespace (e.g. vietnamese) > Its also rather unexpected, as users think their > charfilters/tokenizers/tokenfilters will do the same thing at index and > querytime, but > in many cases they can't. Instead, preferably the queryparser would parse > around only real 'operators'. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [JENKINS] Lucene-Solr-trunk-Windows-Java7-64 - Build # 300 - Still Failing!
I am working on it, I have identified at least one problem which is that the overseer is killed too often in that test, i'll let the test run locally for a bit and if everything looks good commit a fix tomorrow. -- Sami Siren On Tue, Jun 12, 2012 at 6:32 PM, Mark Miller wrote: > While working on the collections api, I have seen this on the odd occasion > locally as well. > > On Jun 12, 2012, at 10:43 AM, jenk...@sd-datasolutions.de wrote: > >> Build: >> http://jenkins.sd-datasolutions.de/job/Lucene-Solr-trunk-Windows-Java7-64/300/ >> >> 1 tests failed. >> FAILED: org.apache.solr.cloud.OverseerTest.testShardLeaderChange >> >> Error Message: >> Unexpected shard leader coll:collection1 shard:shard1 expected: but >> was: >> >> Stack Trace: >> org.junit.ComparisonFailure: Unexpected shard leader coll:collection1 >> shard:shard1 expected: but was: >> at >> __randomizedtesting.SeedInfo.seed([195A5E746C7F55C0:C709D98376E7A031]:0) >> at org.junit.Assert.assertEquals(Assert.java:125) >> at >> org.apache.solr.cloud.OverseerTest.verifyShardLeader(OverseerTest.java:522) >> at >> org.apache.solr.cloud.OverseerTest.testShardLeaderChange(OverseerTest.java:677) >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> at >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) >> at >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) >> at java.lang.reflect.Method.invoke(Method.java:601) >> at >> com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1969) >> at >> com.carrotsearch.randomizedtesting.RandomizedRunner.access$1100(RandomizedRunner.java:132) >> at >> com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:814) >> at >> com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:875) >> at >> com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:889) >> at >> com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) >> at >> org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) >> at >> org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:32) >> at >> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) >> at >> com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) >> at >> org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68) >> at >> org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48) >> at >> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) >> at >> com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:821) >> at >> com.carrotsearch.randomizedtesting.RandomizedRunner.access$700(RandomizedRunner.java:132) >> at >> com.carrotsearch.randomizedtesting.RandomizedRunner$3$1.run(RandomizedRunner.java:669) >> at >> com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:695) >> at >> com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:734) >> at >> com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:745) >> at >> com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) >> at >> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) >> at >> org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68) >> at >> org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38) >> at >> org.apache.lucene.util.TestRuleIcuHack$1.evaluate(TestRuleIcuHack.java:51) >> at >> com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) >> at >> org.apache.lucene.util.TestRuleNoInstanceHooksOverrides$1.evaluate(TestRuleNoInstanceHooksOverrides.java:53) >> at >> org.apache.lucene.util.TestRuleNoStaticHooksShadowing$1.evaluate(TestRuleNoStaticHooksShadowing.java:52) >> at >> org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:36) >> at >> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) >> at >> org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:56) >> at >> com.carrotsearch.randomizedtesting.RandomizedRunner.runSuite(RandomizedRunner.java:605)
Text Extraction Using iText
Hello, I would like to write my own pdf text/metadata extraction module using iText instead of tika/pdfbox. Where to start? Any hints? Regards, Roland
Re: [JENKINS] Lucene-Solr-trunk-Windows-Java7-64 - Build # 300 - Still Failing!
While working on the collections api, I have seen this on the odd occasion locally as well. On Jun 12, 2012, at 10:43 AM, jenk...@sd-datasolutions.de wrote: > Build: > http://jenkins.sd-datasolutions.de/job/Lucene-Solr-trunk-Windows-Java7-64/300/ > > 1 tests failed. > FAILED: org.apache.solr.cloud.OverseerTest.testShardLeaderChange > > Error Message: > Unexpected shard leader coll:collection1 shard:shard1 expected: but > was: > > Stack Trace: > org.junit.ComparisonFailure: Unexpected shard leader coll:collection1 > shard:shard1 expected: but was: > at > __randomizedtesting.SeedInfo.seed([195A5E746C7F55C0:C709D98376E7A031]:0) > at org.junit.Assert.assertEquals(Assert.java:125) > at > org.apache.solr.cloud.OverseerTest.verifyShardLeader(OverseerTest.java:522) > at > org.apache.solr.cloud.OverseerTest.testShardLeaderChange(OverseerTest.java:677) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:601) > at > com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1969) > at > com.carrotsearch.randomizedtesting.RandomizedRunner.access$1100(RandomizedRunner.java:132) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:814) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:875) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:889) > at > com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) > at > org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) > at > org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:32) > at > org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) > at > com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) > at > org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68) > at > org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48) > at > org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) > at > com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:821) > at > com.carrotsearch.randomizedtesting.RandomizedRunner.access$700(RandomizedRunner.java:132) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$3$1.run(RandomizedRunner.java:669) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:695) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:734) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:745) > at > com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) > at > org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) > at > org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68) > at > org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38) > at > org.apache.lucene.util.TestRuleIcuHack$1.evaluate(TestRuleIcuHack.java:51) > at > com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) > at > org.apache.lucene.util.TestRuleNoInstanceHooksOverrides$1.evaluate(TestRuleNoInstanceHooksOverrides.java:53) > at > org.apache.lucene.util.TestRuleNoStaticHooksShadowing$1.evaluate(TestRuleNoStaticHooksShadowing.java:52) > at > org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:36) > at > org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) > at > org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:56) > at > com.carrotsearch.randomizedtesting.RandomizedRunner.runSuite(RandomizedRunner.java:605) > at > com.carrotsearch.randomizedtesting.RandomizedRunner.access$400(RandomizedRunner.java:132) > at > com.carrotsearch.randomizedtesting.RandomizedRunner$2.run(RandomizedRunner.java:551) > > > > > Build Log: > [...truncated 11002 lines...] > [junit4] 2> at > org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:927) > [junit4] 2> at
[jira] [Updated] (LUCENE-4139) multivalued field with offsets makes corrumpt index
[ https://issues.apache.org/jira/browse/LUCENE-4139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-4139: Attachment: LUCENE-4139.patch updated patch, i renamed the prevOffset in writeOffset to offsetAccum (i think this is less misleading). also added a random test. > multivalued field with offsets makes corrumpt index > > > Key: LUCENE-4139 > URL: https://issues.apache.org/jira/browse/LUCENE-4139 > Project: Lucene - Java > Issue Type: Bug >Affects Versions: 4.0 >Reporter: Robert Muir > Attachments: LUCENE-4139.patch, LUCENE-4139.patch, > LUCENE-4139_test.patch, LUCENE-4139_test.patch > > > I was trying to beef up TestBackwardsCompatibility (LUCENE-4085) but i > accidentally made a corrupt index due to a typo: > {code} > // a field with both offsets and term vectors for a cross-check > FieldType customType3 = new FieldType(TextField.TYPE_STORED); > customType3.setStoreTermVectors(true); > customType3.setStoreTermVectorPositions(true); > customType3.setStoreTermVectorOffsets(true); > customType3.setIndexOptions(IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS); > doc.add(new Field("content3", "here is more content with aaa aaa aaa", > customType3)); > // a field that omits only positions > FieldType customType4 = new FieldType(TextField.TYPE_STORED); > customType4.setStoreTermVectors(true); > customType4.setStoreTermVectorPositions(false); > customType4.setStoreTermVectorOffsets(true); > customType4.setIndexOptions(IndexOptions.DOCS_AND_FREQS); > // check out the copy-paste typo here! i forgot to change this to content4 > doc.add(new Field("content3", "here is more content with aaa aaa aaa", > customType3)); > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3440) FastVectorHighlighter: IDF-weighted terms for ordered fragments
[ https://issues.apache.org/jira/browse/LUCENE-3440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293669#comment-13293669 ] Sebastian Lutze commented on LUCENE-3440: - Hi Koji, bq. I'm going to close and mark this issue as resolved because I think Lucene part has been completed. that's really awesome! bq. Can you open a separate issue for Solr part? Sure. bq. This is a great improvement for FVH. I really appreciate what you've done! It was an honor for me! :) > FastVectorHighlighter: IDF-weighted terms for ordered fragments > > > Key: LUCENE-3440 > URL: https://issues.apache.org/jira/browse/LUCENE-3440 > Project: Lucene - Java > Issue Type: Improvement > Components: modules/highlighter >Reporter: Sebastian Lutze >Assignee: Koji Sekiguchi >Priority: Minor > Labels: FastVectorHighlighter > Fix For: 4.0, 5.0 > > Attachments: LUCENE-3440.patch, LUCENE-3440.patch, LUCENE-3440.patch, > LUCENE-3440_3.6.1-SNAPSHOT.patch, LUCENE-4.0-SNAPSHOT-3440-9.patch, > weight-vs-boost_table01.html, weight-vs-boost_table02.html > > > The FastVectorHighlighter uses for every term found in a fragment an equal > weight, which causes a higher ranking for fragments with a high number of > words or, in the worst case, a high number of very common words than > fragments that contains *all* of the terms used in the original query. > This patch provides ordered fragments with IDF-weighted terms: > total weight = total weight + IDF for unique term per fragment * boost of > query; > The ranking-formula should be the same, or at least similar, to that one used > in org.apache.lucene.search.highlight.QueryTermScorer. > The patch is simple, but it works for us. > Some ideas: > - A better approach would be moving the whole fragments-scoring into a > separate class. > - Switch scoring via parameter > - Exact phrases should be given a even better score, regardless if a > phrase-query was executed or not > - edismax/dismax-parameters pf, ps and pf^boost should be observed and > corresponding fragments should be ranked higher -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-trunk-Windows-Java7-64 - Build # 300 - Still Failing!
Build: http://jenkins.sd-datasolutions.de/job/Lucene-Solr-trunk-Windows-Java7-64/300/ 1 tests failed. FAILED: org.apache.solr.cloud.OverseerTest.testShardLeaderChange Error Message: Unexpected shard leader coll:collection1 shard:shard1 expected: but was: Stack Trace: org.junit.ComparisonFailure: Unexpected shard leader coll:collection1 shard:shard1 expected: but was: at __randomizedtesting.SeedInfo.seed([195A5E746C7F55C0:C709D98376E7A031]:0) at org.junit.Assert.assertEquals(Assert.java:125) at org.apache.solr.cloud.OverseerTest.verifyShardLeader(OverseerTest.java:522) at org.apache.solr.cloud.OverseerTest.testShardLeaderChange(OverseerTest.java:677) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1969) at com.carrotsearch.randomizedtesting.RandomizedRunner.access$1100(RandomizedRunner.java:132) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:814) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:875) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:889) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:32) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:821) at com.carrotsearch.randomizedtesting.RandomizedRunner.access$700(RandomizedRunner.java:132) at com.carrotsearch.randomizedtesting.RandomizedRunner$3$1.run(RandomizedRunner.java:669) at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:695) at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:734) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:745) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) at org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38) at org.apache.lucene.util.TestRuleIcuHack$1.evaluate(TestRuleIcuHack.java:51) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleNoInstanceHooksOverrides$1.evaluate(TestRuleNoInstanceHooksOverrides.java:53) at org.apache.lucene.util.TestRuleNoStaticHooksShadowing$1.evaluate(TestRuleNoStaticHooksShadowing.java:52) at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:36) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:56) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSuite(RandomizedRunner.java:605) at com.carrotsearch.randomizedtesting.RandomizedRunner.access$400(RandomizedRunner.java:132) at com.carrotsearch.randomizedtesting.RandomizedRunner$2.run(RandomizedRunner.java:551) Build Log: [...truncated 11002 lines...] [junit4] 2>at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:927) [junit4] 2>at org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:289) [junit4] 2>at org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:286) [junit4] 2>at org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecu
[jira] [Resolved] (LUCENE-3440) FastVectorHighlighter: IDF-weighted terms for ordered fragments
[ https://issues.apache.org/jira/browse/LUCENE-3440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi resolved LUCENE-3440. Resolution: Fixed Fix Version/s: 5.0 Assignee: Koji Sekiguchi Thanks, Sebastian! > FastVectorHighlighter: IDF-weighted terms for ordered fragments > > > Key: LUCENE-3440 > URL: https://issues.apache.org/jira/browse/LUCENE-3440 > Project: Lucene - Java > Issue Type: Improvement > Components: modules/highlighter >Reporter: Sebastian Lutze >Assignee: Koji Sekiguchi >Priority: Minor > Labels: FastVectorHighlighter > Fix For: 4.0, 5.0 > > Attachments: LUCENE-3440.patch, LUCENE-3440.patch, LUCENE-3440.patch, > LUCENE-3440_3.6.1-SNAPSHOT.patch, LUCENE-4.0-SNAPSHOT-3440-9.patch, > weight-vs-boost_table01.html, weight-vs-boost_table02.html > > > The FastVectorHighlighter uses for every term found in a fragment an equal > weight, which causes a higher ranking for fragments with a high number of > words or, in the worst case, a high number of very common words than > fragments that contains *all* of the terms used in the original query. > This patch provides ordered fragments with IDF-weighted terms: > total weight = total weight + IDF for unique term per fragment * boost of > query; > The ranking-formula should be the same, or at least similar, to that one used > in org.apache.lucene.search.highlight.QueryTermScorer. > The patch is simple, but it works for us. > Some ideas: > - A better approach would be moving the whole fragments-scoring into a > separate class. > - Switch scoring via parameter > - Exact phrases should be given a even better score, regardless if a > phrase-query was executed or not > - edismax/dismax-parameters pf, ps and pf^boost should be observed and > corresponding fragments should be ranked higher -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3440) FastVectorHighlighter: IDF-weighted terms for ordered fragments
[ https://issues.apache.org/jira/browse/LUCENE-3440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293653#comment-13293653 ] Koji Sekiguchi commented on LUCENE-3440: Hi Sebastian, I've committed LUCENE-4133. I'm going to close and mark this issue as resolved because I think Lucene part has been completed. Can you open a separate issue for Solr part? This is a great improvement for FVH. I really appreciate what you've done! > FastVectorHighlighter: IDF-weighted terms for ordered fragments > > > Key: LUCENE-3440 > URL: https://issues.apache.org/jira/browse/LUCENE-3440 > Project: Lucene - Java > Issue Type: Improvement > Components: modules/highlighter >Reporter: Sebastian Lutze >Priority: Minor > Labels: FastVectorHighlighter > Fix For: 4.0 > > Attachments: LUCENE-3440.patch, LUCENE-3440.patch, LUCENE-3440.patch, > LUCENE-3440_3.6.1-SNAPSHOT.patch, LUCENE-4.0-SNAPSHOT-3440-9.patch, > weight-vs-boost_table01.html, weight-vs-boost_table02.html > > > The FastVectorHighlighter uses for every term found in a fragment an equal > weight, which causes a higher ranking for fragments with a high number of > words or, in the worst case, a high number of very common words than > fragments that contains *all* of the terms used in the original query. > This patch provides ordered fragments with IDF-weighted terms: > total weight = total weight + IDF for unique term per fragment * boost of > query; > The ranking-formula should be the same, or at least similar, to that one used > in org.apache.lucene.search.highlight.QueryTermScorer. > The patch is simple, but it works for us. > Some ideas: > - A better approach would be moving the whole fragments-scoring into a > separate class. > - Switch scoring via parameter > - Exact phrases should be given a even better score, regardless if a > phrase-query was executed or not > - edismax/dismax-parameters pf, ps and pf^boost should be observed and > corresponding fragments should be ranked higher -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3534) dismax and edismax should default to "df" when "qf" is absent.
[ https://issues.apache.org/jira/browse/SOLR-3534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293651#comment-13293651 ] Jack Krupansky commented on SOLR-3534: -- Looks like a reasonable compromise. I would edit the exception so that it reads like you just said: "neither 'qf', 'df' nor the default search field are present" or at least add "Neither" in front of what you currently have. Any idea what happens for the classic/Solr or flex query parsers if default search field is not present? > dismax and edismax should default to "df" when "qf" is absent. > -- > > Key: SOLR-3534 > URL: https://issues.apache.org/jira/browse/SOLR-3534 > Project: Solr > Issue Type: Improvement > Components: query parsers >Affects Versions: 4.0 >Reporter: David Smiley >Assignee: David Smiley >Priority: Minor > Attachments: > SOLR-3534_dismax_and_edismax_should_default_to_df_if_qf_is_absent.patch > > > The dismax and edismax query parsers should default to "df" when the "qf" > parameter is absent. They only use the defaultSearchField in schema.xml as a > fallback now. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3534) dismax and edismax should default to "df" when "qf" is absent.
[ https://issues.apache.org/jira/browse/SOLR-3534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293647#comment-13293647 ] David Smiley commented on SOLR-3534: Bernd: In my patch I did throw an exception if neither 'qf', 'df' nor the default search field were present. It's tempting to log warnings if a default is relied upon that is inadvisable (like defaultSearchField), but that could flood logs. A one-time flag could be set to prevent this I guess. > dismax and edismax should default to "df" when "qf" is absent. > -- > > Key: SOLR-3534 > URL: https://issues.apache.org/jira/browse/SOLR-3534 > Project: Solr > Issue Type: Improvement > Components: query parsers >Affects Versions: 4.0 >Reporter: David Smiley >Assignee: David Smiley >Priority: Minor > Attachments: > SOLR-3534_dismax_and_edismax_should_default_to_df_if_qf_is_absent.patch > > > The dismax and edismax query parsers should default to "df" when the "qf" > parameter is absent. They only use the defaultSearchField in schema.xml as a > fallback now. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-4133) FastVectorHighlighter: A weighted approach for ordered fragments
[ https://issues.apache.org/jira/browse/LUCENE-4133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi resolved LUCENE-4133. Resolution: Fixed Fix Version/s: 5.0 Committed in trunk and 4x. > FastVectorHighlighter: A weighted approach for ordered fragments > > > Key: LUCENE-4133 > URL: https://issues.apache.org/jira/browse/LUCENE-4133 > Project: Lucene - Java > Issue Type: Improvement > Components: modules/highlighter >Affects Versions: 4.0, 5.0 >Reporter: Sebastian Lutze >Assignee: Koji Sekiguchi >Priority: Minor > Labels: FastVectorHighlighter > Fix For: 4.0, 5.0 > > Attachments: LUCENE-4133.patch, LUCENE-4133.patch > > > The FastVectorHighlighter currently disregards IDF-weights for matching terms > within generated fragments. In the worst case, a fragment, which contains > high number of very common words, is scored higher, than a fragment that > contains *all* of the terms which have been used in the original query. > This patch provides ordered fragments with IDF-weighted terms: > *For each distinct matching term per fragment:* > _weight = weight + IDF * boost_ > *For each fragment:* > _weight = weight * length * 1 / sqrt( length )_ > |weight| total weight of fragment > |IDF| inverse document frequency for each distinct matching term > |boost| query boost as provided, for example _term^2_ > |length| total number of non-distinct matching terms per fragment > *Method:* > {code:java} > public void add( int startOffset, int endOffset, List > phraseInfoList ) { > > float totalBoost = 0; > > List subInfos = new ArrayList(); > HashSet distinctTerms = new HashSet(); > > int length = 0; > for( WeightedPhraseInfo phraseInfo : phraseInfoList ){ > subInfos.add( new SubInfo( phraseInfo.getText(), > phraseInfo.getTermsOffsets(), phraseInfo.getSeqnum() ) ); > for ( TermInfo ti : phraseInfo.getTermsInfos()) { > if ( distinctTerms.add( ti.getText() ) ) > totalBoost += ti.getWeight() * phraseInfo.getBoost(); > length++; > } > } > totalBoost *= length * ( 1 / Math.sqrt( length ) ); > > getFragInfos().add( new WeightedFragInfo( startOffset, endOffset, > subInfos, totalBoost ) ); > } > {code} > The ranking-formula should be the same, or at least similar, to that one used > in QueryTermScorer. > *This patch contains:* > * a changed class-member in FieldPhraseList (termInfos to termsInfos) > * a changed local variable in SimpleFieldFragList (score to totalBoost) > * adds a missing @override in SimpleFragListBuilder > * class WeightedFieldFragList, a implementation of FieldFragList > * class WeightedFragListBuilder, a implementation of BaseFragListBuilder > * class WeightedFragListBuilderTest, a simple test-case > * updated docs for FVH > Last part (see also LUCENE-4091, LUCENE-4107, LUCENE-4113) of LUCENE-3440. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3534) dismax and edismax should default to "df" when "qf" is absent.
[ https://issues.apache.org/jira/browse/SOLR-3534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley updated SOLR-3534: --- Attachment: SOLR-3534_dismax_and_edismax_should_default_to_df_if_qf_is_absent.patch Attached is a patch, with a test. I pulled out this logic into a static method so that both dismax and edismax could use it, just as it was done for parsing MM. I'll apply this patch tomorrow. > dismax and edismax should default to "df" when "qf" is absent. > -- > > Key: SOLR-3534 > URL: https://issues.apache.org/jira/browse/SOLR-3534 > Project: Solr > Issue Type: Improvement > Components: query parsers >Affects Versions: 4.0 >Reporter: David Smiley >Assignee: David Smiley >Priority: Minor > Attachments: > SOLR-3534_dismax_and_edismax_should_default_to_df_if_qf_is_absent.patch > > > The dismax and edismax query parsers should default to "df" when the "qf" > parameter is absent. They only use the defaultSearchField in schema.xml as a > fallback now. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2899) Add OpenNLP Analysis capabilities as a module
[ https://issues.apache.org/jira/browse/LUCENE-2899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293589#comment-13293589 ] Joern Kottmann commented on LUCENE-2899: For a test you can run OpenNLP just over a piece of training data, even when trained on a tiny amount of data this will give good results. It does not test OpenNLP, but is sufficient for the desired interface testing. > Add OpenNLP Analysis capabilities as a module > - > > Key: LUCENE-2899 > URL: https://issues.apache.org/jira/browse/LUCENE-2899 > Project: Lucene - Java > Issue Type: New Feature > Components: modules/analysis >Reporter: Grant Ingersoll >Assignee: Grant Ingersoll >Priority: Minor > Attachments: LUCENE-2899.patch, opennlp_trunk.patch > > > Now that OpenNLP is an ASF project and has a nice license, it would be nice > to have a submodule (under analysis) that exposed capabilities for it. Drew > Farris, Tom Morton and I have code that does: > * Sentence Detection as a Tokenizer (could also be a TokenFilter, although it > would have to change slightly to buffer tokens) > * NamedEntity recognition as a TokenFilter > We are also planning a Tokenizer/TokenFilter that can put parts of speech as > either payloads (PartOfSpeechAttribute?) on a token or at the same position. > I'd propose it go under: > modules/analysis/opennlp -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2899) Add OpenNLP Analysis capabilities as a module
[ https://issues.apache.org/jira/browse/LUCENE-2899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293582#comment-13293582 ] Grant Ingersoll commented on LUCENE-2899: - This really should just be a part of the analysis modules (with the exception of the Solr example parts). I don't know exactly how we are handling Solr examples anymore, but I seem to recall the general consensus was to not proliferate them. Can we just expose the functionality in the main one? I'll update the patch to move this to the module for starters. Not sure on what to do w/ the example part. > Add OpenNLP Analysis capabilities as a module > - > > Key: LUCENE-2899 > URL: https://issues.apache.org/jira/browse/LUCENE-2899 > Project: Lucene - Java > Issue Type: New Feature > Components: modules/analysis >Reporter: Grant Ingersoll >Assignee: Grant Ingersoll >Priority: Minor > Attachments: LUCENE-2899.patch, opennlp_trunk.patch > > > Now that OpenNLP is an ASF project and has a nice license, it would be nice > to have a submodule (under analysis) that exposed capabilities for it. Drew > Farris, Tom Morton and I have code that does: > * Sentence Detection as a Tokenizer (could also be a TokenFilter, although it > would have to change slightly to buffer tokens) > * NamedEntity recognition as a TokenFilter > We are also planning a Tokenizer/TokenFilter that can put parts of speech as > either payloads (PartOfSpeechAttribute?) on a token or at the same position. > I'd propose it go under: > modules/analysis/opennlp -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4139) multivalued field with offsets makes corrumpt index
[ https://issues.apache.org/jira/browse/LUCENE-4139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293575#comment-13293575 ] Robert Muir commented on LUCENE-4139: - {quote} The problem is more complicated: How would you sum up offsets for Multivalued fields? How to correctly do this? If you just sum up the offsets, they don't help you anymore with higlighting (if you get multiple stored fields), although I have no idea how this should work at all (highlighting MV fields)... {quote} Not really: TermVectorsConsumer does this fine and has for many lucene releases. The problem is FreqProxTermsWriter does it wrong. see the patch. > multivalued field with offsets makes corrumpt index > > > Key: LUCENE-4139 > URL: https://issues.apache.org/jira/browse/LUCENE-4139 > Project: Lucene - Java > Issue Type: Bug >Affects Versions: 4.0 >Reporter: Robert Muir > Attachments: LUCENE-4139.patch, LUCENE-4139_test.patch, > LUCENE-4139_test.patch > > > I was trying to beef up TestBackwardsCompatibility (LUCENE-4085) but i > accidentally made a corrupt index due to a typo: > {code} > // a field with both offsets and term vectors for a cross-check > FieldType customType3 = new FieldType(TextField.TYPE_STORED); > customType3.setStoreTermVectors(true); > customType3.setStoreTermVectorPositions(true); > customType3.setStoreTermVectorOffsets(true); > customType3.setIndexOptions(IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS); > doc.add(new Field("content3", "here is more content with aaa aaa aaa", > customType3)); > // a field that omits only positions > FieldType customType4 = new FieldType(TextField.TYPE_STORED); > customType4.setStoreTermVectors(true); > customType4.setStoreTermVectorPositions(false); > customType4.setStoreTermVectorOffsets(true); > customType4.setIndexOptions(IndexOptions.DOCS_AND_FREQS); > // check out the copy-paste typo here! i forgot to change this to content4 > doc.add(new Field("content3", "here is more content with aaa aaa aaa", > customType3)); > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4139) multivalued field with offsets makes corrumpt index
[ https://issues.apache.org/jira/browse/LUCENE-4139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-4139: Attachment: LUCENE-4139.patch patch... needs review and maybe suggestions on how to make it more intuitive: but fixes the bug > multivalued field with offsets makes corrumpt index > > > Key: LUCENE-4139 > URL: https://issues.apache.org/jira/browse/LUCENE-4139 > Project: Lucene - Java > Issue Type: Bug >Affects Versions: 4.0 >Reporter: Robert Muir > Attachments: LUCENE-4139.patch, LUCENE-4139_test.patch, > LUCENE-4139_test.patch > > > I was trying to beef up TestBackwardsCompatibility (LUCENE-4085) but i > accidentally made a corrupt index due to a typo: > {code} > // a field with both offsets and term vectors for a cross-check > FieldType customType3 = new FieldType(TextField.TYPE_STORED); > customType3.setStoreTermVectors(true); > customType3.setStoreTermVectorPositions(true); > customType3.setStoreTermVectorOffsets(true); > customType3.setIndexOptions(IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS); > doc.add(new Field("content3", "here is more content with aaa aaa aaa", > customType3)); > // a field that omits only positions > FieldType customType4 = new FieldType(TextField.TYPE_STORED); > customType4.setStoreTermVectors(true); > customType4.setStoreTermVectorPositions(false); > customType4.setStoreTermVectorOffsets(true); > customType4.setIndexOptions(IndexOptions.DOCS_AND_FREQS); > // check out the copy-paste typo here! i forgot to change this to content4 > doc.add(new Field("content3", "here is more content with aaa aaa aaa", > customType3)); > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-trunk-Windows-Java6-64 - Build # 513 - Failure!
Build: http://jenkins.sd-datasolutions.de/job/Lucene-Solr-trunk-Windows-Java6-64/513/ 1 tests failed. REGRESSION: org.apache.solr.handler.TestReplicationHandler.test Error Message: expected:<498> but was:<0> Stack Trace: java.lang.AssertionError: expected:<498> but was:<0> at __randomizedtesting.SeedInfo.seed([1CD78192E87C5B56:9483BE48468036AE]:0) at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.junit.Assert.assertEquals(Assert.java:456) at org.apache.solr.handler.TestReplicationHandler.doTestIndexAndConfigReplication(TestReplicationHandler.java:391) at org.apache.solr.handler.TestReplicationHandler.test(TestReplicationHandler.java:250) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1969) at com.carrotsearch.randomizedtesting.RandomizedRunner.access$1100(RandomizedRunner.java:132) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:814) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:875) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:889) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:32) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:821) at com.carrotsearch.randomizedtesting.RandomizedRunner.access$700(RandomizedRunner.java:132) at com.carrotsearch.randomizedtesting.RandomizedRunner$3$1.run(RandomizedRunner.java:669) at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:695) at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:734) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:745) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) at org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38) at org.apache.lucene.util.TestRuleIcuHack$1.evaluate(TestRuleIcuHack.java:51) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleNoInstanceHooksOverrides$1.evaluate(TestRuleNoInstanceHooksOverrides.java:53) at org.apache.lucene.util.TestRuleNoStaticHooksShadowing$1.evaluate(TestRuleNoStaticHooksShadowing.java:52) at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:36) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:56) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSuite(RandomizedRunner.java:605) at com.carrotsearch.randomizedtesting.RandomizedRunner.access$400(RandomizedRunner.java:132) at com.carrotsearch.randomizedtesting.RandomizedRunner$2.run(RandomizedRunner.java:551) Build Log: [...truncated 13340 lines...] [junit4] 2> 37703 T1436 C63 REQ [collection1] webapp=/solr path=/replication params={command=filecontent&checksum=true&generation=7&wt=filestream&file=_3.si} status=0 QTime=0 [junit4] 2> 37708 T1436 C63 REQ [
[jira] [Commented] (LUCENE-4139) multivalued field with offsets makes corrumpt index
[ https://issues.apache.org/jira/browse/LUCENE-4139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293561#comment-13293561 ] Uwe Schindler commented on LUCENE-4139: --- The problem is more complicated: How would you sum up offsets for Multivalued fields? How to correctly do this? If you just sum up the offsets, they don't help you anymore with higlighting (if you get multiple stored fields), although I have no idea how this should work at all (highlighting MV fields)... > multivalued field with offsets makes corrumpt index > > > Key: LUCENE-4139 > URL: https://issues.apache.org/jira/browse/LUCENE-4139 > Project: Lucene - Java > Issue Type: Bug >Affects Versions: 4.0 >Reporter: Robert Muir > Attachments: LUCENE-4139_test.patch, LUCENE-4139_test.patch > > > I was trying to beef up TestBackwardsCompatibility (LUCENE-4085) but i > accidentally made a corrupt index due to a typo: > {code} > // a field with both offsets and term vectors for a cross-check > FieldType customType3 = new FieldType(TextField.TYPE_STORED); > customType3.setStoreTermVectors(true); > customType3.setStoreTermVectorPositions(true); > customType3.setStoreTermVectorOffsets(true); > customType3.setIndexOptions(IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS); > doc.add(new Field("content3", "here is more content with aaa aaa aaa", > customType3)); > // a field that omits only positions > FieldType customType4 = new FieldType(TextField.TYPE_STORED); > customType4.setStoreTermVectors(true); > customType4.setStoreTermVectorPositions(false); > customType4.setStoreTermVectorOffsets(true); > customType4.setIndexOptions(IndexOptions.DOCS_AND_FREQS); > // check out the copy-paste typo here! i forgot to change this to content4 > doc.add(new Field("content3", "here is more content with aaa aaa aaa", > customType3)); > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2899) Add OpenNLP Analysis capabilities as a module
[ https://issues.apache.org/jira/browse/LUCENE-2899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293562#comment-13293562 ] Grant Ingersoll commented on LUCENE-2899: - Cool! I think if we could just get a very small model that can be checked in and used for testing purposes, that is all that would be needed. We don't really need to test OpenNLP, we just need to test that the code properly interfaces with OpenNLP, so a really small model should be fine. > Add OpenNLP Analysis capabilities as a module > - > > Key: LUCENE-2899 > URL: https://issues.apache.org/jira/browse/LUCENE-2899 > Project: Lucene - Java > Issue Type: New Feature > Components: modules/analysis >Reporter: Grant Ingersoll >Assignee: Grant Ingersoll >Priority: Minor > Attachments: LUCENE-2899.patch, opennlp_trunk.patch > > > Now that OpenNLP is an ASF project and has a nice license, it would be nice > to have a submodule (under analysis) that exposed capabilities for it. Drew > Farris, Tom Morton and I have code that does: > * Sentence Detection as a Tokenizer (could also be a TokenFilter, although it > would have to change slightly to buffer tokens) > * NamedEntity recognition as a TokenFilter > We are also planning a Tokenizer/TokenFilter that can put parts of speech as > either payloads (PartOfSpeechAttribute?) on a token or at the same position. > I'd propose it go under: > modules/analysis/opennlp -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2899) Add OpenNLP Analysis capabilities as a module
[ https://issues.apache.org/jira/browse/LUCENE-2899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293560#comment-13293560 ] Joern Kottmann commented on LUCENE-2899: I am using this mentioned Corpus Server together with the Apache UIMA Cas Editor for labeling projects. If someone wants to set something up to label data we (OpenNLP people) are happy to help with that! > Add OpenNLP Analysis capabilities as a module > - > > Key: LUCENE-2899 > URL: https://issues.apache.org/jira/browse/LUCENE-2899 > Project: Lucene - Java > Issue Type: New Feature > Components: modules/analysis >Reporter: Grant Ingersoll >Assignee: Grant Ingersoll >Priority: Minor > Attachments: LUCENE-2899.patch, opennlp_trunk.patch > > > Now that OpenNLP is an ASF project and has a nice license, it would be nice > to have a submodule (under analysis) that exposed capabilities for it. Drew > Farris, Tom Morton and I have code that does: > * Sentence Detection as a Tokenizer (could also be a TokenFilter, although it > would have to change slightly to buffer tokens) > * NamedEntity recognition as a TokenFilter > We are also planning a Tokenizer/TokenFilter that can put parts of speech as > either payloads (PartOfSpeechAttribute?) on a token or at the same position. > I'd propose it go under: > modules/analysis/opennlp -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-4141) don't allow Analyzer.offsetGap/posIncGap to be negative
Robert Muir created LUCENE-4141: --- Summary: don't allow Analyzer.offsetGap/posIncGap to be negative Key: LUCENE-4141 URL: https://issues.apache.org/jira/browse/LUCENE-4141 Project: Lucene - Java Issue Type: Bug Affects Versions: 4.0 Reporter: Robert Muir unrelated but i thought about this looking at LUCENE-4139: we should check this doesnt make a corrupt index but instead that IW throws a reasonable exception. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4139) multivalued field with offsets makes corrumpt index
[ https://issues.apache.org/jira/browse/LUCENE-4139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293554#comment-13293554 ] Robert Muir commented on LUCENE-4139: - Looks like we arent summing up offsets correctly for multivalued fields, thus they go backwards. I added this assert to the postingswriter: assert offsetDelta >= 0 && offsetLength >= 0 : "startOffset=" + startOffset + ",lastOffset=" + lastOffset + ",endOffset=" + endOffset; [junit4]> Throwable #1: java.lang.AssertionError: startOffset=26,lastOffset=34,endOffset=29 [junit4]>at __randomizedtesting.SeedInfo.seed([76B886A04FD18EEC:D9439B78AFF692]:0) [junit4]>at org.apache.lucene.codecs.lucene40.Lucene40PostingsWriter.addPosition(Lucene40PostingsWriter.java:255) > multivalued field with offsets makes corrumpt index > > > Key: LUCENE-4139 > URL: https://issues.apache.org/jira/browse/LUCENE-4139 > Project: Lucene - Java > Issue Type: Bug >Affects Versions: 4.0 >Reporter: Robert Muir > Attachments: LUCENE-4139_test.patch, LUCENE-4139_test.patch > > > I was trying to beef up TestBackwardsCompatibility (LUCENE-4085) but i > accidentally made a corrupt index due to a typo: > {code} > // a field with both offsets and term vectors for a cross-check > FieldType customType3 = new FieldType(TextField.TYPE_STORED); > customType3.setStoreTermVectors(true); > customType3.setStoreTermVectorPositions(true); > customType3.setStoreTermVectorOffsets(true); > customType3.setIndexOptions(IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS); > doc.add(new Field("content3", "here is more content with aaa aaa aaa", > customType3)); > // a field that omits only positions > FieldType customType4 = new FieldType(TextField.TYPE_STORED); > customType4.setStoreTermVectors(true); > customType4.setStoreTermVectorPositions(false); > customType4.setStoreTermVectorOffsets(true); > customType4.setIndexOptions(IndexOptions.DOCS_AND_FREQS); > // check out the copy-paste typo here! i forgot to change this to content4 > doc.add(new Field("content3", "here is more content with aaa aaa aaa", > customType3)); > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-trunk-Windows-Java7-64 - Build # 299 - Still Failing!
Build: http://jenkins.sd-datasolutions.de/job/Lucene-Solr-trunk-Windows-Java7-64/299/ 2 tests failed. REGRESSION: org.apache.solr.cloud.OverseerTest.testShardLeaderChange Error Message: Unexpected shard leader coll:collection1 shard:shard1 expected: but was: Stack Trace: org.junit.ComparisonFailure: Unexpected shard leader coll:collection1 shard:shard1 expected: but was: at __randomizedtesting.SeedInfo.seed([742AB9E72396E621:AA793E10390E13D0]:0) at org.junit.Assert.assertEquals(Assert.java:125) at org.apache.solr.cloud.OverseerTest.verifyShardLeader(OverseerTest.java:522) at org.apache.solr.cloud.OverseerTest.testShardLeaderChange(OverseerTest.java:677) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1969) at com.carrotsearch.randomizedtesting.RandomizedRunner.access$1100(RandomizedRunner.java:132) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:814) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:875) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:889) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:32) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:821) at com.carrotsearch.randomizedtesting.RandomizedRunner.access$700(RandomizedRunner.java:132) at com.carrotsearch.randomizedtesting.RandomizedRunner$3$1.run(RandomizedRunner.java:669) at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:695) at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:734) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:745) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) at org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38) at org.apache.lucene.util.TestRuleIcuHack$1.evaluate(TestRuleIcuHack.java:51) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleNoInstanceHooksOverrides$1.evaluate(TestRuleNoInstanceHooksOverrides.java:53) at org.apache.lucene.util.TestRuleNoStaticHooksShadowing$1.evaluate(TestRuleNoStaticHooksShadowing.java:52) at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:36) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:56) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSuite(RandomizedRunner.java:605) at com.carrotsearch.randomizedtesting.RandomizedRunner.access$400(RandomizedRunner.java:132) at com.carrotsearch.randomizedtesting.RandomizedRunner$2.run(RandomizedRunner.java:551) FAILED: org.apache.solr.handler.TestReplicationHandler.test Error Message: expected:<498> but was:<0> Stack Trace: java.lang.AssertionError: expected:<498> but was:<0> at __randomizedtesting.SeedInfo.seed([742AB9E72396E621:FC7E863D8D6A8BD9]:0) at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128)
[jira] [Commented] (LUCENE-4139) multivalued field with offsets makes corrumpt index
[ https://issues.apache.org/jira/browse/LUCENE-4139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293553#comment-13293553 ] Robert Muir commented on LUCENE-4139: - I dont know whats going on with offsets for multivalued fields: will try to dig: {noformat} java.lang.RuntimeException: vector term=[61 61 61] field=content3 doc=0: startOffset=64 differs from postings startOffset=-2147483622 {noformat} > multivalued field with offsets makes corrumpt index > > > Key: LUCENE-4139 > URL: https://issues.apache.org/jira/browse/LUCENE-4139 > Project: Lucene - Java > Issue Type: Bug >Affects Versions: 4.0 >Reporter: Robert Muir > Attachments: LUCENE-4139_test.patch, LUCENE-4139_test.patch > > > I was trying to beef up TestBackwardsCompatibility (LUCENE-4085) but i > accidentally made a corrupt index due to a typo: > {code} > // a field with both offsets and term vectors for a cross-check > FieldType customType3 = new FieldType(TextField.TYPE_STORED); > customType3.setStoreTermVectors(true); > customType3.setStoreTermVectorPositions(true); > customType3.setStoreTermVectorOffsets(true); > customType3.setIndexOptions(IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS); > doc.add(new Field("content3", "here is more content with aaa aaa aaa", > customType3)); > // a field that omits only positions > FieldType customType4 = new FieldType(TextField.TYPE_STORED); > customType4.setStoreTermVectors(true); > customType4.setStoreTermVectorPositions(false); > customType4.setStoreTermVectorOffsets(true); > customType4.setIndexOptions(IndexOptions.DOCS_AND_FREQS); > // check out the copy-paste typo here! i forgot to change this to content4 > doc.add(new Field("content3", "here is more content with aaa aaa aaa", > customType3)); > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4139) multivalued field with offsets makes corrumpt index
[ https://issues.apache.org/jira/browse/LUCENE-4139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-4139: Summary: multivalued field with offsets makes corrumpt index (was: mixing up indexOptions in same IW session makes corrumpt index ) > multivalued field with offsets makes corrumpt index > > > Key: LUCENE-4139 > URL: https://issues.apache.org/jira/browse/LUCENE-4139 > Project: Lucene - Java > Issue Type: Bug >Affects Versions: 4.0 >Reporter: Robert Muir > Attachments: LUCENE-4139_test.patch, LUCENE-4139_test.patch > > > I was trying to beef up TestBackwardsCompatibility (LUCENE-4085) but i > accidentally made a corrupt index due to a typo: > {code} > // a field with both offsets and term vectors for a cross-check > FieldType customType3 = new FieldType(TextField.TYPE_STORED); > customType3.setStoreTermVectors(true); > customType3.setStoreTermVectorPositions(true); > customType3.setStoreTermVectorOffsets(true); > customType3.setIndexOptions(IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS); > doc.add(new Field("content3", "here is more content with aaa aaa aaa", > customType3)); > // a field that omits only positions > FieldType customType4 = new FieldType(TextField.TYPE_STORED); > customType4.setStoreTermVectors(true); > customType4.setStoreTermVectorPositions(false); > customType4.setStoreTermVectorOffsets(true); > customType4.setIndexOptions(IndexOptions.DOCS_AND_FREQS); > // check out the copy-paste typo here! i forgot to change this to content4 > doc.add(new Field("content3", "here is more content with aaa aaa aaa", > customType3)); > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4139) mixing up indexOptions in same IW session makes corrumpt index
[ https://issues.apache.org/jira/browse/LUCENE-4139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-4139: Attachment: LUCENE-4139_test.patch updated test: actually the bug has nothing to do with mixing up fieldtypes, as i forget to use the new fieldtype too. it happens when you have a multivalued field. > mixing up indexOptions in same IW session makes corrumpt index > --- > > Key: LUCENE-4139 > URL: https://issues.apache.org/jira/browse/LUCENE-4139 > Project: Lucene - Java > Issue Type: Bug >Affects Versions: 4.0 >Reporter: Robert Muir > Attachments: LUCENE-4139_test.patch, LUCENE-4139_test.patch > > > I was trying to beef up TestBackwardsCompatibility (LUCENE-4085) but i > accidentally made a corrupt index due to a typo: > {code} > // a field with both offsets and term vectors for a cross-check > FieldType customType3 = new FieldType(TextField.TYPE_STORED); > customType3.setStoreTermVectors(true); > customType3.setStoreTermVectorPositions(true); > customType3.setStoreTermVectorOffsets(true); > customType3.setIndexOptions(IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS); > doc.add(new Field("content3", "here is more content with aaa aaa aaa", > customType3)); > // a field that omits only positions > FieldType customType4 = new FieldType(TextField.TYPE_STORED); > customType4.setStoreTermVectors(true); > customType4.setStoreTermVectorPositions(false); > customType4.setStoreTermVectorOffsets(true); > customType4.setIndexOptions(IndexOptions.DOCS_AND_FREQS); > // check out the copy-paste typo here! i forgot to change this to content4 > doc.add(new Field("content3", "here is more content with aaa aaa aaa", > customType3)); > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2899) Add OpenNLP Analysis capabilities as a module
[ https://issues.apache.org/jira/browse/LUCENE-2899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293529#comment-13293529 ] Tommaso Teofili commented on LUCENE-2899: - bq. I wonder how hard it would be to create much smaller ones based on training just a few things. there was the idea of using the OpenNLP CorpusServer with some wikinews articles to train them (back to OPENNLP-385) > Add OpenNLP Analysis capabilities as a module > - > > Key: LUCENE-2899 > URL: https://issues.apache.org/jira/browse/LUCENE-2899 > Project: Lucene - Java > Issue Type: New Feature > Components: modules/analysis >Reporter: Grant Ingersoll >Assignee: Grant Ingersoll >Priority: Minor > Attachments: LUCENE-2899.patch, opennlp_trunk.patch > > > Now that OpenNLP is an ASF project and has a nice license, it would be nice > to have a submodule (under analysis) that exposed capabilities for it. Drew > Farris, Tom Morton and I have code that does: > * Sentence Detection as a Tokenizer (could also be a TokenFilter, although it > would have to change slightly to buffer tokens) > * NamedEntity recognition as a TokenFilter > We are also planning a Tokenizer/TokenFilter that can put parts of speech as > either payloads (PartOfSpeechAttribute?) on a token or at the same position. > I'd propose it go under: > modules/analysis/opennlp -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4139) mixing up indexOptions in same IW session makes corrumpt index
[ https://issues.apache.org/jira/browse/LUCENE-4139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-4139: Attachment: LUCENE-4139_test.patch simple test. > mixing up indexOptions in same IW session makes corrumpt index > --- > > Key: LUCENE-4139 > URL: https://issues.apache.org/jira/browse/LUCENE-4139 > Project: Lucene - Java > Issue Type: Bug >Affects Versions: 4.0 >Reporter: Robert Muir > Attachments: LUCENE-4139_test.patch > > > I was trying to beef up TestBackwardsCompatibility (LUCENE-4085) but i > accidentally made a corrupt index due to a typo: > {code} > // a field with both offsets and term vectors for a cross-check > FieldType customType3 = new FieldType(TextField.TYPE_STORED); > customType3.setStoreTermVectors(true); > customType3.setStoreTermVectorPositions(true); > customType3.setStoreTermVectorOffsets(true); > customType3.setIndexOptions(IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS); > doc.add(new Field("content3", "here is more content with aaa aaa aaa", > customType3)); > // a field that omits only positions > FieldType customType4 = new FieldType(TextField.TYPE_STORED); > customType4.setStoreTermVectors(true); > customType4.setStoreTermVectorPositions(false); > customType4.setStoreTermVectorOffsets(true); > customType4.setIndexOptions(IndexOptions.DOCS_AND_FREQS); > // check out the copy-paste typo here! i forgot to change this to content4 > doc.add(new Field("content3", "here is more content with aaa aaa aaa", > customType3)); > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (LUCENE-2899) Add OpenNLP Analysis capabilities as a module
[ https://issues.apache.org/jira/browse/LUCENE-2899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll reassigned LUCENE-2899: --- Assignee: Grant Ingersoll > Add OpenNLP Analysis capabilities as a module > - > > Key: LUCENE-2899 > URL: https://issues.apache.org/jira/browse/LUCENE-2899 > Project: Lucene - Java > Issue Type: New Feature > Components: modules/analysis >Reporter: Grant Ingersoll >Assignee: Grant Ingersoll >Priority: Minor > Attachments: LUCENE-2899.patch, opennlp_trunk.patch > > > Now that OpenNLP is an ASF project and has a nice license, it would be nice > to have a submodule (under analysis) that exposed capabilities for it. Drew > Farris, Tom Morton and I have code that does: > * Sentence Detection as a Tokenizer (could also be a TokenFilter, although it > would have to change slightly to buffer tokens) > * NamedEntity recognition as a TokenFilter > We are also planning a Tokenizer/TokenFilter that can put parts of speech as > either payloads (PartOfSpeechAttribute?) on a token or at the same position. > I'd propose it go under: > modules/analysis/opennlp -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2899) Add OpenNLP Analysis capabilities as a module
[ https://issues.apache.org/jira/browse/LUCENE-2899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293526#comment-13293526 ] Grant Ingersoll commented on LUCENE-2899: - Very cool Lance. The models are indeed tricky and I wonder how we can properly hook them into the tests, if at all. I wonder how hard it would be to create much smaller ones based on training just a few things. > Add OpenNLP Analysis capabilities as a module > - > > Key: LUCENE-2899 > URL: https://issues.apache.org/jira/browse/LUCENE-2899 > Project: Lucene - Java > Issue Type: New Feature > Components: modules/analysis >Reporter: Grant Ingersoll >Priority: Minor > Attachments: LUCENE-2899.patch, opennlp_trunk.patch > > > Now that OpenNLP is an ASF project and has a nice license, it would be nice > to have a submodule (under analysis) that exposed capabilities for it. Drew > Farris, Tom Morton and I have code that does: > * Sentence Detection as a Tokenizer (could also be a TokenFilter, although it > would have to change slightly to buffer tokens) > * NamedEntity recognition as a TokenFilter > We are also planning a Tokenizer/TokenFilter that can put parts of speech as > either payloads (PartOfSpeechAttribute?) on a token or at the same position. > I'd propose it go under: > modules/analysis/opennlp -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-4140) IndexWriterConfig has setFlushPolicy but the class is package private
selckin created LUCENE-4140: --- Summary: IndexWriterConfig has setFlushPolicy but the class is package private Key: LUCENE-4140 URL: https://issues.apache.org/jira/browse/LUCENE-4140 Project: Lucene - Java Issue Type: Bug Reporter: selckin 4.0 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-4139) mixing up indexOptions in same IW session makes corrumpt index
Robert Muir created LUCENE-4139: --- Summary: mixing up indexOptions in same IW session makes corrumpt index Key: LUCENE-4139 URL: https://issues.apache.org/jira/browse/LUCENE-4139 Project: Lucene - Java Issue Type: Bug Affects Versions: 4.0 Reporter: Robert Muir I was trying to beef up TestBackwardsCompatibility (LUCENE-4085) but i accidentally made a corrupt index due to a typo: {code} // a field with both offsets and term vectors for a cross-check FieldType customType3 = new FieldType(TextField.TYPE_STORED); customType3.setStoreTermVectors(true); customType3.setStoreTermVectorPositions(true); customType3.setStoreTermVectorOffsets(true); customType3.setIndexOptions(IndexOptions.DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS); doc.add(new Field("content3", "here is more content with aaa aaa aaa", customType3)); // a field that omits only positions FieldType customType4 = new FieldType(TextField.TYPE_STORED); customType4.setStoreTermVectors(true); customType4.setStoreTermVectorPositions(false); customType4.setStoreTermVectorOffsets(true); customType4.setIndexOptions(IndexOptions.DOCS_AND_FREQS); // check out the copy-paste typo here! i forgot to change this to content4 doc.add(new Field("content3", "here is more content with aaa aaa aaa", customType3)); {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4120) FST should use packed integer arrays
[ https://issues.apache.org/jira/browse/LUCENE-4120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293513#comment-13293513 ] Adrien Grand commented on LUCENE-4120: -- @Robert: Yes, it only affects packed FSTs. In this case, the backward compatibility would be rather easy to set-up (just fill a {{GrowableWriter}} instead of an {{int[]}}). @Dawid: Thanks! > FST should use packed integer arrays > > > Key: LUCENE-4120 > URL: https://issues.apache.org/jira/browse/LUCENE-4120 > Project: Lucene - Java > Issue Type: Improvement > Components: core/FSTs >Reporter: Adrien Grand >Assignee: Adrien Grand >Priority: Minor > Fix For: 4.0 > > Attachments: LUCENE-4120.patch, LUCENE-4120.patch > > > There are some places where an int[] could be advantageously replaced with a > packed integer array. > I am thinking (at least) of: > * FST.nodeAddress (GrowableWriter) > * FST.inCounts (GrowableWriter) > * FST.nodeRefToAddress (read-only Reader) > The serialization/deserialization methods should be modified too in order to > take advantage of PackedInts.get{Reader,Writer}. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4136) TestDocumentsWriterStallControl hang (reproducible)
[ https://issues.apache.org/jira/browse/LUCENE-4136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293506#comment-13293506 ] Robert Muir commented on LUCENE-4136: - +1 to commit (and remove @nightly): with the patch this seed is 9 seconds. I also then ran the test w/o seed about 20 times and it never spun off for minutes. It must be something about my # of cpus maybe? This reminds me of Uwe's 2-cpu problem with another test... > TestDocumentsWriterStallControl hang (reproducible) > --- > > Key: LUCENE-4136 > URL: https://issues.apache.org/jira/browse/LUCENE-4136 > Project: Lucene - Java > Issue Type: Bug >Reporter: Robert Muir > Attachments: LUCENE-4136.patch > > > On trunk (probably affects 4.0 too, but trunk is where i hit it): > ant test -Dtestcase=TestDocumentsWriterStallControl > -Dtests.seed=9D5404FF4A909330 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-2357) Reduce transient RAM usage while merging by using packed ints array for docID re-mapping
[ https://issues.apache.org/jira/browse/LUCENE-2357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adrien Grand resolved LUCENE-2357. -- Resolution: Fixed Assignee: Adrien Grand (was: Michael McCandless) > Reduce transient RAM usage while merging by using packed ints array for docID > re-mapping > > > Key: LUCENE-2357 > URL: https://issues.apache.org/jira/browse/LUCENE-2357 > Project: Lucene - Java > Issue Type: Improvement > Components: core/index >Reporter: Michael McCandless >Assignee: Adrien Grand >Priority: Minor > Fix For: 4.0, 5.0 > > Attachments: LUCENE-2357.patch, LUCENE-2357.patch, LUCENE-2357.patch, > LUCENE-2357.patch, LUCENE-2357.patch > > > We allocate this int[] to remap docIDs due to compaction of deleted ones. > This uses alot of RAM for large segment merges, and can fail to allocate due > to fragmentation on 32 bit JREs. > Now that we have packed ints, a simple fix would be to use a packed int > array... and maybe instead of storing abs docID in the mapping, we could > store the number of del docs seen so far (so the remap would do a lookup then > a subtract). This may add some CPU cost to merging but should bring down > transient RAM usage quite a bit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2357) Reduce transient RAM usage while merging by using packed ints array for docID re-mapping
[ https://issues.apache.org/jira/browse/LUCENE-2357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13293504#comment-13293504 ] Adrien Grand commented on LUCENE-2357: -- Committed (r1349234 on trunk and r1349241 on branch 4.x). > Reduce transient RAM usage while merging by using packed ints array for docID > re-mapping > > > Key: LUCENE-2357 > URL: https://issues.apache.org/jira/browse/LUCENE-2357 > Project: Lucene - Java > Issue Type: Improvement > Components: core/index >Reporter: Michael McCandless >Assignee: Michael McCandless >Priority: Minor > Fix For: 4.0, 5.0 > > Attachments: LUCENE-2357.patch, LUCENE-2357.patch, LUCENE-2357.patch, > LUCENE-2357.patch, LUCENE-2357.patch > > > We allocate this int[] to remap docIDs due to compaction of deleted ones. > This uses alot of RAM for large segment merges, and can fail to allocate due > to fragmentation on 32 bit JREs. > Now that we have packed ints, a simple fix would be to use a packed int > array... and maybe instead of storing abs docID in the mapping, we could > store the number of del docs seen so far (so the remap would do a lookup then > a subtract). This may add some CPU cost to merging but should bring down > transient RAM usage quite a bit. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-4.x-Windows-Java7-64 - Build # 47 - Failure!
Build: http://jenkins.sd-datasolutions.de/job/Lucene-Solr-4.x-Windows-Java7-64/47/ 1 tests failed. REGRESSION: org.apache.solr.handler.TestReplicationHandler.test Error Message: expected:<498> but was:<0> Stack Trace: java.lang.AssertionError: expected:<498> but was:<0> at __randomizedtesting.SeedInfo.seed([DB13DF983319FB1C:5347E0429DE596E4]:0) at org.junit.Assert.fail(Assert.java:93) at org.junit.Assert.failNotEquals(Assert.java:647) at org.junit.Assert.assertEquals(Assert.java:128) at org.junit.Assert.assertEquals(Assert.java:472) at org.junit.Assert.assertEquals(Assert.java:456) at org.apache.solr.handler.TestReplicationHandler.doTestIndexAndConfigReplication(TestReplicationHandler.java:391) at org.apache.solr.handler.TestReplicationHandler.test(TestReplicationHandler.java:250) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1969) at com.carrotsearch.randomizedtesting.RandomizedRunner.access$1100(RandomizedRunner.java:132) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:814) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:875) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:889) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:32) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:48) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:821) at com.carrotsearch.randomizedtesting.RandomizedRunner.access$700(RandomizedRunner.java:132) at com.carrotsearch.randomizedtesting.RandomizedRunner$3$1.run(RandomizedRunner.java:669) at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:695) at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:734) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:745) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) at org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38) at org.apache.lucene.util.TestRuleIcuHack$1.evaluate(TestRuleIcuHack.java:51) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleNoInstanceHooksOverrides$1.evaluate(TestRuleNoInstanceHooksOverrides.java:53) at org.apache.lucene.util.TestRuleNoStaticHooksShadowing$1.evaluate(TestRuleNoStaticHooksShadowing.java:52) at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:36) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:56) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSuite(RandomizedRunner.java:605) at com.carrotsearch.randomizedtesting.RandomizedRunner.access$400(RandomizedRunner.java:132) at com.carrotsearch.randomizedtesting.RandomizedRunner$2.run(RandomizedRunner.java:551) Build Log: [...truncated 14187 lines...] [junit4] 2> 42483 T1920 C112 REQ [collection1] webapp=/solr path=/replication params={command=filecontent&checksum=true&generation=7&wt=filestream&file=_3.fdx} status=0 QTime=0 [junit4] 2> 42491 T1920 C112 REQ [
[JENKINS] Solr-4.x - Build # 7 - Still Failing
Build: https://builds.apache.org/job/Solr-4.x/7/ 1 tests failed. REGRESSION: org.apache.solr.cloud.RecoveryZkTest.testDistribSearch Error Message: Thread threw an uncaught exception, thread: Thread[Lucene Merge Thread #1,6,] Stack Trace: java.lang.RuntimeException: Thread threw an uncaught exception, thread: Thread[Lucene Merge Thread #1,6,] at com.carrotsearch.randomizedtesting.RunnerThreadGroup.processUncaught(RunnerThreadGroup.java:96) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:857) at com.carrotsearch.randomizedtesting.RandomizedRunner.access$700(RandomizedRunner.java:132) at com.carrotsearch.randomizedtesting.RandomizedRunner$3$1.run(RandomizedRunner.java:669) at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:695) at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:734) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:745) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:45) at org.apache.lucene.util.TestRuleReportUncaughtExceptions$1.evaluate(TestRuleReportUncaughtExceptions.java:68) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:38) at org.apache.lucene.util.TestRuleIcuHack$1.evaluate(TestRuleIcuHack.java:51) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleNoInstanceHooksOverrides$1.evaluate(TestRuleNoInstanceHooksOverrides.java:53) at org.apache.lucene.util.TestRuleNoStaticHooksShadowing$1.evaluate(TestRuleNoStaticHooksShadowing.java:52) at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:36) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:56) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSuite(RandomizedRunner.java:605) at com.carrotsearch.randomizedtesting.RandomizedRunner.access$400(RandomizedRunner.java:132) at com.carrotsearch.randomizedtesting.RandomizedRunner$2.run(RandomizedRunner.java:551) Caused by: org.apache.lucene.index.MergePolicy$MergeException: org.apache.lucene.store.AlreadyClosedException: this Directory is closed at __randomizedtesting.SeedInfo.seed([EE2933489339AC5B]:0) at org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:507) at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:480) Caused by: org.apache.lucene.store.AlreadyClosedException: this Directory is closed at org.apache.lucene.store.Directory.ensureOpen(Directory.java:244) at org.apache.lucene.store.FSDirectory.listAll(FSDirectory.java:241) at org.apache.lucene.index.IndexFileDeleter.refresh(IndexFileDeleter.java:321) at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3149) at org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:382) at org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:451) Build Log: [...truncated 43487 lines...] [junit4] 2> commit{dir=/usr/home/hudson/hudson-slave/workspace/Solr-4.x/checkout/solr/build/solr-core/test/J0/org.apache.solr.cloud.RecoveryZkTest-1339494931897/control/data/index,segFN=segments_2,generation=2,filenames=[_5y_Lucene40_0.prx, _5z.fnm, _5y.si, _5w_Lucene40_0.frq, _5w_Lucene40_0.prx, _5u_nrm.cfe, _5t_Lucene40_0.prx, _60.fnm, _62.fdx, _5v.fnm, _5x_1.del, _62.fdt, _5u_Lucene40_0.prx, _5x_Lucene40_0.tim, _62_Lucene40_0.tip, _60_Lucene40_0.tip, _5w.fdt, _5u_nrm.cfs, _5z.fdt, _5w.fdx, _62.si, _5y_2.del, _5u_2.del, _5x_Lucene40_0.tip, _5x_Lucene40_0.frq, _5v.fdx, _5s.fnm, _5z_1.del, _62_nrm.cfs, _5v.fdt, _5s.fdt, _5z.fdx, _5s.fdx, _60.si, _5v_Lucene40_0.frq, _5x.si, _62_nrm.cfe, _5v_2.del, _5t.fdx, _5x.fnm, _60_Lucene40_0.tim, _5t.fnm, _62_Lucene40_0.frq, _5t.fdt, _5w.si, _5u.si, _5w_Lucene40_0.tip, _5t.si, _5t_Lucene40_0.frq, _5s.si, _5w_Lucene40_0.tim, _60_Lucene40_0.prx, _62_Lucene40_0.prx, _5z.si, _5t_Lucene40_0.tim, _5y_Lucene40_0.frq, _60_nrm.cfs, _5z_Lucene40_0.prx, _5w_nrm.cfs, _5t_Lucene40_0.tip, _60.fdx, _5u.fdx, _60.fdt, _5u.fdt, _5z_Lucene40_0.tim, _5z_Lucene40_0.tip, _5v_Lucene40_0.tip, _5v_Lucene40_0.tim, _5x.fdt, _5s_2.del, _5t_1.del, _5x.fdx, _5x_nrm.cfs, _5z_Lucene40_0.frq, _5u_Lucene40_0.frq, _5w_nrm.cfe, _5x_nrm.cfe, _5u.fnm, _5v_Lu
[jira] [Resolved] (LUCENE-4061) Improvements to DirectoryTaxonomyWriter (synchronization and others)
[ https://issues.apache.org/jira/browse/LUCENE-4061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera resolved LUCENE-4061. Resolution: Fixed Committed rev 1349214 (trunk) and 1349223 (4x). > Improvements to DirectoryTaxonomyWriter (synchronization and others) > > > Key: LUCENE-4061 > URL: https://issues.apache.org/jira/browse/LUCENE-4061 > Project: Lucene - Java > Issue Type: Improvement > Components: modules/facet >Reporter: Shai Erera >Assignee: Shai Erera > Fix For: 4.0, 5.0 > > Attachments: LUCENE-4061.patch, LUCENE-4061.patch > > > DirTaxoWriter synchronizes in too many places. For instance addCategory() is > fully synchronized, while only a small part of it needs to be. > Additionally, getCacheMemoryUsage looks bogus - it depends on the type of the > TaxoWriterCache. No code uses it, so I'd like to remove it -- whoever is > interested can query the specific cache impl it has. Currently, only > Cl2oTaxoWriterCache supports it. > If the changes will be simple, I'll port them to 3.6.1 as well. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org