[JENKINS] Lucene-Solr-trunk-Linux (32bit/jdk1.7.0_21) - Build # 6159 - Still Failing!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/6159/ Java: 32bit/jdk1.7.0_21 -server -XX:+UseG1GC 1 tests failed. FAILED: org.apache.lucene.spatial.prefix.SpatialOpRecursivePrefixTreeTest.testContains {#5 seed=[594050BB28CB401B:ED310C391B43B6BA]} Error Message: Should have matched I #0:ShapePair(Rect(minX=61.0,maxX=67.0,minY=-122.0,maxY=110.0) , Rect(minX=10.0,maxX=61.0,minY=-113.0,maxY=126.0)) Q:Rect(minX=50.0,maxX=64.0,minY=39.0,maxY=66.0) Stack Trace: java.lang.AssertionError: Should have matched I #0:ShapePair(Rect(minX=61.0,maxX=67.0,minY=-122.0,maxY=110.0) , Rect(minX=10.0,maxX=61.0,minY=-113.0,maxY=126.0)) Q:Rect(minX=50.0,maxX=64.0,minY=39.0,maxY=66.0) at __randomizedtesting.SeedInfo.seed([594050BB28CB401B:ED310C391B43B6BA]:0) at org.junit.Assert.fail(Assert.java:93) at org.apache.lucene.spatial.prefix.SpatialOpRecursivePrefixTreeTest.fail(SpatialOpRecursivePrefixTreeTest.java:289) at org.apache.lucene.spatial.prefix.SpatialOpRecursivePrefixTreeTest.doTest(SpatialOpRecursivePrefixTreeTest.java:282) at org.apache.lucene.spatial.prefix.SpatialOpRecursivePrefixTreeTest.testContains(SpatialOpRecursivePrefixTreeTest.java:103) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559) at com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:782) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:442) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746) at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648) at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:682) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:693) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at
[jira] [Updated] (LUCENE-5029) factor out a generic 'TermState' for better sharing in FST-based term dict
[ https://issues.apache.org/jira/browse/LUCENE-5029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Han Jiang updated LUCENE-5029: -- Attachment: LUCENE-5029.patch This patch keeps the original 'customize termstate in PBF' design. It also pushes flushTermsBlock readTermsBlock to term dict side. Now the rule is: if you PBF have some monotonical but 'don't care' values, always fill -1 on them, so that term dict will reuse previous values to 'pad' that -1s. Yes Mike, the algebra is really simple :) But I still have a problem removing that termBlockOrd from BlockTermState: every time a caller uses seekExact(), it is expected to get a new term state in which 'termBlockOrd' is involved. However I cannot fully understand how this variable works, and maybe we can use metadataUpto to replace this? I'll try this later. Can you put the TestDrillSideway fix in lucene3069 branch as well? Thanks :) factor out a generic 'TermState' for better sharing in FST-based term dict -- Key: LUCENE-5029 URL: https://issues.apache.org/jira/browse/LUCENE-5029 Project: Lucene - Core Issue Type: Sub-task Reporter: Han Jiang Assignee: Han Jiang Priority: Minor Fix For: 4.4 Attachments: LUCENE-5029.algebra.patch, LUCENE-5029.algebra.patch, LUCENE-5029.branch-init.patch, LUCENE-5029.patch, LUCENE-5029.patch, LUCENE-5029.patch, LUCENE-5029.patch, LUCENE-5029.patch Currently, those two FST-based term dict (memory codec blocktree) all use FSTBytesRef as a base data structure, this might not share much data in parent arcs, since the encoded BytesRef doesn't guarantee that 'Outputs.common()' always creates a long prefix. While for current postings format, it is guaranteed that each FP (pointing to .doc, .pos, etc.) will increase monotonically with 'larger' terms. That means, between two Outputs, the Outputs from smaller term can be safely pushed towards root. However we always have some tricky TermState to deal with (like the singletonDocID for pulsing trick), so as Mike suggested, we can simply cut the whole TermState into two parts: one part for comparation and intersection, another for restoring generic data. Then the data structure will be clear: this generic 'TermState' will consist of a fixed-length LongsRef and variable-length BytesRef. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5029) factor out a generic 'TermState' for better sharing in FST-based term dict
[ https://issues.apache.org/jira/browse/LUCENE-5029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13684612#comment-13684612 ] Michael McCandless commented on LUCENE-5029: Patch looks great, thanks Han! It's so awesome to see all that hairy terms block code disappearing from PostingsReader/Writer. I think you should commit it to the branch and then we can iterate on the following?: I think only PostingsBaseWriter should have .longsSize(), and then the terms dict should store this int itself and later load it at read time. This keeps the index self documenting, so an errant PBF that reports the wrong longsSize at read time is not possible. Also, I think it should not take a FieldInfo. Per-field-ness is handled higher up (PerFieldPostingsFormat). I think TempBlockTermsWriter.PendingMetaData should hold the byte[] not the RAMOutputStream? I think RAMOutputStream holds its buffer as 1KB sized chunks... we only need the RAMOutputStream while the PBF is finishing that term; after that we can extract convert to byte[] I think. Instead of -1 for don't care, I think TempPostingsWriterBase impls should simply not change the value? This is part of the contract. Instead of making a separate PendingMetaData in the TempBlockTermWriter, can we put the byte[] + long[] onto the existing PendingTerm? Then we can just pass the slice of PendingTerm down to flushTermsBlock, fixing it to skip the block entries. Can we rename nextTerm to decodeTerm? (next used to be appropriate when it was decoding the next term in the block... but that's an impl detail of the terms dict now). Separately from this effort, now that this issue will make the per-term long[] visible to the terms dict, we can now easily investigate better ways of storing that long[] data than simple delta-coded vLongs, e.g. maybe Simple64 column stride would work well. But this is separate :) factor out a generic 'TermState' for better sharing in FST-based term dict -- Key: LUCENE-5029 URL: https://issues.apache.org/jira/browse/LUCENE-5029 Project: Lucene - Core Issue Type: Sub-task Reporter: Han Jiang Assignee: Han Jiang Priority: Minor Fix For: 4.4 Attachments: LUCENE-5029.algebra.patch, LUCENE-5029.algebra.patch, LUCENE-5029.branch-init.patch, LUCENE-5029.patch, LUCENE-5029.patch, LUCENE-5029.patch, LUCENE-5029.patch, LUCENE-5029.patch Currently, those two FST-based term dict (memory codec blocktree) all use FSTBytesRef as a base data structure, this might not share much data in parent arcs, since the encoded BytesRef doesn't guarantee that 'Outputs.common()' always creates a long prefix. While for current postings format, it is guaranteed that each FP (pointing to .doc, .pos, etc.) will increase monotonically with 'larger' terms. That means, between two Outputs, the Outputs from smaller term can be safely pushed towards root. However we always have some tricky TermState to deal with (like the singletonDocID for pulsing trick), so as Mike suggested, we can simply cut the whole TermState into two parts: one part for comparation and intersection, another for restoring generic data. Then the data structure will be clear: this generic 'TermState' will consist of a fixed-length LongsRef and variable-length BytesRef. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5029) factor out a generic 'TermState' for better sharing in FST-based term dict
[ https://issues.apache.org/jira/browse/LUCENE-5029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13684613#comment-13684613 ] Michael McCandless commented on LUCENE-5029: bq. Can you put the TestDrillSideway fix in lucene3069 branch as well? Sure, I'll just sync up all trunk changes over ... factor out a generic 'TermState' for better sharing in FST-based term dict -- Key: LUCENE-5029 URL: https://issues.apache.org/jira/browse/LUCENE-5029 Project: Lucene - Core Issue Type: Sub-task Reporter: Han Jiang Assignee: Han Jiang Priority: Minor Fix For: 4.4 Attachments: LUCENE-5029.algebra.patch, LUCENE-5029.algebra.patch, LUCENE-5029.branch-init.patch, LUCENE-5029.patch, LUCENE-5029.patch, LUCENE-5029.patch, LUCENE-5029.patch, LUCENE-5029.patch Currently, those two FST-based term dict (memory codec blocktree) all use FSTBytesRef as a base data structure, this might not share much data in parent arcs, since the encoded BytesRef doesn't guarantee that 'Outputs.common()' always creates a long prefix. While for current postings format, it is guaranteed that each FP (pointing to .doc, .pos, etc.) will increase monotonically with 'larger' terms. That means, between two Outputs, the Outputs from smaller term can be safely pushed towards root. However we always have some tricky TermState to deal with (like the singletonDocID for pulsing trick), so as Mike suggested, we can simply cut the whole TermState into two parts: one part for comparation and intersection, another for restoring generic data. Then the data structure will be clear: this generic 'TermState' will consist of a fixed-length LongsRef and variable-length BytesRef. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5029) factor out a generic 'TermState' for better sharing in FST-based term dict
[ https://issues.apache.org/jira/browse/LUCENE-5029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13684615#comment-13684615 ] Michael McCandless commented on LUCENE-5029: {quote} But I still have a problem removing that termBlockOrd from BlockTermState: every time a caller uses seekExact(), it is expected to get a new term state in which 'termBlockOrd' is involved. However I cannot fully understand how this variable works, and maybe we can use metadataUpto to replace this? I'll try this later. {quote} I think we won't be able to eliminate this, because the termBlockOrd (which records the position of this term in the block) is a necessary (from the term dict's standpoint) state for this term, because on seekExact followed by nextTerm, the terms dict needs to know which entry in the block to go to ... factor out a generic 'TermState' for better sharing in FST-based term dict -- Key: LUCENE-5029 URL: https://issues.apache.org/jira/browse/LUCENE-5029 Project: Lucene - Core Issue Type: Sub-task Reporter: Han Jiang Assignee: Han Jiang Priority: Minor Fix For: 4.4 Attachments: LUCENE-5029.algebra.patch, LUCENE-5029.algebra.patch, LUCENE-5029.branch-init.patch, LUCENE-5029.patch, LUCENE-5029.patch, LUCENE-5029.patch, LUCENE-5029.patch, LUCENE-5029.patch Currently, those two FST-based term dict (memory codec blocktree) all use FSTBytesRef as a base data structure, this might not share much data in parent arcs, since the encoded BytesRef doesn't guarantee that 'Outputs.common()' always creates a long prefix. While for current postings format, it is guaranteed that each FP (pointing to .doc, .pos, etc.) will increase monotonically with 'larger' terms. That means, between two Outputs, the Outputs from smaller term can be safely pushed towards root. However we always have some tricky TermState to deal with (like the singletonDocID for pulsing trick), so as Mike suggested, we can simply cut the whole TermState into two parts: one part for comparation and intersection, another for restoring generic data. Then the data structure will be clear: this generic 'TermState' will consist of a fixed-length LongsRef and variable-length BytesRef. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3069) Lucene should have an entirely memory resident term dictionary
[ https://issues.apache.org/jira/browse/LUCENE-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13684616#comment-13684616 ] Commit Tag Bot commented on LUCENE-3069: [lucene3069 commit] mikemccand http://svn.apache.org/viewvc?view=revisionrevision=1493493 LUCENE-3069: merge trunk changes over Lucene should have an entirely memory resident term dictionary -- Key: LUCENE-3069 URL: https://issues.apache.org/jira/browse/LUCENE-3069 Project: Lucene - Core Issue Type: Improvement Components: core/index, core/search Affects Versions: 4.0-ALPHA Reporter: Simon Willnauer Assignee: Han Jiang Labels: gsoc2013 Fix For: 4.4 FST based TermDictionary has been a great improvement yet it still uses a delta codec file for scanning to terms. Some environments have enough memory available to keep the entire FST based term dict in memory. We should add a TermDictionary implementation that encodes all needed information for each term into the FST (custom fst.Output) and builds a FST from the entire term not just the delta. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5029) factor out a generic 'TermState' for better sharing in FST-based term dict
[ https://issues.apache.org/jira/browse/LUCENE-5029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13684621#comment-13684621 ] Commit Tag Bot commented on LUCENE-5029: [lucene3069 commit] han http://svn.apache.org/viewvc?view=revisionrevision=1493494 LUCENE-5029: remove block based API from PBF factor out a generic 'TermState' for better sharing in FST-based term dict -- Key: LUCENE-5029 URL: https://issues.apache.org/jira/browse/LUCENE-5029 Project: Lucene - Core Issue Type: Sub-task Reporter: Han Jiang Assignee: Han Jiang Priority: Minor Fix For: 4.4 Attachments: LUCENE-5029.algebra.patch, LUCENE-5029.algebra.patch, LUCENE-5029.branch-init.patch, LUCENE-5029.patch, LUCENE-5029.patch, LUCENE-5029.patch, LUCENE-5029.patch, LUCENE-5029.patch Currently, those two FST-based term dict (memory codec blocktree) all use FSTBytesRef as a base data structure, this might not share much data in parent arcs, since the encoded BytesRef doesn't guarantee that 'Outputs.common()' always creates a long prefix. While for current postings format, it is guaranteed that each FP (pointing to .doc, .pos, etc.) will increase monotonically with 'larger' terms. That means, between two Outputs, the Outputs from smaller term can be safely pushed towards root. However we always have some tricky TermState to deal with (like the singletonDocID for pulsing trick), so as Mike suggested, we can simply cut the whole TermState into two parts: one part for comparation and intersection, another for restoring generic data. Then the data structure will be clear: this generic 'TermState' will consist of a fixed-length LongsRef and variable-length BytesRef. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5029) factor out a generic 'TermState' for better sharing in FST-based term dict
[ https://issues.apache.org/jira/browse/LUCENE-5029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13684637#comment-13684637 ] Commit Tag Bot commented on LUCENE-5029: [lucene3069 commit] han http://svn.apache.org/viewvc?view=revisionrevision=1493502 LUCENE-5029 simplify contract on generic long[] factor out a generic 'TermState' for better sharing in FST-based term dict -- Key: LUCENE-5029 URL: https://issues.apache.org/jira/browse/LUCENE-5029 Project: Lucene - Core Issue Type: Sub-task Reporter: Han Jiang Assignee: Han Jiang Priority: Minor Fix For: 4.4 Attachments: LUCENE-5029.algebra.patch, LUCENE-5029.algebra.patch, LUCENE-5029.branch-init.patch, LUCENE-5029.patch, LUCENE-5029.patch, LUCENE-5029.patch, LUCENE-5029.patch, LUCENE-5029.patch Currently, those two FST-based term dict (memory codec blocktree) all use FSTBytesRef as a base data structure, this might not share much data in parent arcs, since the encoded BytesRef doesn't guarantee that 'Outputs.common()' always creates a long prefix. While for current postings format, it is guaranteed that each FP (pointing to .doc, .pos, etc.) will increase monotonically with 'larger' terms. That means, between two Outputs, the Outputs from smaller term can be safely pushed towards root. However we always have some tricky TermState to deal with (like the singletonDocID for pulsing trick), so as Mike suggested, we can simply cut the whole TermState into two parts: one part for comparation and intersection, another for restoring generic data. Then the data structure will be clear: this generic 'TermState' will consist of a fixed-length LongsRef and variable-length BytesRef. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5029) factor out a generic 'TermState' for better sharing in FST-based term dict
[ https://issues.apache.org/jira/browse/LUCENE-5029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13684650#comment-13684650 ] Commit Tag Bot commented on LUCENE-5029: [lucene3069 commit] han http://svn.apache.org/viewvc?view=revisionrevision=1493508 LUCENE-5029: merge PendingMetaData into PendingTerm factor out a generic 'TermState' for better sharing in FST-based term dict -- Key: LUCENE-5029 URL: https://issues.apache.org/jira/browse/LUCENE-5029 Project: Lucene - Core Issue Type: Sub-task Reporter: Han Jiang Assignee: Han Jiang Priority: Minor Fix For: 4.4 Attachments: LUCENE-5029.algebra.patch, LUCENE-5029.algebra.patch, LUCENE-5029.branch-init.patch, LUCENE-5029.patch, LUCENE-5029.patch, LUCENE-5029.patch, LUCENE-5029.patch, LUCENE-5029.patch Currently, those two FST-based term dict (memory codec blocktree) all use FSTBytesRef as a base data structure, this might not share much data in parent arcs, since the encoded BytesRef doesn't guarantee that 'Outputs.common()' always creates a long prefix. While for current postings format, it is guaranteed that each FP (pointing to .doc, .pos, etc.) will increase monotonically with 'larger' terms. That means, between two Outputs, the Outputs from smaller term can be safely pushed towards root. However we always have some tricky TermState to deal with (like the singletonDocID for pulsing trick), so as Mike suggested, we can simply cut the whole TermState into two parts: one part for comparation and intersection, another for restoring generic data. Then the data structure will be clear: this generic 'TermState' will consist of a fixed-length LongsRef and variable-length BytesRef. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3069) Lucene should have an entirely memory resident term dictionary
[ https://issues.apache.org/jira/browse/LUCENE-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13684674#comment-13684674 ] Commit Tag Bot commented on LUCENE-3069: [lucene3069 commit] mikemccand http://svn.apache.org/viewvc?view=revisionrevision=1493516 LUCENE-3069: add nocommit/TODO Lucene should have an entirely memory resident term dictionary -- Key: LUCENE-3069 URL: https://issues.apache.org/jira/browse/LUCENE-3069 Project: Lucene - Core Issue Type: Improvement Components: core/index, core/search Affects Versions: 4.0-ALPHA Reporter: Simon Willnauer Assignee: Han Jiang Labels: gsoc2013 Fix For: 4.4 FST based TermDictionary has been a great improvement yet it still uses a delta codec file for scanning to terms. Some environments have enough memory available to keep the entire FST based term dict in memory. We should add a TermDictionary implementation that encodes all needed information for each term into the FST (custom fst.Output) and builds a FST from the entire term not just the delta. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3069) Lucene should have an entirely memory resident term dictionary
[ https://issues.apache.org/jira/browse/LUCENE-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13684676#comment-13684676 ] Commit Tag Bot commented on LUCENE-3069: [lucene3069 commit] han http://svn.apache.org/viewvc?view=revisionrevision=1493517 LUCENE-3069: setField now expose per-field info to term dict Lucene should have an entirely memory resident term dictionary -- Key: LUCENE-3069 URL: https://issues.apache.org/jira/browse/LUCENE-3069 Project: Lucene - Core Issue Type: Improvement Components: core/index, core/search Affects Versions: 4.0-ALPHA Reporter: Simon Willnauer Assignee: Han Jiang Labels: gsoc2013 Fix For: 4.4 FST based TermDictionary has been a great improvement yet it still uses a delta codec file for scanning to terms. Some environments have enough memory available to keep the entire FST based term dict in memory. We should add a TermDictionary implementation that encodes all needed information for each term into the FST (custom fst.Output) and builds a FST from the entire term not just the delta. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-5029) factor out a generic 'TermState' for better sharing in FST-based term dict
[ https://issues.apache.org/jira/browse/LUCENE-5029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Han Jiang resolved LUCENE-5029. --- Resolution: Fixed PostingsBase is now pluggable for non-based term dict, and the introduction of long[] and byte[] naturally helps the delta-encoding in both block-based term dict, and FST-based term dict. factor out a generic 'TermState' for better sharing in FST-based term dict -- Key: LUCENE-5029 URL: https://issues.apache.org/jira/browse/LUCENE-5029 Project: Lucene - Core Issue Type: Sub-task Reporter: Han Jiang Assignee: Han Jiang Priority: Minor Fix For: 4.4 Attachments: LUCENE-5029.algebra.patch, LUCENE-5029.algebra.patch, LUCENE-5029.branch-init.patch, LUCENE-5029.patch, LUCENE-5029.patch, LUCENE-5029.patch, LUCENE-5029.patch, LUCENE-5029.patch Currently, those two FST-based term dict (memory codec blocktree) all use FSTBytesRef as a base data structure, this might not share much data in parent arcs, since the encoded BytesRef doesn't guarantee that 'Outputs.common()' always creates a long prefix. While for current postings format, it is guaranteed that each FP (pointing to .doc, .pos, etc.) will increase monotonically with 'larger' terms. That means, between two Outputs, the Outputs from smaller term can be safely pushed towards root. However we always have some tricky TermState to deal with (like the singletonDocID for pulsing trick), so as Mike suggested, we can simply cut the whole TermState into two parts: one part for comparation and intersection, another for restoring generic data. Then the data structure will be clear: this generic 'TermState' will consist of a fixed-length LongsRef and variable-length BytesRef. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-5029) factor out a generic 'TermState' for better sharing in FST-based term dict
[ https://issues.apache.org/jira/browse/LUCENE-5029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13684678#comment-13684678 ] Han Jiang edited comment on LUCENE-5029 at 6/16/13 2:49 PM: PostingsBase is now pluggable for non-block based term dict, and the introduction of long[] and byte[] naturally helps the delta-encoding in both block-based term dict, and FST-based term dict. was (Author: billy): PostingsBase is now pluggable for non-based term dict, and the introduction of long[] and byte[] naturally helps the delta-encoding in both block-based term dict, and FST-based term dict. factor out a generic 'TermState' for better sharing in FST-based term dict -- Key: LUCENE-5029 URL: https://issues.apache.org/jira/browse/LUCENE-5029 Project: Lucene - Core Issue Type: Sub-task Reporter: Han Jiang Assignee: Han Jiang Priority: Minor Fix For: 4.4 Attachments: LUCENE-5029.algebra.patch, LUCENE-5029.algebra.patch, LUCENE-5029.branch-init.patch, LUCENE-5029.patch, LUCENE-5029.patch, LUCENE-5029.patch, LUCENE-5029.patch, LUCENE-5029.patch Currently, those two FST-based term dict (memory codec blocktree) all use FSTBytesRef as a base data structure, this might not share much data in parent arcs, since the encoded BytesRef doesn't guarantee that 'Outputs.common()' always creates a long prefix. While for current postings format, it is guaranteed that each FP (pointing to .doc, .pos, etc.) will increase monotonically with 'larger' terms. That means, between two Outputs, the Outputs from smaller term can be safely pushed towards root. However we always have some tricky TermState to deal with (like the singletonDocID for pulsing trick), so as Mike suggested, we can simply cut the whole TermState into two parts: one part for comparation and intersection, another for restoring generic data. Then the data structure will be clear: this generic 'TermState' will consist of a fixed-length LongsRef and variable-length BytesRef. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4910) solr.xml persistence is completely broken
[ https://issues.apache.org/jira/browse/SOLR-4910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson updated SOLR-4910: - Attachment: SOLR-4910.patch OK, if all the tests pass (running now), I think this is ready and I'll put it up tonight or tomorrow unless there are objections. This patch goes against trunk... This takes care of 4 bugs and 5 things that testing flushed out, see solr/CHANGES.txt. Shawn (or anyone for that matter) if you have a chance to run this through any exercises it would be a Good Thing, especially seeing whether I made the right decisions around swapping and renaming. Under any circumstances, though, unless someone finds something horribly wrong or the tests blow up, I think this improves solr.xml persistence significantly and I'll check it in Real Soon Now. solr.xml persistence is completely broken - Key: SOLR-4910 URL: https://issues.apache.org/jira/browse/SOLR-4910 Project: Solr Issue Type: Bug Affects Versions: 5.0, 4.4 Reporter: Erick Erickson Assignee: Erick Erickson Priority: Blocker Attachments: SOLR-4910.patch, SOLR-4910.patch, SOLR-4910.patch, SOLR-4910.patch, SOLR-4910.patch I'm working on SOLR-4862 (persisting a created core doesn't preserve some values) and at least compared to 4.3 code, persisting to solr.xml is completely broken. I learned to hate persistence while working on SOLR-4196 etc. and I'm glad it's going away. I frequently got lost in implicit properties (they're easy to persist and shouldn't be), what should/shouldn't be persisted (e.g. the translated ${var:default} or the original), and it was a monster, so don't think I'm nostalgic for the historical behavior. Before I dive back in I want to get some idea whether or not the current behavior was intentional or not, I don't want to go back into that junk only to undo someone else's work. Creating a new core (collection2 in my example) with persistence turned on in solr.xml for instance changes the original definition for collection1 (stock 4.x as of tonight) from this: core name=collection1 instanceDir=collection1 shard=${shard:} collection=${collection:collection1} config=${solrconfig:solrconfig.xml} schema=${schema:schema.xml} coreNodeName=${coreNodeName:}/ to this: core loadOnStartup=true shard=${shard:} instanceDir=collection1/ transient=false name=collection1 dataDir=data/ collection=${collection:collection1} property name=name value=collection1/ property name=config value=solrconfig.xml/ property name=solr.core.instanceDir value=solr/collection1// property name=transient value=false/ property name=schema value=schema.xml/ property name=loadOnStartup value=true/ property name=solr.core.schemaName value=schema.xml/ property name=solr.core.name value=collection1/ property name=solr.core.dataDir value=data// property name=instanceDir value=collection1// property name=solr.core.configName value=solrconfig.xml/ /core So, there are two questions: 1 what is correct for 4.x? 2 do we care at all about 5.x? As much as I hate to say it, I think that we need to go back to the 4.3 behavior. It might be as simple as not persisting in the property tags anything already in the original definition. Not quite sure what to put where in the newly-created core though, I suspect that the compact core + attribs would be best (assuming there's no property tag already in the definition. I really hate the mix of attributes on the core tag and property tags, wish we had one or the other -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4910) solr.xml persistence is completely broken
[ https://issues.apache.org/jira/browse/SOLR-4910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13684706#comment-13684706 ] Shawn Heisey commented on SOLR-4910: That was a bit of a bear to apply to 4x, but I got it done. I don't have anything real set up with trunk where I can easily work on it with my index building code. Perhaps I should set up a fourth index chain for that. I will poke around a bit. solr.xml persistence is completely broken - Key: SOLR-4910 URL: https://issues.apache.org/jira/browse/SOLR-4910 Project: Solr Issue Type: Bug Affects Versions: 5.0, 4.4 Reporter: Erick Erickson Assignee: Erick Erickson Priority: Blocker Attachments: SOLR-4910.patch, SOLR-4910.patch, SOLR-4910.patch, SOLR-4910.patch, SOLR-4910.patch I'm working on SOLR-4862 (persisting a created core doesn't preserve some values) and at least compared to 4.3 code, persisting to solr.xml is completely broken. I learned to hate persistence while working on SOLR-4196 etc. and I'm glad it's going away. I frequently got lost in implicit properties (they're easy to persist and shouldn't be), what should/shouldn't be persisted (e.g. the translated ${var:default} or the original), and it was a monster, so don't think I'm nostalgic for the historical behavior. Before I dive back in I want to get some idea whether or not the current behavior was intentional or not, I don't want to go back into that junk only to undo someone else's work. Creating a new core (collection2 in my example) with persistence turned on in solr.xml for instance changes the original definition for collection1 (stock 4.x as of tonight) from this: core name=collection1 instanceDir=collection1 shard=${shard:} collection=${collection:collection1} config=${solrconfig:solrconfig.xml} schema=${schema:schema.xml} coreNodeName=${coreNodeName:}/ to this: core loadOnStartup=true shard=${shard:} instanceDir=collection1/ transient=false name=collection1 dataDir=data/ collection=${collection:collection1} property name=name value=collection1/ property name=config value=solrconfig.xml/ property name=solr.core.instanceDir value=solr/collection1// property name=transient value=false/ property name=schema value=schema.xml/ property name=loadOnStartup value=true/ property name=solr.core.schemaName value=schema.xml/ property name=solr.core.name value=collection1/ property name=solr.core.dataDir value=data// property name=instanceDir value=collection1// property name=solr.core.configName value=solrconfig.xml/ /core So, there are two questions: 1 what is correct for 4.x? 2 do we care at all about 5.x? As much as I hate to say it, I think that we need to go back to the 4.3 behavior. It might be as simple as not persisting in the property tags anything already in the original definition. Not quite sure what to put where in the newly-created core though, I suspect that the compact core + attribs would be best (assuming there's no property tag already in the definition. I really hate the mix of attributes on the core tag and property tags, wish we had one or the other -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4914) Refactor core persistence to reflect deprecating the core tags in solr.xml
[ https://issues.apache.org/jira/browse/SOLR-4914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Woodward updated SOLR-4914: Attachment: SOLR-4914.patch Patch with my latest status. It won't compile yet, as I haven't updated all tests to reflect the new API. Will work on that next. This combines core discovery and persistence into the CoresListPersistor interface. All persistence logic is removed from CoreContainer and ConfigSolr and put into the two implementing classes of CorePropertiesPersistor and SolrXMLPersistor. CoreDescriptor is tidied up a bit, and made effectively immutable (would be nicer to make it really immutable, maybe with an ImmutableProperties class). The original pre-substitution parameters are stored as well as the values after substitution, which makes persistence a lot easier (you just read from the original values). Solr.xml persistence is also made a lot simpler, by just storing everything around the cores/ tag as a flat string, and only updating the core tags. So you don't need to remember to add new solr.xml parameters to core persistence logic any more, and things like comments will be preserved. This is a pretty big patch, and there's still a fair amount to do, but I'd be grateful for some preliminary reviews. I think it simplifies the whole core discovery/persistence logic a lot. Refactor core persistence to reflect deprecating the core tags in solr.xml Key: SOLR-4914 URL: https://issues.apache.org/jira/browse/SOLR-4914 Project: Solr Issue Type: Improvement Affects Versions: 5.0 Reporter: Erick Erickson Attachments: SOLR-4914.patch, SOLR-4914.patch Alan Woodward has done some work to refactor how core persistence works that we should work on going forward that I want to separate from a shorter-term tactical problem (See SOLR-4910). I'm attaching Alan's patch to this JIRA and we'll carry it forward separately from 4910. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (SOLR-4914) Refactor core persistence to reflect deprecating the core tags in solr.xml
[ https://issues.apache.org/jira/browse/SOLR-4914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Woodward reassigned SOLR-4914: --- Assignee: Alan Woodward Refactor core persistence to reflect deprecating the core tags in solr.xml Key: SOLR-4914 URL: https://issues.apache.org/jira/browse/SOLR-4914 Project: Solr Issue Type: Improvement Affects Versions: 5.0 Reporter: Erick Erickson Assignee: Alan Woodward Attachments: SOLR-4914.patch, SOLR-4914.patch Alan Woodward has done some work to refactor how core persistence works that we should work on going forward that I want to separate from a shorter-term tactical problem (See SOLR-4910). I'm attaching Alan's patch to this JIRA and we'll carry it forward separately from 4910. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4914) Factor out core discovery and persistence logic
[ https://issues.apache.org/jira/browse/SOLR-4914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Woodward updated SOLR-4914: Summary: Factor out core discovery and persistence logic (was: Refactor core persistence to reflect deprecating the core tags in solr.xml) Factor out core discovery and persistence logic --- Key: SOLR-4914 URL: https://issues.apache.org/jira/browse/SOLR-4914 Project: Solr Issue Type: Improvement Affects Versions: 5.0 Reporter: Erick Erickson Assignee: Alan Woodward Attachments: SOLR-4914.patch, SOLR-4914.patch Alan Woodward has done some work to refactor how core persistence works that we should work on going forward that I want to separate from a shorter-term tactical problem (See SOLR-4910). I'm attaching Alan's patch to this JIRA and we'll carry it forward separately from 4910. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
unsubscribe
unsubscribe
[jira] [Created] (SOLR-4930) Make PathHierarchyTokenizer use regex and optionally prefix the depth of the path.
John Berryman created SOLR-4930: --- Summary: Make PathHierarchyTokenizer use regex and optionally prefix the depth of the path. Key: SOLR-4930 URL: https://issues.apache.org/jira/browse/SOLR-4930 Project: Solr Issue Type: Improvement Components: Schema and Analysis Reporter: John Berryman Priority: Minor The PathHierarchyTokenizer lacks a couple of features that I think are commonly needed. 1. Split and replace based upon regex. 2. Optionally prefix the token with the depth of the path token Motivation: I recently had a client who asked me to index laws that were organized in the chapters, sections, subsections, etc. The problem was that the section number used a mixture of delimiters. Ex: 13.4-64.2, so I had to use pattern replacement to map either delimiter to tilda. But the next problem was that these could no longer be displayed as facets (at least not without extra code on the front end). Also, I wanted to prefix the depth of the path at the front of the token. Again, I can achieve this with pattern replacement - but it is ugly and non-performant. I propose we: * update PathHierarchyTokenizer so that if the parameters for delimiter of replacement are single character, then the behavior of PathHierarchyTokenizer remains consistent, but if the length of these arguments is greater than one, then they should be interpreted as regex. * add a new parameter called depthPrefixNumChars that indicates how many characters will be used for a depth prefix - this defaults to zero Here's my current first stab at it: https://github.com/o19s/statedecoded/blob/master/solr_home/statedecoded/src/src/main/java/com/o19s/RegexPathHierarchyTokenizer.java This doesn't support the replacement or skip parameter yet. Before I go the rest of the way, I wanted to gauge interest and see if others need this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-2899) Add OpenNLP Analysis capabilities as a module
[ https://issues.apache.org/jira/browse/LUCENE-2899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lance Norskog updated LUCENE-2899: -- Attachment: LUCENE-2899-x.patch Fixed the Chunker problem. I switched to the new released version of the OpenNLP packages. The MaxEnt implementation (statistical modeling) for chunking changed slightly, and my test data now produces different nounverb phrase chunks for the sample text. At this point the only problems I know of are that the licenses are slightly wrong, and so 'ant validate' fails. These comments only apply to LUCENE-2899-x.patch, which applies to the current 4.x and trunk codelines. LUCENE-2899.patch applies to the release 4.0-4.3 releases. It is not upgraded to the new OpenNLP release. Add OpenNLP Analysis capabilities as a module - Key: LUCENE-2899 URL: https://issues.apache.org/jira/browse/LUCENE-2899 Project: Lucene - Core Issue Type: New Feature Components: modules/analysis Reporter: Grant Ingersoll Assignee: Grant Ingersoll Priority: Minor Fix For: 4.4 Attachments: LUCENE-2899-current.patch, LUCENE-2899.patch, LUCENE-2899.patch, LUCENE-2899.patch, LUCENE-2899.patch, LUCENE-2899.patch, LUCENE-2899.patch, LUCENE-2899-RJN.patch, LUCENE-2899-x.patch, LUCENE-2899-x.patch, LUCENE-2899-x.patch, OpenNLPFilter.java, OpenNLPFilter.java, OpenNLPTokenizer.java, opennlp_trunk.patch Now that OpenNLP is an ASF project and has a nice license, it would be nice to have a submodule (under analysis) that exposed capabilities for it. Drew Farris, Tom Morton and I have code that does: * Sentence Detection as a Tokenizer (could also be a TokenFilter, although it would have to change slightly to buffer tokens) * NamedEntity recognition as a TokenFilter We are also planning a Tokenizer/TokenFilter that can put parts of speech as either payloads (PartOfSpeechAttribute?) on a token or at the same position. I'd propose it go under: modules/analysis/opennlp -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4910) solr.xml persistence is completely broken
[ https://issues.apache.org/jira/browse/SOLR-4910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13684868#comment-13684868 ] Commit Tag Bot commented on SOLR-4910: -- [trunk commit] erick http://svn.apache.org/viewvc?view=revisionrevision=1493618 SOLR-4910, improvements to persisting solr.xml and misc other fixes, see CHANGES.txt solr.xml persistence is completely broken - Key: SOLR-4910 URL: https://issues.apache.org/jira/browse/SOLR-4910 Project: Solr Issue Type: Bug Affects Versions: 5.0, 4.4 Reporter: Erick Erickson Assignee: Erick Erickson Priority: Blocker Attachments: SOLR-4910.patch, SOLR-4910.patch, SOLR-4910.patch, SOLR-4910.patch, SOLR-4910.patch I'm working on SOLR-4862 (persisting a created core doesn't preserve some values) and at least compared to 4.3 code, persisting to solr.xml is completely broken. I learned to hate persistence while working on SOLR-4196 etc. and I'm glad it's going away. I frequently got lost in implicit properties (they're easy to persist and shouldn't be), what should/shouldn't be persisted (e.g. the translated ${var:default} or the original), and it was a monster, so don't think I'm nostalgic for the historical behavior. Before I dive back in I want to get some idea whether or not the current behavior was intentional or not, I don't want to go back into that junk only to undo someone else's work. Creating a new core (collection2 in my example) with persistence turned on in solr.xml for instance changes the original definition for collection1 (stock 4.x as of tonight) from this: core name=collection1 instanceDir=collection1 shard=${shard:} collection=${collection:collection1} config=${solrconfig:solrconfig.xml} schema=${schema:schema.xml} coreNodeName=${coreNodeName:}/ to this: core loadOnStartup=true shard=${shard:} instanceDir=collection1/ transient=false name=collection1 dataDir=data/ collection=${collection:collection1} property name=name value=collection1/ property name=config value=solrconfig.xml/ property name=solr.core.instanceDir value=solr/collection1// property name=transient value=false/ property name=schema value=schema.xml/ property name=loadOnStartup value=true/ property name=solr.core.schemaName value=schema.xml/ property name=solr.core.name value=collection1/ property name=solr.core.dataDir value=data// property name=instanceDir value=collection1// property name=solr.core.configName value=solrconfig.xml/ /core So, there are two questions: 1 what is correct for 4.x? 2 do we care at all about 5.x? As much as I hate to say it, I think that we need to go back to the 4.3 behavior. It might be as simple as not persisting in the property tags anything already in the original definition. Not quite sure what to put where in the newly-created core though, I suspect that the compact core + attribs would be best (assuming there's no property tag already in the definition. I really hate the mix of attributes on the core tag and property tags, wish we had one or the other -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-4910) solr.xml persistence is completely broken
[ https://issues.apache.org/jira/browse/SOLR-4910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson resolved SOLR-4910. -- Resolution: Fixed Fix Version/s: 4.4 5.0 trunk: 1493618 4x:1493620 Also fixed in this commit: SOLR-4862, CREATE fails to persist schema, config, and dataDir SOLR-4363, not persisting coreLoadThreads in solr tag SOLR-3900, logWatcher properties not persisted SOLR-4852, cores defined as loadOnStartup=true, transient=false can't be searched [~elyograg] Have at it, let's open up new JIRAs for anything you find, probably just assign them to me. [~romseygeek] This commit probably makes your life more difficult, you may want to do an update sooner rather than later, this got somewhat bigger than I expected. solr.xml persistence is completely broken - Key: SOLR-4910 URL: https://issues.apache.org/jira/browse/SOLR-4910 Project: Solr Issue Type: Bug Affects Versions: 5.0, 4.4 Reporter: Erick Erickson Assignee: Erick Erickson Priority: Blocker Fix For: 5.0, 4.4 Attachments: SOLR-4910.patch, SOLR-4910.patch, SOLR-4910.patch, SOLR-4910.patch, SOLR-4910.patch I'm working on SOLR-4862 (persisting a created core doesn't preserve some values) and at least compared to 4.3 code, persisting to solr.xml is completely broken. I learned to hate persistence while working on SOLR-4196 etc. and I'm glad it's going away. I frequently got lost in implicit properties (they're easy to persist and shouldn't be), what should/shouldn't be persisted (e.g. the translated ${var:default} or the original), and it was a monster, so don't think I'm nostalgic for the historical behavior. Before I dive back in I want to get some idea whether or not the current behavior was intentional or not, I don't want to go back into that junk only to undo someone else's work. Creating a new core (collection2 in my example) with persistence turned on in solr.xml for instance changes the original definition for collection1 (stock 4.x as of tonight) from this: core name=collection1 instanceDir=collection1 shard=${shard:} collection=${collection:collection1} config=${solrconfig:solrconfig.xml} schema=${schema:schema.xml} coreNodeName=${coreNodeName:}/ to this: core loadOnStartup=true shard=${shard:} instanceDir=collection1/ transient=false name=collection1 dataDir=data/ collection=${collection:collection1} property name=name value=collection1/ property name=config value=solrconfig.xml/ property name=solr.core.instanceDir value=solr/collection1// property name=transient value=false/ property name=schema value=schema.xml/ property name=loadOnStartup value=true/ property name=solr.core.schemaName value=schema.xml/ property name=solr.core.name value=collection1/ property name=solr.core.dataDir value=data// property name=instanceDir value=collection1// property name=solr.core.configName value=solrconfig.xml/ /core So, there are two questions: 1 what is correct for 4.x? 2 do we care at all about 5.x? As much as I hate to say it, I think that we need to go back to the 4.3 behavior. It might be as simple as not persisting in the property tags anything already in the original definition. Not quite sure what to put where in the newly-created core though, I suspect that the compact core + attribs would be best (assuming there's no property tag already in the definition. I really hate the mix of attributes on the core tag and property tags, wish we had one or the other -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-4862) Core admin action CREATE fails to persist some settings in solr.xml
[ https://issues.apache.org/jira/browse/SOLR-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson resolved SOLR-4862. -- Resolution: Fixed Fix Version/s: 4.4 5.0 Fixed as part of SOLR-4910 Core admin action CREATE fails to persist some settings in solr.xml - Key: SOLR-4862 URL: https://issues.apache.org/jira/browse/SOLR-4862 Project: Solr Issue Type: Bug Components: multicore Affects Versions: 4.3 Reporter: André Widhani Assignee: Erick Erickson Priority: Minor Fix For: 5.0, 4.4 When I create a core with Core admin handler using these request parameters: action=CREATE name=core-tex69bbum21ctk1kq6lmkir-index3 schema=/etc/opt/dcx/solr/conf/schema.xml instanceDir=/etc/opt/dcx/solr/ config=/etc/opt/dcx/solr/conf/solrconfig.xml dataDir=/var/opt/dcx/solr/core-tex69bbum21ctk1kq6lmkir-index3 in Solr 4.1, solr.xml would have the following entry: core schema=/etc/opt/dcx/solr/conf/schema.xml loadOnStartup=true instanceDir=/etc/opt/dcx/solr/ transient=false name=core-tex69bbum21ctk1kq6lmkir-index3 config=/etc/opt/dcx/solr/conf/solrconfig.xml dataDir=/var/opt/dcx/solr/core-tex69bbum21ctk1kq6lmkir-index3/ collection=core-tex69bbum21ctk1kq6lmkir-index3/ while in Solr 4.3 schema, config and dataDir will be missing: core loadOnStartup=true instanceDir=/etc/opt/dcx/solr/ transient=false name=core-tex69bbum21ctk1kq6lmkir-index3 collection=core-tex69bbum21ctk1kq6lmkir-index3/ The new core would use the settings specified during CREATE, but after a Solr restart they are lost (fall back to some defaults), as they are not persisted in solr.xml. I should add that solr.xml has persistent=true in the root element. http://lucene.472066.n3.nabble.com/Core-admin-action-quot-CREATE-quot-fails-to-persist-some-settings-in-solr-xml-with-Solr-4-3-td4065786.html -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-4363) Inconsistent coreLoadThreads attributes in solr.xml between read/write
[ https://issues.apache.org/jira/browse/SOLR-4363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson resolved SOLR-4363. -- Resolution: Fixed Fix Version/s: 4.4 5.0 Fixed as part of SOLR-4910 Inconsistent coreLoadThreads attributes in solr.xml between read/write Key: SOLR-4363 URL: https://issues.apache.org/jira/browse/SOLR-4363 Project: Solr Issue Type: Bug Affects Versions: 4.1 Reporter: Patanachai Tangchaisin Assignee: Erick Erickson Priority: Minor Fix For: 5.0, 4.4 Solr is reading coreLoadThreads from an solr element in solr.xml However, when persistent is enabled in solr.xml, Solr inserts coreLoadThreads attribute to a wrong element. Before start solr {code} solr persistent=true coreLoadThreads=2 cores host=localhost adminPath=/admin/cores hostPort=8983 hostContext=solr . /solr {code} After start solr {code} solr persistent=true cores host=localhost adminPath=/admin/cores coreLoadThreads=2 hostPort=8080 hostContext=solr . /solr {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-3900) LogWatcher Config Not Persisted
[ https://issues.apache.org/jira/browse/SOLR-3900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson resolved SOLR-3900. -- Resolution: Fixed Fix Version/s: 5.0 Fixed as part of SOLR-4910 LogWatcher Config Not Persisted Key: SOLR-3900 URL: https://issues.apache.org/jira/browse/SOLR-3900 Project: Solr Issue Type: Bug Components: multicore Reporter: Michael Garski Assignee: Erick Erickson Priority: Minor Fix For: 5.0, 4.4 When the solr.xml file is set to persistent=true, the logging element that contains the LogWatcher configuration is not persisted to the new solr.xml file that is written when managing the cores via core admin. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Issue Comment Deleted] (SOLR-4852) If sharedLib is set to lib, classloader fails to find classes in lib
[ https://issues.apache.org/jira/browse/SOLR-4852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson updated SOLR-4852: - Comment: was deleted (was: Fixed as part of SOLR-4910) If sharedLib is set to lib, classloader fails to find classes in lib Key: SOLR-4852 URL: https://issues.apache.org/jira/browse/SOLR-4852 Project: Solr Issue Type: Bug Affects Versions: 4.4 Environment: Linux bigindy5 2.6.32-358.6.1.el6.centos.plus.x86_64 #1 SMP Wed Apr 24 03:21:04 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux java version 1.7.0_21 Java(TM) SE Runtime Environment (build 1.7.0_21-b11) Java HotSpot(TM) 64-Bit Server VM (build 23.21-b01, mixed mode) Reporter: Shawn Heisey Fix For: 5.0, 4.4 Attachments: SOLR-4852.patch, SOLR-4852.patch, SOLR-4852-test-failhard.txt I have some jars in the lib directory under solr.solr.home - DIH, ICU, and MySQL. If I set sharedLib in solr.xml to lib then the ICUTokenizer class is not found, even though the jar is loaded (twice) during Solr startup. If I set sharedLib to another location that doesn't exist, the jars are only loaded once and there is no problem. I'm using the old-style solr.xml on branch_4x revision 1485566. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-4852) If sharedLib is set to lib, classloader fails to find classes in lib
[ https://issues.apache.org/jira/browse/SOLR-4852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson resolved SOLR-4852. -- Resolution: Fixed Fixed as part of SOLR-4910 If sharedLib is set to lib, classloader fails to find classes in lib Key: SOLR-4852 URL: https://issues.apache.org/jira/browse/SOLR-4852 Project: Solr Issue Type: Bug Affects Versions: 4.4 Environment: Linux bigindy5 2.6.32-358.6.1.el6.centos.plus.x86_64 #1 SMP Wed Apr 24 03:21:04 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux java version 1.7.0_21 Java(TM) SE Runtime Environment (build 1.7.0_21-b11) Java HotSpot(TM) 64-Bit Server VM (build 23.21-b01, mixed mode) Reporter: Shawn Heisey Fix For: 5.0, 4.4 Attachments: SOLR-4852.patch, SOLR-4852.patch, SOLR-4852-test-failhard.txt I have some jars in the lib directory under solr.solr.home - DIH, ICU, and MySQL. If I set sharedLib in solr.xml to lib then the ICUTokenizer class is not found, even though the jar is loaded (twice) during Solr startup. If I set sharedLib to another location that doesn't exist, the jars are only loaded once and there is no problem. I'm using the old-style solr.xml on branch_4x revision 1485566. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Reopened] (SOLR-4852) If sharedLib is set to lib, classloader fails to find classes in lib
[ https://issues.apache.org/jira/browse/SOLR-4852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson reopened SOLR-4852: -- Sorry, got the wrong one when closing JIRAs related to SOLR-4910 If sharedLib is set to lib, classloader fails to find classes in lib Key: SOLR-4852 URL: https://issues.apache.org/jira/browse/SOLR-4852 Project: Solr Issue Type: Bug Affects Versions: 4.4 Environment: Linux bigindy5 2.6.32-358.6.1.el6.centos.plus.x86_64 #1 SMP Wed Apr 24 03:21:04 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux java version 1.7.0_21 Java(TM) SE Runtime Environment (build 1.7.0_21-b11) Java HotSpot(TM) 64-Bit Server VM (build 23.21-b01, mixed mode) Reporter: Shawn Heisey Fix For: 5.0, 4.4 Attachments: SOLR-4852.patch, SOLR-4852.patch, SOLR-4852-test-failhard.txt I have some jars in the lib directory under solr.solr.home - DIH, ICU, and MySQL. If I set sharedLib in solr.xml to lib then the ICUTokenizer class is not found, even though the jar is loaded (twice) during Solr startup. If I set sharedLib to another location that doesn't exist, the jars are only loaded once and there is no problem. I'm using the old-style solr.xml on branch_4x revision 1485566. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-4850) Cores defined as loadOnStartup=true and transient=true can't be queried
[ https://issues.apache.org/jira/browse/SOLR-4850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson resolved SOLR-4850. -- Resolution: Fixed Fix Version/s: 4.4 5.0 Fixed as part of SOLR-4910 Cores defined as loadOnStartup=true and transient=true can't be queried Key: SOLR-4850 URL: https://issues.apache.org/jira/browse/SOLR-4850 Project: Solr Issue Type: Bug Components: multicore Affects Versions: 4.3, 4.2.1 Reporter: Lyubov Romanchuk Assignee: Erick Erickson Fix For: 5.0, 4.4 It seems like in order to query transient cores they must be defined with loadOnStartup=false. I define one core loadOnStartup=true and transient=false, and another cores to be loadOnStartup=true and transient=true, and transientCacheSize is default (=Integer.MAX_VALUE). In this case CoreContainer.dynamicDescriptors will be empty and then CoreContainer.getCoreFromAnyList(String) and CoreContainer.getCore(String) returns null for all transient cores. As a result such cores (loadOnStartup=true and transient=true) can't be queried at all (neither from Query nor from Overview). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4910) solr.xml persistence is completely broken
[ https://issues.apache.org/jira/browse/SOLR-4910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13684899#comment-13684899 ] Commit Tag Bot commented on SOLR-4910: -- [branch_4x commit] erick http://svn.apache.org/viewvc?view=revisionrevision=1493621 SOLR-4910, corrected typo in CHANGES.txt solr.xml persistence is completely broken - Key: SOLR-4910 URL: https://issues.apache.org/jira/browse/SOLR-4910 Project: Solr Issue Type: Bug Affects Versions: 5.0, 4.4 Reporter: Erick Erickson Assignee: Erick Erickson Priority: Blocker Fix For: 5.0, 4.4 Attachments: SOLR-4910.patch, SOLR-4910.patch, SOLR-4910.patch, SOLR-4910.patch, SOLR-4910.patch I'm working on SOLR-4862 (persisting a created core doesn't preserve some values) and at least compared to 4.3 code, persisting to solr.xml is completely broken. I learned to hate persistence while working on SOLR-4196 etc. and I'm glad it's going away. I frequently got lost in implicit properties (they're easy to persist and shouldn't be), what should/shouldn't be persisted (e.g. the translated ${var:default} or the original), and it was a monster, so don't think I'm nostalgic for the historical behavior. Before I dive back in I want to get some idea whether or not the current behavior was intentional or not, I don't want to go back into that junk only to undo someone else's work. Creating a new core (collection2 in my example) with persistence turned on in solr.xml for instance changes the original definition for collection1 (stock 4.x as of tonight) from this: core name=collection1 instanceDir=collection1 shard=${shard:} collection=${collection:collection1} config=${solrconfig:solrconfig.xml} schema=${schema:schema.xml} coreNodeName=${coreNodeName:}/ to this: core loadOnStartup=true shard=${shard:} instanceDir=collection1/ transient=false name=collection1 dataDir=data/ collection=${collection:collection1} property name=name value=collection1/ property name=config value=solrconfig.xml/ property name=solr.core.instanceDir value=solr/collection1// property name=transient value=false/ property name=schema value=schema.xml/ property name=loadOnStartup value=true/ property name=solr.core.schemaName value=schema.xml/ property name=solr.core.name value=collection1/ property name=solr.core.dataDir value=data// property name=instanceDir value=collection1// property name=solr.core.configName value=solrconfig.xml/ /core So, there are two questions: 1 what is correct for 4.x? 2 do we care at all about 5.x? As much as I hate to say it, I think that we need to go back to the 4.3 behavior. It might be as simple as not persisting in the property tags anything already in the original definition. Not quite sure what to put where in the newly-created core though, I suspect that the compact core + attribs would be best (assuming there's no property tag already in the definition. I really hate the mix of attributes on the core tag and property tags, wish we had one or the other -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4910) solr.xml persistence is completely broken
[ https://issues.apache.org/jira/browse/SOLR-4910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13684900#comment-13684900 ] Commit Tag Bot commented on SOLR-4910: -- [trunk commit] erick http://svn.apache.org/viewvc?view=revisionrevision=1493622 SOLR-4910, corrected typo in CHANGES.txt solr.xml persistence is completely broken - Key: SOLR-4910 URL: https://issues.apache.org/jira/browse/SOLR-4910 Project: Solr Issue Type: Bug Affects Versions: 5.0, 4.4 Reporter: Erick Erickson Assignee: Erick Erickson Priority: Blocker Fix For: 5.0, 4.4 Attachments: SOLR-4910.patch, SOLR-4910.patch, SOLR-4910.patch, SOLR-4910.patch, SOLR-4910.patch I'm working on SOLR-4862 (persisting a created core doesn't preserve some values) and at least compared to 4.3 code, persisting to solr.xml is completely broken. I learned to hate persistence while working on SOLR-4196 etc. and I'm glad it's going away. I frequently got lost in implicit properties (they're easy to persist and shouldn't be), what should/shouldn't be persisted (e.g. the translated ${var:default} or the original), and it was a monster, so don't think I'm nostalgic for the historical behavior. Before I dive back in I want to get some idea whether or not the current behavior was intentional or not, I don't want to go back into that junk only to undo someone else's work. Creating a new core (collection2 in my example) with persistence turned on in solr.xml for instance changes the original definition for collection1 (stock 4.x as of tonight) from this: core name=collection1 instanceDir=collection1 shard=${shard:} collection=${collection:collection1} config=${solrconfig:solrconfig.xml} schema=${schema:schema.xml} coreNodeName=${coreNodeName:}/ to this: core loadOnStartup=true shard=${shard:} instanceDir=collection1/ transient=false name=collection1 dataDir=data/ collection=${collection:collection1} property name=name value=collection1/ property name=config value=solrconfig.xml/ property name=solr.core.instanceDir value=solr/collection1// property name=transient value=false/ property name=schema value=schema.xml/ property name=loadOnStartup value=true/ property name=solr.core.schemaName value=schema.xml/ property name=solr.core.name value=collection1/ property name=solr.core.dataDir value=data// property name=instanceDir value=collection1// property name=solr.core.configName value=solrconfig.xml/ /core So, there are two questions: 1 what is correct for 4.x? 2 do we care at all about 5.x? As much as I hate to say it, I think that we need to go back to the 4.3 behavior. It might be as simple as not persisting in the property tags anything already in the original definition. Not quite sure what to put where in the newly-created core though, I suspect that the compact core + attribs would be best (assuming there's no property tag already in the definition. I really hate the mix of attributes on the core tag and property tags, wish we had one or the other -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Adding a mixture of language models to Lucene 4.0
Hi Nikita, Speaking only for myself here... maybe explain more about what this library does in plain English - what problem does it solve? I had to look up the paper (ha! a known item!): http://www.cs.cmu.edu/~callan/Papers/sigir03-pto.pdf (add to README so others don't have to search?) To make it easy to add this to Lucene, you should: * use and include ASL * include ASL snippet in each Java class * switch to Java for tests * move to org.apache.lucene... HTH, Otis -- Solr ElasticSearch Support -- http://sematext.com/ On Fri, Jun 14, 2013 at 7:43 PM, Nikita Zhiltsov nikita.zhilt...@gmail.com wrote: Hi all, I've just published a tiny extension to Lucene 4.0, which enables a mixture of language models using standard FunctionQuery and ValueSource classes: https://github.com/nzhiltsov/lucene-mlm I'd like you to assess the possibility of integrating this code into Lucene. Appreciate any comments or fixes. NB. The implementation avoids using LMSimilarity per field basis, because it would break the computation of correct Dirichlet priors for non-matched terms, which the standard class LMSimilarity fails to include while calculating term frequencies and treats them as zero probability entries. -- Nikita Zhiltsov Visiting Graduate Student Emory University Intelligent Information Access Lab E500 Emerson Hall, Atlanta, Georgia, USA Phone: (404) 834-5364 E-mail: znik...@emory.edu - Graduate Student, Research Fellow Kazan Federal University Computational Linguistics Laboratory Russia, 420008 Kazan, Prof. Nuzhina Str., 1/37 room 117 Skype: nickita.jhiltsov Personal page: http://cll.niimm.ksu.ru/~nzhiltsov E-mail: nikita.zhilt...@gmail.com - - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3076) Solr should support block joins
[ https://issues.apache.org/jira/browse/SOLR-3076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13684935#comment-13684935 ] Otis Gospodnetic commented on SOLR-3076: This is issue #2-3 in terms of popularity. Does it work in SolrCloud-type setups? Solr should support block joins --- Key: SOLR-3076 URL: https://issues.apache.org/jira/browse/SOLR-3076 Project: Solr Issue Type: New Feature Reporter: Grant Ingersoll Fix For: 5.0, 4.4 Attachments: 27M-singlesegment-histogram.png, 27M-singlesegment.png, bjq-vs-filters-backward-disi.patch, bjq-vs-filters-illegal-state.patch, child-bjqparser.patch, dih-3076.patch, dih-config.xml, parent-bjq-qparser.patch, parent-bjq-qparser.patch, Screen Shot 2012-07-17 at 1.12.11 AM.png, SOLR-3076-childDocs.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-7036-childDocs-solr-fork-trunk-patched, solrconf-bjq-erschema-snippet.xml, solrconfig.xml.patch, tochild-bjq-filtered-search-fix.patch Lucene has the ability to do block joins, we should add it to Solr. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4221) Custom sharding
[ https://issues.apache.org/jira/browse/SOLR-4221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13684936#comment-13684936 ] Otis Gospodnetic commented on SOLR-4221: [~noble.paul] should SOLR-4059 be closed as dupe? Custom sharding --- Key: SOLR-4221 URL: https://issues.apache.org/jira/browse/SOLR-4221 Project: Solr Issue Type: New Feature Reporter: Yonik Seeley Assignee: Noble Paul Attachments: SOLR-4221.patch Features to let users control everything about sharding/routing. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5017) SpatialOpRecursivePrefixTreeTest is failing
[ https://issues.apache.org/jira/browse/LUCENE-5017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13684951#comment-13684951 ] Commit Tag Bot commented on LUCENE-5017: [branch_4x commit] dsmiley http://svn.apache.org/viewvc?view=revisionrevision=1493637 LUCENE-5017: SpatialOpRecursivePrefixTreeTest Contains test bug. SpatialOpRecursivePrefixTreeTest is failing --- Key: LUCENE-5017 URL: https://issues.apache.org/jira/browse/LUCENE-5017 Project: Lucene - Core Issue Type: Bug Components: modules/spatial Reporter: Michael McCandless Assignee: David Smiley Fix For: 5.0, 4.4 Attachments: LUCENE-5017_SpatialOpRecursivePrefixTreeTest_bug.patch This has been failing lately on trunk (e.g. on rev 1486339): {noformat} ant test -Dtestcase=SpatialOpRecursivePrefixTreeTest -Dtestmethod=testContains -Dtests.seed=456022665217DADF:2C2A2816BD2BA1C5 -Dtests.slow=true -Dtests.locale=nl_BE -Dtests.timezone=Poland -Dtests.file.encoding=ISO-8859-1 {noformat} Not sure what's up ... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [JENKINS] Lucene-Solr-4.x-Linux (32bit/jdk1.6.0_45) - Build # 6066 - Still Failing!
So it turns out that most test failures were on the 4x branch and that's because I forgot to apply a test bug to the 4x branch last month (doh!). But there is still a bug for me to find because trunk failed, and I needed the -Dtests.multiplier=3 to reproduce it. Thanks Dawid. ~ David On 6/15/13 5:19 AM, Dawid Weiss dawid.we...@gmail.com wrote: [junit4:junit4] 2 NOTE: reproduce with: ant test -Dtestcase=SpatialOpRecursivePrefixTreeTest -Dtests.method=testContains {#1 seed=[9166D28D6532217A:472BE5C4B7344982]} -Dtests.seed=9166D28D6532217A -Dtests.multiplier=3 -Dtests.slow=true -Dtests.locale=uk_UA -Dtests.timezone=Etc/GMT-6 -Dtests.file.encoding=UTF-8 This is a problem with JUnit in general -- the name of a test method is not really known and has to be derived from a Description object... lots of hairy stuff. The info shown above in -Dtests.method has a full seed (class and method-level) so if you run with: -Dtests.seed=9166D28D6532217A:472BE5C4B7344982 it should reproduce (if it's reproducible) because then the seed is fixed for all reiterations of @Repeat. If you provide only the first part of the seed then the @Repeat annotation will pick a different seed for each run (and the failures should still reproduce). Try it. Dawid Notice the -Dtests.method=testContains {#1 seed=[9166D28D6532217A:472BE5C4B7344982]} part, which is wrong because if I do that, it'll not find the method to test. If I change this to simply testContains, and set the seed normally -Dtests.seed=91 then I still can't reproduce the problem. This test appears to have failed a bunch of times lately with different seeds. ~ David -- Forwarded message -- From: Policeman Jenkins Server jenk...@thetaphi.de Date: Fri, Jun 14, 2013 at 9:33 PM Subject: [JENKINS] Lucene-Solr-4.x-Linux (32bit/jdk1.6.0_45) - Build # 6066 - Still Failing! To: dev@lucene.apache.org Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-Linux/6066/ Java: 32bit/jdk1.6.0_45 -server -XX:+UseSerialGC 1 tests failed. FAILED: org.apache.lucene.spatial.prefix.SpatialOpRecursivePrefixTreeTest.testCon tains {#1 seed=[9166D28D6532217A:472BE5C4B7344982]} Error Message: Shouldn't match I #0:ShapePair(Rect(minX=102.0,maxX=112.0,minY=-36.0,maxY=120.0) , Rect(minX=168.0,maxX=175.0,minY=-1.0,maxY=11.0)) Q:Rect(minX=0.0,maxX=256.0,minY=-128.0,maxY=128.0) Stack Trace: java.lang.AssertionError: Shouldn't match I #0:ShapePair(Rect(minX=102.0,maxX=112.0,minY=-36.0,maxY=120.0) , Rect(minX=168.0,maxX=175.0,minY=-1.0,maxY=11.0)) Q:Rect(minX=0.0,maxX=256.0,minY=-128.0,maxY=128.0) at __randomizedtesting.SeedInfo.seed([9166D28D6532217A:472BE5C4B7344982]:0) at org.junit.Assert.fail(Assert.java:93) at org.apache.lucene.spatial.prefix.SpatialOpRecursivePrefixTreeTest.fail(Sp atialOpRecursivePrefixTreeTest.java:287) at org.apache.lucene.spatial.prefix.SpatialOpRecursivePrefixTreeTest.doTest( SpatialOpRecursivePrefixTreeTest.java:273) at org.apache.lucene.spatial.prefix.SpatialOpRecursivePrefixTreeTest.testCon tains(SpatialOpRecursivePrefixTreeTest.java:101) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java :39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorI mpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunn er.java:1559) at com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(Randomized Runner.java:79) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(Randomized Runner.java:737) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(Randomized Runner.java:773) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(Randomized Runner.java:787) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSe tupTeardownChained.java:50) at org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldC acheSanity.java:51) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeA fterRule.java:46) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1. evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThrea dAndTestName.java:49) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRule IgnoreAfterMaxFailures.java:70) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure .java:48) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(Statem entAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(
[jira] [Commented] (SOLR-3076) Solr should support block joins
[ https://issues.apache.org/jira/browse/SOLR-3076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13684996#comment-13684996 ] Vadim Kirilchuk commented on SOLR-3076: --- Otis, patch have a test class solr/core/src/test/org/apache/solr/cloud/FullSolrCloudDistribCmdsTest.java and a method #testIndexQueryDeleteHierarchical which index, query and then delete hierarchical documents. However, it asserts only sizes of parents, children and grandchildren with simple term queries (not bjq), so someone need to check it manually or update a test. By the way, what is the first issue? =) Solr should support block joins --- Key: SOLR-3076 URL: https://issues.apache.org/jira/browse/SOLR-3076 Project: Solr Issue Type: New Feature Reporter: Grant Ingersoll Fix For: 5.0, 4.4 Attachments: 27M-singlesegment-histogram.png, 27M-singlesegment.png, bjq-vs-filters-backward-disi.patch, bjq-vs-filters-illegal-state.patch, child-bjqparser.patch, dih-3076.patch, dih-config.xml, parent-bjq-qparser.patch, parent-bjq-qparser.patch, Screen Shot 2012-07-17 at 1.12.11 AM.png, SOLR-3076-childDocs.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-7036-childDocs-solr-fork-trunk-patched, solrconf-bjq-erschema-snippet.xml, solrconfig.xml.patch, tochild-bjq-filtered-search-fix.patch Lucene has the ability to do block joins, we should add it to Solr. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-3076) Solr should support block joins
[ https://issues.apache.org/jira/browse/SOLR-3076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13684996#comment-13684996 ] Vadim Kirilchuk edited comment on SOLR-3076 at 6/17/13 5:46 AM: Otis, patch have a test class solr/core/src/test/org/apache/solr/cloud/FullSolrCloudDistribCmdsTest.java and a method #testIndexQueryDeleteHierarchical which index, query and then delete hierarchical documents. However, it asserts only sizes of parents, children and grandchildren with simple term queries (not bjq), so someone need to check it manually or update a test. By the way, what is the #1 issue? =) was (Author: vkirilchuk): Otis, patch have a test class solr/core/src/test/org/apache/solr/cloud/FullSolrCloudDistribCmdsTest.java and a method #testIndexQueryDeleteHierarchical which index, query and then delete hierarchical documents. However, it asserts only sizes of parents, children and grandchildren with simple term queries (not bjq), so someone need to check it manually or update a test. By the way, what is the first issue? =) Solr should support block joins --- Key: SOLR-3076 URL: https://issues.apache.org/jira/browse/SOLR-3076 Project: Solr Issue Type: New Feature Reporter: Grant Ingersoll Fix For: 5.0, 4.4 Attachments: 27M-singlesegment-histogram.png, 27M-singlesegment.png, bjq-vs-filters-backward-disi.patch, bjq-vs-filters-illegal-state.patch, child-bjqparser.patch, dih-3076.patch, dih-config.xml, parent-bjq-qparser.patch, parent-bjq-qparser.patch, Screen Shot 2012-07-17 at 1.12.11 AM.png, SOLR-3076-childDocs.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-7036-childDocs-solr-fork-trunk-patched, solrconf-bjq-erschema-snippet.xml, solrconfig.xml.patch, tochild-bjq-filtered-search-fix.patch Lucene has the ability to do block joins, we should add it to Solr. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org