[JENKINS] Lucene-Solr-4.x-Linux (32bit/jdk1.6.0_45) - Build # 7753 - Failure!
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-Linux/7753/ Java: 32bit/jdk1.6.0_45 -server -XX:+UseParallelGC 1 tests failed. REGRESSION: org.apache.lucene.index.TestIndexWriterWithThreads.testCloseWithThreads Error Message: MockDirectoryWrapper: cannot close: there are still open files: {_1e.cfs=9, _t.cfs=9, _1g.cfs=9, _1f.cfs=9, _1d.cfs=9} Stack Trace: java.lang.RuntimeException: MockDirectoryWrapper: cannot close: there are still open files: {_1e.cfs=9, _t.cfs=9, _1g.cfs=9, _1f.cfs=9, _1d.cfs=9} at __randomizedtesting.SeedInfo.seed([4E5816A23CC2ACB:1921A81D4039306D]:0) at org.apache.lucene.store.MockDirectoryWrapper.close(MockDirectoryWrapper.java:622) at org.apache.lucene.index.TestIndexWriterWithThreads.testCloseWithThreads(TestIndexWriterWithThreads.java:240) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559) at com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358) at com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:782) at com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:442) at com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746) at com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648) at com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:682) at com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:693) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358) at java.lang.Thread.run(Thread.java:662) Caused by: java.lang.RuntimeException: unclosed IndexInput: _1d.cfs at org.apache.lucene.store.MockDirectoryWrapper.addFileHandle(MockDirectoryWrapper.java:511) at org.apache.lucene.store.MockDirectoryWrapper$1.openSlice(MockDirectoryWrapper.java:930) at org.apache.lucene.store.Compound
[jira] [Updated] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-5189: --- Attachment: LUCENE-5189-no-lost-updates.patch Add field updates to TestIndexWriterDelete.testNoLostDeletesOrUpdates. I had to change to test to catch IOException and ignore if it's FakeIOE, or ioe.getCause() is a FakeIOE. The reason is that if the exception happens during merge (in mergeMiddle), IW registers the exception in mergeExceptions and later throws it as a wrapped IOE. This caused the test to falsely fail. > Numeric DocValues Updates > - > > Key: LUCENE-5189 > URL: https://issues.apache.org/jira/browse/LUCENE-5189 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUCENE-5189-4x.patch, LUCENE-5189-no-lost-updates.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189_process_events.patch, LUCENE-5189_process_events.patch, > LUCENE-5189-updates-order.patch, LUCENE-5189-updates-order.patch > > > In LUCENE-4258 we started to work on incremental field updates, however the > amount of changes are immense and hard to follow/consume. The reason is that > we targeted postings, stored fields, DV etc., all from the get go. > I'd like to start afresh here, with numeric-dv-field updates only. There are > a couple of reasons to that: > * NumericDV fields should be easier to update, if e.g. we write all the > values of all the documents in a segment for the updated field (similar to > how livedocs work, and previously norms). > * It's a fairly contained issue, attempting to handle just one data type to > update, yet requires many changes to core code which will also be useful for > updating other data types. > * It has value in and on itself, and we don't need to allow updating all the > data types in Lucene at once ... we can do that gradually. > I have some working patch already which I'll upload next, explaining the > changes. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5278) MockTokenizer throws away the character right after a token even if it is a valid start to a new token
[ https://issues.apache.org/jira/browse/LUCENE-5278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13793247#comment-13793247 ] ASF subversion and git services commented on LUCENE-5278: - Commit 1531498 from [~rcmuir] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1531498 ] LUCENE-5278: remove CharTokenizer brain-damage from MockTokenizer so it works better with custom regular expressions > MockTokenizer throws away the character right after a token even if it is a > valid start to a new token > -- > > Key: LUCENE-5278 > URL: https://issues.apache.org/jira/browse/LUCENE-5278 > Project: Lucene - Core > Issue Type: Bug >Reporter: Nik Everett >Assignee: Robert Muir >Priority: Trivial > Fix For: 4.6, 5.0 > > Attachments: LUCENE-5278.patch, LUCENE-5278.patch, LUCENE-5278.patch > > > MockTokenizer throws away the character right after a token even if it is a > valid start to a new token. You won't see this unless you build a tokenizer > that can recognize every character like with new RegExp(".") or RegExp("..."). > Changing this behaviour seems to break a number of tests. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-5278) MockTokenizer throws away the character right after a token even if it is a valid start to a new token
[ https://issues.apache.org/jira/browse/LUCENE-5278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir resolved LUCENE-5278. - Resolution: Fixed Fix Version/s: 5.0 4.6 Thanks again Nik! > MockTokenizer throws away the character right after a token even if it is a > valid start to a new token > -- > > Key: LUCENE-5278 > URL: https://issues.apache.org/jira/browse/LUCENE-5278 > Project: Lucene - Core > Issue Type: Bug >Reporter: Nik Everett >Assignee: Robert Muir >Priority: Trivial > Fix For: 4.6, 5.0 > > Attachments: LUCENE-5278.patch, LUCENE-5278.patch, LUCENE-5278.patch > > > MockTokenizer throws away the character right after a token even if it is a > valid start to a new token. You won't see this unless you build a tokenizer > that can recognize every character like with new RegExp(".") or RegExp("..."). > Changing this behaviour seems to break a number of tests. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates
[ https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13793242#comment-13793242 ] ASF subversion and git services commented on LUCENE-5189: - Commit 1531496 from [~shaie] in branch 'dev/trunk' [ https://svn.apache.org/r1531496 ] LUCENE-5189: test unsetting a document's value while the segment is merging > Numeric DocValues Updates > - > > Key: LUCENE-5189 > URL: https://issues.apache.org/jira/browse/LUCENE-5189 > Project: Lucene - Core > Issue Type: New Feature > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUCENE-5189-4x.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, > LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189_process_events.patch, > LUCENE-5189_process_events.patch, LUCENE-5189-updates-order.patch, > LUCENE-5189-updates-order.patch > > > In LUCENE-4258 we started to work on incremental field updates, however the > amount of changes are immense and hard to follow/consume. The reason is that > we targeted postings, stored fields, DV etc., all from the get go. > I'd like to start afresh here, with numeric-dv-field updates only. There are > a couple of reasons to that: > * NumericDV fields should be easier to update, if e.g. we write all the > values of all the documents in a segment for the updated field (similar to > how livedocs work, and previously norms). > * It's a fairly contained issue, attempting to handle just one data type to > update, yet requires many changes to core code which will also be useful for > updating other data types. > * It has value in and on itself, and we don't need to allow updating all the > data types in Lucene at once ... we can do that gradually. > I have some working patch already which I'll upload next, explaining the > changes. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5248) Improve the data structure used in ReaderAndLiveDocs to hold the updates
[ https://issues.apache.org/jira/browse/LUCENE-5248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13793224#comment-13793224 ] Shai Erera commented on LUCENE-5248: bq. I added a unit test which reproduces and the fix. Will commit on LUCENE-5189. Sorry, it's a bug introduced in this patch so I'll fix here. > Improve the data structure used in ReaderAndLiveDocs to hold the updates > > > Key: LUCENE-5248 > URL: https://issues.apache.org/jira/browse/LUCENE-5248 > Project: Lucene - Core > Issue Type: Improvement > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUCENE-5248.patch, LUCENE-5248.patch, LUCENE-5248.patch, > LUCENE-5248.patch > > > Currently ReaderAndLiveDocs holds the updates in two structures: > +Map>+ > Holds a mapping from each field, to all docs that were updated and their > values. This structure is updated when applyDeletes is called, and needs to > satisfy several requirements: > # Un-ordered writes: if a field "f" is updated by two terms, termA and termB, > in that order, and termA affects doc=100 and termB doc=2, then the updates > are applied in that order, meaning we cannot rely on updates coming in order. > # Same document may be updated multiple times, either by same term (e.g. > several calls to IW.updateNDV) or by different terms. Last update wins. > # Sequential read: when writing the updates to the Directory > (fieldsConsumer), we iterate on the docs in-order and for each one check if > it's updated and if not, pull its value from the current DV. > # A single update may affect several million documents, therefore need to be > efficient w.r.t. memory consumption. > +Map>+ > Holds a mapping from a document, to all the fields that it was updated in and > the updated value for each field. This is used by IW.commitMergedDeletes to > apply the updates that came in while the segment was merging. The > requirements this structure needs to satisfy are: > # Access in doc order: this is how commitMergedDeletes works. > # One-pass: we visit a document once (currently) and so if we can, it's > better if we know all the fields in which it was updated. The updates are > applied to the merged ReaderAndLiveDocs (where they are stored in the first > structure mentioned above). > Comments with proposals will follow next. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5248) Improve the data structure used in ReaderAndLiveDocs to hold the updates
[ https://issues.apache.org/jira/browse/LUCENE-5248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13793221#comment-13793221 ] Shai Erera commented on LUCENE-5248: bq. Do we have test coverage of updating with null (deleting the update from the document)? We have TestNDVUpdates.testUnsetValue and testUnsetAllValues, though we don't have a test which unsets a value while a document is merging. We have tests that cover updating a value (no unsetting) while it is merging, I guess I can modify them to unset as well, but will then need to improve the test to use docsWithField. I'll look into it. bq. So if there are two terms in a row with the same field (which does not exist) won't we hit NPE? Good catch! You're right, I had another {{if (termsEnum == null) continue}} but I removed it since I thought the above if takes care of that. I added a unit test which reproduces and the fix. Will commit on LUCENE-5189. > Improve the data structure used in ReaderAndLiveDocs to hold the updates > > > Key: LUCENE-5248 > URL: https://issues.apache.org/jira/browse/LUCENE-5248 > Project: Lucene - Core > Issue Type: Improvement > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUCENE-5248.patch, LUCENE-5248.patch, LUCENE-5248.patch, > LUCENE-5248.patch > > > Currently ReaderAndLiveDocs holds the updates in two structures: > +Map>+ > Holds a mapping from each field, to all docs that were updated and their > values. This structure is updated when applyDeletes is called, and needs to > satisfy several requirements: > # Un-ordered writes: if a field "f" is updated by two terms, termA and termB, > in that order, and termA affects doc=100 and termB doc=2, then the updates > are applied in that order, meaning we cannot rely on updates coming in order. > # Same document may be updated multiple times, either by same term (e.g. > several calls to IW.updateNDV) or by different terms. Last update wins. > # Sequential read: when writing the updates to the Directory > (fieldsConsumer), we iterate on the docs in-order and for each one check if > it's updated and if not, pull its value from the current DV. > # A single update may affect several million documents, therefore need to be > efficient w.r.t. memory consumption. > +Map>+ > Holds a mapping from a document, to all the fields that it was updated in and > the updated value for each field. This is used by IW.commitMergedDeletes to > apply the updates that came in while the segment was merging. The > requirements this structure needs to satisfy are: > # Access in doc order: this is how commitMergedDeletes works. > # One-pass: we visit a document once (currently) and so if we can, it's > better if we know all the fields in which it was updated. The updates are > applied to the merged ReaderAndLiveDocs (where they are stored in the first > structure mentioned above). > Comments with proposals will follow next. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5277) Modify FixedBitSet copy constructor to take numBits to allow grow/shrink the new bitset
[ https://issues.apache.org/jira/browse/LUCENE-5277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13793217#comment-13793217 ] Shai Erera commented on LUCENE-5277: I thought of that ... it started in LUCENE-5248 where I want to keep a growable bitset alongside the docs/values arrays to mark whether a document has an updated value or not (following Rob's idea). When I implemented that using OpenBitSet, I discovered the bug and opened LUCENE-5272. As I worked on fixing the bug, I realized OBS has other issues as well and thought that perhaps I can use FixedBitSet, only grow it by copying its array. This is doable even without the ctor, since I can call getBits() and do it like that: {code} FixedBitSet newBits = new FixedBitSet(17); // new capacity System.arraycopy(oldBits.getBits(), 0, newBits.getBits(), 0, oldBits.getBits().length); {code} I then noticed there is a ctor already in FixedBitSet which copies another FBS so I thought just to improve it. It seems more intuitive to do t than let users figure out they can grow a FixedBitSet like above? > Modify FixedBitSet copy constructor to take numBits to allow grow/shrink the > new bitset > --- > > Key: LUCENE-5277 > URL: https://issues.apache.org/jira/browse/LUCENE-5277 > Project: Lucene - Core > Issue Type: Improvement > Components: core/index >Reporter: Shai Erera >Assignee: Shai Erera > Attachments: LUCENE-5277.patch > > > FixedBitSet copy constructor is redundant the way it is now -- one can call > FBS.clone() to achieve that (and indeed, no code in Lucene calls this ctor). > I think it will be useful to add a numBits parameter to that method to allow > growing/shrinking the new bitset, while copying all relevant bits from the > passed one. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-5330) PerSegmentSingleValuedFaceting overwrites facet values
[ https://issues.apache.org/jira/browse/SOLR-5330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13793213#comment-13793213 ] Yonik Seeley edited comment on SOLR-5330 at 10/12/13 2:30 AM: -- So I instrumented the faceting code like so: {code} seg.tempBR = seg.tenum.next(); if (seg.tempBR.bytes == val.bytes) { System.err.println("##SHARING DETECTED: val.offset="+val.offset + " val.length="+val.length + " new.offset="+seg.tempBR.offset + " new.length="+seg.tempBR.length); if (val.offset == seg.tempBR.offset) { System.err.println("!!SHARING USING SAME OFFSET"); } {code} And it detects tons of sharing (the returned bytesref still pointing to the same byte[]) of course... but the thing is, it never generates an invalid result. calling next() on the term enum never changes the bytes that were previously pointed to... it simply points to a different part of the same byte array. I can never detect a case where the original bytes are changed, thus invalidating the shallow copy. Example output: {code} ##SHARING DETECTED: val.offset=1 val.length=4 new.offset=6 new.length=4 {code} was (Author: ysee...@gmail.com): So I instrumented the faceting code like so: {code} seg.tempBR = seg.tenum.next(); if (seg.tempBR.bytes == val.bytes) { System.err.println("##SHARING DETECTED: val.offset="+val.offset + " val.length="+val.length + " new.offset="+seg.tempBR.offset + " new.length="+seg.tempBR.length); if (val.offset == seg.tempBR.offset) { System.err.println("!!SHARING USING SAME OFFSET"); } {code} And it detects tons of sharing (the returned bytesref still pointing to the same byte[]) of course... but the thing is, it never generates an invalid result. calling next() on the term enum never changes the bytes that were previously pointed to... it simply points to a different part of the same byte array. I can never detect a case where the original bytes are changed, thus invalidating the shallow copy. > PerSegmentSingleValuedFaceting overwrites facet values > -- > > Key: SOLR-5330 > URL: https://issues.apache.org/jira/browse/SOLR-5330 > Project: Solr > Issue Type: Bug >Affects Versions: 4.2.1 >Reporter: Michael Froh >Assignee: Yonik Seeley > Attachments: solr-5330.patch > > > I recently tried enabling facet.method=fcs for one of my indexes and found a > significant performance improvement (with a large index, many facet values, > and near-realtime updates). Unfortunately, the results were also wrong. > Specifically, some facet values were being partially overwritten by other > facet values. (That is, if I expected facet values like "abcdef" and "123", I > would get a value like "123def".) > Debugging through the code, it looks like the problem was in > PerSegmentSingleValuedFaceting, specifically in the getFacetCounts method, > when BytesRef val is shallow-copied from the temporary per-segment BytesRef. > The byte array assigned to val is shared with the byte array for seg.tempBR, > and is overwritten a few lines down by the call to seg.tenum.next(). > I managed to fix it locally by replacing the shallow copy with a deep copy. > While I encountered this problem on Solr 4.2.1, I see that the code is > identical in 4.5. Unless the behavior of TermsEnum.next() has changed, I > believe this bug still exists. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5330) PerSegmentSingleValuedFaceting overwrites facet values
[ https://issues.apache.org/jira/browse/SOLR-5330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13793213#comment-13793213 ] Yonik Seeley commented on SOLR-5330: So I instrumented the faceting code like so: {code} seg.tempBR = seg.tenum.next(); if (seg.tempBR.bytes == val.bytes) { System.err.println("##SHARING DETECTED: val.offset="+val.offset + " val.length="+val.length + " new.offset="+seg.tempBR.offset + " new.length="+seg.tempBR.length); if (val.offset == seg.tempBR.offset) { System.err.println("!!SHARING USING SAME OFFSET"); } {code} And it detects tons of sharing (the returned bytesref still pointing to the same byte[]) of course... but the thing is, it never generates an invalid result. calling next() on the term enum never changes the bytes that were previously pointed to... it simply points to a different part of the same byte array. I can never detect a case where the original bytes are changed, thus invalidating the shallow copy. > PerSegmentSingleValuedFaceting overwrites facet values > -- > > Key: SOLR-5330 > URL: https://issues.apache.org/jira/browse/SOLR-5330 > Project: Solr > Issue Type: Bug >Affects Versions: 4.2.1 >Reporter: Michael Froh >Assignee: Yonik Seeley > Attachments: solr-5330.patch > > > I recently tried enabling facet.method=fcs for one of my indexes and found a > significant performance improvement (with a large index, many facet values, > and near-realtime updates). Unfortunately, the results were also wrong. > Specifically, some facet values were being partially overwritten by other > facet values. (That is, if I expected facet values like "abcdef" and "123", I > would get a value like "123def".) > Debugging through the code, it looks like the problem was in > PerSegmentSingleValuedFaceting, specifically in the getFacetCounts method, > when BytesRef val is shallow-copied from the temporary per-segment BytesRef. > The byte array assigned to val is shared with the byte array for seg.tempBR, > and is overwritten a few lines down by the call to seg.tenum.next(). > I managed to fix it locally by replacing the shallow copy with a deep copy. > While I encountered this problem on Solr 4.2.1, I see that the code is > identical in 4.5. Unless the behavior of TermsEnum.next() has changed, I > believe this bug still exists. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
help in getting sort to work on an indexed binary field
Hi, We added a custom field type to allow an indexed binary field type that supports search (exact match), prefix search, and sort as unsigned bytes lexicographical compare. For sort, BytesRef's UTF8SortedAsUnicodeComparator accomplishes what we want, and even though the name of the comparator mentions UTF8, it doesn't actually assume so and just does byte-level operation, so it's good. However, when we do this across different nodes, we run into an issue where in QueryComponent.doFieldSortValues: // Must do the same conversion when sorting by a // String field in Lucene, which returns the terms // data as BytesRef: if (val instanceof BytesRef) { UnicodeUtil.UTF8toUTF16((BytesRef)val, spare); field.setStringValue(spare.toString()); val = ft.toObject(field); } UnicodeUtil.UTF8toUTF16 is called on our byte array,which isn't actually UTF8. I did a hack where I specified our own field comparator to be ByteBuffer based to get around that instanceof check, but then the field value gets transformed into BYTEARR in JavaBinCodec, and when it's unmarshalled, it gets turned into byte[]. Then, in QueryComponent.mergeIds, a ShardFieldSortedHitQueue is constructed with ShardDoc.getCachedComparator, which decides to give me comparatorNatural in the else of the TODO for CUSTOM, which barfs because byte[] are not Comparable... Any advice is appreciated! Thanks, Jessica
[jira] [Commented] (LUCENE-5278) MockTokenizer throws away the character right after a token even if it is a valid start to a new token
[ https://issues.apache.org/jira/browse/LUCENE-5278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13793206#comment-13793206 ] Robert Muir commented on LUCENE-5278: - I committed this to trunk: I did a lot of testing locally but I want to let Jenkins have its way with it for a few hours before backporting to branch_4x. > MockTokenizer throws away the character right after a token even if it is a > valid start to a new token > -- > > Key: LUCENE-5278 > URL: https://issues.apache.org/jira/browse/LUCENE-5278 > Project: Lucene - Core > Issue Type: Bug >Reporter: Nik Everett >Assignee: Robert Muir >Priority: Trivial > Attachments: LUCENE-5278.patch, LUCENE-5278.patch, LUCENE-5278.patch > > > MockTokenizer throws away the character right after a token even if it is a > valid start to a new token. You won't see this unless you build a tokenizer > that can recognize every character like with new RegExp(".") or RegExp("..."). > Changing this behaviour seems to break a number of tests. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5278) MockTokenizer throws away the character right after a token even if it is a valid start to a new token
[ https://issues.apache.org/jira/browse/LUCENE-5278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13793205#comment-13793205 ] ASF subversion and git services commented on LUCENE-5278: - Commit 1531479 from [~rcmuir] in branch 'dev/trunk' [ https://svn.apache.org/r1531479 ] LUCENE-5278: remove CharTokenizer brain-damage from MockTokenizer so it works better with custom regular expressions > MockTokenizer throws away the character right after a token even if it is a > valid start to a new token > -- > > Key: LUCENE-5278 > URL: https://issues.apache.org/jira/browse/LUCENE-5278 > Project: Lucene - Core > Issue Type: Bug >Reporter: Nik Everett >Assignee: Robert Muir >Priority: Trivial > Attachments: LUCENE-5278.patch, LUCENE-5278.patch, LUCENE-5278.patch > > > MockTokenizer throws away the character right after a token even if it is a > valid start to a new token. You won't see this unless you build a tokenizer > that can recognize every character like with new RegExp(".") or RegExp("..."). > Changing this behaviour seems to break a number of tests. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5274) Teach fast FastVectorHighlighter to highlight "child fields" with parent fields
[ https://issues.apache.org/jira/browse/LUCENE-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13793204#comment-13793204 ] Robert Muir commented on LUCENE-5274: - Yeah I guess for me, its not a caveat at all, but a feature :) We need to iterate sorted-union for stuff in the index like terms and fields, so they appear as if they exist only once. The guava one isn't doing a "union" operation but just simply maintaining compareTo() order... > Teach fast FastVectorHighlighter to highlight "child fields" with parent > fields > --- > > Key: LUCENE-5274 > URL: https://issues.apache.org/jira/browse/LUCENE-5274 > Project: Lucene - Core > Issue Type: Improvement > Components: core/other >Reporter: Nik Everett >Assignee: Adrien Grand >Priority: Minor > Attachments: LUCENE-5274.patch > > > I've been messing around with the FastVectorHighlighter and it looks like I > can teach it to highlight matches on "child fields". Like this query: > foo:scissors foo_exact:running > would highlight foo like this: > running with scissors > Where foo is stored WITH_POSITIONS_OFFSETS and foo_plain is an unstored copy > of foo a different analyzer and its own WITH_POSITIONS_OFFSETS. > This would make queries that perform weighted matches against different > analyzers much more convenient to highlight. > I have working code and test cases but they are hacked into Elasticsearch. > I'd love to Lucene-ify if you'll take them. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5278) MockTokenizer throws away the character right after a token even if it is a valid start to a new token
[ https://issues.apache.org/jira/browse/LUCENE-5278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-5278: Attachment: LUCENE-5278.patch added a few more tests to TestMockAnalyzer so all these crazy corner cases are found there and not debugging other tests :) > MockTokenizer throws away the character right after a token even if it is a > valid start to a new token > -- > > Key: LUCENE-5278 > URL: https://issues.apache.org/jira/browse/LUCENE-5278 > Project: Lucene - Core > Issue Type: Bug >Reporter: Nik Everett >Assignee: Robert Muir >Priority: Trivial > Attachments: LUCENE-5278.patch, LUCENE-5278.patch, LUCENE-5278.patch > > > MockTokenizer throws away the character right after a token even if it is a > valid start to a new token. You won't see this unless you build a tokenizer > that can recognize every character like with new RegExp(".") or RegExp("..."). > Changing this behaviour seems to break a number of tests. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5278) MockTokenizer throws away the character right after a token even if it is a valid start to a new token
[ https://issues.apache.org/jira/browse/LUCENE-5278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-5278: Attachment: LUCENE-5278.patch Nice patch Nik! I think this is ready: i tweaked variable names and rearranged stuff (e.g. i use -1 instead of Integer so we arent boxing and a few other things). I also added some unit tests. The main issues why tests were failing with your original patch: * reset() needed to clear the buffer variables. * the state machine needed some particular extra check when emitting a token: e.g. if you make a regex of "..", but you send it "abcde", the tokens should be "ab", "cd", but not "e". so when we end on a partial match, we have to check that we are in an accept state. * term-limit-exceeded is a special case (versus last character being in a reject state) > MockTokenizer throws away the character right after a token even if it is a > valid start to a new token > -- > > Key: LUCENE-5278 > URL: https://issues.apache.org/jira/browse/LUCENE-5278 > Project: Lucene - Core > Issue Type: Bug >Reporter: Nik Everett >Assignee: Robert Muir >Priority: Trivial > Attachments: LUCENE-5278.patch, LUCENE-5278.patch > > > MockTokenizer throws away the character right after a token even if it is a > valid start to a new token. You won't see this unless you build a tokenizer > that can recognize every character like with new RegExp(".") or RegExp("..."). > Changing this behaviour seems to break a number of tests. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5279) Don't use recursion in DisjunctionSumScorer.countMatches
[ https://issues.apache.org/jira/browse/LUCENE-5279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-5279: --- Attachment: LUCENE-5279.patch Patch. However, it seems to be slower, testing on full Wikpedia en: {noformat} Report after iter 10: TaskQPS base StdDevQPS comp StdDev Pct diff OrHighLow 14.44 (7.7%) 12.48 (4.7%) -13.6% ( -24% - -1%) OrHighHigh5.56 (6.2%)4.86 (4.4%) -12.6% ( -21% - -2%) OrHighMed 18.62 (6.7%) 16.29 (4.4%) -12.5% ( -22% - -1%) AndHighLow 398.09 (1.6%) 390.34 (2.3%) -1.9% ( -5% -1%) OrNotHighLow 374.60 (1.7%) 369.61 (1.7%) -1.3% ( -4% -2%) Fuzzy1 67.10 (2.1%) 66.41 (2.2%) -1.0% ( -5% -3%) OrNotHighMed 51.68 (1.7%) 51.37 (1.5%) -0.6% ( -3% -2%) Fuzzy2 46.73 (2.8%) 46.45 (2.6%) -0.6% ( -5% -4%) OrHighNotLow 20.05 (3.5%) 19.96 (5.0%) -0.5% ( -8% -8%) OrHighNotMed 27.15 (3.2%) 27.05 (4.8%) -0.3% ( -8% -7%) OrNotHighHigh7.72 (3.2%)7.70 (4.7%) -0.3% ( -7% -7%) OrHighNotHigh9.81 (3.0%)9.79 (4.5%) -0.1% ( -7% -7%) LowSloppyPhrase 43.83 (1.9%) 43.89 (2.1%) 0.2% ( -3% -4%) IntNRQ3.49 (4.5%)3.50 (4.1%) 0.2% ( -8% -9%) Prefix3 70.74 (2.7%) 71.01 (2.4%) 0.4% ( -4% -5%) HighTerm 65.33 (3.0%) 65.62 (13.5%) 0.4% ( -15% - 17%) MedSloppyPhrase3.47 (3.5%)3.49 (4.7%) 0.6% ( -7% -9%) LowPhrase 13.06 (1.5%) 13.14 (2.0%) 0.6% ( -2% -4%) Wildcard 16.71 (2.9%) 16.82 (2.2%) 0.7% ( -4% -5%) MedTerm 100.90 (2.5%) 101.71 (10.4%) 0.8% ( -11% - 14%) LowTerm 311.85 (1.4%) 314.53 (6.4%) 0.9% ( -6% -8%) HighSpanNear8.06 (5.1%)8.13 (5.9%) 0.9% ( -9% - 12%) Respell 48.00 (2.3%) 48.45 (2.8%) 0.9% ( -4% -6%) HighSloppyPhrase3.40 (4.1%)3.43 (6.6%) 1.0% ( -9% - 12%) AndHighMed 34.14 (1.6%) 34.52 (1.7%) 1.1% ( -2% -4%) AndHighHigh 28.15 (1.7%) 28.48 (1.7%) 1.2% ( -2% -4%) MedSpanNear 30.62 (2.8%) 31.07 (3.2%) 1.5% ( -4% -7%) LowSpanNear 10.30 (2.6%) 10.48 (2.9%) 1.7% ( -3% -7%) MedPhrase 195.60 (5.1%) 201.44 (6.6%) 3.0% ( -8% - 15%) HighPhrase4.17 (5.6%)4.34 (6.9%) 4.0% ( -8% - 17%) {noformat} So ... I don't plan on pursuing it any further, but wanted to open the issue in case anybody wants to try ... > Don't use recursion in DisjunctionSumScorer.countMatches > > > Key: LUCENE-5279 > URL: https://issues.apache.org/jira/browse/LUCENE-5279 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Michael McCandless > Attachments: LUCENE-5279.patch > > > I noticed the TODO in there, to not use recursion, so I fixed it to just use > a private queue ... -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-5279) Don't use recursion in DisjunctionSumScorer.countMatches
Michael McCandless created LUCENE-5279: -- Summary: Don't use recursion in DisjunctionSumScorer.countMatches Key: LUCENE-5279 URL: https://issues.apache.org/jira/browse/LUCENE-5279 Project: Lucene - Core Issue Type: Improvement Reporter: Michael McCandless I noticed the TODO in there, to not use recursion, so I fixed it to just use a private queue ... -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5274) Teach fast FastVectorHighlighter to highlight "child fields" with parent fields
[ https://issues.apache.org/jira/browse/LUCENE-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13793038#comment-13793038 ] Nik Everett commented on LUCENE-5274: - {{quote}} There is no lucene dependency on guava. I don't think we should introduce one, and it wouldnt solve the issues i mentioned anyway (e.g. comparable inconsistent with equals and stuff). It would only add 2.1MB of bloated unnecessary syntactic sugar (sorry, thats just my opinion on it, i think its useless). We should keep our third party dependencies minimal and necessary so that any app using lucene can choose for itself what version of this stuff (if any) it wants to use. If we rely upon unnecessary stuff it hurts the end user by forcing them to compatible versions. {{quote}} I figured that was the reasoning and I don't intend to argue with it. In this case it would provide a method to merge sorted iterators just like MergedIterator only without the caveats around duplication but I'm happy to work around it. Guava certainly wouldn't fix my forgetting equals and hashcode. > Teach fast FastVectorHighlighter to highlight "child fields" with parent > fields > --- > > Key: LUCENE-5274 > URL: https://issues.apache.org/jira/browse/LUCENE-5274 > Project: Lucene - Core > Issue Type: Improvement > Components: core/other >Reporter: Nik Everett >Assignee: Adrien Grand >Priority: Minor > Attachments: LUCENE-5274.patch > > > I've been messing around with the FastVectorHighlighter and it looks like I > can teach it to highlight matches on "child fields". Like this query: > foo:scissors foo_exact:running > would highlight foo like this: > running with scissors > Where foo is stored WITH_POSITIONS_OFFSETS and foo_plain is an unstored copy > of foo a different analyzer and its own WITH_POSITIONS_OFFSETS. > This would make queries that perform weighted matches against different > analyzers much more convenient to highlight. > I have working code and test cases but they are hacked into Elasticsearch. > I'd love to Lucene-ify if you'll take them. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5027) Field Collapsing PostFilter
[ https://issues.apache.org/jira/browse/SOLR-5027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13793035#comment-13793035 ] Joel Bernstein commented on SOLR-5027: -- Patch that passes precommit for trunk > Field Collapsing PostFilter > --- > > Key: SOLR-5027 > URL: https://issues.apache.org/jira/browse/SOLR-5027 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 5.0 >Reporter: Joel Bernstein >Assignee: Joel Bernstein >Priority: Minor > Fix For: 4.6, 5.0 > > Attachments: SOLR-5027.patch, SOLR-5027.patch, SOLR-5027.patch, > SOLR-5027.patch, SOLR-5027.patch, SOLR-5027.patch, SOLR-5027.patch, > SOLR-5027.patch, SOLR-5027.patch > > > This ticket introduces the *CollapsingQParserPlugin* > The *CollapsingQParserPlugin* is a PostFilter that performs field collapsing. > This is a high performance alternative to standard Solr field collapsing > (with *ngroups*) when the number of distinct groups in the result set is high. > For example in one performance test, a search with 10 million full results > and 1 million collapsed groups: > Standard grouping with ngroups : 17 seconds. > CollapsingQParserPlugin: 300 milli-seconds. > Sample syntax: > Collapse based on the highest scoring document: > {code} > fq=(!collapse field=} > {code} > Collapse based on the min value of a numeric field: > {code} > fq={!collapse field= min=} > {code} > Collapse based on the max value of a numeric field: > {code} > fq={!collapse field= max=} > {code} > Collapse with a null policy: > {code} > fq={!collapse field= nullPolicy=} > {code} > There are three null policies: > ignore : removes docs with a null value in the collapse field (default). > expand : treats each doc with a null value in the collapse field as a > separate group. > collapse : collapses all docs with a null value into a single group using > either highest score, or min/max. > The CollapsingQParserPlugin also fully supports the QueryElevationComponent > *Note:* The July 16 patch also includes and ExpandComponent that expands the > collapsed groups for the current search result page. This functionality will > be moved to it's own ticket. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5212) java 7u40 causes sigsegv and corrupt term vectors
[ https://issues.apache.org/jira/browse/LUCENE-5212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13793037#comment-13793037 ] Bill Bell commented on LUCENE-5212: --- It appears this happens on 7u40 64-bit too. See https://bugs.openjdk.java.net/browse/JDK-8024830 Am I reading this wrong? Start failing around hs24-b21: [junit4] # SIGSEGV (0xb) at pc=0xfd7ff91d9f7d, pid=23810, tid=343 [junit4] # [junit4] # JRE version: Java(TM) SE Runtime Environment (8.0-b54) [junit4] # Java VM: Java HotSpot(TM) 64-Bit Server VM (24.0-b21 mixed mode solaris-amd64 ) [junit4] # Problematic frame: [junit4] # J org.apache.lucene.codecs.compressing.CompressingTermVectorsReader.get(I)Lorg/apache/lucene/index/Fields; [junit4] # Note, first 7u40 build b01 has hs24-b24. Next, I will try to find changeset. > java 7u40 causes sigsegv and corrupt term vectors > - > > Key: LUCENE-5212 > URL: https://issues.apache.org/jira/browse/LUCENE-5212 > Project: Lucene - Core > Issue Type: Bug >Reporter: Robert Muir > Attachments: crashFaster2.0.patch, crashFaster.patch, > hs_err_pid32714.log, jenkins.txt > > -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5027) Field Collapsing PostFilter
[ https://issues.apache.org/jira/browse/SOLR-5027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joel Bernstein updated SOLR-5027: - Attachment: SOLR-5027.patch > Field Collapsing PostFilter > --- > > Key: SOLR-5027 > URL: https://issues.apache.org/jira/browse/SOLR-5027 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 5.0 >Reporter: Joel Bernstein >Assignee: Joel Bernstein >Priority: Minor > Fix For: 4.6, 5.0 > > Attachments: SOLR-5027.patch, SOLR-5027.patch, SOLR-5027.patch, > SOLR-5027.patch, SOLR-5027.patch, SOLR-5027.patch, SOLR-5027.patch, > SOLR-5027.patch, SOLR-5027.patch > > > This ticket introduces the *CollapsingQParserPlugin* > The *CollapsingQParserPlugin* is a PostFilter that performs field collapsing. > This is a high performance alternative to standard Solr field collapsing > (with *ngroups*) when the number of distinct groups in the result set is high. > For example in one performance test, a search with 10 million full results > and 1 million collapsed groups: > Standard grouping with ngroups : 17 seconds. > CollapsingQParserPlugin: 300 milli-seconds. > Sample syntax: > Collapse based on the highest scoring document: > {code} > fq=(!collapse field=} > {code} > Collapse based on the min value of a numeric field: > {code} > fq={!collapse field= min=} > {code} > Collapse based on the max value of a numeric field: > {code} > fq={!collapse field= max=} > {code} > Collapse with a null policy: > {code} > fq={!collapse field= nullPolicy=} > {code} > There are three null policies: > ignore : removes docs with a null value in the collapse field (default). > expand : treats each doc with a null value in the collapse field as a > separate group. > collapse : collapses all docs with a null value into a single group using > either highest score, or min/max. > The CollapsingQParserPlugin also fully supports the QueryElevationComponent > *Note:* The July 16 patch also includes and ExpandComponent that expands the > collapsed groups for the current search result page. This functionality will > be moved to it's own ticket. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5274) Teach fast FastVectorHighlighter to highlight "child fields" with parent fields
[ https://issues.apache.org/jira/browse/LUCENE-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13793029#comment-13793029 ] Robert Muir commented on LUCENE-5274: - {quote} Sure! I'm more used to Guava's tools so I think I was lulled in to a false sense of recognition. No chance of updating to a modern version of Guava? {quote} There is no lucene dependency on guava. I don't think we should introduce one, and it wouldnt solve the issues i mentioned anyway (e.g. comparable inconsistent with equals and stuff). It would only add 2.1MB of bloated unnecessary syntactic sugar (sorry, thats just my opinion on it, i think its useless). We should keep our third party dependencies minimal and necessary so that any app using lucene can choose for itself what version of this stuff (if any) it wants to use. If we rely upon unnecessary stuff it hurts the end user by forcing them to compatible versions. > Teach fast FastVectorHighlighter to highlight "child fields" with parent > fields > --- > > Key: LUCENE-5274 > URL: https://issues.apache.org/jira/browse/LUCENE-5274 > Project: Lucene - Core > Issue Type: Improvement > Components: core/other >Reporter: Nik Everett >Assignee: Adrien Grand >Priority: Minor > Attachments: LUCENE-5274.patch > > > I've been messing around with the FastVectorHighlighter and it looks like I > can teach it to highlight matches on "child fields". Like this query: > foo:scissors foo_exact:running > would highlight foo like this: > running with scissors > Where foo is stored WITH_POSITIONS_OFFSETS and foo_plain is an unstored copy > of foo a different analyzer and its own WITH_POSITIONS_OFFSETS. > This would make queries that perform weighted matches against different > analyzers much more convenient to highlight. > I have working code and test cases but they are hacked into Elasticsearch. > I'd love to Lucene-ify if you'll take them. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5274) Teach fast FastVectorHighlighter to highlight "child fields" with parent fields
[ https://issues.apache.org/jira/browse/LUCENE-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13793018#comment-13793018 ] Nik Everett commented on LUCENE-5274: - {quote} I can see the possible use case here, but I think it deserves some discussion first (versus just making it public). {quote} Sure! I'm more used to Guava's tools so I think I was lulled in to a false sense of recognition. No chance of updating to a modern version of Guava?:) {quote} This thing has limitations (its currently only used by indexwriter for buffereddeletes, its basically like a MultiTerms over an Iterator). For example each iterator it consumes should not have duplicate values according to its compareTo(): its not clear to me this WeightedPhraseInfo behaves this way {quote} Yikes! I didn't catch that but now that you point it out it is right there in the docs and I should have. WeightedPhraseInfo doesn't behave that way and {quote} Furthermore the class in question (WeightedPhraseInfo) is public, and adding Comparable to it looks like it will create a situation where its inconsistent with equals()... I think this is a little dangerous. {quote} I agree on the inconsistent with inconsistent with equals. I can either fix that or use a Comparator for sorting both WeightedPhraseInfo and Toffs. That'd require a MergeSorter that can take one but {quote} If it turns out we can reuse it: great! But i think rather than just slapping public on it, we should move it to .util, ensure it has good javadocs and unit tests, and investigate what exactly happens when these contracts are violated: e.g. can we make an exception happen rather than just broken behavior in a way that won't hurt performance and so on? {quote} Makes sense to me. > Teach fast FastVectorHighlighter to highlight "child fields" with parent > fields > --- > > Key: LUCENE-5274 > URL: https://issues.apache.org/jira/browse/LUCENE-5274 > Project: Lucene - Core > Issue Type: Improvement > Components: core/other >Reporter: Nik Everett >Assignee: Adrien Grand >Priority: Minor > Attachments: LUCENE-5274.patch > > > I've been messing around with the FastVectorHighlighter and it looks like I > can teach it to highlight matches on "child fields". Like this query: > foo:scissors foo_exact:running > would highlight foo like this: > running with scissors > Where foo is stored WITH_POSITIONS_OFFSETS and foo_plain is an unstored copy > of foo a different analyzer and its own WITH_POSITIONS_OFFSETS. > This would make queries that perform weighted matches against different > analyzers much more convenient to highlight. > I have working code and test cases but they are hacked into Elasticsearch. > I'd love to Lucene-ify if you'll take them. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (LUCENE-5278) MockTokenizer throws away the character right after a token even if it is a valid start to a new token
[ https://issues.apache.org/jira/browse/LUCENE-5278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir reassigned LUCENE-5278: --- Assignee: Robert Muir > MockTokenizer throws away the character right after a token even if it is a > valid start to a new token > -- > > Key: LUCENE-5278 > URL: https://issues.apache.org/jira/browse/LUCENE-5278 > Project: Lucene - Core > Issue Type: Bug >Reporter: Nik Everett >Assignee: Robert Muir >Priority: Trivial > Attachments: LUCENE-5278.patch > > > MockTokenizer throws away the character right after a token even if it is a > valid start to a new token. You won't see this unless you build a tokenizer > that can recognize every character like with new RegExp(".") or RegExp("..."). > Changing this behaviour seems to break a number of tests. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5274) Teach fast FastVectorHighlighter to highlight "child fields" with parent fields
[ https://issues.apache.org/jira/browse/LUCENE-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13793000#comment-13793000 ] Robert Muir commented on LUCENE-5274: - Thanks Nik: I can help with that one! Another question: about the MergedIterator :) I can see the possible use case here, but I think it deserves some discussion first (versus just making it public). This thing has limitations (its currently only used by indexwriter for buffereddeletes, its basically like a MultiTerms over an Iterator). For example each iterator it consumes should not have duplicate values according to its compareTo(): its not clear to me this WeightedPhraseInfo behaves this way: * what if you have a synonym of "dog" sitting on top of "cat" with the same boost factor... its a duplicate according to that compareTo, but the text is different. * what if the synonym is just "dog" with posinc=0 stacked ontop of itself (which is totally valid to do)... Perhaps highlighting can make use of it, but its unclear to me that its really following the contract. Furthermore the class in question (WeightedPhraseInfo) is public, and adding Comparable to it looks like it will create a situation where its inconsistent with equals()... I think this is a little dangerous. If it turns out we can reuse it: great! But i think rather than just slapping public on it, we should move it to .util, ensure it has good javadocs and unit tests, and investigate what exactly happens when these contracts are violated: e.g. can we make an exception happen rather than just broken behavior in a way that won't hurt performance and so on? > Teach fast FastVectorHighlighter to highlight "child fields" with parent > fields > --- > > Key: LUCENE-5274 > URL: https://issues.apache.org/jira/browse/LUCENE-5274 > Project: Lucene - Core > Issue Type: Improvement > Components: core/other >Reporter: Nik Everett >Assignee: Adrien Grand >Priority: Minor > Attachments: LUCENE-5274.patch > > > I've been messing around with the FastVectorHighlighter and it looks like I > can teach it to highlight matches on "child fields". Like this query: > foo:scissors foo_exact:running > would highlight foo like this: > running with scissors > Where foo is stored WITH_POSITIONS_OFFSETS and foo_plain is an unstored copy > of foo a different analyzer and its own WITH_POSITIONS_OFFSETS. > This would make queries that perform weighted matches against different > analyzers much more convenient to highlight. > I have working code and test cases but they are hacked into Elasticsearch. > I'd love to Lucene-ify if you'll take them. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5278) MockTokenizer throws away the character right after a token even if it is a valid start to a new token
[ https://issues.apache.org/jira/browse/LUCENE-5278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792993#comment-13792993 ] Robert Muir commented on LUCENE-5278: - I think i understand what you want: it makes sense. The only reason its the way it is today is because this thing historically came from CharTokenizer (see the isTokenChar?). But it would be better if you could e.g. make a pattern like ([A-Z]a-z+) and for it to actually break FooBar into Foo, Bar rather than throwout out "bar" all together. I'll dig into this! > MockTokenizer throws away the character right after a token even if it is a > valid start to a new token > -- > > Key: LUCENE-5278 > URL: https://issues.apache.org/jira/browse/LUCENE-5278 > Project: Lucene - Core > Issue Type: Bug >Reporter: Nik Everett >Priority: Trivial > Attachments: LUCENE-5278.patch > > > MockTokenizer throws away the character right after a token even if it is a > valid start to a new token. You won't see this unless you build a tokenizer > that can recognize every character like with new RegExp(".") or RegExp("..."). > Changing this behaviour seems to break a number of tests. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5278) MockTokenizer throws away the character right after a token even if it is a valid start to a new token
[ https://issues.apache.org/jira/browse/LUCENE-5278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nik Everett updated LUCENE-5278: Attachment: LUCENE-5278.patch This patch "fixes" the behaviour from my perspective but breaks a bunch of other tests. > MockTokenizer throws away the character right after a token even if it is a > valid start to a new token > -- > > Key: LUCENE-5278 > URL: https://issues.apache.org/jira/browse/LUCENE-5278 > Project: Lucene - Core > Issue Type: Bug >Reporter: Nik Everett >Priority: Trivial > Attachments: LUCENE-5278.patch > > > MockTokenizer throws away the character right after a token even if it is a > valid start to a new token. You won't see this unless you build a tokenizer > that can recognize every character like with new RegExp(".") or RegExp("..."). > Changing this behaviour seems to break a number of tests. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5274) Teach fast FastVectorHighlighter to highlight "child fields" with parent fields
[ https://issues.apache.org/jira/browse/LUCENE-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792974#comment-13792974 ] Nik Everett commented on LUCENE-5274: - Filed LUCENE-5278. > Teach fast FastVectorHighlighter to highlight "child fields" with parent > fields > --- > > Key: LUCENE-5274 > URL: https://issues.apache.org/jira/browse/LUCENE-5274 > Project: Lucene - Core > Issue Type: Improvement > Components: core/other >Reporter: Nik Everett >Assignee: Adrien Grand >Priority: Minor > Attachments: LUCENE-5274.patch > > > I've been messing around with the FastVectorHighlighter and it looks like I > can teach it to highlight matches on "child fields". Like this query: > foo:scissors foo_exact:running > would highlight foo like this: > running with scissors > Where foo is stored WITH_POSITIONS_OFFSETS and foo_plain is an unstored copy > of foo a different analyzer and its own WITH_POSITIONS_OFFSETS. > This would make queries that perform weighted matches against different > analyzers much more convenient to highlight. > I have working code and test cases but they are hacked into Elasticsearch. > I'd love to Lucene-ify if you'll take them. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-5278) MockTokenizer throws away the character right after a token even if it is a valid start to a new token
Nik Everett created LUCENE-5278: --- Summary: MockTokenizer throws away the character right after a token even if it is a valid start to a new token Key: LUCENE-5278 URL: https://issues.apache.org/jira/browse/LUCENE-5278 Project: Lucene - Core Issue Type: Bug Reporter: Nik Everett Priority: Trivial MockTokenizer throws away the character right after a token even if it is a valid start to a new token. You won't see this unless you build a tokenizer that can recognize every character like with new RegExp(".") or RegExp("..."). Changing this behaviour seems to break a number of tests. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-5340) Add support for named snapshots
Mike Schrag created SOLR-5340: - Summary: Add support for named snapshots Key: SOLR-5340 URL: https://issues.apache.org/jira/browse/SOLR-5340 Project: Solr Issue Type: Improvement Components: SolrCloud Affects Versions: 4.5 Reporter: Mike Schrag It would be really nice if Solr supported named snapshots. Right now if you snapshot a SolrCloud cluster, every node potentially records a slightly different timestamp. Correlating those back together to effectively restore the entire cluster to a consistent snapshot is pretty tedious. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5266) Optimization of the direct PackedInts readers
[ https://issues.apache.org/jira/browse/LUCENE-5266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792948#comment-13792948 ] Adrien Grand commented on LUCENE-5266: -- bq. The only caveat is the encoding would need to ensure there is always an extra 2 bytes at the end. There are some places (codecs) where I encode many short sequences consecutively so I care about not wasting extra bytes but if this proves to help performance, I think it shouldn't be too hard to do add the ability to have extra bytes at the end of the stream (I'm thinking about adding a new PackedInts.Format to the enum but there might be other options). > Optimization of the direct PackedInts readers > - > > Key: LUCENE-5266 > URL: https://issues.apache.org/jira/browse/LUCENE-5266 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Assignee: Adrien Grand >Priority: Minor > Attachments: LUCENE-5266.patch, LUCENE-5266.patch > > > Given that the initial focus for PackedInts readers was more on in-memory > readers (for storing stuff like the mapping from old to new doc IDs at > merging time), I never spent time trying to optimize the direct readers > although it could be beneficial now that they are used for disk-based doc > values. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5265) Make BlockPackedWriter constructor take an acceptable overhead ratio
[ https://issues.apache.org/jira/browse/LUCENE-5265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Adrien Grand updated LUCENE-5265: - Attachment: LUCENE-5265.patch Here is a patch. > Make BlockPackedWriter constructor take an acceptable overhead ratio > > > Key: LUCENE-5265 > URL: https://issues.apache.org/jira/browse/LUCENE-5265 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Adrien Grand >Priority: Minor > Attachments: LUCENE-5265.patch > > > Follow-up of http://search-lucene.com/m/SjmSW1CZYuZ1 > MemoryDocValuesFormat takes an acceptable overhead ratio but it is only used > when doing table compression. It should be used for all compression methods, > especially DELTA_COMPRESSED whose encoding is based on BlockPackedWriter. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-5275) Fix AttributeSource.toString()
[ https://issues.apache.org/jira/browse/LUCENE-5275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir resolved LUCENE-5275. - Resolution: Fixed Fix Version/s: 5.0 4.6 > Fix AttributeSource.toString() > -- > > Key: LUCENE-5275 > URL: https://issues.apache.org/jira/browse/LUCENE-5275 > Project: Lucene - Core > Issue Type: Bug >Reporter: Robert Muir > Fix For: 4.6, 5.0 > > Attachments: LUCENE-5275.patch, LUCENE-5275.patch > > > Its currently just Object.toString, e.g.: > org.apache.lucene.analysis.en.PorterStemFilter@8a32165c > But I think we should make it more useful, to end users trying to see what > their chain is doing, and to make SOPs easier when debugging: > {code} > EnglishAnalyzer analyzer = new EnglishAnalyzer(TEST_VERSION_CURRENT); > try (TokenStream ts = analyzer.tokenStream("body", "Its 2013, let's fix this > already!")) { > ts.reset(); > while (ts.incrementToken()) { > System.out.println(ts.toString()); > } > ts.end(); > } > {code} > Proposed output: > {noformat} > PorterStemFilter@8a32165c term=it,bytes=[69 > 74],startOffset=0,endOffset=3,positionIncrement=1,type=,keyword=false > PorterStemFilter@987b9eea term=2013,bytes=[32 30 31 > 33],startOffset=4,endOffset=8,positionIncrement=1,type=,keyword=false > PorterStemFilter@6b5dbd1f term=let,bytes=[6c 65 > 74],startOffset=10,endOffset=15,positionIncrement=1,type=,keyword=false > PorterStemFilter@45cbde1b term=fix,bytes=[66 69 > 78],startOffset=16,endOffset=19,positionIncrement=1,type=,keyword=false > PorterStemFilter@bcd8f627 term=alreadi,bytes=[61 6c 72 65 61 64 > 69],startOffset=25,endOffset=32,positionIncrement=2,type=,keyword=false > {noformat} -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-4073) Overseer will miss operations in some cases for OverseerCollectionProcessor
[ https://issues.apache.org/jira/browse/SOLR-4073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller resolved SOLR-4073. --- Resolution: Duplicate Fix Version/s: (was: 4.6) > Overseer will miss operations in some cases for OverseerCollectionProcessor > > > Key: SOLR-4073 > URL: https://issues.apache.org/jira/browse/SOLR-4073 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Affects Versions: 4.0-ALPHA, 4.0-BETA, 4.0, 4.1, 4.2 > Environment: Solr cloud >Reporter: Raintung Li >Assignee: Mark Miller > Attachments: patch-4073 > > Original Estimate: 168h > Remaining Estimate: 168h > > One overseer disconnect with Zookeeper, but overseer thread still handle the > request(A) in the DistributedQueue. Example: overseer thread reconnect > Zookeeper try to remove the Top's request. "workQueue.remove();". > Now the other server will take over the overseer privilege because old > overseer disconnect. Start overseer thread and handle the queue request(A) > again, and remove the request(A) from queue, then try to get the top's > request(B, doesn't get). In the this time old overseer reconnect with > ZooKeeper, and remove the top's request from queue. Now the top request is B, > it is moved by old overseer server. New overseer server never do B > request,because this request deleted by old overseer server, at the last this > request(B) miss operations. > At best, distributeQueue.peek can get the request's ID that will be removed > for workqueue.remove(ID), not remove the top's request. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5274) Teach fast FastVectorHighlighter to highlight "child fields" with parent fields
[ https://issues.apache.org/jira/browse/LUCENE-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792921#comment-13792921 ] Robert Muir commented on LUCENE-5274: - if you suspect there is a bug in mocktokenizer, please open a separate issue for that. mocktokenizer is used by like, thousands of tests :) > Teach fast FastVectorHighlighter to highlight "child fields" with parent > fields > --- > > Key: LUCENE-5274 > URL: https://issues.apache.org/jira/browse/LUCENE-5274 > Project: Lucene - Core > Issue Type: Improvement > Components: core/other >Reporter: Nik Everett >Assignee: Adrien Grand >Priority: Minor > Attachments: LUCENE-5274.patch > > > I've been messing around with the FastVectorHighlighter and it looks like I > can teach it to highlight matches on "child fields". Like this query: > foo:scissors foo_exact:running > would highlight foo like this: > running with scissors > Where foo is stored WITH_POSITIONS_OFFSETS and foo_plain is an unstored copy > of foo a different analyzer and its own WITH_POSITIONS_OFFSETS. > This would make queries that perform weighted matches against different > analyzers much more convenient to highlight. > I have working code and test cases but they are hacked into Elasticsearch. > I'd love to Lucene-ify if you'll take them. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5274) Teach fast FastVectorHighlighter to highlight "child fields" with parent fields
[ https://issues.apache.org/jira/browse/LUCENE-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792913#comment-13792913 ] Nik Everett commented on LUCENE-5274: - Hey, forgot to mention that. MockTokenizer seems to throw away the character after the end of each token even if that character is the valid start to the next token. This comes up because I wanted to tokenize strings in a simplistic way to test that the highlighter can handle different tokenizers and it just wasn't working right. So I "fixed" MockTokenizer but I did it in a pretty brutal way. I'm happy to move the change to another bug and improve it but testing the highlighter change without it is a bit painful. > Teach fast FastVectorHighlighter to highlight "child fields" with parent > fields > --- > > Key: LUCENE-5274 > URL: https://issues.apache.org/jira/browse/LUCENE-5274 > Project: Lucene - Core > Issue Type: Improvement > Components: core/other >Reporter: Nik Everett >Assignee: Adrien Grand >Priority: Minor > Attachments: LUCENE-5274.patch > > > I've been messing around with the FastVectorHighlighter and it looks like I > can teach it to highlight matches on "child fields". Like this query: > foo:scissors foo_exact:running > would highlight foo like this: > running with scissors > Where foo is stored WITH_POSITIONS_OFFSETS and foo_plain is an unstored copy > of foo a different analyzer and its own WITH_POSITIONS_OFFSETS. > This would make queries that perform weighted matches against different > analyzers much more convenient to highlight. > I have working code and test cases but they are hacked into Elasticsearch. > I'd love to Lucene-ify if you'll take them. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4816) Add document routing to CloudSolrServer
[ https://issues.apache.org/jira/browse/SOLR-4816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792911#comment-13792911 ] Jessica Cheng commented on SOLR-4816: - I think the latest patch: -if (request instanceof IsUpdateRequest && updatesToLeaders) { +if (request instanceof IsUpdateRequest) { removed the effect of the "updatesToLeaders" variable. Looking at http://svn.apache.org/viewvc/lucene/dev/branches/lucene_solr_4_5/solr/solrj/src/java/org/apache/solr/client/solrj/impl/CloudSolrServer.java?view=markup it's not used anywhere to make a decision anymore. > Add document routing to CloudSolrServer > --- > > Key: SOLR-4816 > URL: https://issues.apache.org/jira/browse/SOLR-4816 > Project: Solr > Issue Type: Improvement > Components: SolrCloud >Reporter: Joel Bernstein >Assignee: Mark Miller >Priority: Minor > Fix For: 4.5, 5.0 > > Attachments: RequestTask-removal.patch, SOLR-4816.patch, > SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, > SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, > SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, > SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, > SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, > SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, > SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816-sriesenberg.patch > > > This issue adds the following enhancements to CloudSolrServer's update logic: > 1) Document routing: Updates are routed directly to the correct shard leader > eliminating document routing at the server. > 2) Optional parallel update execution: Updates for each shard are executed in > a separate thread so parallel indexing can occur across the cluster. > These enhancements should allow for near linear scalability on indexing > throughput. > Usage: > CloudSolrServer cloudClient = new CloudSolrServer(zkAddress); > cloudClient.setParallelUpdates(true); > SolrInputDocument doc1 = new SolrInputDocument(); > doc1.addField(id, "0"); > doc1.addField("a_t", "hello1"); > SolrInputDocument doc2 = new SolrInputDocument(); > doc2.addField(id, "2"); > doc2.addField("a_t", "hello2"); > UpdateRequest request = new UpdateRequest(); > request.add(doc1); > request.add(doc2); > request.setAction(AbstractUpdateRequest.ACTION.OPTIMIZE, false, false); > NamedList response = cloudClient.request(request); // Returns a backwards > compatible condensed response. > //To get more detailed response down cast to RouteResponse: > CloudSolrServer.RouteResponse rr = (CloudSolrServer.RouteResponse)response; -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5275) Fix AttributeSource.toString()
[ https://issues.apache.org/jira/browse/LUCENE-5275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792912#comment-13792912 ] ASF subversion and git services commented on LUCENE-5275: - Commit 1531381 from [~rcmuir] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1531381 ] LUCENE-5275: Change AttributeSource.toString to display the current state of attributes > Fix AttributeSource.toString() > -- > > Key: LUCENE-5275 > URL: https://issues.apache.org/jira/browse/LUCENE-5275 > Project: Lucene - Core > Issue Type: Bug >Reporter: Robert Muir > Attachments: LUCENE-5275.patch, LUCENE-5275.patch > > > Its currently just Object.toString, e.g.: > org.apache.lucene.analysis.en.PorterStemFilter@8a32165c > But I think we should make it more useful, to end users trying to see what > their chain is doing, and to make SOPs easier when debugging: > {code} > EnglishAnalyzer analyzer = new EnglishAnalyzer(TEST_VERSION_CURRENT); > try (TokenStream ts = analyzer.tokenStream("body", "Its 2013, let's fix this > already!")) { > ts.reset(); > while (ts.incrementToken()) { > System.out.println(ts.toString()); > } > ts.end(); > } > {code} > Proposed output: > {noformat} > PorterStemFilter@8a32165c term=it,bytes=[69 > 74],startOffset=0,endOffset=3,positionIncrement=1,type=,keyword=false > PorterStemFilter@987b9eea term=2013,bytes=[32 30 31 > 33],startOffset=4,endOffset=8,positionIncrement=1,type=,keyword=false > PorterStemFilter@6b5dbd1f term=let,bytes=[6c 65 > 74],startOffset=10,endOffset=15,positionIncrement=1,type=,keyword=false > PorterStemFilter@45cbde1b term=fix,bytes=[66 69 > 78],startOffset=16,endOffset=19,positionIncrement=1,type=,keyword=false > PorterStemFilter@bcd8f627 term=alreadi,bytes=[61 6c 72 65 61 64 > 69],startOffset=25,endOffset=32,positionIncrement=2,type=,keyword=false > {noformat} -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5274) Teach fast FastVectorHighlighter to highlight "child fields" with parent fields
[ https://issues.apache.org/jira/browse/LUCENE-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792907#comment-13792907 ] Robert Muir commented on LUCENE-5274: - Why would a highlighter improvement require mocktokenizer changes? > Teach fast FastVectorHighlighter to highlight "child fields" with parent > fields > --- > > Key: LUCENE-5274 > URL: https://issues.apache.org/jira/browse/LUCENE-5274 > Project: Lucene - Core > Issue Type: Improvement > Components: core/other >Reporter: Nik Everett >Assignee: Adrien Grand >Priority: Minor > Attachments: LUCENE-5274.patch > > > I've been messing around with the FastVectorHighlighter and it looks like I > can teach it to highlight matches on "child fields". Like this query: > foo:scissors foo_exact:running > would highlight foo like this: > running with scissors > Where foo is stored WITH_POSITIONS_OFFSETS and foo_plain is an unstored copy > of foo a different analyzer and its own WITH_POSITIONS_OFFSETS. > This would make queries that perform weighted matches against different > analyzers much more convenient to highlight. > I have working code and test cases but they are hacked into Elasticsearch. > I'd love to Lucene-ify if you'll take them. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-5323) Solr requires -Dsolr.clustering.enabled=false when pointing at example config
[ https://issues.apache.org/jira/browse/SOLR-5323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss resolved SOLR-5323. --- Resolution: Fixed Applied to branch_4x, lucene_solr_4_5 and trunk. > Solr requires -Dsolr.clustering.enabled=false when pointing at example config > - > > Key: SOLR-5323 > URL: https://issues.apache.org/jira/browse/SOLR-5323 > Project: Solr > Issue Type: Bug > Components: contrib - Clustering >Affects Versions: 4.5 > Environment: vanilla mac >Reporter: John Berryman >Assignee: Dawid Weiss > Fix For: 4.5.1, 4.6, 5.0 > > Attachments: SOLR-5323.patch > > > my typical use of Solr is something like this: > {code} > cd SOLR_HOME/example > cp -r solr /myProjectDir/solr_home > java -jar -Dsolr.solr.home=/myProjectDir/solr_home start.jar > {code} > But in solr 4.5.0 this fails to start successfully. I get an error: > {code} > org.apache.solr.common.SolrException: Error loading class > 'solr.clustering.ClusteringComponent' > {code} > The reason is because solr.clustering.enabled defaults to true now. I don't > know why this might be the case. > you can get around it with > {code} > java -jar -Dsolr.solr.home=/myProjectDir/solr_home > -Dsolr.clustering.enabled=false start.jar > {code} > SOLR-4708 is when this became an issue. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5323) Solr requires -Dsolr.clustering.enabled=false when pointing at example config
[ https://issues.apache.org/jira/browse/SOLR-5323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792902#comment-13792902 ] ASF subversion and git services commented on SOLR-5323: --- Commit 1531380 from [~dawidweiss] in branch 'dev/branches/lucene_solr_4_5' [ https://svn.apache.org/r1531380 ] SOLR-5323: Disable ClusteringComponent by default in collection1 example. The solr.clustering.enabled system property needs to be set to 'true' to enable the clustering contrib (reverts SOLR-4708). (Dawid Weiss) > Solr requires -Dsolr.clustering.enabled=false when pointing at example config > - > > Key: SOLR-5323 > URL: https://issues.apache.org/jira/browse/SOLR-5323 > Project: Solr > Issue Type: Bug > Components: contrib - Clustering >Affects Versions: 4.5 > Environment: vanilla mac >Reporter: John Berryman >Assignee: Dawid Weiss > Fix For: 4.5.1, 4.6, 5.0 > > Attachments: SOLR-5323.patch > > > my typical use of Solr is something like this: > {code} > cd SOLR_HOME/example > cp -r solr /myProjectDir/solr_home > java -jar -Dsolr.solr.home=/myProjectDir/solr_home start.jar > {code} > But in solr 4.5.0 this fails to start successfully. I get an error: > {code} > org.apache.solr.common.SolrException: Error loading class > 'solr.clustering.ClusteringComponent' > {code} > The reason is because solr.clustering.enabled defaults to true now. I don't > know why this might be the case. > you can get around it with > {code} > java -jar -Dsolr.solr.home=/myProjectDir/solr_home > -Dsolr.clustering.enabled=false start.jar > {code} > SOLR-4708 is when this became an issue. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5274) Teach fast FastVectorHighlighter to highlight "child fields" with parent fields
[ https://issues.apache.org/jira/browse/LUCENE-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nik Everett updated LUCENE-5274: Attachment: LUCENE-5274.patch New version of the patch. This one works a lot better with phrases and even works on fields that have the same source but different tokenizers. It still makes highlighting depend on the analysis module to pick up PerFieldAnalyzerWrapper. I think all the new code this adds to FieldPhraseList deserves a unit test on its own but I'm not in the frame of mind to write one at the moment so I'll have to come back to it later. > Teach fast FastVectorHighlighter to highlight "child fields" with parent > fields > --- > > Key: LUCENE-5274 > URL: https://issues.apache.org/jira/browse/LUCENE-5274 > Project: Lucene - Core > Issue Type: Improvement > Components: core/other >Reporter: Nik Everett >Assignee: Adrien Grand >Priority: Minor > Attachments: LUCENE-5274.patch > > > I've been messing around with the FastVectorHighlighter and it looks like I > can teach it to highlight matches on "child fields". Like this query: > foo:scissors foo_exact:running > would highlight foo like this: > running with scissors > Where foo is stored WITH_POSITIONS_OFFSETS and foo_plain is an unstored copy > of foo a different analyzer and its own WITH_POSITIONS_OFFSETS. > This would make queries that perform weighted matches against different > analyzers much more convenient to highlight. > I have working code and test cases but they are hacked into Elasticsearch. > I'd love to Lucene-ify if you'll take them. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4708) Enable ClusteringComponent by default
[ https://issues.apache.org/jira/browse/SOLR-4708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792903#comment-13792903 ] ASF subversion and git services commented on SOLR-4708: --- Commit 1531380 from [~dawidweiss] in branch 'dev/branches/lucene_solr_4_5' [ https://svn.apache.org/r1531380 ] SOLR-5323: Disable ClusteringComponent by default in collection1 example. The solr.clustering.enabled system property needs to be set to 'true' to enable the clustering contrib (reverts SOLR-4708). (Dawid Weiss) > Enable ClusteringComponent by default > - > > Key: SOLR-4708 > URL: https://issues.apache.org/jira/browse/SOLR-4708 > Project: Solr > Issue Type: Task >Reporter: Erik Hatcher >Assignee: Dawid Weiss >Priority: Minor > Fix For: 4.5, 5.0 > > Attachments: SOLR-4708.patch, SOLR-4708.patch > > > In the past, the ClusteringComponent used to rely on 3rd party JARs not > available from a Solr distro. This is no longer the case, but the /browse UI > and other references still had the clustering component disabled in the > example with an awkward system property way to enable it. Let's remove all > of that unnecessary stuff and just enable it as it works out of the box now. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5274) Teach fast FastVectorHighlighter to highlight "child fields" with parent fields
[ https://issues.apache.org/jira/browse/LUCENE-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nik Everett updated LUCENE-5274: Attachment: (was: LUCENE-5274-4.patch) > Teach fast FastVectorHighlighter to highlight "child fields" with parent > fields > --- > > Key: LUCENE-5274 > URL: https://issues.apache.org/jira/browse/LUCENE-5274 > Project: Lucene - Core > Issue Type: Improvement > Components: core/other >Reporter: Nik Everett >Assignee: Adrien Grand >Priority: Minor > > I've been messing around with the FastVectorHighlighter and it looks like I > can teach it to highlight matches on "child fields". Like this query: > foo:scissors foo_exact:running > would highlight foo like this: > running with scissors > Where foo is stored WITH_POSITIONS_OFFSETS and foo_plain is an unstored copy > of foo a different analyzer and its own WITH_POSITIONS_OFFSETS. > This would make queries that perform weighted matches against different > analyzers much more convenient to highlight. > I have working code and test cases but they are hacked into Elasticsearch. > I'd love to Lucene-ify if you'll take them. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4708) Enable ClusteringComponent by default
[ https://issues.apache.org/jira/browse/SOLR-4708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792898#comment-13792898 ] ASF subversion and git services commented on SOLR-4708: --- Commit 1531378 from [~dawidweiss] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1531378 ] SOLR-5323: Disable ClusteringComponent by default in collection1 example. The solr.clustering.enabled system property needs to be set to 'true' to enable the clustering contrib (reverts SOLR-4708). (Dawid Weiss) > Enable ClusteringComponent by default > - > > Key: SOLR-4708 > URL: https://issues.apache.org/jira/browse/SOLR-4708 > Project: Solr > Issue Type: Task >Reporter: Erik Hatcher >Assignee: Dawid Weiss >Priority: Minor > Fix For: 4.5, 5.0 > > Attachments: SOLR-4708.patch, SOLR-4708.patch > > > In the past, the ClusteringComponent used to rely on 3rd party JARs not > available from a Solr distro. This is no longer the case, but the /browse UI > and other references still had the clustering component disabled in the > example with an awkward system property way to enable it. Let's remove all > of that unnecessary stuff and just enable it as it works out of the box now. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5274) Teach fast FastVectorHighlighter to highlight "child fields" with parent fields
[ https://issues.apache.org/jira/browse/LUCENE-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nik Everett updated LUCENE-5274: Attachment: (was: LUCENE-5274.patch) > Teach fast FastVectorHighlighter to highlight "child fields" with parent > fields > --- > > Key: LUCENE-5274 > URL: https://issues.apache.org/jira/browse/LUCENE-5274 > Project: Lucene - Core > Issue Type: Improvement > Components: core/other >Reporter: Nik Everett >Assignee: Adrien Grand >Priority: Minor > > I've been messing around with the FastVectorHighlighter and it looks like I > can teach it to highlight matches on "child fields". Like this query: > foo:scissors foo_exact:running > would highlight foo like this: > running with scissors > Where foo is stored WITH_POSITIONS_OFFSETS and foo_plain is an unstored copy > of foo a different analyzer and its own WITH_POSITIONS_OFFSETS. > This would make queries that perform weighted matches against different > analyzers much more convenient to highlight. > I have working code and test cases but they are hacked into Elasticsearch. > I'd love to Lucene-ify if you'll take them. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5323) Solr requires -Dsolr.clustering.enabled=false when pointing at example config
[ https://issues.apache.org/jira/browse/SOLR-5323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792897#comment-13792897 ] ASF subversion and git services commented on SOLR-5323: --- Commit 1531378 from [~dawidweiss] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1531378 ] SOLR-5323: Disable ClusteringComponent by default in collection1 example. The solr.clustering.enabled system property needs to be set to 'true' to enable the clustering contrib (reverts SOLR-4708). (Dawid Weiss) > Solr requires -Dsolr.clustering.enabled=false when pointing at example config > - > > Key: SOLR-5323 > URL: https://issues.apache.org/jira/browse/SOLR-5323 > Project: Solr > Issue Type: Bug > Components: contrib - Clustering >Affects Versions: 4.5 > Environment: vanilla mac >Reporter: John Berryman >Assignee: Dawid Weiss > Fix For: 4.5.1, 4.6, 5.0 > > Attachments: SOLR-5323.patch > > > my typical use of Solr is something like this: > {code} > cd SOLR_HOME/example > cp -r solr /myProjectDir/solr_home > java -jar -Dsolr.solr.home=/myProjectDir/solr_home start.jar > {code} > But in solr 4.5.0 this fails to start successfully. I get an error: > {code} > org.apache.solr.common.SolrException: Error loading class > 'solr.clustering.ClusteringComponent' > {code} > The reason is because solr.clustering.enabled defaults to true now. I don't > know why this might be the case. > you can get around it with > {code} > java -jar -Dsolr.solr.home=/myProjectDir/solr_home > -Dsolr.clustering.enabled=false start.jar > {code} > SOLR-4708 is when this became an issue. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4708) Enable ClusteringComponent by default
[ https://issues.apache.org/jira/browse/SOLR-4708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792895#comment-13792895 ] ASF subversion and git services commented on SOLR-4708: --- Commit 1531377 from [~dawidweiss] in branch 'dev/trunk' [ https://svn.apache.org/r1531377 ] SOLR-5323: Disable ClusteringComponent by default in collection1 example. The solr.clustering.enabled system property needs to be set to 'true' to enable the clustering contrib (reverts SOLR-4708). (Dawid Weiss) > Enable ClusteringComponent by default > - > > Key: SOLR-4708 > URL: https://issues.apache.org/jira/browse/SOLR-4708 > Project: Solr > Issue Type: Task >Reporter: Erik Hatcher >Assignee: Dawid Weiss >Priority: Minor > Fix For: 4.5, 5.0 > > Attachments: SOLR-4708.patch, SOLR-4708.patch > > > In the past, the ClusteringComponent used to rely on 3rd party JARs not > available from a Solr distro. This is no longer the case, but the /browse UI > and other references still had the clustering component disabled in the > example with an awkward system property way to enable it. Let's remove all > of that unnecessary stuff and just enable it as it works out of the box now. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5323) Solr requires -Dsolr.clustering.enabled=false when pointing at example config
[ https://issues.apache.org/jira/browse/SOLR-5323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792894#comment-13792894 ] ASF subversion and git services commented on SOLR-5323: --- Commit 1531377 from [~dawidweiss] in branch 'dev/trunk' [ https://svn.apache.org/r1531377 ] SOLR-5323: Disable ClusteringComponent by default in collection1 example. The solr.clustering.enabled system property needs to be set to 'true' to enable the clustering contrib (reverts SOLR-4708). (Dawid Weiss) > Solr requires -Dsolr.clustering.enabled=false when pointing at example config > - > > Key: SOLR-5323 > URL: https://issues.apache.org/jira/browse/SOLR-5323 > Project: Solr > Issue Type: Bug > Components: contrib - Clustering >Affects Versions: 4.5 > Environment: vanilla mac >Reporter: John Berryman >Assignee: Dawid Weiss > Fix For: 4.5.1, 4.6, 5.0 > > Attachments: SOLR-5323.patch > > > my typical use of Solr is something like this: > {code} > cd SOLR_HOME/example > cp -r solr /myProjectDir/solr_home > java -jar -Dsolr.solr.home=/myProjectDir/solr_home start.jar > {code} > But in solr 4.5.0 this fails to start successfully. I get an error: > {code} > org.apache.solr.common.SolrException: Error loading class > 'solr.clustering.ClusteringComponent' > {code} > The reason is because solr.clustering.enabled defaults to true now. I don't > know why this might be the case. > you can get around it with > {code} > java -jar -Dsolr.solr.home=/myProjectDir/solr_home > -Dsolr.clustering.enabled=false start.jar > {code} > SOLR-4708 is when this became an issue. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5323) Solr requires -Dsolr.clustering.enabled=false when pointing at example config
[ https://issues.apache.org/jira/browse/SOLR-5323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss updated SOLR-5323: -- Fix Version/s: 4.5.1 > Solr requires -Dsolr.clustering.enabled=false when pointing at example config > - > > Key: SOLR-5323 > URL: https://issues.apache.org/jira/browse/SOLR-5323 > Project: Solr > Issue Type: Bug > Components: contrib - Clustering >Affects Versions: 4.5 > Environment: vanilla mac >Reporter: John Berryman >Assignee: Dawid Weiss > Fix For: 4.5.1, 4.6, 5.0 > > Attachments: SOLR-5323.patch > > > my typical use of Solr is something like this: > {code} > cd SOLR_HOME/example > cp -r solr /myProjectDir/solr_home > java -jar -Dsolr.solr.home=/myProjectDir/solr_home start.jar > {code} > But in solr 4.5.0 this fails to start successfully. I get an error: > {code} > org.apache.solr.common.SolrException: Error loading class > 'solr.clustering.ClusteringComponent' > {code} > The reason is because solr.clustering.enabled defaults to true now. I don't > know why this might be the case. > you can get around it with > {code} > java -jar -Dsolr.solr.home=/myProjectDir/solr_home > -Dsolr.clustering.enabled=false start.jar > {code} > SOLR-4708 is when this became an issue. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5323) Solr requires -Dsolr.clustering.enabled=false when pointing at example config
[ https://issues.apache.org/jira/browse/SOLR-5323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss updated SOLR-5323: -- Attachment: SOLR-5323.patch Patch reverting (portions) of SOLR-4708. > Solr requires -Dsolr.clustering.enabled=false when pointing at example config > - > > Key: SOLR-5323 > URL: https://issues.apache.org/jira/browse/SOLR-5323 > Project: Solr > Issue Type: Bug > Components: contrib - Clustering >Affects Versions: 4.5 > Environment: vanilla mac >Reporter: John Berryman >Assignee: Dawid Weiss > Fix For: 4.6, 5.0 > > Attachments: SOLR-5323.patch > > > my typical use of Solr is something like this: > {code} > cd SOLR_HOME/example > cp -r solr /myProjectDir/solr_home > java -jar -Dsolr.solr.home=/myProjectDir/solr_home start.jar > {code} > But in solr 4.5.0 this fails to start successfully. I get an error: > {code} > org.apache.solr.common.SolrException: Error loading class > 'solr.clustering.ClusteringComponent' > {code} > The reason is because solr.clustering.enabled defaults to true now. I don't > know why this might be the case. > you can get around it with > {code} > java -jar -Dsolr.solr.home=/myProjectDir/solr_home > -Dsolr.clustering.enabled=false start.jar > {code} > SOLR-4708 is when this became an issue. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5275) Fix AttributeSource.toString()
[ https://issues.apache.org/jira/browse/LUCENE-5275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792884#comment-13792884 ] ASF subversion and git services commented on LUCENE-5275: - Commit 1531376 from [~rcmuir] in branch 'dev/trunk' [ https://svn.apache.org/r1531376 ] LUCENE-5275: Change AttributeSource.toString to display the current state of attributes > Fix AttributeSource.toString() > -- > > Key: LUCENE-5275 > URL: https://issues.apache.org/jira/browse/LUCENE-5275 > Project: Lucene - Core > Issue Type: Bug >Reporter: Robert Muir > Attachments: LUCENE-5275.patch, LUCENE-5275.patch > > > Its currently just Object.toString, e.g.: > org.apache.lucene.analysis.en.PorterStemFilter@8a32165c > But I think we should make it more useful, to end users trying to see what > their chain is doing, and to make SOPs easier when debugging: > {code} > EnglishAnalyzer analyzer = new EnglishAnalyzer(TEST_VERSION_CURRENT); > try (TokenStream ts = analyzer.tokenStream("body", "Its 2013, let's fix this > already!")) { > ts.reset(); > while (ts.incrementToken()) { > System.out.println(ts.toString()); > } > ts.end(); > } > {code} > Proposed output: > {noformat} > PorterStemFilter@8a32165c term=it,bytes=[69 > 74],startOffset=0,endOffset=3,positionIncrement=1,type=,keyword=false > PorterStemFilter@987b9eea term=2013,bytes=[32 30 31 > 33],startOffset=4,endOffset=8,positionIncrement=1,type=,keyword=false > PorterStemFilter@6b5dbd1f term=let,bytes=[6c 65 > 74],startOffset=10,endOffset=15,positionIncrement=1,type=,keyword=false > PorterStemFilter@45cbde1b term=fix,bytes=[66 69 > 78],startOffset=16,endOffset=19,positionIncrement=1,type=,keyword=false > PorterStemFilter@bcd8f627 term=alreadi,bytes=[61 6c 72 65 61 64 > 69],startOffset=25,endOffset=32,positionIncrement=2,type=,keyword=false > {noformat} -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5323) Solr requires -Dsolr.clustering.enabled=false when pointing at example config
[ https://issues.apache.org/jira/browse/SOLR-5323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792859#comment-13792859 ] Dawid Weiss commented on SOLR-5323: --- Ok, I will reverting the changes from SOLR-4708. > Solr requires -Dsolr.clustering.enabled=false when pointing at example config > - > > Key: SOLR-5323 > URL: https://issues.apache.org/jira/browse/SOLR-5323 > Project: Solr > Issue Type: Bug > Components: contrib - Clustering >Affects Versions: 4.5 > Environment: vanilla mac >Reporter: John Berryman >Assignee: Dawid Weiss > Fix For: 4.6, 5.0 > > > my typical use of Solr is something like this: > {code} > cd SOLR_HOME/example > cp -r solr /myProjectDir/solr_home > java -jar -Dsolr.solr.home=/myProjectDir/solr_home start.jar > {code} > But in solr 4.5.0 this fails to start successfully. I get an error: > {code} > org.apache.solr.common.SolrException: Error loading class > 'solr.clustering.ClusteringComponent' > {code} > The reason is because solr.clustering.enabled defaults to true now. I don't > know why this might be the case. > you can get around it with > {code} > java -jar -Dsolr.solr.home=/myProjectDir/solr_home > -Dsolr.clustering.enabled=false start.jar > {code} > SOLR-4708 is when this became an issue. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5323) Solr requires -Dsolr.clustering.enabled=false when pointing at example config
[ https://issues.apache.org/jira/browse/SOLR-5323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792850#comment-13792850 ] Mark Miller commented on SOLR-5323: --- I just think anything with the relative paths is a separate issue. You can use any hierarchy - you just have to change those paths. I'm all for that being improved somehow, but the issue here seems to be: Solr contrib modules are lazy loaded so that if you don't use them, you can delete any of them from the dist package layout and things still work. Or you can not delete them and if you try and use them, things work. Clustering now violates that. It's not really clusterings fault, it seems to more be a limitation of the search component. > Solr requires -Dsolr.clustering.enabled=false when pointing at example config > - > > Key: SOLR-5323 > URL: https://issues.apache.org/jira/browse/SOLR-5323 > Project: Solr > Issue Type: Bug > Components: contrib - Clustering >Affects Versions: 4.5 > Environment: vanilla mac >Reporter: John Berryman >Assignee: Dawid Weiss > Fix For: 4.6, 5.0 > > > my typical use of Solr is something like this: > {code} > cd SOLR_HOME/example > cp -r solr /myProjectDir/solr_home > java -jar -Dsolr.solr.home=/myProjectDir/solr_home start.jar > {code} > But in solr 4.5.0 this fails to start successfully. I get an error: > {code} > org.apache.solr.common.SolrException: Error loading class > 'solr.clustering.ClusteringComponent' > {code} > The reason is because solr.clustering.enabled defaults to true now. I don't > know why this might be the case. > you can get around it with > {code} > java -jar -Dsolr.solr.home=/myProjectDir/solr_home > -Dsolr.clustering.enabled=false start.jar > {code} > SOLR-4708 is when this became an issue. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5269) TestRandomChains failure
[ https://issues.apache.org/jira/browse/LUCENE-5269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792846#comment-13792846 ] ASF subversion and git services commented on LUCENE-5269: - Commit 1531369 from [~rcmuir] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1531369 ] LUCENE-5269: satisfy the policeman > TestRandomChains failure > > > Key: LUCENE-5269 > URL: https://issues.apache.org/jira/browse/LUCENE-5269 > Project: Lucene - Core > Issue Type: Bug >Reporter: Robert Muir > Fix For: 4.5.1, 4.6, 5.0 > > Attachments: LUCENE-5269.patch, LUCENE-5269.patch, LUCENE-5269.patch, > LUCENE-5269_test.patch, LUCENE-5269_test.patch, LUCENE-5269_test.patch > > > One of EdgeNGramTokenizer, ShingleFilter, NGramTokenFilter is buggy, or > possibly only the combination of them conspiring together. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5269) TestRandomChains failure
[ https://issues.apache.org/jira/browse/LUCENE-5269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792845#comment-13792845 ] ASF subversion and git services commented on LUCENE-5269: - Commit 1531368 from [~rcmuir] in branch 'dev/trunk' [ https://svn.apache.org/r1531368 ] LUCENE-5269: satisfy the policeman > TestRandomChains failure > > > Key: LUCENE-5269 > URL: https://issues.apache.org/jira/browse/LUCENE-5269 > Project: Lucene - Core > Issue Type: Bug >Reporter: Robert Muir > Fix For: 4.5.1, 4.6, 5.0 > > Attachments: LUCENE-5269.patch, LUCENE-5269.patch, LUCENE-5269.patch, > LUCENE-5269_test.patch, LUCENE-5269_test.patch, LUCENE-5269_test.patch > > > One of EdgeNGramTokenizer, ShingleFilter, NGramTokenFilter is buggy, or > possibly only the combination of them conspiring together. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5323) Solr requires -Dsolr.clustering.enabled=false when pointing at example config
[ https://issues.apache.org/jira/browse/SOLR-5323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792834#comment-13792834 ] Dawid Weiss commented on SOLR-5323: --- I can revert to lazy-loading, not a problem. But this isn't solving the relative paths issue at all. Like I mentioned there were several times when I had to pass an example preconfigured solr configuration to somebody -- this always required that person to put the content of the example under a specific directory in Solr distribution, otherwise things wouldn't work because of relative paths. It was a pain to explain why this step is needed and to enforce... I ended up just copying the required JARs into the example. This seems wrong somehow -- if it's a solr distribution then there should be a way to reference contribs in a way that allows people to have their stuff in any folder hierarchy? What do you think? > Solr requires -Dsolr.clustering.enabled=false when pointing at example config > - > > Key: SOLR-5323 > URL: https://issues.apache.org/jira/browse/SOLR-5323 > Project: Solr > Issue Type: Bug > Components: contrib - Clustering >Affects Versions: 4.5 > Environment: vanilla mac >Reporter: John Berryman >Assignee: Dawid Weiss > Fix For: 4.6, 5.0 > > > my typical use of Solr is something like this: > {code} > cd SOLR_HOME/example > cp -r solr /myProjectDir/solr_home > java -jar -Dsolr.solr.home=/myProjectDir/solr_home start.jar > {code} > But in solr 4.5.0 this fails to start successfully. I get an error: > {code} > org.apache.solr.common.SolrException: Error loading class > 'solr.clustering.ClusteringComponent' > {code} > The reason is because solr.clustering.enabled defaults to true now. I don't > know why this might be the case. > you can get around it with > {code} > java -jar -Dsolr.solr.home=/myProjectDir/solr_home > -Dsolr.clustering.enabled=false start.jar > {code} > SOLR-4708 is when this became an issue. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5323) Solr requires -Dsolr.clustering.enabled=false when pointing at example config
[ https://issues.apache.org/jira/browse/SOLR-5323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792824#comment-13792824 ] Mark Miller commented on SOLR-5323: --- I also think this was a mistake - I don't know that we need another solr.home type thing to address it though. The root of the issue is that the clustering is not "really" lazy loading clustering - and the current policy is to lazy load the contrib modules - and that is because of the component. I think Erik is on to the right path with lazy SearchComponents. I think that if the only request handlers that refer to a search component are lazy, they should probably also init lazily. I have not looked into how hard that is to do, but it seems like the correct fix to bring clustering in line with the other contribs. I also think the whole enabled flag we had is no good. > Solr requires -Dsolr.clustering.enabled=false when pointing at example config > - > > Key: SOLR-5323 > URL: https://issues.apache.org/jira/browse/SOLR-5323 > Project: Solr > Issue Type: Bug > Components: contrib - Clustering >Affects Versions: 4.5 > Environment: vanilla mac >Reporter: John Berryman >Assignee: Dawid Weiss > Fix For: 4.6, 5.0 > > > my typical use of Solr is something like this: > {code} > cd SOLR_HOME/example > cp -r solr /myProjectDir/solr_home > java -jar -Dsolr.solr.home=/myProjectDir/solr_home start.jar > {code} > But in solr 4.5.0 this fails to start successfully. I get an error: > {code} > org.apache.solr.common.SolrException: Error loading class > 'solr.clustering.ClusteringComponent' > {code} > The reason is because solr.clustering.enabled defaults to true now. I don't > know why this might be the case. > you can get around it with > {code} > java -jar -Dsolr.solr.home=/myProjectDir/solr_home > -Dsolr.clustering.enabled=false start.jar > {code} > SOLR-4708 is when this became an issue. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5273) Binary artifacts in Lucene and Solr convenience binary distributions accompanying a release, including on Maven Central, should be identical across all distributions
[ https://issues.apache.org/jira/browse/LUCENE-5273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792800#comment-13792800 ] ASF subversion and git services commented on LUCENE-5273: - Commit 1531354 from [~steve_rowe] in branch 'dev/trunk' [ https://svn.apache.org/r1531354 ] LUCENE-5273: Binary artifacts in Lucene and Solr convenience binary distributions accompanying a release, including on Maven Central, should be identical across all distributions. > Binary artifacts in Lucene and Solr convenience binary distributions > accompanying a release, including on Maven Central, should be identical > across all distributions > - > > Key: LUCENE-5273 > URL: https://issues.apache.org/jira/browse/LUCENE-5273 > Project: Lucene - Core > Issue Type: Bug > Components: general/build >Reporter: Steve Rowe >Assignee: Steve Rowe > Fix For: 4.6 > > Attachments: LUCENE-5273.patch > > > As mentioned in various issues (e.g. LUCENE-3655, LUCENE-3885, SOLR-4766), we > release multiple versions of the same artifact: binary Maven artifacts are > not identical to the ones in the Lucene and Solr binary distributions, and > the Lucene jars in the Solr binary distribution, including within the war, > are not identical to the ones in the Lucene binary distribution. This is bad. > It's (probably always?) not horribly bad, since the differences all appear to > be caused by the build re-creating manifests and re-building jars and the > Solr war from their constituents at various points in the release build > process; as a result, manifest timestamp attributes, as well as archive > metadata (at least constituent timestamps, maybe other things?), differ each > time a jar is rebuilt. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Closed] (SOLR-5199) Restarting zookeeper makes the overseer stop processing queue events
[ https://issues.apache.org/jira/browse/SOLR-5199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jessica Cheng closed SOLR-5199. --- Resolution: Duplicate > Restarting zookeeper makes the overseer stop processing queue events > > > Key: SOLR-5199 > URL: https://issues.apache.org/jira/browse/SOLR-5199 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Affects Versions: 4.4 >Reporter: Jessica Cheng >Assignee: Mark Miller > Labels: overseer, zookeeper > Fix For: 4.5.1, 4.6, 5.0 > > Attachments: 5199-log > > > Taking the external zookeeper down (I'm just testing, so I only have one > external zookeeper instance running) and then bringing it back up seems to > have caused the overseer to stop processing queue event. > I tried to issue the delete collection command (curl > 'http://localhost:7574/solr/admin/collections?action=DELETE&name=c1') and > each time it just timed out. Looking at the zookeeper data, I see > ... > /overseer >collection-queue-work > qn-02 > qn-04 > qn-06 > ... > and the qn-xxx are not being processed. > Attached please find the log from the overseer (according to > /overseer_elect/leader). -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5199) Restarting zookeeper makes the overseer stop processing queue events
[ https://issues.apache.org/jira/browse/SOLR-5199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792778#comment-13792778 ] Jessica Cheng commented on SOLR-5199: - Sorry, I only saw this once and I didn't have time to investigate, so I don't know what the cause is. SOLR-5325 definitely sounds similar so I'll close this issue now. Thanks! > Restarting zookeeper makes the overseer stop processing queue events > > > Key: SOLR-5199 > URL: https://issues.apache.org/jira/browse/SOLR-5199 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Affects Versions: 4.4 >Reporter: Jessica Cheng >Assignee: Mark Miller > Labels: overseer, zookeeper > Fix For: 4.5.1, 4.6, 5.0 > > Attachments: 5199-log > > > Taking the external zookeeper down (I'm just testing, so I only have one > external zookeeper instance running) and then bringing it back up seems to > have caused the overseer to stop processing queue event. > I tried to issue the delete collection command (curl > 'http://localhost:7574/solr/admin/collections?action=DELETE&name=c1') and > each time it just timed out. Looking at the zookeeper data, I see > ... > /overseer >collection-queue-work > qn-02 > qn-04 > qn-06 > ... > and the qn-xxx are not being processed. > Attached please find the log from the overseer (according to > /overseer_elect/leader). -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4824) Fuzzy / Faceting results are changed after ingestion of documents past a certain number
[ https://issues.apache.org/jira/browse/SOLR-4824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792742#comment-13792742 ] Lakshmi Venkataswamy commented on SOLR-4824: I have tested 4.5.0 version and the same behavior has been observed. So we are staying with 3.6 in production for now. > Fuzzy / Faceting results are changed after ingestion of documents past a > certain number > > > Key: SOLR-4824 > URL: https://issues.apache.org/jira/browse/SOLR-4824 > Project: Solr > Issue Type: Bug >Affects Versions: 4.2, 4.3 > Environment: Ubuntu 12.04 LTS 12.04.2 > jre1.7.0_17 > jboss-as-7.1.1.Final >Reporter: Lakshmi Venkataswamy > > In upgrading from SOLR 3.6 to 4.2/4.3 and comparing results on fuzzy queries, > I found that after a certain number of documents were ingested the fuzzy > query had drastically lower number of results. We have approximately 18,000 > documents per day and after ingesting approximately 40 days of documents, the > next incremental day of documents results in a lower number of results of a > fuzzy search. > The query : > http://10.100.1.xx:8080/solr/corex/select?q=cc:worde~1&facet=on&facet.field=date&fl=date&facet.sort > produces the following result before the threshold is crossed > > 02349 name="facet">ondate > cc:worde~1 name="facet.field">date numFound="362803" start="0"> > name="facet_fields"> > 2866 > 11372 > 11514 > 12015 > 11746 > 10853 > 11053 > 11815 > 11427 > 11475 > 11461 > 12058 > 11335 > 12039 > 12064 > 12234 > 12545 > 11766 > 12197 > 11414 > 11633 > 12863 > 12378 > 11947 > 11822 > 11882 > 10474 > 11051 > 11776 > 11957 > 11260 > 8511 > name="facet_ranges"/> > Once the 40 days of documents ingested threshold is crossed the results drop > as show below for the same query > > 02 name="facet">ondate name="q">cc:worde~1date > > name="facet_fields"> > 0 > 41 > 21 > 24 > 19 > 9 > 11 > 17 > 14 > 24 > 43 > 14 > 52 > 57 > 25 > 17 > 34 > 11 > 16 > 121 > 33 > 26 > 59 > 27 > 10 > 9 > 6 > 16 > 11 > 15 > 21 > 109 > 11 > 7 > 10 > 8 > 13 > 75 > 77 > 31 > 35 > 22 > 18 > 11 > 68 > 40 > name="facet_ranges"/> > I have also tested this with different months of data and have seen the same > issue around the number of documents. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5325) zk connection loss causes overseer leader loss
[ https://issues.apache.org/jira/browse/SOLR-5325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792695#comment-13792695 ] ASF subversion and git services commented on SOLR-5325: --- Commit 1531327 from [~markrmil...@gmail.com] in branch 'dev/branches/lucene_solr_4_5' [ https://svn.apache.org/r1531327 ] SOLR-5325: raise retry padding a bit > zk connection loss causes overseer leader loss > -- > > Key: SOLR-5325 > URL: https://issues.apache.org/jira/browse/SOLR-5325 > Project: Solr > Issue Type: Bug >Affects Versions: 4.3, 4.4, 4.5 >Reporter: Christine Poerschke >Assignee: Mark Miller > Fix For: 4.5.1, 4.6, 5.0 > > Attachments: SOLR-5325.patch, SOLR-5325.patch, SOLR-5325.patch > > > The problem we saw was that when the solr overseer leader experienced > temporary zk connectivity problems it stopped processing overseer queue > events. > This first happened when quorum within the external zk ensemble was lost due > to too many zookeepers being stopped (similar to SOLR-5199). The second time > it happened when there was a sufficient number of zookeepers but they were > holding zookeeper leadership elections and thus refused connections (the > elections were taking several seconds, we were using the default > zookeeper.cnxTimeout=5s value and it was hit for one ensemble member). -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5325) zk connection loss causes overseer leader loss
[ https://issues.apache.org/jira/browse/SOLR-5325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792694#comment-13792694 ] ASF subversion and git services commented on SOLR-5325: --- Commit 1531325 from [~markrmil...@gmail.com] in branch 'dev/branches/lucene_solr_4_5' [ https://svn.apache.org/r1531325 ] SOLR-5325: ZooKeeper connection loss can cause the Overseer to stop processing commands. > zk connection loss causes overseer leader loss > -- > > Key: SOLR-5325 > URL: https://issues.apache.org/jira/browse/SOLR-5325 > Project: Solr > Issue Type: Bug >Affects Versions: 4.3, 4.4, 4.5 >Reporter: Christine Poerschke >Assignee: Mark Miller > Fix For: 4.5.1, 4.6, 5.0 > > Attachments: SOLR-5325.patch, SOLR-5325.patch, SOLR-5325.patch > > > The problem we saw was that when the solr overseer leader experienced > temporary zk connectivity problems it stopped processing overseer queue > events. > This first happened when quorum within the external zk ensemble was lost due > to too many zookeepers being stopped (similar to SOLR-5199). The second time > it happened when there was a sufficient number of zookeepers but they were > holding zookeeper leadership elections and thus refused connections (the > elections were taking several seconds, we were using the default > zookeeper.cnxTimeout=5s value and it was hit for one ensemble member). -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5308) Split all documents of a route key into another collection
[ https://issues.apache.org/jira/browse/SOLR-5308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792692#comment-13792692 ] Shalin Shekhar Mangar commented on SOLR-5308: - For splitting a single source shard into a single target collection/shard by a route key such as: {code} /admin/collections?action=migrate&collection=collection1&split.key=A!&shard=shardX&target.collection=collection2&target.shard=shardY {code} A rough strategy could be to: # Create new core X on source # Create new core Y on target # Ask target core to buffer updates # Start forwarding updates for route key received by source shard to target collection # Split source shard to a new core X # Ask Y to replicate fully from X # Core Admin merge Y to target core # Ask target core to replay buffer updates > Split all documents of a route key into another collection > -- > > Key: SOLR-5308 > URL: https://issues.apache.org/jira/browse/SOLR-5308 > Project: Solr > Issue Type: New Feature > Components: SolrCloud >Reporter: Shalin Shekhar Mangar >Assignee: Shalin Shekhar Mangar > Fix For: 4.6, 5.0 > > > Enable SolrCloud users to split out a set of documents from a source > collection into another collection. > This will be useful in multi-tenant environments. This feature will make it > possible to split a tenant out of a collection and put them into their own > collection which can be scaled separately. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5325) zk connection loss causes overseer leader loss
[ https://issues.apache.org/jira/browse/SOLR-5325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792689#comment-13792689 ] ASF subversion and git services commented on SOLR-5325: --- Commit 1531324 from [~markrmil...@gmail.com] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1531324 ] SOLR-5325: raise retry padding a bit > zk connection loss causes overseer leader loss > -- > > Key: SOLR-5325 > URL: https://issues.apache.org/jira/browse/SOLR-5325 > Project: Solr > Issue Type: Bug >Affects Versions: 4.3, 4.4, 4.5 >Reporter: Christine Poerschke >Assignee: Mark Miller > Fix For: 4.5.1, 4.6, 5.0 > > Attachments: SOLR-5325.patch, SOLR-5325.patch, SOLR-5325.patch > > > The problem we saw was that when the solr overseer leader experienced > temporary zk connectivity problems it stopped processing overseer queue > events. > This first happened when quorum within the external zk ensemble was lost due > to too many zookeepers being stopped (similar to SOLR-5199). The second time > it happened when there was a sufficient number of zookeepers but they were > holding zookeeper leadership elections and thus refused connections (the > elections were taking several seconds, we were using the default > zookeeper.cnxTimeout=5s value and it was hit for one ensemble member). -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5325) zk connection loss causes overseer leader loss
[ https://issues.apache.org/jira/browse/SOLR-5325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792688#comment-13792688 ] ASF subversion and git services commented on SOLR-5325: --- Commit 1531323 from [~markrmil...@gmail.com] in branch 'dev/trunk' [ https://svn.apache.org/r1531323 ] SOLR-5325: raise retry padding a bit > zk connection loss causes overseer leader loss > -- > > Key: SOLR-5325 > URL: https://issues.apache.org/jira/browse/SOLR-5325 > Project: Solr > Issue Type: Bug >Affects Versions: 4.3, 4.4, 4.5 >Reporter: Christine Poerschke >Assignee: Mark Miller > Fix For: 4.5.1, 4.6, 5.0 > > Attachments: SOLR-5325.patch, SOLR-5325.patch, SOLR-5325.patch > > > The problem we saw was that when the solr overseer leader experienced > temporary zk connectivity problems it stopped processing overseer queue > events. > This first happened when quorum within the external zk ensemble was lost due > to too many zookeepers being stopped (similar to SOLR-5199). The second time > it happened when there was a sufficient number of zookeepers but they were > holding zookeeper leadership elections and thus refused connections (the > elections were taking several seconds, we were using the default > zookeeper.cnxTimeout=5s value and it was hit for one ensemble member). -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5325) zk connection loss causes overseer leader loss
[ https://issues.apache.org/jira/browse/SOLR-5325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792684#comment-13792684 ] Mark Miller commented on SOLR-5325: --- I'm still kind of surprised this would happen - we should be retrying on connectionloss up to an expiration - which would make us the leader no longer. Perhaps the length of retrying can be a little short or something. And perhaps that is part of why it is more difficult for me to reproduce in a test. > zk connection loss causes overseer leader loss > -- > > Key: SOLR-5325 > URL: https://issues.apache.org/jira/browse/SOLR-5325 > Project: Solr > Issue Type: Bug >Affects Versions: 4.3, 4.4, 4.5 >Reporter: Christine Poerschke >Assignee: Mark Miller > Fix For: 4.5.1, 4.6, 5.0 > > Attachments: SOLR-5325.patch, SOLR-5325.patch, SOLR-5325.patch > > > The problem we saw was that when the solr overseer leader experienced > temporary zk connectivity problems it stopped processing overseer queue > events. > This first happened when quorum within the external zk ensemble was lost due > to too many zookeepers being stopped (similar to SOLR-5199). The second time > it happened when there was a sufficient number of zookeepers but they were > holding zookeeper leadership elections and thus refused connections (the > elections were taking several seconds, we were using the default > zookeeper.cnxTimeout=5s value and it was hit for one ensemble member). -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5199) Restarting zookeeper makes the overseer stop processing queue events
[ https://issues.apache.org/jira/browse/SOLR-5199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792679#comment-13792679 ] Mark Miller commented on SOLR-5199: --- Hey Jessica - if we can confirm this is the same issue as SOLR-5325, we can close this as a duplicate. > Restarting zookeeper makes the overseer stop processing queue events > > > Key: SOLR-5199 > URL: https://issues.apache.org/jira/browse/SOLR-5199 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Affects Versions: 4.4 >Reporter: Jessica Cheng >Assignee: Mark Miller > Labels: overseer, zookeeper > Fix For: 4.5.1, 4.6, 5.0 > > Attachments: 5199-log > > > Taking the external zookeeper down (I'm just testing, so I only have one > external zookeeper instance running) and then bringing it back up seems to > have caused the overseer to stop processing queue event. > I tried to issue the delete collection command (curl > 'http://localhost:7574/solr/admin/collections?action=DELETE&name=c1') and > each time it just timed out. Looking at the zookeeper data, I see > ... > /overseer >collection-queue-work > qn-02 > qn-04 > qn-06 > ... > and the qn-xxx are not being processed. > Attached please find the log from the overseer (according to > /overseer_elect/leader). -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-5325) zk connection loss causes overseer leader loss
[ https://issues.apache.org/jira/browse/SOLR-5325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792671#comment-13792671 ] Mark Miller edited comment on SOLR-5325 at 10/11/13 2:50 PM: - Added some more testing that I thought would catch it, but it has not yet on my system. Still poking around a bit. Anyway, I've committed the fix. was (Author: markrmil...@gmail.com): Add some more testing that I thought would catch it, but it has not yet on my system. Still poking around a bit. Anyway, I've committed the fix. > zk connection loss causes overseer leader loss > -- > > Key: SOLR-5325 > URL: https://issues.apache.org/jira/browse/SOLR-5325 > Project: Solr > Issue Type: Bug >Affects Versions: 4.3, 4.4, 4.5 >Reporter: Christine Poerschke >Assignee: Mark Miller > Fix For: 4.5.1, 4.6, 5.0 > > Attachments: SOLR-5325.patch, SOLR-5325.patch, SOLR-5325.patch > > > The problem we saw was that when the solr overseer leader experienced > temporary zk connectivity problems it stopped processing overseer queue > events. > This first happened when quorum within the external zk ensemble was lost due > to too many zookeepers being stopped (similar to SOLR-5199). The second time > it happened when there was a sufficient number of zookeepers but they were > holding zookeeper leadership elections and thus refused connections (the > elections were taking several seconds, we were using the default > zookeeper.cnxTimeout=5s value and it was hit for one ensemble member). -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5274) Teach fast FastVectorHighlighter to highlight "child fields" with parent fields
[ https://issues.apache.org/jira/browse/LUCENE-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nik Everett updated LUCENE-5274: Attachment: LUCENE-5274-4.patch Reworked to remove dependency on query parser and most of the analyzer dependency and to fix errors with phrases. It'll need to lose the rest of the analyzer dependency and have more test cases in addition to any other concerns raised in the review. > Teach fast FastVectorHighlighter to highlight "child fields" with parent > fields > --- > > Key: LUCENE-5274 > URL: https://issues.apache.org/jira/browse/LUCENE-5274 > Project: Lucene - Core > Issue Type: Improvement > Components: core/other >Reporter: Nik Everett >Assignee: Adrien Grand >Priority: Minor > Attachments: LUCENE-5274-4.patch, LUCENE-5274.patch > > > I've been messing around with the FastVectorHighlighter and it looks like I > can teach it to highlight matches on "child fields". Like this query: > foo:scissors foo_exact:running > would highlight foo like this: > running with scissors > Where foo is stored WITH_POSITIONS_OFFSETS and foo_plain is an unstored copy > of foo a different analyzer and its own WITH_POSITIONS_OFFSETS. > This would make queries that perform weighted matches against different > analyzers much more convenient to highlight. > I have working code and test cases but they are hacked into Elasticsearch. > I'd love to Lucene-ify if you'll take them. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5325) zk connection loss causes overseer leader loss
[ https://issues.apache.org/jira/browse/SOLR-5325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792671#comment-13792671 ] Mark Miller commented on SOLR-5325: --- Add some more testing that I thought would catch it, but it has not yet on my system. Still poking around a bit. Anyway, I've committed the fix. > zk connection loss causes overseer leader loss > -- > > Key: SOLR-5325 > URL: https://issues.apache.org/jira/browse/SOLR-5325 > Project: Solr > Issue Type: Bug >Affects Versions: 4.3, 4.4, 4.5 >Reporter: Christine Poerschke >Assignee: Mark Miller > Fix For: 4.5.1, 4.6, 5.0 > > Attachments: SOLR-5325.patch, SOLR-5325.patch, SOLR-5325.patch > > > The problem we saw was that when the solr overseer leader experienced > temporary zk connectivity problems it stopped processing overseer queue > events. > This first happened when quorum within the external zk ensemble was lost due > to too many zookeepers being stopped (similar to SOLR-5199). The second time > it happened when there was a sufficient number of zookeepers but they were > holding zookeeper leadership elections and thus refused connections (the > elections were taking several seconds, we were using the default > zookeeper.cnxTimeout=5s value and it was hit for one ensemble member). -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5260) Make older Suggesters more accepting of TermFreqPayloadIterator
[ https://issues.apache.org/jira/browse/LUCENE-5260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792662#comment-13792662 ] Michael McCandless commented on LUCENE-5260: Thanks Areek, patch looks great! I like the hasPayloads() up-front introspection. In UnsortedTermFreqIteratorWrapper.payload(), why do we set currentOrd as a side effect? Shouldn't next() already do that? Maybe, we should instead assert currentOrd == ords[curPos]? Also, can we break that sneaky currentOrd assignment in next into its own line before? > Make older Suggesters more accepting of TermFreqPayloadIterator > --- > > Key: LUCENE-5260 > URL: https://issues.apache.org/jira/browse/LUCENE-5260 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Reporter: Areek Zillur > Attachments: LUCENE-5260.patch > > > As discussed in https://issues.apache.org/jira/browse/LUCENE-5251, it would > be nice to make the older suggesters accepting of TermFreqPayloadIterator and > throw an exception if payload is found (if it cannot be used). > This will also allow us to nuke most of the other interfaces for > BytesRefIterator. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5325) zk connection loss causes overseer leader loss
[ https://issues.apache.org/jira/browse/SOLR-5325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792663#comment-13792663 ] ASF subversion and git services commented on SOLR-5325: --- Commit 1531315 from [~markrmil...@gmail.com] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1531315 ] SOLR-5325: ZooKeeper connection loss can cause the Overseer to stop processing commands. > zk connection loss causes overseer leader loss > -- > > Key: SOLR-5325 > URL: https://issues.apache.org/jira/browse/SOLR-5325 > Project: Solr > Issue Type: Bug >Affects Versions: 4.3, 4.4, 4.5 >Reporter: Christine Poerschke >Assignee: Mark Miller > Fix For: 4.5.1, 4.6, 5.0 > > Attachments: SOLR-5325.patch, SOLR-5325.patch, SOLR-5325.patch > > > The problem we saw was that when the solr overseer leader experienced > temporary zk connectivity problems it stopped processing overseer queue > events. > This first happened when quorum within the external zk ensemble was lost due > to too many zookeepers being stopped (similar to SOLR-5199). The second time > it happened when there was a sufficient number of zookeepers but they were > holding zookeeper leadership elections and thus refused connections (the > elections were taking several seconds, we were using the default > zookeeper.cnxTimeout=5s value and it was hit for one ensemble member). -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5325) zk connection loss causes overseer leader loss
[ https://issues.apache.org/jira/browse/SOLR-5325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792661#comment-13792661 ] ASF subversion and git services commented on SOLR-5325: --- Commit 1531313 from [~markrmil...@gmail.com] in branch 'dev/trunk' [ https://svn.apache.org/r1531313 ] SOLR-5325: ZooKeeper connection loss can cause the Overseer to stop processing commands. > zk connection loss causes overseer leader loss > -- > > Key: SOLR-5325 > URL: https://issues.apache.org/jira/browse/SOLR-5325 > Project: Solr > Issue Type: Bug >Affects Versions: 4.3, 4.4, 4.5 >Reporter: Christine Poerschke >Assignee: Mark Miller > Fix For: 4.5.1, 4.6, 5.0 > > Attachments: SOLR-5325.patch, SOLR-5325.patch, SOLR-5325.patch > > > The problem we saw was that when the solr overseer leader experienced > temporary zk connectivity problems it stopped processing overseer queue > events. > This first happened when quorum within the external zk ensemble was lost due > to too many zookeepers being stopped (similar to SOLR-5199). The second time > it happened when there was a sufficient number of zookeepers but they were > holding zookeeper leadership elections and thus refused connections (the > elections were taking several seconds, we were using the default > zookeeper.cnxTimeout=5s value and it was hit for one ensemble member). -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5252) add NGramSynonymTokenizer
[ https://issues.apache.org/jira/browse/LUCENE-5252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi updated LUCENE-5252: --- Attachment: LUCENE-5252_4x.patch Fix a bug regarding ignoreCase in the attached patch. > add NGramSynonymTokenizer > - > > Key: LUCENE-5252 > URL: https://issues.apache.org/jira/browse/LUCENE-5252 > Project: Lucene - Core > Issue Type: New Feature > Components: modules/analysis >Reporter: Koji Sekiguchi >Priority: Minor > Attachments: LUCENE-5252_4x.patch, LUCENE-5252_4x.patch, > LUCENE-5252_4x.patch > > > I'd like to propose that we have another n-gram tokenizer which can process > synonyms. That is NGramSynonymTokenizer. Note that in this ticket, the gram > size is fixed, i.e. minGramSize = maxGramSize. > Today, I think we have the following problems when using SynonymFilter with > NGramTokenizer. > For purpose of illustration, we have a synonym setting "ABC, DEFG" w/ > expand=true and N = 2 (2-gram). > # There is no consensus (I think :-) how we assign offsets to generated > synonym tokens DE, EF and FG when expanding source token AB and BC. > # If the query pattern looks like ABCY, it cannot be matched even if there is > a document "…ABCY…" in index when autoGeneratePhraseQueries set to true, > because there is no "CY" token (but "GY" is there) in the index. > NGramSynonymTokenizer can solve these problems by providing the following > methods. > * NGramSynonymTokenizer reads synonym settings (synonyms.txt) and it doesn't > tokenize registered words. e.g. > ||source text||NGramTokenizer+SynonymFilter||NGramSynonymTokenizer|| > |ABC|AB/DE/BC/EF/FG|ABC/DEFG| > * The back and forth of the registered words, NGramSynonymTokenizer generates > *extra* tokens w/ posInc=0. e.g. > ||source text||NGramTokenizer+SynonymFilter||NGramSynonymTokenizer|| > |XYZABC123|XY/YZ/ZA/AB/DE/BC/EF/C1/FG/12/23|XY/YZ/Z/ABC/DEFG/1/12/23| > In the above sample, "Z" and "1" are the extra tokens. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5338) Split shards by a route key
[ https://issues.apache.org/jira/browse/SOLR-5338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792600#comment-13792600 ] Shalin Shekhar Mangar commented on SOLR-5338: - [~ysee...@gmail.com] - Would you mind reviewing the new CompositeIdRouter methods? > Split shards by a route key > --- > > Key: SOLR-5338 > URL: https://issues.apache.org/jira/browse/SOLR-5338 > Project: Solr > Issue Type: Improvement > Components: SolrCloud >Reporter: Shalin Shekhar Mangar >Assignee: Shalin Shekhar Mangar > Fix For: 4.6, 5.0 > > Attachments: SOLR-5338.patch > > > Provide a way to split a shard using a route key such that all documents of > the specified route key end up in a single dedicated sub-shard. > Example: > Assume that collection1, shard1 has hash range [0, 20]. Also that route key > 'A!' has hash range [12,15]. Then invoking: > {code} > /admin/collections?action=SPLIT&collection=collection1&split.key=A! > {code} > should produce three sub-shards with hash range [0,11], [12,15] and [16,20]. > Specifying the source shard is not required here because the route key is > enough to figure it out. Route keys spanning more than one shards will not be > supported. > Note that the sub-shard with the hash range of the route key may also contain > documents for other route keys whose hashes collide. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: [JENKINS] Lucene-trunk-Linux-Java7-64-test-only - Build # 62737 - Failure!
Hihi, FYI: I have a compilation unit here (non-Lucene) that also segfaults on JDK 7.0u25, if you don't do "ant clean" before. If there are already existing class files and only modified ones are recompiled it always segfaults. Reproducible, but I have no idea what causes this. :-) Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: simon.willna...@gmail.com [mailto:simon.willna...@gmail.com] On > Behalf Of Simon Willnauer > Sent: Friday, October 11, 2013 2:50 PM > Cc: dev@lucene.apache.org > Subject: Re: [JENKINS] Lucene-trunk-Linux-Java7-64-test-only - Build # 62737 > - Failure! > > ok maybe updateing the JDK would be a good idea :) > > > > On Fri, Oct 11, 2013 at 2:46 PM, wrote: > > Build: > > builds.flonkings.com/job/Lucene-trunk-Linux-Java7-64-test-only/62737/ > > > > No tests ran. > > > > Build Log: > > [...truncated 61 lines...] > > - > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional > commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [JENKINS] Lucene-trunk-Linux-Java7-64-test-only - Build # 62737 - Failure!
ok maybe updateing the JDK would be a good idea :) On Fri, Oct 11, 2013 at 2:46 PM, wrote: > Build: builds.flonkings.com/job/Lucene-trunk-Linux-Java7-64-test-only/62737/ > > No tests ran. > > Build Log: > [...truncated 61 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-trunk-Linux-Java7-64-test-only - Build # 62737 - Failure!
Build: builds.flonkings.com/job/Lucene-trunk-Linux-Java7-64-test-only/62737/ No tests ran. Build Log: [...truncated 61 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5310) Add a collection admin command to remove a replica
[ https://issues.apache.org/jira/browse/SOLR-5310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Noble Paul updated SOLR-5310: - Attachment: SOLR-5310.patch > Add a collection admin command to remove a replica > -- > > Key: SOLR-5310 > URL: https://issues.apache.org/jira/browse/SOLR-5310 > Project: Solr > Issue Type: Improvement > Components: SolrCloud >Reporter: Noble Paul >Assignee: Noble Paul > Attachments: SOLR-5310.patch, SOLR-5310.patch > > Original Estimate: 72h > Remaining Estimate: 72h > > the only way a replica can removed is by unloading the core .There is no way > to remove a replica that is down . So, the clusterstate will have > unreferenced nodes if a few nodes go down over time > We need a cluster admin command to clean that up > e.g: > /admin/collections?action=DELETEREPLICA&collection=coll1&shard=shard1&replica=core_node3 > The system would first see if the replica is active. If yes , a core UNLOAD > command is fired , which would take care of deleting the replica from the > clusterstate as well > if the state is inactive, then the core or node may be down , in that case > the entry is removed from cluster state -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5310) Add a collection admin command to remove a replica
[ https://issues.apache.org/jira/browse/SOLR-5310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Noble Paul updated SOLR-5310: - Attachment: (was: SOLR-5310-1.patch) > Add a collection admin command to remove a replica > -- > > Key: SOLR-5310 > URL: https://issues.apache.org/jira/browse/SOLR-5310 > Project: Solr > Issue Type: Improvement > Components: SolrCloud >Reporter: Noble Paul >Assignee: Noble Paul > Attachments: SOLR-5310.patch, SOLR-5310.patch > > Original Estimate: 72h > Remaining Estimate: 72h > > the only way a replica can removed is by unloading the core .There is no way > to remove a replica that is down . So, the clusterstate will have > unreferenced nodes if a few nodes go down over time > We need a cluster admin command to clean that up > e.g: > /admin/collections?action=DELETEREPLICA&collection=coll1&shard=shard1&replica=core_node3 > The system would first see if the replica is active. If yes , a core UNLOAD > command is fired , which would take care of deleting the replica from the > clusterstate as well > if the state is inactive, then the core or node may be down , in that case > the entry is removed from cluster state -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5310) Add a collection admin command to remove a replica
[ https://issues.apache.org/jira/browse/SOLR-5310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Noble Paul updated SOLR-5310: - Attachment: SOLR-5310-1.patch The testcases still fail occassionally > Add a collection admin command to remove a replica > -- > > Key: SOLR-5310 > URL: https://issues.apache.org/jira/browse/SOLR-5310 > Project: Solr > Issue Type: Improvement > Components: SolrCloud >Reporter: Noble Paul >Assignee: Noble Paul > Attachments: SOLR-5310-1.patch, SOLR-5310.patch > > Original Estimate: 72h > Remaining Estimate: 72h > > the only way a replica can removed is by unloading the core .There is no way > to remove a replica that is down . So, the clusterstate will have > unreferenced nodes if a few nodes go down over time > We need a cluster admin command to clean that up > e.g: > /admin/collections?action=DELETEREPLICA&collection=coll1&shard=shard1&replica=core_node3 > The system would first see if the replica is active. If yes , a core UNLOAD > command is fired , which would take care of deleting the replica from the > clusterstate as well > if the state is inactive, then the core or node may be down , in that case > the entry is removed from cluster state -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-NightlyTests-trunk - Build # 407 - Failure
Build: https://builds.apache.org/job/Lucene-Solr-NightlyTests-trunk/407/ 1 tests failed. REGRESSION: org.apache.lucene.index.Test2BPostings.test Error Message: Java heap space Stack Trace: java.lang.OutOfMemoryError: Java heap space at __randomizedtesting.SeedInfo.seed([D8D3920C725BF71C:5087ADD6DCA79AE4]:0) at org.apache.lucene.store.BufferedIndexOutput.(BufferedIndexOutput.java:50) at org.apache.lucene.store.FSDirectory$FSIndexOutput.(FSDirectory.java:365) at org.apache.lucene.store.FSDirectory.createOutput(FSDirectory.java:280) at org.apache.lucene.store.NRTCachingDirectory.createOutput(NRTCachingDirectory.java:206) at org.apache.lucene.store.MockDirectoryWrapper.createOutput(MockDirectoryWrapper.java:478) at org.apache.lucene.store.TrackingDirectoryWrapper.createOutput(TrackingDirectoryWrapper.java:44) at org.apache.lucene.store.CompoundFileWriter.close(CompoundFileWriter.java:149) at org.apache.lucene.store.CompoundFileDirectory.close(CompoundFileDirectory.java:171) at org.apache.lucene.util.IOUtils.closeWhileHandlingException(IOUtils.java:80) at org.apache.lucene.index.IndexWriter.createCompoundFile(IndexWriter.java:4408) at org.apache.lucene.index.DocumentsWriterPerThread.sealFlushedSegment(DocumentsWriterPerThread.java:535) at org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:502) at org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:506) at org.apache.lucene.index.DocumentsWriter.postUpdate(DocumentsWriter.java:378) at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:470) at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1523) at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1193) at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1174) at org.apache.lucene.index.Test2BPostings.test(Test2BPostings.java:76) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559) at com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79) at com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737) at com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773) at com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787) at org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50) at org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46) at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55) Build Log: [...truncated 655 lines...] [junit4] Suite: org.apache.lucene.index.Test2BPostings [junit4] 2> NOTE: download the large Jenkins line-docs file by running 'ant get-jenkins-line-docs' in the lucene directory. [junit4] 2> NOTE: reproduce with: ant test -Dtestcase=Test2BPostings -Dtests.method=test -Dtests.seed=D8D3920C725BF71C -Dtests.multiplier=2 -Dtests.nightly=true -Dtests.slow=true -Dtests.linedocsfile=/home/hudson/lucene-data/enwiki.random.lines.txt -Dtests.locale=en_IN -Dtests.timezone=America/Puerto_Rico -Dtests.file.encoding=US-ASCII [junit4] ERROR408s J0 | Test2BPostings.test <<< [junit4]> Throwable #1: java.lang.OutOfMemoryError: Java heap space [junit4]>at __randomizedtesting.SeedInfo.seed([D8D3920C725BF71C:5087ADD6DCA79AE4]:0) [junit4]>at org.apache.lucene.store.BufferedIndexOutput.(BufferedIndexOutput.java:50) [junit4]>at org.apache.lucene.store.FSDirectory$FSIndexOutput.(FSDirectory.java:365) [junit4]>at org.apache.lucene.store.FSDirectory.createOutput(FSDirectory.java:280) [junit4]>at org.apache.lucene.store.NRTCachingDirectory.createOutput(NRTCachingDirectory.java:206) [junit4]>at org.apache.lucene.store.MockDirectoryWrapper.createOutput(MockDirectoryWrapper.java:478) [junit4]>at org.apache.lucene.store.TrackingDirectoryWrapper.createOutput(TrackingDirectoryWrapper.java:44) [junit4]>at org.apache.lucene.store.CompoundFileWriter.close(CompoundFileWriter.java:149) [junit4]>at org.apache.lucene.store.CompoundFileDirectory.close(C
[jira] [Updated] (SOLR-5308) Split all documents of a route key into another collection
[ https://issues.apache.org/jira/browse/SOLR-5308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar updated SOLR-5308: Attachment: (was: SOLR-5308.patch) > Split all documents of a route key into another collection > -- > > Key: SOLR-5308 > URL: https://issues.apache.org/jira/browse/SOLR-5308 > Project: Solr > Issue Type: New Feature > Components: SolrCloud >Reporter: Shalin Shekhar Mangar >Assignee: Shalin Shekhar Mangar > Fix For: 4.6, 5.0 > > > Enable SolrCloud users to split out a set of documents from a source > collection into another collection. > This will be useful in multi-tenant environments. This feature will make it > possible to split a tenant out of a collection and put them into their own > collection which can be scaled separately. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5320) Multi level compositeId router
[ https://issues.apache.org/jira/browse/SOLR-5320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792478#comment-13792478 ] Anshum Gupta commented on SOLR-5320: A 3 level composite id routing to begin with is what I think would be good. I'd use 8 bits each from the first 2 components of the key and 16 bits from the last component. Functionally, this should work on similar lines as the current 2-level composite id routing. > Multi level compositeId router > -- > > Key: SOLR-5320 > URL: https://issues.apache.org/jira/browse/SOLR-5320 > Project: Solr > Issue Type: New Feature > Components: SolrCloud >Reporter: Anshum Gupta > Original Estimate: 336h > Remaining Estimate: 336h > > This would enable multi level routing as compared to the 2 level routing > available as of now. On the usage bit, here's an example: > Document Id: myapp!dummyuser!doc > myapp!dummyuser! can be used as the shardkey for searching content for > dummyuser. > myapp! can be used for searching across all users of myapp. > I am looking at either a 3 (or 4) level routing. The 32 bit hash would then > comprise of 8X4 components from each part (in case of 4 level). -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5339) solr-core-4.4's ip is not right when the os is centos 5.6 sometimes
[ https://issues.apache.org/jira/browse/SOLR-5339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] dejie Chang updated SOLR-5339: -- Description: when I install the solr-cloud on the centos5.6 . it is strange that sometimes ,the ip is not correct which is displayed on the http://192.168.10.54:8081/solr/#/~cloud , it is 202.106.199.36,but my actual is 192.168.10.54. but on the windows it is right. and i found it is because of hostaddress = InetAddress.getLocalHost().getHostAddress(); in ZkController.java . sometimes the method which get ip is not correct .we should not trust . so i think in linux we should not use this method (was: when I install the solr-cloud on the centos5.6 . t) > solr-core-4.4's ip is not right when the os is centos 5.6 sometimes > > > Key: SOLR-5339 > URL: https://issues.apache.org/jira/browse/SOLR-5339 > Project: Solr > Issue Type: Bug > Components: contrib - Clustering >Affects Versions: 4.4 > Environment: centos 5.6 >Reporter: dejie Chang >Priority: Critical > > when I install the solr-cloud on the centos5.6 . it is strange that sometimes > ,the ip is not correct which is displayed on the > http://192.168.10.54:8081/solr/#/~cloud , it is 202.106.199.36,but my actual > is 192.168.10.54. but on the windows it is right. and i found it is because > of hostaddress = InetAddress.getLocalHost().getHostAddress(); in > ZkController.java . sometimes the method which get ip is not correct .we > should not trust . so i think in linux we should not use this method -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5338) Split shards by a route key
[ https://issues.apache.org/jira/browse/SOLR-5338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar updated SOLR-5338: Attachment: SOLR-5338.patch Changes: * Introduces two new methods in CompositeIdRouter {code} public List partitionRangeByKey(String key, Range range) {code} and {code} public Range routeKeyHashRange(String routeKey) {code} * The collection split action accepts a new parameter 'split.key' * The parent slice is found and its range is partitioned according to split.key * We re-use the logic introduced in SOLR-5300 to do the actual splitting. > Split shards by a route key > --- > > Key: SOLR-5338 > URL: https://issues.apache.org/jira/browse/SOLR-5338 > Project: Solr > Issue Type: Improvement > Components: SolrCloud >Reporter: Shalin Shekhar Mangar >Assignee: Shalin Shekhar Mangar > Fix For: 4.6, 5.0 > > Attachments: SOLR-5338.patch > > > Provide a way to split a shard using a route key such that all documents of > the specified route key end up in a single dedicated sub-shard. > Example: > Assume that collection1, shard1 has hash range [0, 20]. Also that route key > 'A!' has hash range [12,15]. Then invoking: > {code} > /admin/collections?action=SPLIT&collection=collection1&split.key=A! > {code} > should produce three sub-shards with hash range [0,11], [12,15] and [16,20]. > Specifying the source shard is not required here because the route key is > enough to figure it out. Route keys spanning more than one shards will not be > supported. > Note that the sub-shard with the hash range of the route key may also contain > documents for other route keys whose hashes collide. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-5339) solr-core-4.4's ip is not right when the os is centos 5.6 sometimes
dejie Chang created SOLR-5339: - Summary: solr-core-4.4's ip is not right when the os is centos 5.6 sometimes Key: SOLR-5339 URL: https://issues.apache.org/jira/browse/SOLR-5339 Project: Solr Issue Type: Bug Components: contrib - Clustering Affects Versions: 4.4 Environment: centos 5.6 Reporter: dejie Chang Priority: Critical when I install the solr-cloud on the centos5.6 . t -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5290) Warming up using search logs.
[ https://issues.apache.org/jira/browse/SOLR-5290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792471#comment-13792471 ] Minoru Osuka commented on SOLR-5290: The patch includes test code. > Warming up using search logs. > - > > Key: SOLR-5290 > URL: https://issues.apache.org/jira/browse/SOLR-5290 > Project: Solr > Issue Type: Wish > Components: search >Affects Versions: 4.4 >Reporter: Minoru Osuka >Priority: Minor > Attachments: SOLR-5290.patch > > > It is possible to warm up of cache automatically in newSearcher event, but it > is impossible to warm up of cache automatically in firstSearcher event > because there isn't old searcher. > We describe queries in solrconfig.xml if we required to cache in > firstSearcher event like this: > {code:xml} > > > > static firstSearcher warming in solrconfig.xml > > > > {code} > This setting is very statically. I want to query dynamically in firstSearcher > event when restart solr. So I paid my attention to the past search log. I > think if there are past search logs, it is possible to warm up of cache > automatically in firstSearcher event like an autowarming of the cache in > newSearcher event. > I had created QueryLogSenderListener which extended QuerySenderListener. > Sample definition in solrconfig.xml: > - directory : Specify the Solr log directory. (Required) > - regex : Describe the regular expression of log. (Required) > - encoding : Specify the Solr log encoding. (Default : UTF-8) > - count : Specify the number of the log to process. (Default : 100) > - paths : Specify the request handler name to process. > - exclude_params : Specify the request parameter to except. > {code:xml} > > > > > static firstSearcher warming in solrconfig.xml > > > logs > UTF-8 >name="regex"> > > /select > > 100 > > indent > _ > > > {code} > I'd like to propose this feature. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5269) TestRandomChains failure
[ https://issues.apache.org/jira/browse/LUCENE-5269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792461#comment-13792461 ] Uwe Schindler commented on LUCENE-5269: --- bq. I didnt want new features mixed with bugfixes really I agree! But now we have the "new feature", so I just asked to add this as a separate entry in CHANGES.txt under "New features", just the new filter nothing more. > TestRandomChains failure > > > Key: LUCENE-5269 > URL: https://issues.apache.org/jira/browse/LUCENE-5269 > Project: Lucene - Core > Issue Type: Bug >Reporter: Robert Muir > Fix For: 4.5.1, 4.6, 5.0 > > Attachments: LUCENE-5269.patch, LUCENE-5269.patch, LUCENE-5269.patch, > LUCENE-5269_test.patch, LUCENE-5269_test.patch, LUCENE-5269_test.patch > > > One of EdgeNGramTokenizer, ShingleFilter, NGramTokenFilter is buggy, or > possibly only the combination of them conspiring together. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-5260) Make older Suggesters more accepting of TermFreqPayloadIterator
[ https://issues.apache.org/jira/browse/LUCENE-5260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Areek Zillur updated LUCENE-5260: - Attachment: LUCENE-5260.patch Uploaded Patch: - changed the input to lookup.build to take TermFreqPayloadIterator instead of TermFreqPayloadIterator - made all suggesters compatible with termFreqPayloadIterator (but error if payload is present but cannot be used) - nuked all implementations of TermFreq and made them work with termFreqPayload instead (Except for SortedTermFreqIteratorWrapper). - got rid of all the references to termFreqIter Still todo: - actually nuke TermFreqIterator - change the names of the implementations to reflect that they are implementations of TermFreqPayloadIter - add tests to ensure that all the implementations work with payload - support payloads in SortedTermFreqIteratorWrapper > Make older Suggesters more accepting of TermFreqPayloadIterator > --- > > Key: LUCENE-5260 > URL: https://issues.apache.org/jira/browse/LUCENE-5260 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Reporter: Areek Zillur > Attachments: LUCENE-5260.patch > > > As discussed in https://issues.apache.org/jira/browse/LUCENE-5251, it would > be nice to make the older suggesters accepting of TermFreqPayloadIterator and > throw an exception if payload is found (if it cannot be used). > This will also allow us to nuke most of the other interfaces for > BytesRefIterator. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5310) Add a collection admin command to remove a replica
[ https://issues.apache.org/jira/browse/SOLR-5310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Noble Paul updated SOLR-5310: - Attachment: SOLR-5310.patch > Add a collection admin command to remove a replica > -- > > Key: SOLR-5310 > URL: https://issues.apache.org/jira/browse/SOLR-5310 > Project: Solr > Issue Type: Improvement > Components: SolrCloud >Reporter: Noble Paul >Assignee: Noble Paul > Attachments: SOLR-5310.patch > > Original Estimate: 72h > Remaining Estimate: 72h > > the only way a replica can removed is by unloading the core .There is no way > to remove a replica that is down . So, the clusterstate will have > unreferenced nodes if a few nodes go down over time > We need a cluster admin command to clean that up > e.g: > /admin/collections?action=DELETEREPLICA&collection=coll1&shard=shard1&replica=core_node3 > The system would first see if the replica is active. If yes , a core UNLOAD > command is fired , which would take care of deleting the replica from the > clusterstate as well > if the state is inactive, then the core or node may be down , in that case > the entry is removed from cluster state -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-5338) Split shards by a route key
[ https://issues.apache.org/jira/browse/SOLR-5338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar updated SOLR-5338: Description: Provide a way to split a shard using a route key such that all documents of the specified route key end up in a single dedicated sub-shard. Example: Assume that collection1, shard1 has hash range [0, 20]. Also that route key 'A!' has hash range [12,15]. Then invoking: {code} /admin/collections?action=SPLIT&collection=collection1&split.key=A! {code} should produce three sub-shards with hash range [0,11], [12,15] and [16,20]. Specifying the source shard is not required here because the route key is enough to figure it out. Route keys spanning more than one shards will not be supported. Note that the sub-shard with the hash range of the route key may also contain documents for other route keys whose hashes collide. was: Provide a way to split a shard using a route key such that all documents of the specified route key end up in a single dedicated sub-shard. Example: Assume that collection1, shard1 has hash range [0, 20]. Also that route key 'A!' has hash range [12,15]. Then invoking: {code} /admin/collections?action=SPLIT&collection=collection1&split.key=A! {code} should produce three sub-shards with hash range [0,11], [12,15] and [16,20]. Then the sub-shard dedicated to documents for route key 'A!' can be scaled separately. Specifying the source shard is not required here because the route key is enough to figure it out. > Split shards by a route key > --- > > Key: SOLR-5338 > URL: https://issues.apache.org/jira/browse/SOLR-5338 > Project: Solr > Issue Type: Improvement > Components: SolrCloud >Reporter: Shalin Shekhar Mangar >Assignee: Shalin Shekhar Mangar > Fix For: 4.6, 5.0 > > > Provide a way to split a shard using a route key such that all documents of > the specified route key end up in a single dedicated sub-shard. > Example: > Assume that collection1, shard1 has hash range [0, 20]. Also that route key > 'A!' has hash range [12,15]. Then invoking: > {code} > /admin/collections?action=SPLIT&collection=collection1&split.key=A! > {code} > should produce three sub-shards with hash range [0,11], [12,15] and [16,20]. > Specifying the source shard is not required here because the route key is > enough to figure it out. Route keys spanning more than one shards will not be > supported. > Note that the sub-shard with the hash range of the route key may also contain > documents for other route keys whose hashes collide. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5269) TestRandomChains failure
[ https://issues.apache.org/jira/browse/LUCENE-5269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792429#comment-13792429 ] Robert Muir commented on LUCENE-5269: - {quote} This is so crazy! Why did we never hit this combination before? {quote} This combination is especially good at finding the bug, here's why: {code} Tokenizer tokenizer = new EdgeNGramTokenizer(TEST_VERSION_CURRENT, reader, 2, 94); TokenStream stream = new ShingleFilter(tokenizer, 5); stream = new NGramTokenFilter(TEST_VERSION_CURRENT, stream, 55, 83); {code} The edge-ngram has min=2 max=94, its basically brute forcing every token size. then the shingles makes tons of tokens with positionIncrement=0. so it makes it easy for the (previously buggy ngramtokenfilter with wrong length filter) to misclassify tokens with its logic expecting codepoints, emit an initial token with posinc=0: {code} if ((curPos + curGramSize) <= curCodePointCount) { ... posIncAtt.setPositionIncrement(curPosInc); {code} > TestRandomChains failure > > > Key: LUCENE-5269 > URL: https://issues.apache.org/jira/browse/LUCENE-5269 > Project: Lucene - Core > Issue Type: Bug >Reporter: Robert Muir > Fix For: 4.5.1, 4.6, 5.0 > > Attachments: LUCENE-5269.patch, LUCENE-5269.patch, LUCENE-5269.patch, > LUCENE-5269_test.patch, LUCENE-5269_test.patch, LUCENE-5269_test.patch > > > One of EdgeNGramTokenizer, ShingleFilter, NGramTokenFilter is buggy, or > possibly only the combination of them conspiring together. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-5269) TestRandomChains failure
[ https://issues.apache.org/jira/browse/LUCENE-5269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792424#comment-13792424 ] Robert Muir commented on LUCENE-5269: - I didnt want new features mixed with bugfixes really :( But in my opinion this was the simplest way to solve the problem: to just add a filter like this and for it to use that instead of LengthFilter. I think it would be wierd to see "new features" in a 4.5.1? > TestRandomChains failure > > > Key: LUCENE-5269 > URL: https://issues.apache.org/jira/browse/LUCENE-5269 > Project: Lucene - Core > Issue Type: Bug >Reporter: Robert Muir > Fix For: 4.5.1, 4.6, 5.0 > > Attachments: LUCENE-5269.patch, LUCENE-5269.patch, LUCENE-5269.patch, > LUCENE-5269_test.patch, LUCENE-5269_test.patch, LUCENE-5269_test.patch > > > One of EdgeNGramTokenizer, ShingleFilter, NGramTokenFilter is buggy, or > possibly only the combination of them conspiring together. -- This message was sent by Atlassian JIRA (v6.1#6144) - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org