[JENKINS] Lucene-Solr-4.x-Linux (32bit/jdk1.6.0_45) - Build # 7753 - Failure!

2013-10-11 Thread Policeman Jenkins Server
Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-Linux/7753/
Java: 32bit/jdk1.6.0_45 -server -XX:+UseParallelGC

1 tests failed.
REGRESSION:  
org.apache.lucene.index.TestIndexWriterWithThreads.testCloseWithThreads

Error Message:
MockDirectoryWrapper: cannot close: there are still open files: {_1e.cfs=9, 
_t.cfs=9, _1g.cfs=9, _1f.cfs=9, _1d.cfs=9}

Stack Trace:
java.lang.RuntimeException: MockDirectoryWrapper: cannot close: there are still 
open files: {_1e.cfs=9, _t.cfs=9, _1g.cfs=9, _1f.cfs=9, _1d.cfs=9}
at 
__randomizedtesting.SeedInfo.seed([4E5816A23CC2ACB:1921A81D4039306D]:0)
at 
org.apache.lucene.store.MockDirectoryWrapper.close(MockDirectoryWrapper.java:622)
at 
org.apache.lucene.index.TestIndexWriterWithThreads.testCloseWithThreads(TestIndexWriterWithThreads.java:240)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:782)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:442)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:682)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:693)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.lang.RuntimeException: unclosed IndexInput: _1d.cfs
at 
org.apache.lucene.store.MockDirectoryWrapper.addFileHandle(MockDirectoryWrapper.java:511)
at 
org.apache.lucene.store.MockDirectoryWrapper$1.openSlice(MockDirectoryWrapper.java:930)
at 
org.apache.lucene.store.Compound

[jira] [Updated] (LUCENE-5189) Numeric DocValues Updates

2013-10-11 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-5189:
---

Attachment: LUCENE-5189-no-lost-updates.patch

Add field updates to TestIndexWriterDelete.testNoLostDeletesOrUpdates. I had to 
change to test to catch IOException and ignore if it's FakeIOE, or 
ioe.getCause() is a FakeIOE. The reason is that if the exception happens during 
merge (in mergeMiddle), IW registers the exception in mergeExceptions and later 
throws it as a wrapped IOE. This caused the test to falsely fail.

> Numeric DocValues Updates
> -
>
> Key: LUCENE-5189
> URL: https://issues.apache.org/jira/browse/LUCENE-5189
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/index
>Reporter: Shai Erera
>Assignee: Shai Erera
> Attachments: LUCENE-5189-4x.patch, LUCENE-5189-no-lost-updates.patch, 
> LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, 
> LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, 
> LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, 
> LUCENE-5189_process_events.patch, LUCENE-5189_process_events.patch, 
> LUCENE-5189-updates-order.patch, LUCENE-5189-updates-order.patch
>
>
> In LUCENE-4258 we started to work on incremental field updates, however the 
> amount of changes are immense and hard to follow/consume. The reason is that 
> we targeted postings, stored fields, DV etc., all from the get go.
> I'd like to start afresh here, with numeric-dv-field updates only. There are 
> a couple of reasons to that:
> * NumericDV fields should be easier to update, if e.g. we write all the 
> values of all the documents in a segment for the updated field (similar to 
> how livedocs work, and previously norms).
> * It's a fairly contained issue, attempting to handle just one data type to 
> update, yet requires many changes to core code which will also be useful for 
> updating other data types.
> * It has value in and on itself, and we don't need to allow updating all the 
> data types in Lucene at once ... we can do that gradually.
> I have some working patch already which I'll upload next, explaining the 
> changes.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5278) MockTokenizer throws away the character right after a token even if it is a valid start to a new token

2013-10-11 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13793247#comment-13793247
 ] 

ASF subversion and git services commented on LUCENE-5278:
-

Commit 1531498 from [~rcmuir] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1531498 ]

LUCENE-5278: remove CharTokenizer brain-damage from MockTokenizer so it works 
better with custom regular expressions

> MockTokenizer throws away the character right after a token even if it is a 
> valid start to a new token
> --
>
> Key: LUCENE-5278
> URL: https://issues.apache.org/jira/browse/LUCENE-5278
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Nik Everett
>Assignee: Robert Muir
>Priority: Trivial
> Fix For: 4.6, 5.0
>
> Attachments: LUCENE-5278.patch, LUCENE-5278.patch, LUCENE-5278.patch
>
>
> MockTokenizer throws away the character right after a token even if it is a 
> valid start to a new token.  You won't see this unless you build a tokenizer 
> that can recognize every character like with new RegExp(".") or RegExp("...").
> Changing this behaviour seems to break a number of tests.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-5278) MockTokenizer throws away the character right after a token even if it is a valid start to a new token

2013-10-11 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-5278.
-

   Resolution: Fixed
Fix Version/s: 5.0
   4.6

Thanks again Nik!

> MockTokenizer throws away the character right after a token even if it is a 
> valid start to a new token
> --
>
> Key: LUCENE-5278
> URL: https://issues.apache.org/jira/browse/LUCENE-5278
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Nik Everett
>Assignee: Robert Muir
>Priority: Trivial
> Fix For: 4.6, 5.0
>
> Attachments: LUCENE-5278.patch, LUCENE-5278.patch, LUCENE-5278.patch
>
>
> MockTokenizer throws away the character right after a token even if it is a 
> valid start to a new token.  You won't see this unless you build a tokenizer 
> that can recognize every character like with new RegExp(".") or RegExp("...").
> Changing this behaviour seems to break a number of tests.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates

2013-10-11 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13793242#comment-13793242
 ] 

ASF subversion and git services commented on LUCENE-5189:
-

Commit 1531496 from [~shaie] in branch 'dev/trunk'
[ https://svn.apache.org/r1531496 ]

LUCENE-5189: test unsetting a document's value while the segment is merging

> Numeric DocValues Updates
> -
>
> Key: LUCENE-5189
> URL: https://issues.apache.org/jira/browse/LUCENE-5189
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/index
>Reporter: Shai Erera
>Assignee: Shai Erera
> Attachments: LUCENE-5189-4x.patch, LUCENE-5189.patch, 
> LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, 
> LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, 
> LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189_process_events.patch, 
> LUCENE-5189_process_events.patch, LUCENE-5189-updates-order.patch, 
> LUCENE-5189-updates-order.patch
>
>
> In LUCENE-4258 we started to work on incremental field updates, however the 
> amount of changes are immense and hard to follow/consume. The reason is that 
> we targeted postings, stored fields, DV etc., all from the get go.
> I'd like to start afresh here, with numeric-dv-field updates only. There are 
> a couple of reasons to that:
> * NumericDV fields should be easier to update, if e.g. we write all the 
> values of all the documents in a segment for the updated field (similar to 
> how livedocs work, and previously norms).
> * It's a fairly contained issue, attempting to handle just one data type to 
> update, yet requires many changes to core code which will also be useful for 
> updating other data types.
> * It has value in and on itself, and we don't need to allow updating all the 
> data types in Lucene at once ... we can do that gradually.
> I have some working patch already which I'll upload next, explaining the 
> changes.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5248) Improve the data structure used in ReaderAndLiveDocs to hold the updates

2013-10-11 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13793224#comment-13793224
 ] 

Shai Erera commented on LUCENE-5248:


bq. I added a unit test which reproduces and the fix. Will commit on 
LUCENE-5189.

Sorry, it's a bug introduced in this patch so I'll fix here.

> Improve the data structure used in ReaderAndLiveDocs to hold the updates
> 
>
> Key: LUCENE-5248
> URL: https://issues.apache.org/jira/browse/LUCENE-5248
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index
>Reporter: Shai Erera
>Assignee: Shai Erera
> Attachments: LUCENE-5248.patch, LUCENE-5248.patch, LUCENE-5248.patch, 
> LUCENE-5248.patch
>
>
> Currently ReaderAndLiveDocs holds the updates in two structures:
> +Map>+
> Holds a mapping from each field, to all docs that were updated and their 
> values. This structure is updated when applyDeletes is called, and needs to 
> satisfy several requirements:
> # Un-ordered writes: if a field "f" is updated by two terms, termA and termB, 
> in that order, and termA affects doc=100 and termB doc=2, then the updates 
> are applied in that order, meaning we cannot rely on updates coming in order.
> # Same document may be updated multiple times, either by same term (e.g. 
> several calls to IW.updateNDV) or by different terms. Last update wins.
> # Sequential read: when writing the updates to the Directory 
> (fieldsConsumer), we iterate on the docs in-order and for each one check if 
> it's updated and if not, pull its value from the current DV.
> # A single update may affect several million documents, therefore need to be 
> efficient w.r.t. memory consumption.
> +Map>+
> Holds a mapping from a document, to all the fields that it was updated in and 
> the updated value for each field. This is used by IW.commitMergedDeletes to 
> apply the updates that came in while the segment was merging. The 
> requirements this structure needs to satisfy are:
> # Access in doc order: this is how commitMergedDeletes works.
> # One-pass: we visit a document once (currently) and so if we can, it's 
> better if we know all the fields in which it was updated. The updates are 
> applied to the merged ReaderAndLiveDocs (where they are stored in the first 
> structure mentioned above).
> Comments with proposals will follow next.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5248) Improve the data structure used in ReaderAndLiveDocs to hold the updates

2013-10-11 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13793221#comment-13793221
 ] 

Shai Erera commented on LUCENE-5248:


bq. Do we have test coverage of updating with null (deleting the update from 
the document)?

We have TestNDVUpdates.testUnsetValue and testUnsetAllValues, though we don't 
have a test which unsets a value while a document is merging. We have tests 
that cover updating a value (no unsetting) while it is merging, I guess I can 
modify them to unset as well, but will then need to improve the test to use 
docsWithField. I'll look into it.

bq. So if there are two terms in a row with the same field (which does not 
exist) won't we hit NPE?

Good catch! You're right, I had another {{if (termsEnum == null) continue}} but 
I removed it since I thought the above if takes care of that. I added a unit 
test which reproduces and the fix. Will commit on LUCENE-5189.

> Improve the data structure used in ReaderAndLiveDocs to hold the updates
> 
>
> Key: LUCENE-5248
> URL: https://issues.apache.org/jira/browse/LUCENE-5248
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index
>Reporter: Shai Erera
>Assignee: Shai Erera
> Attachments: LUCENE-5248.patch, LUCENE-5248.patch, LUCENE-5248.patch, 
> LUCENE-5248.patch
>
>
> Currently ReaderAndLiveDocs holds the updates in two structures:
> +Map>+
> Holds a mapping from each field, to all docs that were updated and their 
> values. This structure is updated when applyDeletes is called, and needs to 
> satisfy several requirements:
> # Un-ordered writes: if a field "f" is updated by two terms, termA and termB, 
> in that order, and termA affects doc=100 and termB doc=2, then the updates 
> are applied in that order, meaning we cannot rely on updates coming in order.
> # Same document may be updated multiple times, either by same term (e.g. 
> several calls to IW.updateNDV) or by different terms. Last update wins.
> # Sequential read: when writing the updates to the Directory 
> (fieldsConsumer), we iterate on the docs in-order and for each one check if 
> it's updated and if not, pull its value from the current DV.
> # A single update may affect several million documents, therefore need to be 
> efficient w.r.t. memory consumption.
> +Map>+
> Holds a mapping from a document, to all the fields that it was updated in and 
> the updated value for each field. This is used by IW.commitMergedDeletes to 
> apply the updates that came in while the segment was merging. The 
> requirements this structure needs to satisfy are:
> # Access in doc order: this is how commitMergedDeletes works.
> # One-pass: we visit a document once (currently) and so if we can, it's 
> better if we know all the fields in which it was updated. The updates are 
> applied to the merged ReaderAndLiveDocs (where they are stored in the first 
> structure mentioned above).
> Comments with proposals will follow next.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5277) Modify FixedBitSet copy constructor to take numBits to allow grow/shrink the new bitset

2013-10-11 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13793217#comment-13793217
 ] 

Shai Erera commented on LUCENE-5277:


I thought of that ... it started in LUCENE-5248 where I want to keep a growable 
bitset alongside the docs/values arrays to mark whether a document has an 
updated value or not (following Rob's idea). When I implemented that using 
OpenBitSet, I discovered the bug and opened LUCENE-5272. As I worked on fixing 
the bug, I realized OBS has other issues as well and thought that perhaps I can 
use FixedBitSet, only grow it by copying its array. This is doable even without 
the ctor, since I can call getBits() and do it like that:

{code}
FixedBitSet newBits = new FixedBitSet(17); // new capacity
System.arraycopy(oldBits.getBits(), 0, newBits.getBits(), 0, 
oldBits.getBits().length);
{code}

I then noticed there is a ctor already in FixedBitSet which copies another FBS 
so I thought just to improve it. It seems more intuitive to do t than let users 
figure out they can grow a FixedBitSet like above?

> Modify FixedBitSet copy constructor to take numBits to allow grow/shrink the 
> new bitset
> ---
>
> Key: LUCENE-5277
> URL: https://issues.apache.org/jira/browse/LUCENE-5277
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index
>Reporter: Shai Erera
>Assignee: Shai Erera
> Attachments: LUCENE-5277.patch
>
>
> FixedBitSet copy constructor is redundant the way it is now -- one can call 
> FBS.clone() to achieve that (and indeed, no code in Lucene calls this ctor). 
> I think it will be useful to add a numBits parameter to that method to allow 
> growing/shrinking the new bitset, while copying all relevant bits from the 
> passed one.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-5330) PerSegmentSingleValuedFaceting overwrites facet values

2013-10-11 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13793213#comment-13793213
 ] 

Yonik Seeley edited comment on SOLR-5330 at 10/12/13 2:30 AM:
--

So I instrumented the faceting code like so:
{code}
  seg.tempBR = seg.tenum.next();
if (seg.tempBR.bytes == val.bytes) {
System.err.println("##SHARING DETECTED: val.offset="+val.offset + " 
val.length="+val.length + " new.offset="+seg.tempBR.offset + " 
new.length="+seg.tempBR.length);
if (val.offset == seg.tempBR.offset) {
  System.err.println("!!SHARING USING SAME OFFSET");
}
{code}

And it detects tons of sharing (the returned bytesref still pointing to the 
same byte[]) of course... but the thing is, it never generates an invalid 
result.  calling next() on the term enum never changes the bytes that were 
previously pointed to... it simply points to a different part of the same byte 
array.  I can never detect a case where the original bytes are changed, thus 
invalidating the shallow copy.

Example output:
{code}
##SHARING DETECTED: val.offset=1 val.length=4 new.offset=6 new.length=4
{code}


was (Author: ysee...@gmail.com):
So I instrumented the faceting code like so:
{code}
  seg.tempBR = seg.tenum.next();
if (seg.tempBR.bytes == val.bytes) {
System.err.println("##SHARING DETECTED: val.offset="+val.offset + " 
val.length="+val.length + " new.offset="+seg.tempBR.offset + " 
new.length="+seg.tempBR.length);
if (val.offset == seg.tempBR.offset) {
  System.err.println("!!SHARING USING SAME OFFSET");
}
{code}

And it detects tons of sharing (the returned bytesref still pointing to the 
same byte[]) of course... but the thing is, it never generates an invalid 
result.  calling next() on the term enum never changes the bytes that were 
previously pointed to... it simply points to a different part of the same byte 
array.  I can never detect a case where the original bytes are changed, thus 
invalidating the shallow copy.


> PerSegmentSingleValuedFaceting overwrites facet values
> --
>
> Key: SOLR-5330
> URL: https://issues.apache.org/jira/browse/SOLR-5330
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 4.2.1
>Reporter: Michael Froh
>Assignee: Yonik Seeley
> Attachments: solr-5330.patch
>
>
> I recently tried enabling facet.method=fcs for one of my indexes and found a 
> significant performance improvement (with a large index, many facet values, 
> and near-realtime updates). Unfortunately, the results were also wrong. 
> Specifically, some facet values were being partially overwritten by other 
> facet values. (That is, if I expected facet values like "abcdef" and "123", I 
> would get a value like "123def".)
> Debugging through the code, it looks like the problem was in 
> PerSegmentSingleValuedFaceting, specifically in the getFacetCounts method, 
> when BytesRef val is shallow-copied from the temporary per-segment BytesRef. 
> The byte array assigned to val is shared with the byte array for seg.tempBR, 
> and is overwritten a few lines down by the call to seg.tenum.next().
> I managed to fix it locally by replacing the shallow copy with a deep copy.
> While I encountered this problem on Solr 4.2.1, I see that the code is 
> identical in 4.5. Unless the behavior of TermsEnum.next() has changed, I 
> believe this bug still exists.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5330) PerSegmentSingleValuedFaceting overwrites facet values

2013-10-11 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13793213#comment-13793213
 ] 

Yonik Seeley commented on SOLR-5330:


So I instrumented the faceting code like so:
{code}
  seg.tempBR = seg.tenum.next();
if (seg.tempBR.bytes == val.bytes) {
System.err.println("##SHARING DETECTED: val.offset="+val.offset + " 
val.length="+val.length + " new.offset="+seg.tempBR.offset + " 
new.length="+seg.tempBR.length);
if (val.offset == seg.tempBR.offset) {
  System.err.println("!!SHARING USING SAME OFFSET");
}
{code}

And it detects tons of sharing (the returned bytesref still pointing to the 
same byte[]) of course... but the thing is, it never generates an invalid 
result.  calling next() on the term enum never changes the bytes that were 
previously pointed to... it simply points to a different part of the same byte 
array.  I can never detect a case where the original bytes are changed, thus 
invalidating the shallow copy.


> PerSegmentSingleValuedFaceting overwrites facet values
> --
>
> Key: SOLR-5330
> URL: https://issues.apache.org/jira/browse/SOLR-5330
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 4.2.1
>Reporter: Michael Froh
>Assignee: Yonik Seeley
> Attachments: solr-5330.patch
>
>
> I recently tried enabling facet.method=fcs for one of my indexes and found a 
> significant performance improvement (with a large index, many facet values, 
> and near-realtime updates). Unfortunately, the results were also wrong. 
> Specifically, some facet values were being partially overwritten by other 
> facet values. (That is, if I expected facet values like "abcdef" and "123", I 
> would get a value like "123def".)
> Debugging through the code, it looks like the problem was in 
> PerSegmentSingleValuedFaceting, specifically in the getFacetCounts method, 
> when BytesRef val is shallow-copied from the temporary per-segment BytesRef. 
> The byte array assigned to val is shared with the byte array for seg.tempBR, 
> and is overwritten a few lines down by the call to seg.tenum.next().
> I managed to fix it locally by replacing the shallow copy with a deep copy.
> While I encountered this problem on Solr 4.2.1, I see that the code is 
> identical in 4.5. Unless the behavior of TermsEnum.next() has changed, I 
> believe this bug still exists.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



help in getting sort to work on an indexed binary field

2013-10-11 Thread Jessica Cheng
Hi,

We added a custom field type to allow an indexed binary field type that
supports search (exact match), prefix search, and sort as unsigned bytes
lexicographical compare. For sort, BytesRef's UTF8SortedAsUnicodeComparator
accomplishes what we want, and even though the name of the comparator
mentions UTF8, it doesn't actually assume so and just does byte-level
operation, so it's good. However, when we do this across different nodes,
we run into an issue where in QueryComponent.doFieldSortValues:

  // Must do the same conversion when sorting by a
  // String field in Lucene, which returns the terms
  // data as BytesRef:
  if (val instanceof BytesRef) {
UnicodeUtil.UTF8toUTF16((BytesRef)val, spare);
field.setStringValue(spare.toString());
val = ft.toObject(field);
  }

UnicodeUtil.UTF8toUTF16 is called on our byte array,which isn't actually
UTF8. I did a hack where I specified our own field comparator to be
ByteBuffer based to get around that instanceof check, but then the field
value gets transformed into BYTEARR in JavaBinCodec, and when it's
unmarshalled, it gets turned into byte[]. Then, in QueryComponent.mergeIds,
a ShardFieldSortedHitQueue is constructed with
ShardDoc.getCachedComparator, which decides to give me comparatorNatural in
the else of the TODO for CUSTOM, which barfs because byte[] are not
Comparable...

Any advice is appreciated!

Thanks,
Jessica


[jira] [Commented] (LUCENE-5278) MockTokenizer throws away the character right after a token even if it is a valid start to a new token

2013-10-11 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13793206#comment-13793206
 ] 

Robert Muir commented on LUCENE-5278:
-

I committed this to trunk: I did a lot of testing locally but I want to let 
Jenkins have its way with it for a few hours before backporting to branch_4x.

> MockTokenizer throws away the character right after a token even if it is a 
> valid start to a new token
> --
>
> Key: LUCENE-5278
> URL: https://issues.apache.org/jira/browse/LUCENE-5278
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Nik Everett
>Assignee: Robert Muir
>Priority: Trivial
> Attachments: LUCENE-5278.patch, LUCENE-5278.patch, LUCENE-5278.patch
>
>
> MockTokenizer throws away the character right after a token even if it is a 
> valid start to a new token.  You won't see this unless you build a tokenizer 
> that can recognize every character like with new RegExp(".") or RegExp("...").
> Changing this behaviour seems to break a number of tests.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5278) MockTokenizer throws away the character right after a token even if it is a valid start to a new token

2013-10-11 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13793205#comment-13793205
 ] 

ASF subversion and git services commented on LUCENE-5278:
-

Commit 1531479 from [~rcmuir] in branch 'dev/trunk'
[ https://svn.apache.org/r1531479 ]

LUCENE-5278: remove CharTokenizer brain-damage from MockTokenizer so it works 
better with custom regular expressions

> MockTokenizer throws away the character right after a token even if it is a 
> valid start to a new token
> --
>
> Key: LUCENE-5278
> URL: https://issues.apache.org/jira/browse/LUCENE-5278
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Nik Everett
>Assignee: Robert Muir
>Priority: Trivial
> Attachments: LUCENE-5278.patch, LUCENE-5278.patch, LUCENE-5278.patch
>
>
> MockTokenizer throws away the character right after a token even if it is a 
> valid start to a new token.  You won't see this unless you build a tokenizer 
> that can recognize every character like with new RegExp(".") or RegExp("...").
> Changing this behaviour seems to break a number of tests.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5274) Teach fast FastVectorHighlighter to highlight "child fields" with parent fields

2013-10-11 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13793204#comment-13793204
 ] 

Robert Muir commented on LUCENE-5274:
-

Yeah I guess for me, its not a caveat at all, but a feature :)

We need to iterate sorted-union for stuff in the index like terms and fields, 
so they appear as if they exist only once.
The guava one isn't doing a "union" operation but just simply maintaining 
compareTo() order...


> Teach fast FastVectorHighlighter to highlight "child fields" with parent 
> fields
> ---
>
> Key: LUCENE-5274
> URL: https://issues.apache.org/jira/browse/LUCENE-5274
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/other
>Reporter: Nik Everett
>Assignee: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-5274.patch
>
>
> I've been messing around with the FastVectorHighlighter and it looks like I 
> can teach it to highlight matches on "child fields".  Like this query:
> foo:scissors foo_exact:running
> would highlight foo like this:
> running with scissors
> Where foo is stored WITH_POSITIONS_OFFSETS and foo_plain is an unstored copy 
> of foo a different analyzer and its own WITH_POSITIONS_OFFSETS.
> This would make queries that perform weighted matches against different 
> analyzers much more convenient to highlight.
> I have working code and test cases but they are hacked into Elasticsearch.  
> I'd love to Lucene-ify if you'll take them.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5278) MockTokenizer throws away the character right after a token even if it is a valid start to a new token

2013-10-11 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-5278:


Attachment: LUCENE-5278.patch

added a few more tests to TestMockAnalyzer so all these crazy corner cases are 
found there and not debugging other tests :)

> MockTokenizer throws away the character right after a token even if it is a 
> valid start to a new token
> --
>
> Key: LUCENE-5278
> URL: https://issues.apache.org/jira/browse/LUCENE-5278
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Nik Everett
>Assignee: Robert Muir
>Priority: Trivial
> Attachments: LUCENE-5278.patch, LUCENE-5278.patch, LUCENE-5278.patch
>
>
> MockTokenizer throws away the character right after a token even if it is a 
> valid start to a new token.  You won't see this unless you build a tokenizer 
> that can recognize every character like with new RegExp(".") or RegExp("...").
> Changing this behaviour seems to break a number of tests.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5278) MockTokenizer throws away the character right after a token even if it is a valid start to a new token

2013-10-11 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-5278:


Attachment: LUCENE-5278.patch

Nice patch Nik!

I think this is ready: i tweaked variable names and rearranged stuff (e.g. i 
use -1 instead of Integer so we arent boxing and a few other things).

I also added some unit tests.

The main issues why tests were failing with your original patch:
* reset() needed to clear the buffer variables.
* the state machine needed some particular extra check when emitting a token: 
e.g. if you make a regex of "..", but you send it "abcde", the tokens should be 
"ab", "cd", but not "e". so when we end on a partial match, we have to check 
that we are in an accept state.
* term-limit-exceeded is a special case (versus last character being in a 
reject state)

> MockTokenizer throws away the character right after a token even if it is a 
> valid start to a new token
> --
>
> Key: LUCENE-5278
> URL: https://issues.apache.org/jira/browse/LUCENE-5278
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Nik Everett
>Assignee: Robert Muir
>Priority: Trivial
> Attachments: LUCENE-5278.patch, LUCENE-5278.patch
>
>
> MockTokenizer throws away the character right after a token even if it is a 
> valid start to a new token.  You won't see this unless you build a tokenizer 
> that can recognize every character like with new RegExp(".") or RegExp("...").
> Changing this behaviour seems to break a number of tests.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5279) Don't use recursion in DisjunctionSumScorer.countMatches

2013-10-11 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-5279:
---

Attachment: LUCENE-5279.patch

Patch.

However, it seems to be slower, testing on full Wikpedia en:

{noformat}
Report after iter 10:
TaskQPS base  StdDevQPS comp  StdDev
Pct diff
   OrHighLow   14.44  (7.7%)   12.48  (4.7%)  
-13.6% ( -24% -   -1%)
  OrHighHigh5.56  (6.2%)4.86  (4.4%)  
-12.6% ( -21% -   -2%)
   OrHighMed   18.62  (6.7%)   16.29  (4.4%)  
-12.5% ( -22% -   -1%)
  AndHighLow  398.09  (1.6%)  390.34  (2.3%)   
-1.9% (  -5% -1%)
OrNotHighLow  374.60  (1.7%)  369.61  (1.7%)   
-1.3% (  -4% -2%)
  Fuzzy1   67.10  (2.1%)   66.41  (2.2%)   
-1.0% (  -5% -3%)
OrNotHighMed   51.68  (1.7%)   51.37  (1.5%)   
-0.6% (  -3% -2%)
  Fuzzy2   46.73  (2.8%)   46.45  (2.6%)   
-0.6% (  -5% -4%)
OrHighNotLow   20.05  (3.5%)   19.96  (5.0%)   
-0.5% (  -8% -8%)
OrHighNotMed   27.15  (3.2%)   27.05  (4.8%)   
-0.3% (  -8% -7%)
   OrNotHighHigh7.72  (3.2%)7.70  (4.7%)   
-0.3% (  -7% -7%)
   OrHighNotHigh9.81  (3.0%)9.79  (4.5%)   
-0.1% (  -7% -7%)
 LowSloppyPhrase   43.83  (1.9%)   43.89  (2.1%)
0.2% (  -3% -4%)
  IntNRQ3.49  (4.5%)3.50  (4.1%)
0.2% (  -8% -9%)
 Prefix3   70.74  (2.7%)   71.01  (2.4%)
0.4% (  -4% -5%)
HighTerm   65.33  (3.0%)   65.62 (13.5%)
0.4% ( -15% -   17%)
 MedSloppyPhrase3.47  (3.5%)3.49  (4.7%)
0.6% (  -7% -9%)
   LowPhrase   13.06  (1.5%)   13.14  (2.0%)
0.6% (  -2% -4%)
Wildcard   16.71  (2.9%)   16.82  (2.2%)
0.7% (  -4% -5%)
 MedTerm  100.90  (2.5%)  101.71 (10.4%)
0.8% ( -11% -   14%)
 LowTerm  311.85  (1.4%)  314.53  (6.4%)
0.9% (  -6% -8%)
HighSpanNear8.06  (5.1%)8.13  (5.9%)
0.9% (  -9% -   12%)
 Respell   48.00  (2.3%)   48.45  (2.8%)
0.9% (  -4% -6%)
HighSloppyPhrase3.40  (4.1%)3.43  (6.6%)
1.0% (  -9% -   12%)
  AndHighMed   34.14  (1.6%)   34.52  (1.7%)
1.1% (  -2% -4%)
 AndHighHigh   28.15  (1.7%)   28.48  (1.7%)
1.2% (  -2% -4%)
 MedSpanNear   30.62  (2.8%)   31.07  (3.2%)
1.5% (  -4% -7%)
 LowSpanNear   10.30  (2.6%)   10.48  (2.9%)
1.7% (  -3% -7%)
   MedPhrase  195.60  (5.1%)  201.44  (6.6%)
3.0% (  -8% -   15%)
  HighPhrase4.17  (5.6%)4.34  (6.9%)
4.0% (  -8% -   17%)
{noformat}

So ... I don't plan on pursuing it any further, but wanted to open the issue in 
case anybody wants to try ...

> Don't use recursion in DisjunctionSumScorer.countMatches
> 
>
> Key: LUCENE-5279
> URL: https://issues.apache.org/jira/browse/LUCENE-5279
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael McCandless
> Attachments: LUCENE-5279.patch
>
>
> I noticed the TODO in there, to not use recursion, so I fixed it to just use 
> a private queue ...



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-5279) Don't use recursion in DisjunctionSumScorer.countMatches

2013-10-11 Thread Michael McCandless (JIRA)
Michael McCandless created LUCENE-5279:
--

 Summary: Don't use recursion in DisjunctionSumScorer.countMatches
 Key: LUCENE-5279
 URL: https://issues.apache.org/jira/browse/LUCENE-5279
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Michael McCandless


I noticed the TODO in there, to not use recursion, so I fixed it to just use a 
private queue ...



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5274) Teach fast FastVectorHighlighter to highlight "child fields" with parent fields

2013-10-11 Thread Nik Everett (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13793038#comment-13793038
 ] 

Nik Everett commented on LUCENE-5274:
-

{{quote}}
There is no lucene dependency on guava. I don't think we should introduce one, 
and it wouldnt solve the issues i mentioned anyway (e.g. comparable 
inconsistent with equals and stuff). It would only add 2.1MB of bloated 
unnecessary syntactic sugar (sorry, thats just my opinion on it, i think its 
useless).

We should keep our third party dependencies minimal and necessary so that any 
app using lucene can choose for itself what version of this stuff (if any) it 
wants to use. If we rely upon unnecessary stuff it hurts the end user by 
forcing them to compatible versions.
{{quote}}
I figured that was the reasoning and I don't intend to argue with it.  In this 
case it would provide a method to merge sorted iterators just like 
MergedIterator only without the caveats around duplication but I'm happy to 
work around it.  Guava certainly wouldn't fix my forgetting equals and hashcode.

> Teach fast FastVectorHighlighter to highlight "child fields" with parent 
> fields
> ---
>
> Key: LUCENE-5274
> URL: https://issues.apache.org/jira/browse/LUCENE-5274
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/other
>Reporter: Nik Everett
>Assignee: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-5274.patch
>
>
> I've been messing around with the FastVectorHighlighter and it looks like I 
> can teach it to highlight matches on "child fields".  Like this query:
> foo:scissors foo_exact:running
> would highlight foo like this:
> running with scissors
> Where foo is stored WITH_POSITIONS_OFFSETS and foo_plain is an unstored copy 
> of foo a different analyzer and its own WITH_POSITIONS_OFFSETS.
> This would make queries that perform weighted matches against different 
> analyzers much more convenient to highlight.
> I have working code and test cases but they are hacked into Elasticsearch.  
> I'd love to Lucene-ify if you'll take them.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5027) Field Collapsing PostFilter

2013-10-11 Thread Joel Bernstein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13793035#comment-13793035
 ] 

Joel Bernstein commented on SOLR-5027:
--

Patch that passes precommit for trunk

> Field Collapsing PostFilter
> ---
>
> Key: SOLR-5027
> URL: https://issues.apache.org/jira/browse/SOLR-5027
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 5.0
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>Priority: Minor
> Fix For: 4.6, 5.0
>
> Attachments: SOLR-5027.patch, SOLR-5027.patch, SOLR-5027.patch, 
> SOLR-5027.patch, SOLR-5027.patch, SOLR-5027.patch, SOLR-5027.patch, 
> SOLR-5027.patch, SOLR-5027.patch
>
>
> This ticket introduces the *CollapsingQParserPlugin* 
> The *CollapsingQParserPlugin* is a PostFilter that performs field collapsing. 
> This is a high performance alternative to standard Solr field collapsing 
> (with *ngroups*) when the number of distinct groups in the result set is high.
> For example in one performance test, a search with 10 million full results 
> and 1 million collapsed groups:
> Standard grouping with ngroups : 17 seconds.
> CollapsingQParserPlugin: 300 milli-seconds.
> Sample syntax:
> Collapse based on the highest scoring document:
> {code}
> fq=(!collapse field=}
> {code}
> Collapse based on the min value of a numeric field:
> {code}
> fq={!collapse field= min=}
> {code}
> Collapse based on the max value of a numeric field:
> {code}
> fq={!collapse field= max=}
> {code}
> Collapse with a null policy:
> {code}
> fq={!collapse field= nullPolicy=}
> {code}
> There are three null policies:
> ignore : removes docs with a null value in the collapse field (default).
> expand : treats each doc with a null value in the collapse field as a 
> separate group.
> collapse : collapses all docs with a null value into a single group using 
> either highest score, or min/max.
> The CollapsingQParserPlugin also fully supports the QueryElevationComponent
> *Note:*  The July 16 patch also includes and ExpandComponent that expands the 
> collapsed groups for the current search result page. This functionality will 
> be moved to it's own ticket.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5212) java 7u40 causes sigsegv and corrupt term vectors

2013-10-11 Thread Bill Bell (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13793037#comment-13793037
 ] 

Bill Bell commented on LUCENE-5212:
---

It appears this happens on 7u40 64-bit too. See 
https://bugs.openjdk.java.net/browse/JDK-8024830

Am I reading this wrong?

Start failing around hs24-b21:

   [junit4] # SIGSEGV (0xb) at pc=0xfd7ff91d9f7d, pid=23810, tid=343
   [junit4] #
   [junit4] # JRE version: Java(TM) SE Runtime Environment (8.0-b54)
   [junit4] # Java VM: Java HotSpot(TM) 64-Bit Server VM (24.0-b21 mixed mode 
solaris-amd64 )
   [junit4] # Problematic frame:
   [junit4] # J 
org.apache.lucene.codecs.compressing.CompressingTermVectorsReader.get(I)Lorg/apache/lucene/index/Fields;
   [junit4] #

Note, first 7u40 build b01 has hs24-b24.

Next, I will try to find changeset.



> java 7u40 causes sigsegv and corrupt term vectors
> -
>
> Key: LUCENE-5212
> URL: https://issues.apache.org/jira/browse/LUCENE-5212
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Robert Muir
> Attachments: crashFaster2.0.patch, crashFaster.patch, 
> hs_err_pid32714.log, jenkins.txt
>
>




--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5027) Field Collapsing PostFilter

2013-10-11 Thread Joel Bernstein (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-5027:
-

Attachment: SOLR-5027.patch

> Field Collapsing PostFilter
> ---
>
> Key: SOLR-5027
> URL: https://issues.apache.org/jira/browse/SOLR-5027
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 5.0
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>Priority: Minor
> Fix For: 4.6, 5.0
>
> Attachments: SOLR-5027.patch, SOLR-5027.patch, SOLR-5027.patch, 
> SOLR-5027.patch, SOLR-5027.patch, SOLR-5027.patch, SOLR-5027.patch, 
> SOLR-5027.patch, SOLR-5027.patch
>
>
> This ticket introduces the *CollapsingQParserPlugin* 
> The *CollapsingQParserPlugin* is a PostFilter that performs field collapsing. 
> This is a high performance alternative to standard Solr field collapsing 
> (with *ngroups*) when the number of distinct groups in the result set is high.
> For example in one performance test, a search with 10 million full results 
> and 1 million collapsed groups:
> Standard grouping with ngroups : 17 seconds.
> CollapsingQParserPlugin: 300 milli-seconds.
> Sample syntax:
> Collapse based on the highest scoring document:
> {code}
> fq=(!collapse field=}
> {code}
> Collapse based on the min value of a numeric field:
> {code}
> fq={!collapse field= min=}
> {code}
> Collapse based on the max value of a numeric field:
> {code}
> fq={!collapse field= max=}
> {code}
> Collapse with a null policy:
> {code}
> fq={!collapse field= nullPolicy=}
> {code}
> There are three null policies:
> ignore : removes docs with a null value in the collapse field (default).
> expand : treats each doc with a null value in the collapse field as a 
> separate group.
> collapse : collapses all docs with a null value into a single group using 
> either highest score, or min/max.
> The CollapsingQParserPlugin also fully supports the QueryElevationComponent
> *Note:*  The July 16 patch also includes and ExpandComponent that expands the 
> collapsed groups for the current search result page. This functionality will 
> be moved to it's own ticket.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5274) Teach fast FastVectorHighlighter to highlight "child fields" with parent fields

2013-10-11 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13793029#comment-13793029
 ] 

Robert Muir commented on LUCENE-5274:
-

{quote}
Sure! I'm more used to Guava's tools so I think I was lulled in to a false 
sense of recognition. No chance of updating to a modern version of Guava?
{quote}

There is no lucene dependency on guava. I don't think we should introduce one, 
and it wouldnt solve the issues i mentioned anyway (e.g. comparable 
inconsistent with equals and stuff). It would only add 2.1MB of bloated 
unnecessary syntactic sugar (sorry, thats just my opinion on it, i think its 
useless).

We should keep our third party dependencies minimal and necessary so that any 
app using lucene can choose for itself what version of this stuff (if any) it 
wants to use. If we rely upon unnecessary stuff it hurts the end user by 
forcing them to compatible versions.


> Teach fast FastVectorHighlighter to highlight "child fields" with parent 
> fields
> ---
>
> Key: LUCENE-5274
> URL: https://issues.apache.org/jira/browse/LUCENE-5274
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/other
>Reporter: Nik Everett
>Assignee: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-5274.patch
>
>
> I've been messing around with the FastVectorHighlighter and it looks like I 
> can teach it to highlight matches on "child fields".  Like this query:
> foo:scissors foo_exact:running
> would highlight foo like this:
> running with scissors
> Where foo is stored WITH_POSITIONS_OFFSETS and foo_plain is an unstored copy 
> of foo a different analyzer and its own WITH_POSITIONS_OFFSETS.
> This would make queries that perform weighted matches against different 
> analyzers much more convenient to highlight.
> I have working code and test cases but they are hacked into Elasticsearch.  
> I'd love to Lucene-ify if you'll take them.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5274) Teach fast FastVectorHighlighter to highlight "child fields" with parent fields

2013-10-11 Thread Nik Everett (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13793018#comment-13793018
 ] 

Nik Everett commented on LUCENE-5274:
-

{quote}
I can see the possible use case here, but I think it deserves some discussion 
first (versus just making it public).
{quote}
Sure!  I'm more used to Guava's tools so I think I was lulled in to a false 
sense of recognition.  No chance of updating to a modern version of Guava?:)

{quote}
This thing has limitations (its currently only used by indexwriter for 
buffereddeletes, its basically like a MultiTerms over an Iterator). For example 
each iterator it consumes should not have duplicate values according to its 
compareTo(): its not clear to me this WeightedPhraseInfo behaves this way
{quote}
Yikes!  I didn't catch that but now that you point it out it is right there in 
the docs and I should have.  WeightedPhraseInfo doesn't behave that way and 

{quote}
Furthermore the class in question (WeightedPhraseInfo) is public, and adding 
Comparable to it looks like it will create a situation where its inconsistent 
with equals()... I think this is a little dangerous.
{quote}
I agree on the inconsistent with inconsistent with equals.  I can either fix 
that or use a Comparator for sorting both WeightedPhraseInfo and Toffs.  That'd 
require a MergeSorter that can take one but 

{quote}
If it turns out we can reuse it: great! But i think rather than just slapping 
public on it, we should move it to .util, ensure it has good javadocs and unit 
tests, and investigate what exactly happens when these contracts are violated: 
e.g. can we make an exception happen rather than just broken behavior in a way 
that won't hurt performance and so on?
{quote}
Makes sense to me.

> Teach fast FastVectorHighlighter to highlight "child fields" with parent 
> fields
> ---
>
> Key: LUCENE-5274
> URL: https://issues.apache.org/jira/browse/LUCENE-5274
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/other
>Reporter: Nik Everett
>Assignee: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-5274.patch
>
>
> I've been messing around with the FastVectorHighlighter and it looks like I 
> can teach it to highlight matches on "child fields".  Like this query:
> foo:scissors foo_exact:running
> would highlight foo like this:
> running with scissors
> Where foo is stored WITH_POSITIONS_OFFSETS and foo_plain is an unstored copy 
> of foo a different analyzer and its own WITH_POSITIONS_OFFSETS.
> This would make queries that perform weighted matches against different 
> analyzers much more convenient to highlight.
> I have working code and test cases but they are hacked into Elasticsearch.  
> I'd love to Lucene-ify if you'll take them.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (LUCENE-5278) MockTokenizer throws away the character right after a token even if it is a valid start to a new token

2013-10-11 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir reassigned LUCENE-5278:
---

Assignee: Robert Muir

> MockTokenizer throws away the character right after a token even if it is a 
> valid start to a new token
> --
>
> Key: LUCENE-5278
> URL: https://issues.apache.org/jira/browse/LUCENE-5278
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Nik Everett
>Assignee: Robert Muir
>Priority: Trivial
> Attachments: LUCENE-5278.patch
>
>
> MockTokenizer throws away the character right after a token even if it is a 
> valid start to a new token.  You won't see this unless you build a tokenizer 
> that can recognize every character like with new RegExp(".") or RegExp("...").
> Changing this behaviour seems to break a number of tests.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5274) Teach fast FastVectorHighlighter to highlight "child fields" with parent fields

2013-10-11 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13793000#comment-13793000
 ] 

Robert Muir commented on LUCENE-5274:
-

Thanks Nik: I can help with that one!

Another question: about the MergedIterator :)

I can see the possible use case here, but I think it deserves some discussion 
first (versus just making it public).
This thing has limitations (its currently only used by indexwriter for 
buffereddeletes, its basically like a MultiTerms over an Iterator). For example 
each iterator it consumes should not have duplicate values according to its 
compareTo(): its not clear to me this WeightedPhraseInfo behaves this way:
* what if you have a synonym of "dog" sitting on top of "cat" with the same 
boost factor... its a duplicate according to that compareTo, but the text is 
different.
* what if the synonym is just "dog" with posinc=0 stacked ontop of itself 
(which is totally valid to do)...

Perhaps highlighting can make use of it, but its unclear to me that its really 
following the contract. Furthermore the class in question (WeightedPhraseInfo) 
is public, and adding Comparable to it looks like it will create a situation 
where its inconsistent with equals()... I think this is a little dangerous.

If it turns out we can reuse it: great! But i think rather than just slapping 
public on it, we should move it to .util, ensure it has good javadocs and unit 
tests, and investigate what exactly happens when these contracts are violated: 
e.g. can we make an exception happen rather than just broken behavior in a way 
that won't hurt performance and so on?



> Teach fast FastVectorHighlighter to highlight "child fields" with parent 
> fields
> ---
>
> Key: LUCENE-5274
> URL: https://issues.apache.org/jira/browse/LUCENE-5274
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/other
>Reporter: Nik Everett
>Assignee: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-5274.patch
>
>
> I've been messing around with the FastVectorHighlighter and it looks like I 
> can teach it to highlight matches on "child fields".  Like this query:
> foo:scissors foo_exact:running
> would highlight foo like this:
> running with scissors
> Where foo is stored WITH_POSITIONS_OFFSETS and foo_plain is an unstored copy 
> of foo a different analyzer and its own WITH_POSITIONS_OFFSETS.
> This would make queries that perform weighted matches against different 
> analyzers much more convenient to highlight.
> I have working code and test cases but they are hacked into Elasticsearch.  
> I'd love to Lucene-ify if you'll take them.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5278) MockTokenizer throws away the character right after a token even if it is a valid start to a new token

2013-10-11 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792993#comment-13792993
 ] 

Robert Muir commented on LUCENE-5278:
-

I think i understand what you want: it makes sense. The only reason its the way 
it is today is because this thing historically came from CharTokenizer (see the 
isTokenChar?).

But it would be better if you could e.g. make a pattern like ([A-Z]a-z+) and 
for it to actually break FooBar into Foo, Bar rather than throwout out "bar" 
all together.

I'll dig into this!

> MockTokenizer throws away the character right after a token even if it is a 
> valid start to a new token
> --
>
> Key: LUCENE-5278
> URL: https://issues.apache.org/jira/browse/LUCENE-5278
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Nik Everett
>Priority: Trivial
> Attachments: LUCENE-5278.patch
>
>
> MockTokenizer throws away the character right after a token even if it is a 
> valid start to a new token.  You won't see this unless you build a tokenizer 
> that can recognize every character like with new RegExp(".") or RegExp("...").
> Changing this behaviour seems to break a number of tests.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5278) MockTokenizer throws away the character right after a token even if it is a valid start to a new token

2013-10-11 Thread Nik Everett (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nik Everett updated LUCENE-5278:


Attachment: LUCENE-5278.patch

This patch "fixes" the behaviour from my perspective but breaks a bunch of 
other tests.

> MockTokenizer throws away the character right after a token even if it is a 
> valid start to a new token
> --
>
> Key: LUCENE-5278
> URL: https://issues.apache.org/jira/browse/LUCENE-5278
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Nik Everett
>Priority: Trivial
> Attachments: LUCENE-5278.patch
>
>
> MockTokenizer throws away the character right after a token even if it is a 
> valid start to a new token.  You won't see this unless you build a tokenizer 
> that can recognize every character like with new RegExp(".") or RegExp("...").
> Changing this behaviour seems to break a number of tests.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5274) Teach fast FastVectorHighlighter to highlight "child fields" with parent fields

2013-10-11 Thread Nik Everett (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792974#comment-13792974
 ] 

Nik Everett commented on LUCENE-5274:
-

Filed LUCENE-5278.

> Teach fast FastVectorHighlighter to highlight "child fields" with parent 
> fields
> ---
>
> Key: LUCENE-5274
> URL: https://issues.apache.org/jira/browse/LUCENE-5274
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/other
>Reporter: Nik Everett
>Assignee: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-5274.patch
>
>
> I've been messing around with the FastVectorHighlighter and it looks like I 
> can teach it to highlight matches on "child fields".  Like this query:
> foo:scissors foo_exact:running
> would highlight foo like this:
> running with scissors
> Where foo is stored WITH_POSITIONS_OFFSETS and foo_plain is an unstored copy 
> of foo a different analyzer and its own WITH_POSITIONS_OFFSETS.
> This would make queries that perform weighted matches against different 
> analyzers much more convenient to highlight.
> I have working code and test cases but they are hacked into Elasticsearch.  
> I'd love to Lucene-ify if you'll take them.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-5278) MockTokenizer throws away the character right after a token even if it is a valid start to a new token

2013-10-11 Thread Nik Everett (JIRA)
Nik Everett created LUCENE-5278:
---

 Summary: MockTokenizer throws away the character right after a 
token even if it is a valid start to a new token
 Key: LUCENE-5278
 URL: https://issues.apache.org/jira/browse/LUCENE-5278
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Nik Everett
Priority: Trivial


MockTokenizer throws away the character right after a token even if it is a 
valid start to a new token.  You won't see this unless you build a tokenizer 
that can recognize every character like with new RegExp(".") or RegExp("...").

Changing this behaviour seems to break a number of tests.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-5340) Add support for named snapshots

2013-10-11 Thread Mike Schrag (JIRA)
Mike Schrag created SOLR-5340:
-

 Summary: Add support for named snapshots
 Key: SOLR-5340
 URL: https://issues.apache.org/jira/browse/SOLR-5340
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud
Affects Versions: 4.5
Reporter: Mike Schrag


It would be really nice if Solr supported named snapshots. Right now if you 
snapshot a SolrCloud cluster, every node potentially records a slightly 
different timestamp. Correlating those back together to effectively restore the 
entire cluster to a consistent snapshot is pretty tedious.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5266) Optimization of the direct PackedInts readers

2013-10-11 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792948#comment-13792948
 ] 

Adrien Grand commented on LUCENE-5266:
--

bq. The only caveat is the encoding would need to ensure there is always an 
extra 2 bytes at the end.

There are some places (codecs) where I encode many short sequences 
consecutively so I care about not wasting extra bytes but if this proves to 
help performance, I think it shouldn't be too hard to do add the ability to 
have extra bytes at the end of the stream (I'm thinking about adding a new 
PackedInts.Format to the enum but there might be other options).

> Optimization of the direct PackedInts readers
> -
>
> Key: LUCENE-5266
> URL: https://issues.apache.org/jira/browse/LUCENE-5266
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Assignee: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-5266.patch, LUCENE-5266.patch
>
>
> Given that the initial focus for PackedInts readers was more on in-memory 
> readers (for storing stuff like the mapping from old to new doc IDs at 
> merging time), I never spent time trying to optimize the direct readers 
> although it could be beneficial now that they are used for disk-based doc 
> values.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5265) Make BlockPackedWriter constructor take an acceptable overhead ratio

2013-10-11 Thread Adrien Grand (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand updated LUCENE-5265:
-

Attachment: LUCENE-5265.patch

Here is a patch.

> Make BlockPackedWriter constructor take an acceptable overhead ratio
> 
>
> Key: LUCENE-5265
> URL: https://issues.apache.org/jira/browse/LUCENE-5265
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-5265.patch
>
>
> Follow-up of http://search-lucene.com/m/SjmSW1CZYuZ1
> MemoryDocValuesFormat takes an acceptable overhead ratio but it is only used 
> when doing table compression. It should be used for all compression methods, 
> especially DELTA_COMPRESSED whose encoding is based on BlockPackedWriter.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-5275) Fix AttributeSource.toString()

2013-10-11 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-5275.
-

   Resolution: Fixed
Fix Version/s: 5.0
   4.6

> Fix AttributeSource.toString()
> --
>
> Key: LUCENE-5275
> URL: https://issues.apache.org/jira/browse/LUCENE-5275
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Robert Muir
> Fix For: 4.6, 5.0
>
> Attachments: LUCENE-5275.patch, LUCENE-5275.patch
>
>
> Its currently just Object.toString, e.g.:
> org.apache.lucene.analysis.en.PorterStemFilter@8a32165c
> But I think we should make it more useful, to end users trying to see what 
> their chain is doing, and to make SOPs easier when debugging:
> {code}
> EnglishAnalyzer analyzer = new EnglishAnalyzer(TEST_VERSION_CURRENT);
> try (TokenStream ts = analyzer.tokenStream("body", "Its 2013, let's fix this 
> already!")) {
>   ts.reset();
>   while (ts.incrementToken()) {
> System.out.println(ts.toString());
>   }
>   ts.end();
> }
> {code}
> Proposed output:
> {noformat}
> PorterStemFilter@8a32165c term=it,bytes=[69 
> 74],startOffset=0,endOffset=3,positionIncrement=1,type=,keyword=false
> PorterStemFilter@987b9eea term=2013,bytes=[32 30 31 
> 33],startOffset=4,endOffset=8,positionIncrement=1,type=,keyword=false
> PorterStemFilter@6b5dbd1f term=let,bytes=[6c 65 
> 74],startOffset=10,endOffset=15,positionIncrement=1,type=,keyword=false
> PorterStemFilter@45cbde1b term=fix,bytes=[66 69 
> 78],startOffset=16,endOffset=19,positionIncrement=1,type=,keyword=false
> PorterStemFilter@bcd8f627 term=alreadi,bytes=[61 6c 72 65 61 64 
> 69],startOffset=25,endOffset=32,positionIncrement=2,type=,keyword=false
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-4073) Overseer will miss operations in some cases for OverseerCollectionProcessor

2013-10-11 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller resolved SOLR-4073.
---

   Resolution: Duplicate
Fix Version/s: (was: 4.6)

> Overseer will miss  operations in some cases for OverseerCollectionProcessor
> 
>
> Key: SOLR-4073
> URL: https://issues.apache.org/jira/browse/SOLR-4073
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 4.0-ALPHA, 4.0-BETA, 4.0, 4.1, 4.2
> Environment: Solr cloud
>Reporter: Raintung Li
>Assignee: Mark Miller
> Attachments: patch-4073
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> One overseer disconnect with Zookeeper, but overseer thread still handle the 
> request(A) in the DistributedQueue. Example: overseer thread reconnect 
> Zookeeper try to remove the Top's request. "workQueue.remove();".   
> Now the other server will take over the overseer privilege because old 
> overseer disconnect. Start overseer thread and handle the queue request(A) 
> again, and remove the request(A) from queue, then try to get the top's 
> request(B, doesn't get). In the this time old overseer reconnect with 
> ZooKeeper, and remove the top's request from queue. Now the top request is B, 
> it is moved by old overseer server.  New overseer server never do B 
> request,because this request deleted by old overseer server, at the last this 
> request(B) miss operations.
> At best, distributeQueue.peek can get the request's ID that will be removed 
> for workqueue.remove(ID), not remove the top's request.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5274) Teach fast FastVectorHighlighter to highlight "child fields" with parent fields

2013-10-11 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792921#comment-13792921
 ] 

Robert Muir commented on LUCENE-5274:
-

if you suspect there is a bug in mocktokenizer, please open a separate issue 
for that. mocktokenizer is used by like, thousands of tests :)

> Teach fast FastVectorHighlighter to highlight "child fields" with parent 
> fields
> ---
>
> Key: LUCENE-5274
> URL: https://issues.apache.org/jira/browse/LUCENE-5274
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/other
>Reporter: Nik Everett
>Assignee: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-5274.patch
>
>
> I've been messing around with the FastVectorHighlighter and it looks like I 
> can teach it to highlight matches on "child fields".  Like this query:
> foo:scissors foo_exact:running
> would highlight foo like this:
> running with scissors
> Where foo is stored WITH_POSITIONS_OFFSETS and foo_plain is an unstored copy 
> of foo a different analyzer and its own WITH_POSITIONS_OFFSETS.
> This would make queries that perform weighted matches against different 
> analyzers much more convenient to highlight.
> I have working code and test cases but they are hacked into Elasticsearch.  
> I'd love to Lucene-ify if you'll take them.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5274) Teach fast FastVectorHighlighter to highlight "child fields" with parent fields

2013-10-11 Thread Nik Everett (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792913#comment-13792913
 ] 

Nik Everett commented on LUCENE-5274:
-

Hey, forgot to mention that.  MockTokenizer seems to throw away the character 
after the end of each token even if that character is the valid start to the 
next token.  This comes up because I wanted to tokenize strings in a simplistic 
way to test that the highlighter can handle different tokenizers and it just 
wasn't working right.  So I "fixed" MockTokenizer but I did it in a pretty 
brutal way.  I'm happy to move the change to another bug and improve it but 
testing the highlighter change without it is a bit painful.

> Teach fast FastVectorHighlighter to highlight "child fields" with parent 
> fields
> ---
>
> Key: LUCENE-5274
> URL: https://issues.apache.org/jira/browse/LUCENE-5274
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/other
>Reporter: Nik Everett
>Assignee: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-5274.patch
>
>
> I've been messing around with the FastVectorHighlighter and it looks like I 
> can teach it to highlight matches on "child fields".  Like this query:
> foo:scissors foo_exact:running
> would highlight foo like this:
> running with scissors
> Where foo is stored WITH_POSITIONS_OFFSETS and foo_plain is an unstored copy 
> of foo a different analyzer and its own WITH_POSITIONS_OFFSETS.
> This would make queries that perform weighted matches against different 
> analyzers much more convenient to highlight.
> I have working code and test cases but they are hacked into Elasticsearch.  
> I'd love to Lucene-ify if you'll take them.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4816) Add document routing to CloudSolrServer

2013-10-11 Thread Jessica Cheng (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792911#comment-13792911
 ] 

Jessica Cheng commented on SOLR-4816:
-

I think the latest patch:

-if (request instanceof IsUpdateRequest && updatesToLeaders) {
+if (request instanceof IsUpdateRequest) {

removed the effect of the "updatesToLeaders" variable. Looking at 
http://svn.apache.org/viewvc/lucene/dev/branches/lucene_solr_4_5/solr/solrj/src/java/org/apache/solr/client/solrj/impl/CloudSolrServer.java?view=markup
 it's not used anywhere to make a decision anymore.

> Add document routing to CloudSolrServer
> ---
>
> Key: SOLR-4816
> URL: https://issues.apache.org/jira/browse/SOLR-4816
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Reporter: Joel Bernstein
>Assignee: Mark Miller
>Priority: Minor
> Fix For: 4.5, 5.0
>
> Attachments: RequestTask-removal.patch, SOLR-4816.patch, 
> SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
> SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
> SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
> SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
> SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
> SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
> SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816-sriesenberg.patch
>
>
> This issue adds the following enhancements to CloudSolrServer's update logic:
> 1) Document routing: Updates are routed directly to the correct shard leader 
> eliminating document routing at the server.
> 2) Optional parallel update execution: Updates for each shard are executed in 
> a separate thread so parallel indexing can occur across the cluster.
> These enhancements should allow for near linear scalability on indexing 
> throughput.
> Usage:
> CloudSolrServer cloudClient = new CloudSolrServer(zkAddress);
> cloudClient.setParallelUpdates(true); 
> SolrInputDocument doc1 = new SolrInputDocument();
> doc1.addField(id, "0");
> doc1.addField("a_t", "hello1");
> SolrInputDocument doc2 = new SolrInputDocument();
> doc2.addField(id, "2");
> doc2.addField("a_t", "hello2");
> UpdateRequest request = new UpdateRequest();
> request.add(doc1);
> request.add(doc2);
> request.setAction(AbstractUpdateRequest.ACTION.OPTIMIZE, false, false);
> NamedList response = cloudClient.request(request); // Returns a backwards 
> compatible condensed response.
> //To get more detailed response down cast to RouteResponse:
> CloudSolrServer.RouteResponse rr = (CloudSolrServer.RouteResponse)response;



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5275) Fix AttributeSource.toString()

2013-10-11 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792912#comment-13792912
 ] 

ASF subversion and git services commented on LUCENE-5275:
-

Commit 1531381 from [~rcmuir] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1531381 ]

LUCENE-5275: Change AttributeSource.toString to display the current state of 
attributes

> Fix AttributeSource.toString()
> --
>
> Key: LUCENE-5275
> URL: https://issues.apache.org/jira/browse/LUCENE-5275
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Robert Muir
> Attachments: LUCENE-5275.patch, LUCENE-5275.patch
>
>
> Its currently just Object.toString, e.g.:
> org.apache.lucene.analysis.en.PorterStemFilter@8a32165c
> But I think we should make it more useful, to end users trying to see what 
> their chain is doing, and to make SOPs easier when debugging:
> {code}
> EnglishAnalyzer analyzer = new EnglishAnalyzer(TEST_VERSION_CURRENT);
> try (TokenStream ts = analyzer.tokenStream("body", "Its 2013, let's fix this 
> already!")) {
>   ts.reset();
>   while (ts.incrementToken()) {
> System.out.println(ts.toString());
>   }
>   ts.end();
> }
> {code}
> Proposed output:
> {noformat}
> PorterStemFilter@8a32165c term=it,bytes=[69 
> 74],startOffset=0,endOffset=3,positionIncrement=1,type=,keyword=false
> PorterStemFilter@987b9eea term=2013,bytes=[32 30 31 
> 33],startOffset=4,endOffset=8,positionIncrement=1,type=,keyword=false
> PorterStemFilter@6b5dbd1f term=let,bytes=[6c 65 
> 74],startOffset=10,endOffset=15,positionIncrement=1,type=,keyword=false
> PorterStemFilter@45cbde1b term=fix,bytes=[66 69 
> 78],startOffset=16,endOffset=19,positionIncrement=1,type=,keyword=false
> PorterStemFilter@bcd8f627 term=alreadi,bytes=[61 6c 72 65 61 64 
> 69],startOffset=25,endOffset=32,positionIncrement=2,type=,keyword=false
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5274) Teach fast FastVectorHighlighter to highlight "child fields" with parent fields

2013-10-11 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792907#comment-13792907
 ] 

Robert Muir commented on LUCENE-5274:
-

Why would a highlighter improvement require mocktokenizer changes?

> Teach fast FastVectorHighlighter to highlight "child fields" with parent 
> fields
> ---
>
> Key: LUCENE-5274
> URL: https://issues.apache.org/jira/browse/LUCENE-5274
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/other
>Reporter: Nik Everett
>Assignee: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-5274.patch
>
>
> I've been messing around with the FastVectorHighlighter and it looks like I 
> can teach it to highlight matches on "child fields".  Like this query:
> foo:scissors foo_exact:running
> would highlight foo like this:
> running with scissors
> Where foo is stored WITH_POSITIONS_OFFSETS and foo_plain is an unstored copy 
> of foo a different analyzer and its own WITH_POSITIONS_OFFSETS.
> This would make queries that perform weighted matches against different 
> analyzers much more convenient to highlight.
> I have working code and test cases but they are hacked into Elasticsearch.  
> I'd love to Lucene-ify if you'll take them.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-5323) Solr requires -Dsolr.clustering.enabled=false when pointing at example config

2013-10-11 Thread Dawid Weiss (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss resolved SOLR-5323.
---

Resolution: Fixed

Applied to branch_4x, lucene_solr_4_5 and trunk.

> Solr requires -Dsolr.clustering.enabled=false when pointing at example config
> -
>
> Key: SOLR-5323
> URL: https://issues.apache.org/jira/browse/SOLR-5323
> Project: Solr
>  Issue Type: Bug
>  Components: contrib - Clustering
>Affects Versions: 4.5
> Environment: vanilla mac
>Reporter: John Berryman
>Assignee: Dawid Weiss
> Fix For: 4.5.1, 4.6, 5.0
>
> Attachments: SOLR-5323.patch
>
>
> my typical use of Solr is something like this: 
> {code}
> cd SOLR_HOME/example
> cp -r solr /myProjectDir/solr_home
> java -jar -Dsolr.solr.home=/myProjectDir/solr_home  start.jar
> {code}
> But in solr 4.5.0 this fails to start successfully. I get an error:
> {code}
> org.apache.solr.common.SolrException: Error loading class 
> 'solr.clustering.ClusteringComponent'
> {code}
> The reason is because solr.clustering.enabled defaults to true now. I don't 
> know why this might be the case.
> you can get around it with 
> {code}
> java -jar -Dsolr.solr.home=/myProjectDir/solr_home 
> -Dsolr.clustering.enabled=false start.jar
> {code}
> SOLR-4708 is when this became an issue.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5323) Solr requires -Dsolr.clustering.enabled=false when pointing at example config

2013-10-11 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792902#comment-13792902
 ] 

ASF subversion and git services commented on SOLR-5323:
---

Commit 1531380 from [~dawidweiss] in branch 'dev/branches/lucene_solr_4_5'
[ https://svn.apache.org/r1531380 ]

SOLR-5323: Disable ClusteringComponent by default in collection1 example. The 
solr.clustering.enabled system property needs to be set to 'true' to enable the 
clustering contrib (reverts SOLR-4708). (Dawid Weiss)

> Solr requires -Dsolr.clustering.enabled=false when pointing at example config
> -
>
> Key: SOLR-5323
> URL: https://issues.apache.org/jira/browse/SOLR-5323
> Project: Solr
>  Issue Type: Bug
>  Components: contrib - Clustering
>Affects Versions: 4.5
> Environment: vanilla mac
>Reporter: John Berryman
>Assignee: Dawid Weiss
> Fix For: 4.5.1, 4.6, 5.0
>
> Attachments: SOLR-5323.patch
>
>
> my typical use of Solr is something like this: 
> {code}
> cd SOLR_HOME/example
> cp -r solr /myProjectDir/solr_home
> java -jar -Dsolr.solr.home=/myProjectDir/solr_home  start.jar
> {code}
> But in solr 4.5.0 this fails to start successfully. I get an error:
> {code}
> org.apache.solr.common.SolrException: Error loading class 
> 'solr.clustering.ClusteringComponent'
> {code}
> The reason is because solr.clustering.enabled defaults to true now. I don't 
> know why this might be the case.
> you can get around it with 
> {code}
> java -jar -Dsolr.solr.home=/myProjectDir/solr_home 
> -Dsolr.clustering.enabled=false start.jar
> {code}
> SOLR-4708 is when this became an issue.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5274) Teach fast FastVectorHighlighter to highlight "child fields" with parent fields

2013-10-11 Thread Nik Everett (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nik Everett updated LUCENE-5274:


Attachment: LUCENE-5274.patch

New version of the patch.  This one works a lot better with phrases and even 
works on fields that have the same source but different tokenizers.

It still makes highlighting depend on the analysis module to pick up 
PerFieldAnalyzerWrapper.

I think all the new code this adds to FieldPhraseList deserves a unit test on 
its own but I'm not in the frame of mind to write one at the moment so I'll 
have to come back to it later.

> Teach fast FastVectorHighlighter to highlight "child fields" with parent 
> fields
> ---
>
> Key: LUCENE-5274
> URL: https://issues.apache.org/jira/browse/LUCENE-5274
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/other
>Reporter: Nik Everett
>Assignee: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-5274.patch
>
>
> I've been messing around with the FastVectorHighlighter and it looks like I 
> can teach it to highlight matches on "child fields".  Like this query:
> foo:scissors foo_exact:running
> would highlight foo like this:
> running with scissors
> Where foo is stored WITH_POSITIONS_OFFSETS and foo_plain is an unstored copy 
> of foo a different analyzer and its own WITH_POSITIONS_OFFSETS.
> This would make queries that perform weighted matches against different 
> analyzers much more convenient to highlight.
> I have working code and test cases but they are hacked into Elasticsearch.  
> I'd love to Lucene-ify if you'll take them.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4708) Enable ClusteringComponent by default

2013-10-11 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792903#comment-13792903
 ] 

ASF subversion and git services commented on SOLR-4708:
---

Commit 1531380 from [~dawidweiss] in branch 'dev/branches/lucene_solr_4_5'
[ https://svn.apache.org/r1531380 ]

SOLR-5323: Disable ClusteringComponent by default in collection1 example. The 
solr.clustering.enabled system property needs to be set to 'true' to enable the 
clustering contrib (reverts SOLR-4708). (Dawid Weiss)

> Enable ClusteringComponent by default
> -
>
> Key: SOLR-4708
> URL: https://issues.apache.org/jira/browse/SOLR-4708
> Project: Solr
>  Issue Type: Task
>Reporter: Erik Hatcher
>Assignee: Dawid Weiss
>Priority: Minor
> Fix For: 4.5, 5.0
>
> Attachments: SOLR-4708.patch, SOLR-4708.patch
>
>
> In the past, the ClusteringComponent used to rely on 3rd party JARs not 
> available from a Solr distro.  This is no longer the case, but the /browse UI 
> and other references still had the clustering component disabled in the 
> example with an awkward system property way to enable it.  Let's remove all 
> of that unnecessary stuff and just enable it as it works out of the box now.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5274) Teach fast FastVectorHighlighter to highlight "child fields" with parent fields

2013-10-11 Thread Nik Everett (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nik Everett updated LUCENE-5274:


Attachment: (was: LUCENE-5274-4.patch)

> Teach fast FastVectorHighlighter to highlight "child fields" with parent 
> fields
> ---
>
> Key: LUCENE-5274
> URL: https://issues.apache.org/jira/browse/LUCENE-5274
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/other
>Reporter: Nik Everett
>Assignee: Adrien Grand
>Priority: Minor
>
> I've been messing around with the FastVectorHighlighter and it looks like I 
> can teach it to highlight matches on "child fields".  Like this query:
> foo:scissors foo_exact:running
> would highlight foo like this:
> running with scissors
> Where foo is stored WITH_POSITIONS_OFFSETS and foo_plain is an unstored copy 
> of foo a different analyzer and its own WITH_POSITIONS_OFFSETS.
> This would make queries that perform weighted matches against different 
> analyzers much more convenient to highlight.
> I have working code and test cases but they are hacked into Elasticsearch.  
> I'd love to Lucene-ify if you'll take them.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4708) Enable ClusteringComponent by default

2013-10-11 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792898#comment-13792898
 ] 

ASF subversion and git services commented on SOLR-4708:
---

Commit 1531378 from [~dawidweiss] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1531378 ]

SOLR-5323: Disable ClusteringComponent by default in collection1 example. The 
solr.clustering.enabled system property needs to be set to 'true' to enable the 
clustering contrib (reverts SOLR-4708). (Dawid Weiss)

> Enable ClusteringComponent by default
> -
>
> Key: SOLR-4708
> URL: https://issues.apache.org/jira/browse/SOLR-4708
> Project: Solr
>  Issue Type: Task
>Reporter: Erik Hatcher
>Assignee: Dawid Weiss
>Priority: Minor
> Fix For: 4.5, 5.0
>
> Attachments: SOLR-4708.patch, SOLR-4708.patch
>
>
> In the past, the ClusteringComponent used to rely on 3rd party JARs not 
> available from a Solr distro.  This is no longer the case, but the /browse UI 
> and other references still had the clustering component disabled in the 
> example with an awkward system property way to enable it.  Let's remove all 
> of that unnecessary stuff and just enable it as it works out of the box now.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5274) Teach fast FastVectorHighlighter to highlight "child fields" with parent fields

2013-10-11 Thread Nik Everett (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nik Everett updated LUCENE-5274:


Attachment: (was: LUCENE-5274.patch)

> Teach fast FastVectorHighlighter to highlight "child fields" with parent 
> fields
> ---
>
> Key: LUCENE-5274
> URL: https://issues.apache.org/jira/browse/LUCENE-5274
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/other
>Reporter: Nik Everett
>Assignee: Adrien Grand
>Priority: Minor
>
> I've been messing around with the FastVectorHighlighter and it looks like I 
> can teach it to highlight matches on "child fields".  Like this query:
> foo:scissors foo_exact:running
> would highlight foo like this:
> running with scissors
> Where foo is stored WITH_POSITIONS_OFFSETS and foo_plain is an unstored copy 
> of foo a different analyzer and its own WITH_POSITIONS_OFFSETS.
> This would make queries that perform weighted matches against different 
> analyzers much more convenient to highlight.
> I have working code and test cases but they are hacked into Elasticsearch.  
> I'd love to Lucene-ify if you'll take them.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5323) Solr requires -Dsolr.clustering.enabled=false when pointing at example config

2013-10-11 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792897#comment-13792897
 ] 

ASF subversion and git services commented on SOLR-5323:
---

Commit 1531378 from [~dawidweiss] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1531378 ]

SOLR-5323: Disable ClusteringComponent by default in collection1 example. The 
solr.clustering.enabled system property needs to be set to 'true' to enable the 
clustering contrib (reverts SOLR-4708). (Dawid Weiss)

> Solr requires -Dsolr.clustering.enabled=false when pointing at example config
> -
>
> Key: SOLR-5323
> URL: https://issues.apache.org/jira/browse/SOLR-5323
> Project: Solr
>  Issue Type: Bug
>  Components: contrib - Clustering
>Affects Versions: 4.5
> Environment: vanilla mac
>Reporter: John Berryman
>Assignee: Dawid Weiss
> Fix For: 4.5.1, 4.6, 5.0
>
> Attachments: SOLR-5323.patch
>
>
> my typical use of Solr is something like this: 
> {code}
> cd SOLR_HOME/example
> cp -r solr /myProjectDir/solr_home
> java -jar -Dsolr.solr.home=/myProjectDir/solr_home  start.jar
> {code}
> But in solr 4.5.0 this fails to start successfully. I get an error:
> {code}
> org.apache.solr.common.SolrException: Error loading class 
> 'solr.clustering.ClusteringComponent'
> {code}
> The reason is because solr.clustering.enabled defaults to true now. I don't 
> know why this might be the case.
> you can get around it with 
> {code}
> java -jar -Dsolr.solr.home=/myProjectDir/solr_home 
> -Dsolr.clustering.enabled=false start.jar
> {code}
> SOLR-4708 is when this became an issue.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4708) Enable ClusteringComponent by default

2013-10-11 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792895#comment-13792895
 ] 

ASF subversion and git services commented on SOLR-4708:
---

Commit 1531377 from [~dawidweiss] in branch 'dev/trunk'
[ https://svn.apache.org/r1531377 ]

SOLR-5323: Disable ClusteringComponent by default in collection1 example. The 
solr.clustering.enabled system property needs to be set to 'true' to enable the 
clustering contrib (reverts SOLR-4708). (Dawid Weiss)

> Enable ClusteringComponent by default
> -
>
> Key: SOLR-4708
> URL: https://issues.apache.org/jira/browse/SOLR-4708
> Project: Solr
>  Issue Type: Task
>Reporter: Erik Hatcher
>Assignee: Dawid Weiss
>Priority: Minor
> Fix For: 4.5, 5.0
>
> Attachments: SOLR-4708.patch, SOLR-4708.patch
>
>
> In the past, the ClusteringComponent used to rely on 3rd party JARs not 
> available from a Solr distro.  This is no longer the case, but the /browse UI 
> and other references still had the clustering component disabled in the 
> example with an awkward system property way to enable it.  Let's remove all 
> of that unnecessary stuff and just enable it as it works out of the box now.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5323) Solr requires -Dsolr.clustering.enabled=false when pointing at example config

2013-10-11 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792894#comment-13792894
 ] 

ASF subversion and git services commented on SOLR-5323:
---

Commit 1531377 from [~dawidweiss] in branch 'dev/trunk'
[ https://svn.apache.org/r1531377 ]

SOLR-5323: Disable ClusteringComponent by default in collection1 example. The 
solr.clustering.enabled system property needs to be set to 'true' to enable the 
clustering contrib (reverts SOLR-4708). (Dawid Weiss)

> Solr requires -Dsolr.clustering.enabled=false when pointing at example config
> -
>
> Key: SOLR-5323
> URL: https://issues.apache.org/jira/browse/SOLR-5323
> Project: Solr
>  Issue Type: Bug
>  Components: contrib - Clustering
>Affects Versions: 4.5
> Environment: vanilla mac
>Reporter: John Berryman
>Assignee: Dawid Weiss
> Fix For: 4.5.1, 4.6, 5.0
>
> Attachments: SOLR-5323.patch
>
>
> my typical use of Solr is something like this: 
> {code}
> cd SOLR_HOME/example
> cp -r solr /myProjectDir/solr_home
> java -jar -Dsolr.solr.home=/myProjectDir/solr_home  start.jar
> {code}
> But in solr 4.5.0 this fails to start successfully. I get an error:
> {code}
> org.apache.solr.common.SolrException: Error loading class 
> 'solr.clustering.ClusteringComponent'
> {code}
> The reason is because solr.clustering.enabled defaults to true now. I don't 
> know why this might be the case.
> you can get around it with 
> {code}
> java -jar -Dsolr.solr.home=/myProjectDir/solr_home 
> -Dsolr.clustering.enabled=false start.jar
> {code}
> SOLR-4708 is when this became an issue.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5323) Solr requires -Dsolr.clustering.enabled=false when pointing at example config

2013-10-11 Thread Dawid Weiss (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss updated SOLR-5323:
--

Fix Version/s: 4.5.1

> Solr requires -Dsolr.clustering.enabled=false when pointing at example config
> -
>
> Key: SOLR-5323
> URL: https://issues.apache.org/jira/browse/SOLR-5323
> Project: Solr
>  Issue Type: Bug
>  Components: contrib - Clustering
>Affects Versions: 4.5
> Environment: vanilla mac
>Reporter: John Berryman
>Assignee: Dawid Weiss
> Fix For: 4.5.1, 4.6, 5.0
>
> Attachments: SOLR-5323.patch
>
>
> my typical use of Solr is something like this: 
> {code}
> cd SOLR_HOME/example
> cp -r solr /myProjectDir/solr_home
> java -jar -Dsolr.solr.home=/myProjectDir/solr_home  start.jar
> {code}
> But in solr 4.5.0 this fails to start successfully. I get an error:
> {code}
> org.apache.solr.common.SolrException: Error loading class 
> 'solr.clustering.ClusteringComponent'
> {code}
> The reason is because solr.clustering.enabled defaults to true now. I don't 
> know why this might be the case.
> you can get around it with 
> {code}
> java -jar -Dsolr.solr.home=/myProjectDir/solr_home 
> -Dsolr.clustering.enabled=false start.jar
> {code}
> SOLR-4708 is when this became an issue.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5323) Solr requires -Dsolr.clustering.enabled=false when pointing at example config

2013-10-11 Thread Dawid Weiss (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss updated SOLR-5323:
--

Attachment: SOLR-5323.patch

Patch reverting (portions) of SOLR-4708.

> Solr requires -Dsolr.clustering.enabled=false when pointing at example config
> -
>
> Key: SOLR-5323
> URL: https://issues.apache.org/jira/browse/SOLR-5323
> Project: Solr
>  Issue Type: Bug
>  Components: contrib - Clustering
>Affects Versions: 4.5
> Environment: vanilla mac
>Reporter: John Berryman
>Assignee: Dawid Weiss
> Fix For: 4.6, 5.0
>
> Attachments: SOLR-5323.patch
>
>
> my typical use of Solr is something like this: 
> {code}
> cd SOLR_HOME/example
> cp -r solr /myProjectDir/solr_home
> java -jar -Dsolr.solr.home=/myProjectDir/solr_home  start.jar
> {code}
> But in solr 4.5.0 this fails to start successfully. I get an error:
> {code}
> org.apache.solr.common.SolrException: Error loading class 
> 'solr.clustering.ClusteringComponent'
> {code}
> The reason is because solr.clustering.enabled defaults to true now. I don't 
> know why this might be the case.
> you can get around it with 
> {code}
> java -jar -Dsolr.solr.home=/myProjectDir/solr_home 
> -Dsolr.clustering.enabled=false start.jar
> {code}
> SOLR-4708 is when this became an issue.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5275) Fix AttributeSource.toString()

2013-10-11 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792884#comment-13792884
 ] 

ASF subversion and git services commented on LUCENE-5275:
-

Commit 1531376 from [~rcmuir] in branch 'dev/trunk'
[ https://svn.apache.org/r1531376 ]

LUCENE-5275: Change AttributeSource.toString to display the current state of 
attributes

> Fix AttributeSource.toString()
> --
>
> Key: LUCENE-5275
> URL: https://issues.apache.org/jira/browse/LUCENE-5275
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Robert Muir
> Attachments: LUCENE-5275.patch, LUCENE-5275.patch
>
>
> Its currently just Object.toString, e.g.:
> org.apache.lucene.analysis.en.PorterStemFilter@8a32165c
> But I think we should make it more useful, to end users trying to see what 
> their chain is doing, and to make SOPs easier when debugging:
> {code}
> EnglishAnalyzer analyzer = new EnglishAnalyzer(TEST_VERSION_CURRENT);
> try (TokenStream ts = analyzer.tokenStream("body", "Its 2013, let's fix this 
> already!")) {
>   ts.reset();
>   while (ts.incrementToken()) {
> System.out.println(ts.toString());
>   }
>   ts.end();
> }
> {code}
> Proposed output:
> {noformat}
> PorterStemFilter@8a32165c term=it,bytes=[69 
> 74],startOffset=0,endOffset=3,positionIncrement=1,type=,keyword=false
> PorterStemFilter@987b9eea term=2013,bytes=[32 30 31 
> 33],startOffset=4,endOffset=8,positionIncrement=1,type=,keyword=false
> PorterStemFilter@6b5dbd1f term=let,bytes=[6c 65 
> 74],startOffset=10,endOffset=15,positionIncrement=1,type=,keyword=false
> PorterStemFilter@45cbde1b term=fix,bytes=[66 69 
> 78],startOffset=16,endOffset=19,positionIncrement=1,type=,keyword=false
> PorterStemFilter@bcd8f627 term=alreadi,bytes=[61 6c 72 65 61 64 
> 69],startOffset=25,endOffset=32,positionIncrement=2,type=,keyword=false
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5323) Solr requires -Dsolr.clustering.enabled=false when pointing at example config

2013-10-11 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792859#comment-13792859
 ] 

Dawid Weiss commented on SOLR-5323:
---

Ok, I will reverting the changes from SOLR-4708.

> Solr requires -Dsolr.clustering.enabled=false when pointing at example config
> -
>
> Key: SOLR-5323
> URL: https://issues.apache.org/jira/browse/SOLR-5323
> Project: Solr
>  Issue Type: Bug
>  Components: contrib - Clustering
>Affects Versions: 4.5
> Environment: vanilla mac
>Reporter: John Berryman
>Assignee: Dawid Weiss
> Fix For: 4.6, 5.0
>
>
> my typical use of Solr is something like this: 
> {code}
> cd SOLR_HOME/example
> cp -r solr /myProjectDir/solr_home
> java -jar -Dsolr.solr.home=/myProjectDir/solr_home  start.jar
> {code}
> But in solr 4.5.0 this fails to start successfully. I get an error:
> {code}
> org.apache.solr.common.SolrException: Error loading class 
> 'solr.clustering.ClusteringComponent'
> {code}
> The reason is because solr.clustering.enabled defaults to true now. I don't 
> know why this might be the case.
> you can get around it with 
> {code}
> java -jar -Dsolr.solr.home=/myProjectDir/solr_home 
> -Dsolr.clustering.enabled=false start.jar
> {code}
> SOLR-4708 is when this became an issue.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5323) Solr requires -Dsolr.clustering.enabled=false when pointing at example config

2013-10-11 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792850#comment-13792850
 ] 

Mark Miller commented on SOLR-5323:
---

I just think anything with the relative paths is a separate issue.

You can use any hierarchy - you just have to change those paths. I'm all for 
that being improved somehow, but the issue here seems to be:

Solr contrib modules are lazy loaded so that if you don't use them, you can 
delete any of them from the dist package layout and things still work. Or you 
can not delete them and if you try and use them, things work. Clustering now 
violates that. It's not really clusterings fault, it seems to more be a 
limitation of the search component.


> Solr requires -Dsolr.clustering.enabled=false when pointing at example config
> -
>
> Key: SOLR-5323
> URL: https://issues.apache.org/jira/browse/SOLR-5323
> Project: Solr
>  Issue Type: Bug
>  Components: contrib - Clustering
>Affects Versions: 4.5
> Environment: vanilla mac
>Reporter: John Berryman
>Assignee: Dawid Weiss
> Fix For: 4.6, 5.0
>
>
> my typical use of Solr is something like this: 
> {code}
> cd SOLR_HOME/example
> cp -r solr /myProjectDir/solr_home
> java -jar -Dsolr.solr.home=/myProjectDir/solr_home  start.jar
> {code}
> But in solr 4.5.0 this fails to start successfully. I get an error:
> {code}
> org.apache.solr.common.SolrException: Error loading class 
> 'solr.clustering.ClusteringComponent'
> {code}
> The reason is because solr.clustering.enabled defaults to true now. I don't 
> know why this might be the case.
> you can get around it with 
> {code}
> java -jar -Dsolr.solr.home=/myProjectDir/solr_home 
> -Dsolr.clustering.enabled=false start.jar
> {code}
> SOLR-4708 is when this became an issue.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5269) TestRandomChains failure

2013-10-11 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792846#comment-13792846
 ] 

ASF subversion and git services commented on LUCENE-5269:
-

Commit 1531369 from [~rcmuir] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1531369 ]

LUCENE-5269: satisfy the policeman

> TestRandomChains failure
> 
>
> Key: LUCENE-5269
> URL: https://issues.apache.org/jira/browse/LUCENE-5269
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Robert Muir
> Fix For: 4.5.1, 4.6, 5.0
>
> Attachments: LUCENE-5269.patch, LUCENE-5269.patch, LUCENE-5269.patch, 
> LUCENE-5269_test.patch, LUCENE-5269_test.patch, LUCENE-5269_test.patch
>
>
> One of EdgeNGramTokenizer, ShingleFilter, NGramTokenFilter is buggy, or 
> possibly only the combination of them conspiring together.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5269) TestRandomChains failure

2013-10-11 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792845#comment-13792845
 ] 

ASF subversion and git services commented on LUCENE-5269:
-

Commit 1531368 from [~rcmuir] in branch 'dev/trunk'
[ https://svn.apache.org/r1531368 ]

LUCENE-5269: satisfy the policeman

> TestRandomChains failure
> 
>
> Key: LUCENE-5269
> URL: https://issues.apache.org/jira/browse/LUCENE-5269
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Robert Muir
> Fix For: 4.5.1, 4.6, 5.0
>
> Attachments: LUCENE-5269.patch, LUCENE-5269.patch, LUCENE-5269.patch, 
> LUCENE-5269_test.patch, LUCENE-5269_test.patch, LUCENE-5269_test.patch
>
>
> One of EdgeNGramTokenizer, ShingleFilter, NGramTokenFilter is buggy, or 
> possibly only the combination of them conspiring together.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5323) Solr requires -Dsolr.clustering.enabled=false when pointing at example config

2013-10-11 Thread Dawid Weiss (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792834#comment-13792834
 ] 

Dawid Weiss commented on SOLR-5323:
---

I can revert to lazy-loading, not a problem. But this isn't solving the 
relative paths issue at all. Like I mentioned there were several times when I 
had to pass an example preconfigured solr configuration to somebody -- this 
always required that person to put the content of the example under a specific 
directory in Solr distribution, otherwise things wouldn't work because of 
relative paths. It was a pain to explain why this step is needed and to 
enforce... I ended up just copying the required JARs into the example. This 
seems wrong somehow -- if it's a solr distribution then there should be a way 
to reference contribs in a way that allows people to have their stuff in any 
folder hierarchy?

What do you think?

> Solr requires -Dsolr.clustering.enabled=false when pointing at example config
> -
>
> Key: SOLR-5323
> URL: https://issues.apache.org/jira/browse/SOLR-5323
> Project: Solr
>  Issue Type: Bug
>  Components: contrib - Clustering
>Affects Versions: 4.5
> Environment: vanilla mac
>Reporter: John Berryman
>Assignee: Dawid Weiss
> Fix For: 4.6, 5.0
>
>
> my typical use of Solr is something like this: 
> {code}
> cd SOLR_HOME/example
> cp -r solr /myProjectDir/solr_home
> java -jar -Dsolr.solr.home=/myProjectDir/solr_home  start.jar
> {code}
> But in solr 4.5.0 this fails to start successfully. I get an error:
> {code}
> org.apache.solr.common.SolrException: Error loading class 
> 'solr.clustering.ClusteringComponent'
> {code}
> The reason is because solr.clustering.enabled defaults to true now. I don't 
> know why this might be the case.
> you can get around it with 
> {code}
> java -jar -Dsolr.solr.home=/myProjectDir/solr_home 
> -Dsolr.clustering.enabled=false start.jar
> {code}
> SOLR-4708 is when this became an issue.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5323) Solr requires -Dsolr.clustering.enabled=false when pointing at example config

2013-10-11 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792824#comment-13792824
 ] 

Mark Miller commented on SOLR-5323:
---

I also think this was a mistake - I don't know that we need another solr.home 
type thing to address it though.

The root of the issue is that the clustering is not "really" lazy loading 
clustering - and the current policy is to lazy load the contrib modules - and 
that is because of the component. I think Erik is on to the right path with 
lazy SearchComponents. I think that if the only request handlers that refer to 
a search component are lazy, they should probably also init lazily. I have not 
looked into how hard that is to do, but it seems like the correct fix to bring 
clustering in line with the other contribs. I also think the whole enabled flag 
we had is no good.

> Solr requires -Dsolr.clustering.enabled=false when pointing at example config
> -
>
> Key: SOLR-5323
> URL: https://issues.apache.org/jira/browse/SOLR-5323
> Project: Solr
>  Issue Type: Bug
>  Components: contrib - Clustering
>Affects Versions: 4.5
> Environment: vanilla mac
>Reporter: John Berryman
>Assignee: Dawid Weiss
> Fix For: 4.6, 5.0
>
>
> my typical use of Solr is something like this: 
> {code}
> cd SOLR_HOME/example
> cp -r solr /myProjectDir/solr_home
> java -jar -Dsolr.solr.home=/myProjectDir/solr_home  start.jar
> {code}
> But in solr 4.5.0 this fails to start successfully. I get an error:
> {code}
> org.apache.solr.common.SolrException: Error loading class 
> 'solr.clustering.ClusteringComponent'
> {code}
> The reason is because solr.clustering.enabled defaults to true now. I don't 
> know why this might be the case.
> you can get around it with 
> {code}
> java -jar -Dsolr.solr.home=/myProjectDir/solr_home 
> -Dsolr.clustering.enabled=false start.jar
> {code}
> SOLR-4708 is when this became an issue.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5273) Binary artifacts in Lucene and Solr convenience binary distributions accompanying a release, including on Maven Central, should be identical across all distributions

2013-10-11 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792800#comment-13792800
 ] 

ASF subversion and git services commented on LUCENE-5273:
-

Commit 1531354 from [~steve_rowe] in branch 'dev/trunk'
[ https://svn.apache.org/r1531354 ]

LUCENE-5273: Binary artifacts in Lucene and Solr convenience binary 
distributions accompanying a release, including on Maven Central, should be 
identical across all distributions.

> Binary artifacts in Lucene and Solr convenience binary distributions 
> accompanying a release, including on Maven Central, should be identical 
> across all distributions
> -
>
> Key: LUCENE-5273
> URL: https://issues.apache.org/jira/browse/LUCENE-5273
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: general/build
>Reporter: Steve Rowe
>Assignee: Steve Rowe
> Fix For: 4.6
>
> Attachments: LUCENE-5273.patch
>
>
> As mentioned in various issues (e.g. LUCENE-3655, LUCENE-3885, SOLR-4766), we 
> release multiple versions of the same artifact: binary Maven artifacts are 
> not identical to the ones in the Lucene and Solr binary distributions, and 
> the Lucene jars in the Solr binary distribution, including within the war, 
> are not identical to the ones in the Lucene binary distribution.  This is bad.
> It's (probably always?) not horribly bad, since the differences all appear to 
> be caused by the build re-creating manifests and re-building jars and the 
> Solr war from their constituents at various points in the release build 
> process; as a result, manifest timestamp attributes, as well as archive 
> metadata (at least constituent timestamps, maybe other things?), differ each 
> time a jar is rebuilt.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Closed] (SOLR-5199) Restarting zookeeper makes the overseer stop processing queue events

2013-10-11 Thread Jessica Cheng (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jessica Cheng closed SOLR-5199.
---

Resolution: Duplicate

> Restarting zookeeper makes the overseer stop processing queue events
> 
>
> Key: SOLR-5199
> URL: https://issues.apache.org/jira/browse/SOLR-5199
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 4.4
>Reporter: Jessica Cheng
>Assignee: Mark Miller
>  Labels: overseer, zookeeper
> Fix For: 4.5.1, 4.6, 5.0
>
> Attachments: 5199-log
>
>
> Taking the external zookeeper down (I'm just testing, so I only have one 
> external zookeeper instance running) and then bringing it back up seems to 
> have caused the overseer to stop processing queue event.
> I tried to issue the delete collection command (curl 
> 'http://localhost:7574/solr/admin/collections?action=DELETE&name=c1') and 
> each time it just timed out. Looking at the zookeeper data, I see
> ... 
> /overseer
>collection-queue-work
>  qn-02
>  qn-04
>  qn-06
> ...
> and the qn-xxx are not being processed.
> Attached please find the log from the overseer (according to 
> /overseer_elect/leader).



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5199) Restarting zookeeper makes the overseer stop processing queue events

2013-10-11 Thread Jessica Cheng (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792778#comment-13792778
 ] 

Jessica Cheng commented on SOLR-5199:
-

Sorry, I only saw this once and I didn't have time to investigate, so I don't 
know what the cause is. SOLR-5325 definitely sounds similar so I'll close this 
issue now. Thanks!

> Restarting zookeeper makes the overseer stop processing queue events
> 
>
> Key: SOLR-5199
> URL: https://issues.apache.org/jira/browse/SOLR-5199
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 4.4
>Reporter: Jessica Cheng
>Assignee: Mark Miller
>  Labels: overseer, zookeeper
> Fix For: 4.5.1, 4.6, 5.0
>
> Attachments: 5199-log
>
>
> Taking the external zookeeper down (I'm just testing, so I only have one 
> external zookeeper instance running) and then bringing it back up seems to 
> have caused the overseer to stop processing queue event.
> I tried to issue the delete collection command (curl 
> 'http://localhost:7574/solr/admin/collections?action=DELETE&name=c1') and 
> each time it just timed out. Looking at the zookeeper data, I see
> ... 
> /overseer
>collection-queue-work
>  qn-02
>  qn-04
>  qn-06
> ...
> and the qn-xxx are not being processed.
> Attached please find the log from the overseer (according to 
> /overseer_elect/leader).



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4824) Fuzzy / Faceting results are changed after ingestion of documents past a certain number

2013-10-11 Thread Lakshmi Venkataswamy (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792742#comment-13792742
 ] 

Lakshmi Venkataswamy commented on SOLR-4824:


I have tested 4.5.0 version and the same behavior has been observed.  So we are 
staying with 3.6 in production for now.

> Fuzzy / Faceting results are changed after ingestion of documents past a 
> certain number 
> 
>
> Key: SOLR-4824
> URL: https://issues.apache.org/jira/browse/SOLR-4824
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 4.2, 4.3
> Environment: Ubuntu 12.04 LTS 12.04.2 
> jre1.7.0_17
> jboss-as-7.1.1.Final
>Reporter: Lakshmi Venkataswamy
>
> In upgrading from SOLR 3.6 to 4.2/4.3 and comparing results on fuzzy queries, 
> I found that after a certain number of documents were ingested the fuzzy 
> query had drastically lower number of results.  We have approximately 18,000 
> documents per day and after ingesting approximately 40 days of documents, the 
> next incremental day of documents results in a lower number of results of a 
> fuzzy search.
> The query :  
> http://10.100.1.xx:8080/solr/corex/select?q=cc:worde~1&facet=on&facet.field=date&fl=date&facet.sort
> produces the following result before the threshold is crossed
> 
> 02349 name="facet">ondate
> cc:worde~1 name="facet.field">date numFound="362803" start="0">
>  name="facet_fields">
> 2866
> 11372
> 11514
> 12015
> 11746
> 10853
> 11053
> 11815
> 11427
> 11475
> 11461
> 12058
> 11335
> 12039
> 12064
> 12234
> 12545
> 11766
> 12197
> 11414
> 11633
> 12863
> 12378
> 11947
> 11822
> 11882
> 10474
> 11051
> 11776
> 11957
> 11260
> 8511
>  name="facet_ranges"/>
> Once the 40 days of documents ingested threshold is crossed the results drop 
> as show below for the same query
> 
> 02 name="facet">ondate name="q">cc:worde~1date
> 
>  name="facet_fields">
> 0
> 41
> 21
> 24
> 19
> 9
> 11
> 17
> 14
> 24
> 43
> 14
> 52
> 57
> 25
> 17
> 34
> 11
> 16
> 121
> 33
> 26
> 59
> 27
> 10
> 9
> 6
> 16
> 11
> 15
> 21
> 109
> 11
> 7
> 10
> 8
> 13
> 75
> 77
> 31
> 35
> 22
> 18
> 11
> 68
> 40
>  name="facet_ranges"/>
> I have also tested this with different months of data and have seen the same 
> issue  around the number of documents.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5325) zk connection loss causes overseer leader loss

2013-10-11 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792695#comment-13792695
 ] 

ASF subversion and git services commented on SOLR-5325:
---

Commit 1531327 from [~markrmil...@gmail.com] in branch 
'dev/branches/lucene_solr_4_5'
[ https://svn.apache.org/r1531327 ]

SOLR-5325: raise retry padding a bit

> zk connection loss causes overseer leader loss
> --
>
> Key: SOLR-5325
> URL: https://issues.apache.org/jira/browse/SOLR-5325
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 4.3, 4.4, 4.5
>Reporter: Christine Poerschke
>Assignee: Mark Miller
> Fix For: 4.5.1, 4.6, 5.0
>
> Attachments: SOLR-5325.patch, SOLR-5325.patch, SOLR-5325.patch
>
>
> The problem we saw was that when the solr overseer leader experienced 
> temporary zk connectivity problems it stopped processing overseer queue 
> events.
> This first happened when quorum within the external zk ensemble was lost due 
> to too many zookeepers being stopped (similar to SOLR-5199). The second time 
> it happened when there was a sufficient number of zookeepers but they were 
> holding zookeeper leadership elections and thus refused connections (the 
> elections were taking several seconds, we were using the default 
> zookeeper.cnxTimeout=5s value and it was hit for one ensemble member).



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5325) zk connection loss causes overseer leader loss

2013-10-11 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792694#comment-13792694
 ] 

ASF subversion and git services commented on SOLR-5325:
---

Commit 1531325 from [~markrmil...@gmail.com] in branch 
'dev/branches/lucene_solr_4_5'
[ https://svn.apache.org/r1531325 ]

SOLR-5325: ZooKeeper connection loss can cause the Overseer to stop processing 
commands.

> zk connection loss causes overseer leader loss
> --
>
> Key: SOLR-5325
> URL: https://issues.apache.org/jira/browse/SOLR-5325
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 4.3, 4.4, 4.5
>Reporter: Christine Poerschke
>Assignee: Mark Miller
> Fix For: 4.5.1, 4.6, 5.0
>
> Attachments: SOLR-5325.patch, SOLR-5325.patch, SOLR-5325.patch
>
>
> The problem we saw was that when the solr overseer leader experienced 
> temporary zk connectivity problems it stopped processing overseer queue 
> events.
> This first happened when quorum within the external zk ensemble was lost due 
> to too many zookeepers being stopped (similar to SOLR-5199). The second time 
> it happened when there was a sufficient number of zookeepers but they were 
> holding zookeeper leadership elections and thus refused connections (the 
> elections were taking several seconds, we were using the default 
> zookeeper.cnxTimeout=5s value and it was hit for one ensemble member).



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5308) Split all documents of a route key into another collection

2013-10-11 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792692#comment-13792692
 ] 

Shalin Shekhar Mangar commented on SOLR-5308:
-

For splitting a single source shard into a single target collection/shard by a 
route key such as:
{code}
/admin/collections?action=migrate&collection=collection1&split.key=A!&shard=shardX&target.collection=collection2&target.shard=shardY
{code}
A rough strategy could be to:
# Create new core X on source
# Create new core Y on target
# Ask target core to buffer updates
# Start forwarding updates for route key received by source shard to target 
collection
# Split source shard to a new core X
# Ask Y to replicate fully from X
# Core Admin merge Y to target core
# Ask target core to replay buffer updates


> Split all documents of a route key into another collection
> --
>
> Key: SOLR-5308
> URL: https://issues.apache.org/jira/browse/SOLR-5308
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrCloud
>Reporter: Shalin Shekhar Mangar
>Assignee: Shalin Shekhar Mangar
> Fix For: 4.6, 5.0
>
>
> Enable SolrCloud users to split out a set of documents from a source 
> collection into another collection.
> This will be useful in multi-tenant environments. This feature will make it 
> possible to split a tenant out of a collection and put them into their own 
> collection which can be scaled separately.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5325) zk connection loss causes overseer leader loss

2013-10-11 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792689#comment-13792689
 ] 

ASF subversion and git services commented on SOLR-5325:
---

Commit 1531324 from [~markrmil...@gmail.com] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1531324 ]

SOLR-5325: raise retry padding a bit

> zk connection loss causes overseer leader loss
> --
>
> Key: SOLR-5325
> URL: https://issues.apache.org/jira/browse/SOLR-5325
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 4.3, 4.4, 4.5
>Reporter: Christine Poerschke
>Assignee: Mark Miller
> Fix For: 4.5.1, 4.6, 5.0
>
> Attachments: SOLR-5325.patch, SOLR-5325.patch, SOLR-5325.patch
>
>
> The problem we saw was that when the solr overseer leader experienced 
> temporary zk connectivity problems it stopped processing overseer queue 
> events.
> This first happened when quorum within the external zk ensemble was lost due 
> to too many zookeepers being stopped (similar to SOLR-5199). The second time 
> it happened when there was a sufficient number of zookeepers but they were 
> holding zookeeper leadership elections and thus refused connections (the 
> elections were taking several seconds, we were using the default 
> zookeeper.cnxTimeout=5s value and it was hit for one ensemble member).



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5325) zk connection loss causes overseer leader loss

2013-10-11 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792688#comment-13792688
 ] 

ASF subversion and git services commented on SOLR-5325:
---

Commit 1531323 from [~markrmil...@gmail.com] in branch 'dev/trunk'
[ https://svn.apache.org/r1531323 ]

SOLR-5325: raise retry padding a bit

> zk connection loss causes overseer leader loss
> --
>
> Key: SOLR-5325
> URL: https://issues.apache.org/jira/browse/SOLR-5325
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 4.3, 4.4, 4.5
>Reporter: Christine Poerschke
>Assignee: Mark Miller
> Fix For: 4.5.1, 4.6, 5.0
>
> Attachments: SOLR-5325.patch, SOLR-5325.patch, SOLR-5325.patch
>
>
> The problem we saw was that when the solr overseer leader experienced 
> temporary zk connectivity problems it stopped processing overseer queue 
> events.
> This first happened when quorum within the external zk ensemble was lost due 
> to too many zookeepers being stopped (similar to SOLR-5199). The second time 
> it happened when there was a sufficient number of zookeepers but they were 
> holding zookeeper leadership elections and thus refused connections (the 
> elections were taking several seconds, we were using the default 
> zookeeper.cnxTimeout=5s value and it was hit for one ensemble member).



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5325) zk connection loss causes overseer leader loss

2013-10-11 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792684#comment-13792684
 ] 

Mark Miller commented on SOLR-5325:
---

I'm still kind of surprised this would happen - we should be retrying on 
connectionloss up to an expiration - which would make us the leader no longer. 
Perhaps the length of retrying can be a little short or something. And perhaps 
that is part of why it is more difficult for me to reproduce in a test.

> zk connection loss causes overseer leader loss
> --
>
> Key: SOLR-5325
> URL: https://issues.apache.org/jira/browse/SOLR-5325
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 4.3, 4.4, 4.5
>Reporter: Christine Poerschke
>Assignee: Mark Miller
> Fix For: 4.5.1, 4.6, 5.0
>
> Attachments: SOLR-5325.patch, SOLR-5325.patch, SOLR-5325.patch
>
>
> The problem we saw was that when the solr overseer leader experienced 
> temporary zk connectivity problems it stopped processing overseer queue 
> events.
> This first happened when quorum within the external zk ensemble was lost due 
> to too many zookeepers being stopped (similar to SOLR-5199). The second time 
> it happened when there was a sufficient number of zookeepers but they were 
> holding zookeeper leadership elections and thus refused connections (the 
> elections were taking several seconds, we were using the default 
> zookeeper.cnxTimeout=5s value and it was hit for one ensemble member).



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5199) Restarting zookeeper makes the overseer stop processing queue events

2013-10-11 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792679#comment-13792679
 ] 

Mark Miller commented on SOLR-5199:
---

Hey Jessica - if we can confirm this is the same issue as SOLR-5325, we can 
close this as a duplicate.

> Restarting zookeeper makes the overseer stop processing queue events
> 
>
> Key: SOLR-5199
> URL: https://issues.apache.org/jira/browse/SOLR-5199
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 4.4
>Reporter: Jessica Cheng
>Assignee: Mark Miller
>  Labels: overseer, zookeeper
> Fix For: 4.5.1, 4.6, 5.0
>
> Attachments: 5199-log
>
>
> Taking the external zookeeper down (I'm just testing, so I only have one 
> external zookeeper instance running) and then bringing it back up seems to 
> have caused the overseer to stop processing queue event.
> I tried to issue the delete collection command (curl 
> 'http://localhost:7574/solr/admin/collections?action=DELETE&name=c1') and 
> each time it just timed out. Looking at the zookeeper data, I see
> ... 
> /overseer
>collection-queue-work
>  qn-02
>  qn-04
>  qn-06
> ...
> and the qn-xxx are not being processed.
> Attached please find the log from the overseer (according to 
> /overseer_elect/leader).



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-5325) zk connection loss causes overseer leader loss

2013-10-11 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792671#comment-13792671
 ] 

Mark Miller edited comment on SOLR-5325 at 10/11/13 2:50 PM:
-

Added some more testing that I thought would catch it, but it has not yet on my 
system. Still poking around a bit.

Anyway, I've committed the fix.


was (Author: markrmil...@gmail.com):
Add some more testing that I thought would catch it, but it has not yet on my 
system. Still poking around a bit.

Anyway, I've committed the fix.

> zk connection loss causes overseer leader loss
> --
>
> Key: SOLR-5325
> URL: https://issues.apache.org/jira/browse/SOLR-5325
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 4.3, 4.4, 4.5
>Reporter: Christine Poerschke
>Assignee: Mark Miller
> Fix For: 4.5.1, 4.6, 5.0
>
> Attachments: SOLR-5325.patch, SOLR-5325.patch, SOLR-5325.patch
>
>
> The problem we saw was that when the solr overseer leader experienced 
> temporary zk connectivity problems it stopped processing overseer queue 
> events.
> This first happened when quorum within the external zk ensemble was lost due 
> to too many zookeepers being stopped (similar to SOLR-5199). The second time 
> it happened when there was a sufficient number of zookeepers but they were 
> holding zookeeper leadership elections and thus refused connections (the 
> elections were taking several seconds, we were using the default 
> zookeeper.cnxTimeout=5s value and it was hit for one ensemble member).



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5274) Teach fast FastVectorHighlighter to highlight "child fields" with parent fields

2013-10-11 Thread Nik Everett (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nik Everett updated LUCENE-5274:


Attachment: LUCENE-5274-4.patch

Reworked to remove dependency on query parser and most of the analyzer 
dependency and to fix errors with phrases.  It'll need to lose the rest of the 
analyzer dependency and have more test cases in addition to any other concerns 
raised in the review. 

> Teach fast FastVectorHighlighter to highlight "child fields" with parent 
> fields
> ---
>
> Key: LUCENE-5274
> URL: https://issues.apache.org/jira/browse/LUCENE-5274
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/other
>Reporter: Nik Everett
>Assignee: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-5274-4.patch, LUCENE-5274.patch
>
>
> I've been messing around with the FastVectorHighlighter and it looks like I 
> can teach it to highlight matches on "child fields".  Like this query:
> foo:scissors foo_exact:running
> would highlight foo like this:
> running with scissors
> Where foo is stored WITH_POSITIONS_OFFSETS and foo_plain is an unstored copy 
> of foo a different analyzer and its own WITH_POSITIONS_OFFSETS.
> This would make queries that perform weighted matches against different 
> analyzers much more convenient to highlight.
> I have working code and test cases but they are hacked into Elasticsearch.  
> I'd love to Lucene-ify if you'll take them.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5325) zk connection loss causes overseer leader loss

2013-10-11 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792671#comment-13792671
 ] 

Mark Miller commented on SOLR-5325:
---

Add some more testing that I thought would catch it, but it has not yet on my 
system. Still poking around a bit.

Anyway, I've committed the fix.

> zk connection loss causes overseer leader loss
> --
>
> Key: SOLR-5325
> URL: https://issues.apache.org/jira/browse/SOLR-5325
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 4.3, 4.4, 4.5
>Reporter: Christine Poerschke
>Assignee: Mark Miller
> Fix For: 4.5.1, 4.6, 5.0
>
> Attachments: SOLR-5325.patch, SOLR-5325.patch, SOLR-5325.patch
>
>
> The problem we saw was that when the solr overseer leader experienced 
> temporary zk connectivity problems it stopped processing overseer queue 
> events.
> This first happened when quorum within the external zk ensemble was lost due 
> to too many zookeepers being stopped (similar to SOLR-5199). The second time 
> it happened when there was a sufficient number of zookeepers but they were 
> holding zookeeper leadership elections and thus refused connections (the 
> elections were taking several seconds, we were using the default 
> zookeeper.cnxTimeout=5s value and it was hit for one ensemble member).



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5260) Make older Suggesters more accepting of TermFreqPayloadIterator

2013-10-11 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792662#comment-13792662
 ] 

Michael McCandless commented on LUCENE-5260:


Thanks Areek, patch looks great!  I like the hasPayloads() up-front
introspection.

In UnsortedTermFreqIteratorWrapper.payload(), why do we set currentOrd
as a side effect?  Shouldn't next() already do that?  Maybe, we should
instead assert currentOrd == ords[curPos]?  Also, can we break that
sneaky currentOrd assignment in next into its own line before?


> Make older Suggesters more accepting of TermFreqPayloadIterator
> ---
>
> Key: LUCENE-5260
> URL: https://issues.apache.org/jira/browse/LUCENE-5260
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Reporter: Areek Zillur
> Attachments: LUCENE-5260.patch
>
>
> As discussed in https://issues.apache.org/jira/browse/LUCENE-5251, it would 
> be nice to make the older suggesters accepting of TermFreqPayloadIterator and 
> throw an exception if payload is found (if it cannot be used). 
> This will also allow us to nuke most of the other interfaces for 
> BytesRefIterator. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5325) zk connection loss causes overseer leader loss

2013-10-11 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792663#comment-13792663
 ] 

ASF subversion and git services commented on SOLR-5325:
---

Commit 1531315 from [~markrmil...@gmail.com] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1531315 ]

SOLR-5325: ZooKeeper connection loss can cause the Overseer to stop processing 
commands.

> zk connection loss causes overseer leader loss
> --
>
> Key: SOLR-5325
> URL: https://issues.apache.org/jira/browse/SOLR-5325
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 4.3, 4.4, 4.5
>Reporter: Christine Poerschke
>Assignee: Mark Miller
> Fix For: 4.5.1, 4.6, 5.0
>
> Attachments: SOLR-5325.patch, SOLR-5325.patch, SOLR-5325.patch
>
>
> The problem we saw was that when the solr overseer leader experienced 
> temporary zk connectivity problems it stopped processing overseer queue 
> events.
> This first happened when quorum within the external zk ensemble was lost due 
> to too many zookeepers being stopped (similar to SOLR-5199). The second time 
> it happened when there was a sufficient number of zookeepers but they were 
> holding zookeeper leadership elections and thus refused connections (the 
> elections were taking several seconds, we were using the default 
> zookeeper.cnxTimeout=5s value and it was hit for one ensemble member).



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5325) zk connection loss causes overseer leader loss

2013-10-11 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792661#comment-13792661
 ] 

ASF subversion and git services commented on SOLR-5325:
---

Commit 1531313 from [~markrmil...@gmail.com] in branch 'dev/trunk'
[ https://svn.apache.org/r1531313 ]

SOLR-5325: ZooKeeper connection loss can cause the Overseer to stop processing 
commands.

> zk connection loss causes overseer leader loss
> --
>
> Key: SOLR-5325
> URL: https://issues.apache.org/jira/browse/SOLR-5325
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 4.3, 4.4, 4.5
>Reporter: Christine Poerschke
>Assignee: Mark Miller
> Fix For: 4.5.1, 4.6, 5.0
>
> Attachments: SOLR-5325.patch, SOLR-5325.patch, SOLR-5325.patch
>
>
> The problem we saw was that when the solr overseer leader experienced 
> temporary zk connectivity problems it stopped processing overseer queue 
> events.
> This first happened when quorum within the external zk ensemble was lost due 
> to too many zookeepers being stopped (similar to SOLR-5199). The second time 
> it happened when there was a sufficient number of zookeepers but they were 
> holding zookeeper leadership elections and thus refused connections (the 
> elections were taking several seconds, we were using the default 
> zookeeper.cnxTimeout=5s value and it was hit for one ensemble member).



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5252) add NGramSynonymTokenizer

2013-10-11 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi updated LUCENE-5252:
---

Attachment: LUCENE-5252_4x.patch

Fix a bug regarding ignoreCase in the attached patch.

> add NGramSynonymTokenizer
> -
>
> Key: LUCENE-5252
> URL: https://issues.apache.org/jira/browse/LUCENE-5252
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/analysis
>Reporter: Koji Sekiguchi
>Priority: Minor
> Attachments: LUCENE-5252_4x.patch, LUCENE-5252_4x.patch, 
> LUCENE-5252_4x.patch
>
>
> I'd like to propose that we have another n-gram tokenizer which can process 
> synonyms. That is NGramSynonymTokenizer. Note that in this ticket, the gram 
> size is fixed, i.e. minGramSize = maxGramSize.
> Today, I think we have the following problems when using SynonymFilter with 
> NGramTokenizer. 
> For purpose of illustration, we have a synonym setting "ABC, DEFG" w/ 
> expand=true and N = 2 (2-gram).
> # There is no consensus (I think :-) how we assign offsets to generated 
> synonym tokens DE, EF and FG when expanding source token AB and BC.
> # If the query pattern looks like ABCY, it cannot be matched even if there is 
> a document "…ABCY…" in index when autoGeneratePhraseQueries set to true, 
> because there is no "CY" token (but "GY" is there) in the index.
> NGramSynonymTokenizer can solve these problems by providing the following 
> methods.
> * NGramSynonymTokenizer reads synonym settings (synonyms.txt) and it doesn't 
> tokenize registered words. e.g.
> ||source text||NGramTokenizer+SynonymFilter||NGramSynonymTokenizer||
> |ABC|AB/DE/BC/EF/FG|ABC/DEFG|
> * The back and forth of the registered words, NGramSynonymTokenizer generates 
> *extra* tokens w/ posInc=0. e.g.
> ||source text||NGramTokenizer+SynonymFilter||NGramSynonymTokenizer||
> |XYZABC123|XY/YZ/ZA/AB/DE/BC/EF/C1/FG/12/23|XY/YZ/Z/ABC/DEFG/1/12/23|
> In the above sample, "Z" and "1" are the extra tokens.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5338) Split shards by a route key

2013-10-11 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792600#comment-13792600
 ] 

Shalin Shekhar Mangar commented on SOLR-5338:
-

[~ysee...@gmail.com] - Would you mind reviewing the new CompositeIdRouter 
methods?

> Split shards by a route key
> ---
>
> Key: SOLR-5338
> URL: https://issues.apache.org/jira/browse/SOLR-5338
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Reporter: Shalin Shekhar Mangar
>Assignee: Shalin Shekhar Mangar
> Fix For: 4.6, 5.0
>
> Attachments: SOLR-5338.patch
>
>
> Provide a way to split a shard using a route key such that all documents of 
> the specified route key end up in a single dedicated sub-shard.
> Example:
> Assume that collection1, shard1 has hash range [0, 20]. Also that route key 
> 'A!' has hash range [12,15]. Then invoking:
> {code}
> /admin/collections?action=SPLIT&collection=collection1&split.key=A!
> {code}
> should produce three sub-shards with hash range [0,11], [12,15] and [16,20].
> Specifying the source shard is not required here because the route key is 
> enough to figure it out. Route keys spanning more than one shards will not be 
> supported.
> Note that the sub-shard with the hash range of the route key may also contain 
> documents for other route keys whose hashes collide.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



RE: [JENKINS] Lucene-trunk-Linux-Java7-64-test-only - Build # 62737 - Failure!

2013-10-11 Thread Uwe Schindler
Hihi,

FYI: I have a compilation unit here (non-Lucene) that also segfaults on JDK 
7.0u25, if you don't do "ant clean" before. If there are already existing class 
files and only modified ones are recompiled it always segfaults. Reproducible, 
but I have no idea what causes this. :-)

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


> -Original Message-
> From: simon.willna...@gmail.com [mailto:simon.willna...@gmail.com] On
> Behalf Of Simon Willnauer
> Sent: Friday, October 11, 2013 2:50 PM
> Cc: dev@lucene.apache.org
> Subject: Re: [JENKINS] Lucene-trunk-Linux-Java7-64-test-only - Build # 62737
> - Failure!
> 
> ok maybe updateing the JDK would be a good idea :)
> 
> 
> 
> On Fri, Oct 11, 2013 at 2:46 PM,   wrote:
> > Build:
> > builds.flonkings.com/job/Lucene-trunk-Linux-Java7-64-test-only/62737/
> >
> > No tests ran.
> >
> > Build Log:
> > [...truncated 61 lines...]
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional
> commands, e-mail: dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [JENKINS] Lucene-trunk-Linux-Java7-64-test-only - Build # 62737 - Failure!

2013-10-11 Thread Simon Willnauer
ok maybe updateing the JDK would be a good idea :)



On Fri, Oct 11, 2013 at 2:46 PM,   wrote:
> Build: builds.flonkings.com/job/Lucene-trunk-Linux-Java7-64-test-only/62737/
>
> No tests ran.
>
> Build Log:
> [...truncated 61 lines...]

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-trunk-Linux-Java7-64-test-only - Build # 62737 - Failure!

2013-10-11 Thread builder
Build: builds.flonkings.com/job/Lucene-trunk-Linux-Java7-64-test-only/62737/

No tests ran.

Build Log:
[...truncated 61 lines...]


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-5310) Add a collection admin command to remove a replica

2013-10-11 Thread Noble Paul (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul updated SOLR-5310:
-

Attachment: SOLR-5310.patch

> Add a collection admin command to remove a replica
> --
>
> Key: SOLR-5310
> URL: https://issues.apache.org/jira/browse/SOLR-5310
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Reporter: Noble Paul
>Assignee: Noble Paul
> Attachments: SOLR-5310.patch, SOLR-5310.patch
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> the only way a replica can removed is by unloading the core .There is no way 
> to remove a replica that is down . So, the clusterstate will have 
> unreferenced nodes if a few nodes go down over time
> We need a cluster admin command to clean that up
> e.g: 
> /admin/collections?action=DELETEREPLICA&collection=coll1&shard=shard1&replica=core_node3
> The system would first see if the replica is active. If yes , a core UNLOAD 
> command is fired , which would take care of deleting the replica from the 
> clusterstate as well
> if the state is inactive, then the core or node may be down , in that case 
> the entry is removed from cluster state  



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5310) Add a collection admin command to remove a replica

2013-10-11 Thread Noble Paul (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul updated SOLR-5310:
-

Attachment: (was: SOLR-5310-1.patch)

> Add a collection admin command to remove a replica
> --
>
> Key: SOLR-5310
> URL: https://issues.apache.org/jira/browse/SOLR-5310
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Reporter: Noble Paul
>Assignee: Noble Paul
> Attachments: SOLR-5310.patch, SOLR-5310.patch
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> the only way a replica can removed is by unloading the core .There is no way 
> to remove a replica that is down . So, the clusterstate will have 
> unreferenced nodes if a few nodes go down over time
> We need a cluster admin command to clean that up
> e.g: 
> /admin/collections?action=DELETEREPLICA&collection=coll1&shard=shard1&replica=core_node3
> The system would first see if the replica is active. If yes , a core UNLOAD 
> command is fired , which would take care of deleting the replica from the 
> clusterstate as well
> if the state is inactive, then the core or node may be down , in that case 
> the entry is removed from cluster state  



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5310) Add a collection admin command to remove a replica

2013-10-11 Thread Noble Paul (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul updated SOLR-5310:
-

Attachment: SOLR-5310-1.patch

The testcases still fail occassionally

> Add a collection admin command to remove a replica
> --
>
> Key: SOLR-5310
> URL: https://issues.apache.org/jira/browse/SOLR-5310
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Reporter: Noble Paul
>Assignee: Noble Paul
> Attachments: SOLR-5310-1.patch, SOLR-5310.patch
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> the only way a replica can removed is by unloading the core .There is no way 
> to remove a replica that is down . So, the clusterstate will have 
> unreferenced nodes if a few nodes go down over time
> We need a cluster admin command to clean that up
> e.g: 
> /admin/collections?action=DELETEREPLICA&collection=coll1&shard=shard1&replica=core_node3
> The system would first see if the replica is active. If yes , a core UNLOAD 
> command is fired , which would take care of deleting the replica from the 
> clusterstate as well
> if the state is inactive, then the core or node may be down , in that case 
> the entry is removed from cluster state  



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-NightlyTests-trunk - Build # 407 - Failure

2013-10-11 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-NightlyTests-trunk/407/

1 tests failed.
REGRESSION:  org.apache.lucene.index.Test2BPostings.test

Error Message:
Java heap space

Stack Trace:
java.lang.OutOfMemoryError: Java heap space
at 
__randomizedtesting.SeedInfo.seed([D8D3920C725BF71C:5087ADD6DCA79AE4]:0)
at 
org.apache.lucene.store.BufferedIndexOutput.(BufferedIndexOutput.java:50)
at 
org.apache.lucene.store.FSDirectory$FSIndexOutput.(FSDirectory.java:365)
at 
org.apache.lucene.store.FSDirectory.createOutput(FSDirectory.java:280)
at 
org.apache.lucene.store.NRTCachingDirectory.createOutput(NRTCachingDirectory.java:206)
at 
org.apache.lucene.store.MockDirectoryWrapper.createOutput(MockDirectoryWrapper.java:478)
at 
org.apache.lucene.store.TrackingDirectoryWrapper.createOutput(TrackingDirectoryWrapper.java:44)
at 
org.apache.lucene.store.CompoundFileWriter.close(CompoundFileWriter.java:149)
at 
org.apache.lucene.store.CompoundFileDirectory.close(CompoundFileDirectory.java:171)
at 
org.apache.lucene.util.IOUtils.closeWhileHandlingException(IOUtils.java:80)
at 
org.apache.lucene.index.IndexWriter.createCompoundFile(IndexWriter.java:4408)
at 
org.apache.lucene.index.DocumentsWriterPerThread.sealFlushedSegment(DocumentsWriterPerThread.java:535)
at 
org.apache.lucene.index.DocumentsWriterPerThread.flush(DocumentsWriterPerThread.java:502)
at 
org.apache.lucene.index.DocumentsWriter.doFlush(DocumentsWriter.java:506)
at 
org.apache.lucene.index.DocumentsWriter.postUpdate(DocumentsWriter.java:378)
at 
org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:470)
at 
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1523)
at 
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1193)
at 
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1174)
at org.apache.lucene.index.Test2BPostings.test(Test2BPostings.java:76)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)




Build Log:
[...truncated 655 lines...]
   [junit4] Suite: org.apache.lucene.index.Test2BPostings
   [junit4]   2> NOTE: download the large Jenkins line-docs file by running 
'ant get-jenkins-line-docs' in the lucene directory.
   [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=Test2BPostings 
-Dtests.method=test -Dtests.seed=D8D3920C725BF71C -Dtests.multiplier=2 
-Dtests.nightly=true -Dtests.slow=true 
-Dtests.linedocsfile=/home/hudson/lucene-data/enwiki.random.lines.txt 
-Dtests.locale=en_IN -Dtests.timezone=America/Puerto_Rico 
-Dtests.file.encoding=US-ASCII
   [junit4] ERROR408s J0 | Test2BPostings.test <<<
   [junit4]> Throwable #1: java.lang.OutOfMemoryError: Java heap space
   [junit4]>at 
__randomizedtesting.SeedInfo.seed([D8D3920C725BF71C:5087ADD6DCA79AE4]:0)
   [junit4]>at 
org.apache.lucene.store.BufferedIndexOutput.(BufferedIndexOutput.java:50)
   [junit4]>at 
org.apache.lucene.store.FSDirectory$FSIndexOutput.(FSDirectory.java:365)
   [junit4]>at 
org.apache.lucene.store.FSDirectory.createOutput(FSDirectory.java:280)
   [junit4]>at 
org.apache.lucene.store.NRTCachingDirectory.createOutput(NRTCachingDirectory.java:206)
   [junit4]>at 
org.apache.lucene.store.MockDirectoryWrapper.createOutput(MockDirectoryWrapper.java:478)
   [junit4]>at 
org.apache.lucene.store.TrackingDirectoryWrapper.createOutput(TrackingDirectoryWrapper.java:44)
   [junit4]>at 
org.apache.lucene.store.CompoundFileWriter.close(CompoundFileWriter.java:149)
   [junit4]>at 
org.apache.lucene.store.CompoundFileDirectory.close(C

[jira] [Updated] (SOLR-5308) Split all documents of a route key into another collection

2013-10-11 Thread Shalin Shekhar Mangar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar updated SOLR-5308:


Attachment: (was: SOLR-5308.patch)

> Split all documents of a route key into another collection
> --
>
> Key: SOLR-5308
> URL: https://issues.apache.org/jira/browse/SOLR-5308
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrCloud
>Reporter: Shalin Shekhar Mangar
>Assignee: Shalin Shekhar Mangar
> Fix For: 4.6, 5.0
>
>
> Enable SolrCloud users to split out a set of documents from a source 
> collection into another collection.
> This will be useful in multi-tenant environments. This feature will make it 
> possible to split a tenant out of a collection and put them into their own 
> collection which can be scaled separately.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5320) Multi level compositeId router

2013-10-11 Thread Anshum Gupta (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792478#comment-13792478
 ] 

Anshum Gupta commented on SOLR-5320:


A 3 level composite id routing to begin with is what I think would be good.
I'd use 8 bits each from the first 2 components of the key and 16 bits from the 
last component.
Functionally, this should work on similar lines as the current 2-level 
composite id routing.

> Multi level compositeId router
> --
>
> Key: SOLR-5320
> URL: https://issues.apache.org/jira/browse/SOLR-5320
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrCloud
>Reporter: Anshum Gupta
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> This would enable multi level routing as compared to the 2 level routing 
> available as of now. On the usage bit, here's an example:
> Document Id: myapp!dummyuser!doc
> myapp!dummyuser! can be used as the shardkey for searching content for 
> dummyuser.
> myapp! can be used for searching across all users of myapp.
> I am looking at either a 3 (or 4) level routing. The 32 bit hash would then 
> comprise of 8X4 components from each part (in case of 4 level).



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5339) solr-core-4.4's ip is not right when the os is centos 5.6 sometimes

2013-10-11 Thread dejie Chang (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dejie Chang updated SOLR-5339:
--

Description: when I install the solr-cloud on the centos5.6 . it is strange 
that sometimes ,the ip is not correct which is displayed on the 
http://192.168.10.54:8081/solr/#/~cloud , it is 202.106.199.36,but my actual is 
192.168.10.54. but on the windows it is right. and i found it is because of 
hostaddress = InetAddress.getLocalHost().getHostAddress(); in ZkController.java 
. sometimes the method which get ip is not correct .we should not trust . so i 
think in linux we should not use this method   (was: when I install the 
solr-cloud on the centos5.6 . t)

> solr-core-4.4's ip is not right when the os is centos 5.6 sometimes 
> 
>
> Key: SOLR-5339
> URL: https://issues.apache.org/jira/browse/SOLR-5339
> Project: Solr
>  Issue Type: Bug
>  Components: contrib - Clustering
>Affects Versions: 4.4
> Environment: centos 5.6
>Reporter: dejie Chang
>Priority: Critical
>
> when I install the solr-cloud on the centos5.6 . it is strange that sometimes 
> ,the ip is not correct which is displayed on the 
> http://192.168.10.54:8081/solr/#/~cloud , it is 202.106.199.36,but my actual 
> is 192.168.10.54. but on the windows it is right. and i found it is because 
> of hostaddress = InetAddress.getLocalHost().getHostAddress(); in 
> ZkController.java . sometimes the method which get ip is not correct .we 
> should not trust . so i think in linux we should not use this method 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5338) Split shards by a route key

2013-10-11 Thread Shalin Shekhar Mangar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar updated SOLR-5338:


Attachment: SOLR-5338.patch

Changes:
* Introduces two new methods in CompositeIdRouter 
{code}
public List partitionRangeByKey(String key, Range range)
{code}
and
{code}
public Range routeKeyHashRange(String routeKey)
{code}
* The collection split action accepts a new parameter 'split.key'
* The parent slice is found and its range is partitioned according to split.key
* We re-use the logic introduced in SOLR-5300 to do the actual splitting. 

> Split shards by a route key
> ---
>
> Key: SOLR-5338
> URL: https://issues.apache.org/jira/browse/SOLR-5338
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Reporter: Shalin Shekhar Mangar
>Assignee: Shalin Shekhar Mangar
> Fix For: 4.6, 5.0
>
> Attachments: SOLR-5338.patch
>
>
> Provide a way to split a shard using a route key such that all documents of 
> the specified route key end up in a single dedicated sub-shard.
> Example:
> Assume that collection1, shard1 has hash range [0, 20]. Also that route key 
> 'A!' has hash range [12,15]. Then invoking:
> {code}
> /admin/collections?action=SPLIT&collection=collection1&split.key=A!
> {code}
> should produce three sub-shards with hash range [0,11], [12,15] and [16,20].
> Specifying the source shard is not required here because the route key is 
> enough to figure it out. Route keys spanning more than one shards will not be 
> supported.
> Note that the sub-shard with the hash range of the route key may also contain 
> documents for other route keys whose hashes collide.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-5339) solr-core-4.4's ip is not right when the os is centos 5.6 sometimes

2013-10-11 Thread dejie Chang (JIRA)
dejie Chang created SOLR-5339:
-

 Summary: solr-core-4.4's ip is not right when the os is centos 5.6 
sometimes 
 Key: SOLR-5339
 URL: https://issues.apache.org/jira/browse/SOLR-5339
 Project: Solr
  Issue Type: Bug
  Components: contrib - Clustering
Affects Versions: 4.4
 Environment: centos 5.6
Reporter: dejie Chang
Priority: Critical


when I install the solr-cloud on the centos5.6 . t



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5290) Warming up using search logs.

2013-10-11 Thread Minoru Osuka (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792471#comment-13792471
 ] 

Minoru Osuka commented on SOLR-5290:


The patch includes test code.

> Warming up using search logs.
> -
>
> Key: SOLR-5290
> URL: https://issues.apache.org/jira/browse/SOLR-5290
> Project: Solr
>  Issue Type: Wish
>  Components: search
>Affects Versions: 4.4
>Reporter: Minoru Osuka
>Priority: Minor
> Attachments: SOLR-5290.patch
>
>
> It is possible to warm up of cache automatically in newSearcher event, but it 
> is impossible to warm up of cache automatically in firstSearcher event 
> because there isn't old searcher.
> We describe queries in solrconfig.xml if we required to cache in 
> firstSearcher event like this:
> {code:xml}
> 
>   
> 
>   static firstSearcher warming in solrconfig.xml
> 
>   
> 
> {code}
> This setting is very statically. I want to query dynamically in firstSearcher 
> event when restart solr. So I paid my attention to the past search log. I 
> think if there are past search logs, it is possible to warm up of cache 
> automatically in firstSearcher event like an autowarming of the cache in 
> newSearcher event.
> I had created QueryLogSenderListener which extended QuerySenderListener.
> Sample definition in solrconfig.xml:
>  - directory : Specify the Solr log directory. (Required)
>  - regex : Describe the regular expression of log. (Required)
>  - encoding : Specify the Solr log encoding. (Default : UTF-8)
>  - count : Specify the number of the log to process. (Default : 100)
>  - paths : Specify the request handler name to process.
>  - exclude_params : Specify the request parameter to except.
> {code:xml}
> 
> 
>   
> 
>   static firstSearcher warming in solrconfig.xml
> 
>   
>   logs
>   UTF-8
>name="regex">
>   
> /select
>   
>   100
>   
> indent
> _
>   
> 
> {code}
> I'd like to propose this feature.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5269) TestRandomChains failure

2013-10-11 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792461#comment-13792461
 ] 

Uwe Schindler commented on LUCENE-5269:
---

bq. I didnt want new features mixed with bugfixes really 

I agree! But now we have the "new feature", so I just asked to add this as a 
separate entry in CHANGES.txt under "New features", just the new filter nothing 
more.

> TestRandomChains failure
> 
>
> Key: LUCENE-5269
> URL: https://issues.apache.org/jira/browse/LUCENE-5269
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Robert Muir
> Fix For: 4.5.1, 4.6, 5.0
>
> Attachments: LUCENE-5269.patch, LUCENE-5269.patch, LUCENE-5269.patch, 
> LUCENE-5269_test.patch, LUCENE-5269_test.patch, LUCENE-5269_test.patch
>
>
> One of EdgeNGramTokenizer, ShingleFilter, NGramTokenFilter is buggy, or 
> possibly only the combination of them conspiring together.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5260) Make older Suggesters more accepting of TermFreqPayloadIterator

2013-10-11 Thread Areek Zillur (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Areek Zillur updated LUCENE-5260:
-

Attachment: LUCENE-5260.patch

Uploaded Patch:
  - changed the input to lookup.build to take TermFreqPayloadIterator instead 
of TermFreqPayloadIterator 
  - made all suggesters compatible with termFreqPayloadIterator (but error if 
payload is present but cannot be used)
  - nuked all implementations of TermFreq and made them work with 
termFreqPayload instead (Except for SortedTermFreqIteratorWrapper). 
  - got rid of all the references to termFreqIter

Still todo:
  - actually nuke TermFreqIterator
  - change the names of the implementations to reflect that they are 
implementations of TermFreqPayloadIter
  - add tests to ensure that all the implementations work with payload
  - support payloads in SortedTermFreqIteratorWrapper

> Make older Suggesters more accepting of TermFreqPayloadIterator
> ---
>
> Key: LUCENE-5260
> URL: https://issues.apache.org/jira/browse/LUCENE-5260
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Reporter: Areek Zillur
> Attachments: LUCENE-5260.patch
>
>
> As discussed in https://issues.apache.org/jira/browse/LUCENE-5251, it would 
> be nice to make the older suggesters accepting of TermFreqPayloadIterator and 
> throw an exception if payload is found (if it cannot be used). 
> This will also allow us to nuke most of the other interfaces for 
> BytesRefIterator. 



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5310) Add a collection admin command to remove a replica

2013-10-11 Thread Noble Paul (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul updated SOLR-5310:
-

Attachment: SOLR-5310.patch

> Add a collection admin command to remove a replica
> --
>
> Key: SOLR-5310
> URL: https://issues.apache.org/jira/browse/SOLR-5310
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Reporter: Noble Paul
>Assignee: Noble Paul
> Attachments: SOLR-5310.patch
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> the only way a replica can removed is by unloading the core .There is no way 
> to remove a replica that is down . So, the clusterstate will have 
> unreferenced nodes if a few nodes go down over time
> We need a cluster admin command to clean that up
> e.g: 
> /admin/collections?action=DELETEREPLICA&collection=coll1&shard=shard1&replica=core_node3
> The system would first see if the replica is active. If yes , a core UNLOAD 
> command is fired , which would take care of deleting the replica from the 
> clusterstate as well
> if the state is inactive, then the core or node may be down , in that case 
> the entry is removed from cluster state  



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5338) Split shards by a route key

2013-10-11 Thread Shalin Shekhar Mangar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar updated SOLR-5338:


Description: 
Provide a way to split a shard using a route key such that all documents of the 
specified route key end up in a single dedicated sub-shard.

Example:
Assume that collection1, shard1 has hash range [0, 20]. Also that route key 
'A!' has hash range [12,15]. Then invoking:
{code}
/admin/collections?action=SPLIT&collection=collection1&split.key=A!
{code}
should produce three sub-shards with hash range [0,11], [12,15] and [16,20].

Specifying the source shard is not required here because the route key is 
enough to figure it out. Route keys spanning more than one shards will not be 
supported.

Note that the sub-shard with the hash range of the route key may also contain 
documents for other route keys whose hashes collide.



  was:
Provide a way to split a shard using a route key such that all documents of the 
specified route key end up in a single dedicated sub-shard.

Example:
Assume that collection1, shard1 has hash range [0, 20]. Also that route key 
'A!' has hash range [12,15]. Then invoking:
{code}
/admin/collections?action=SPLIT&collection=collection1&split.key=A!
{code}
should produce three sub-shards with hash range [0,11], [12,15] and [16,20]. 
Then the sub-shard dedicated to documents for route key 'A!' can be scaled 
separately.

Specifying the source shard is not required here because the route key is 
enough to figure it out.




> Split shards by a route key
> ---
>
> Key: SOLR-5338
> URL: https://issues.apache.org/jira/browse/SOLR-5338
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Reporter: Shalin Shekhar Mangar
>Assignee: Shalin Shekhar Mangar
> Fix For: 4.6, 5.0
>
>
> Provide a way to split a shard using a route key such that all documents of 
> the specified route key end up in a single dedicated sub-shard.
> Example:
> Assume that collection1, shard1 has hash range [0, 20]. Also that route key 
> 'A!' has hash range [12,15]. Then invoking:
> {code}
> /admin/collections?action=SPLIT&collection=collection1&split.key=A!
> {code}
> should produce three sub-shards with hash range [0,11], [12,15] and [16,20].
> Specifying the source shard is not required here because the route key is 
> enough to figure it out. Route keys spanning more than one shards will not be 
> supported.
> Note that the sub-shard with the hash range of the route key may also contain 
> documents for other route keys whose hashes collide.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5269) TestRandomChains failure

2013-10-11 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792429#comment-13792429
 ] 

Robert Muir commented on LUCENE-5269:
-

{quote}
This is so crazy! Why did we never hit this combination before?
{quote}

This combination is especially good at finding the bug, here's why:
{code}
Tokenizer tokenizer = new EdgeNGramTokenizer(TEST_VERSION_CURRENT, reader, 2, 
94);
TokenStream stream = new ShingleFilter(tokenizer, 5);
stream = new NGramTokenFilter(TEST_VERSION_CURRENT, stream, 55, 83);
{code}

The edge-ngram has min=2 max=94, its basically brute forcing every token size.
then the shingles makes tons of tokens with positionIncrement=0.
so it makes it easy for the (previously buggy ngramtokenfilter with wrong 
length filter) to misclassify tokens with its logic expecting codepoints, emit 
an initial token with posinc=0:

{code}
if ((curPos + curGramSize) <= curCodePointCount) {
...
  posIncAtt.setPositionIncrement(curPosInc);
{code}


> TestRandomChains failure
> 
>
> Key: LUCENE-5269
> URL: https://issues.apache.org/jira/browse/LUCENE-5269
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Robert Muir
> Fix For: 4.5.1, 4.6, 5.0
>
> Attachments: LUCENE-5269.patch, LUCENE-5269.patch, LUCENE-5269.patch, 
> LUCENE-5269_test.patch, LUCENE-5269_test.patch, LUCENE-5269_test.patch
>
>
> One of EdgeNGramTokenizer, ShingleFilter, NGramTokenFilter is buggy, or 
> possibly only the combination of them conspiring together.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5269) TestRandomChains failure

2013-10-11 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792424#comment-13792424
 ] 

Robert Muir commented on LUCENE-5269:
-

I didnt want new features mixed with bugfixes really :(

But in my opinion this was the simplest way to solve the problem: to just add a 
filter like this and for it to use that instead of LengthFilter.

I think it would be wierd to see "new features" in a 4.5.1?

> TestRandomChains failure
> 
>
> Key: LUCENE-5269
> URL: https://issues.apache.org/jira/browse/LUCENE-5269
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Robert Muir
> Fix For: 4.5.1, 4.6, 5.0
>
> Attachments: LUCENE-5269.patch, LUCENE-5269.patch, LUCENE-5269.patch, 
> LUCENE-5269_test.patch, LUCENE-5269_test.patch, LUCENE-5269_test.patch
>
>
> One of EdgeNGramTokenizer, ShingleFilter, NGramTokenFilter is buggy, or 
> possibly only the combination of them conspiring together.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org