[jira] [Updated] (SOLR-5111) Change SpellCheckComponent default analyzer when queryAnalyzerFieldType is not defined

2014-07-23 Thread Varun Thacker (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Thacker updated SOLR-5111:


Attachment: SOLR-5111.patch

- I created a custom analyzer which keeps the whitespace tokenizer but adds the 
lowercase filter thus it will give the same results to users who are relying on 
the WhitespaceAnalyzer behaviour.
- Added a test for the same
- Fixed some indentation in testThresholdTokenFrequency

[~jdyer] - This should work right?

> Change SpellCheckComponent default analyzer when queryAnalyzerFieldType is 
> not defined
> --
>
> Key: SOLR-5111
> URL: https://issues.apache.org/jira/browse/SOLR-5111
> Project: Solr
>  Issue Type: Improvement
>Reporter: Varun Thacker
>Priority: Minor
> Fix For: 4.9, 5.0
>
> Attachments: SOLR-5111.patch, SOLR-5111.patch
>
>
> In the collection1 example, the SpellCheckComponent uses the query analyzer 
> of "text_general" FieldType. If "queryAnalyzerFieldType" is removed from the 
> configuration a WhitespaceAnalyzer is used by default.
> I suggest we could change the default to SimpleAnalyzer so that "foo" and 
> "Foo" gives the same results and log that the analyzer is missing.
> Also are there more places in solrconfig which have dependencies on schema 
> like this?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (SOLR-6270) MultiThreadedOCPTest failures in jenkins

2014-07-23 Thread Shalin Shekhar Mangar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar reassigned SOLR-6270:
---

Assignee: Shalin Shekhar Mangar

> MultiThreadedOCPTest failures in jenkins
> 
>
> Key: SOLR-6270
> URL: https://issues.apache.org/jira/browse/SOLR-6270
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud, Tests
>Reporter: Shalin Shekhar Mangar
>Assignee: Shalin Shekhar Mangar
>Priority: Minor
> Fix For: 5.0, 4.10
>
> Attachments: SOLR-6270.patch
>
>
> Latest failure from jenkins:
> https://builds.apache.org/job/Lucene-Solr-Maven-trunk/1172/
> {code}
> FAILED:  org.apache.solr.cloud.MultiThreadedOCPTest.testDistribSearch
> Error Message:
> Task 3002 did not complete, final state: running
> Stack Trace:
> java.lang.AssertionError: Task 3002 did not complete, final state: running
> at 
> __randomizedtesting.SeedInfo.seed([A057826F41471802:21B10C773618783E]:0)
> at org.junit.Assert.fail(Assert.java:93)
> at org.junit.Assert.assertTrue(Assert.java:43)
> at 
> org.apache.solr.cloud.MultiThreadedOCPTest.testDeduplicationOfSubmittedTasks(MultiThreadedOCPTest.java:162)
> at 
> org.apache.solr.cloud.MultiThreadedOCPTest.doTest(MultiThreadedOCPTest.java:71)
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6270) MultiThreadedOCPTest failures in jenkins

2014-07-23 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14072825#comment-14072825
 ] 

ASF subversion and git services commented on SOLR-6270:
---

Commit 1613000 from sha...@apache.org in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1613000 ]

SOLR-6270: Increased timeouts for MultiThreadedOCPTest

> MultiThreadedOCPTest failures in jenkins
> 
>
> Key: SOLR-6270
> URL: https://issues.apache.org/jira/browse/SOLR-6270
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud, Tests
>Reporter: Shalin Shekhar Mangar
>Assignee: Shalin Shekhar Mangar
>Priority: Minor
> Fix For: 5.0, 4.10
>
> Attachments: SOLR-6270.patch
>
>
> Latest failure from jenkins:
> https://builds.apache.org/job/Lucene-Solr-Maven-trunk/1172/
> {code}
> FAILED:  org.apache.solr.cloud.MultiThreadedOCPTest.testDistribSearch
> Error Message:
> Task 3002 did not complete, final state: running
> Stack Trace:
> java.lang.AssertionError: Task 3002 did not complete, final state: running
> at 
> __randomizedtesting.SeedInfo.seed([A057826F41471802:21B10C773618783E]:0)
> at org.junit.Assert.fail(Assert.java:93)
> at org.junit.Assert.assertTrue(Assert.java:43)
> at 
> org.apache.solr.cloud.MultiThreadedOCPTest.testDeduplicationOfSubmittedTasks(MultiThreadedOCPTest.java:162)
> at 
> org.apache.solr.cloud.MultiThreadedOCPTest.doTest(MultiThreadedOCPTest.java:71)
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6270) MultiThreadedOCPTest failures in jenkins

2014-07-23 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14072824#comment-14072824
 ] 

ASF subversion and git services commented on SOLR-6270:
---

Commit 1612999 from sha...@apache.org in branch 'dev/trunk'
[ https://svn.apache.org/r1612999 ]

SOLR-6270: Increased timeouts for MultiThreadedOCPTest

> MultiThreadedOCPTest failures in jenkins
> 
>
> Key: SOLR-6270
> URL: https://issues.apache.org/jira/browse/SOLR-6270
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud, Tests
>Reporter: Shalin Shekhar Mangar
>Priority: Minor
> Fix For: 5.0, 4.10
>
> Attachments: SOLR-6270.patch
>
>
> Latest failure from jenkins:
> https://builds.apache.org/job/Lucene-Solr-Maven-trunk/1172/
> {code}
> FAILED:  org.apache.solr.cloud.MultiThreadedOCPTest.testDistribSearch
> Error Message:
> Task 3002 did not complete, final state: running
> Stack Trace:
> java.lang.AssertionError: Task 3002 did not complete, final state: running
> at 
> __randomizedtesting.SeedInfo.seed([A057826F41471802:21B10C773618783E]:0)
> at org.junit.Assert.fail(Assert.java:93)
> at org.junit.Assert.assertTrue(Assert.java:43)
> at 
> org.apache.solr.cloud.MultiThreadedOCPTest.testDeduplicationOfSubmittedTasks(MultiThreadedOCPTest.java:162)
> at 
> org.apache.solr.cloud.MultiThreadedOCPTest.doTest(MultiThreadedOCPTest.java:71)
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-6270) MultiThreadedOCPTest failures in jenkins

2014-07-23 Thread Shalin Shekhar Mangar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar updated SOLR-6270:


Attachment: SOLR-6270.patch

Keep polling until we succeed but abort if we've been waiting for more than 5 
minutes.

> MultiThreadedOCPTest failures in jenkins
> 
>
> Key: SOLR-6270
> URL: https://issues.apache.org/jira/browse/SOLR-6270
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud, Tests
>Reporter: Shalin Shekhar Mangar
>Priority: Minor
> Fix For: 5.0, 4.10
>
> Attachments: SOLR-6270.patch
>
>
> Latest failure from jenkins:
> https://builds.apache.org/job/Lucene-Solr-Maven-trunk/1172/
> {code}
> FAILED:  org.apache.solr.cloud.MultiThreadedOCPTest.testDistribSearch
> Error Message:
> Task 3002 did not complete, final state: running
> Stack Trace:
> java.lang.AssertionError: Task 3002 did not complete, final state: running
> at 
> __randomizedtesting.SeedInfo.seed([A057826F41471802:21B10C773618783E]:0)
> at org.junit.Assert.fail(Assert.java:93)
> at org.junit.Assert.assertTrue(Assert.java:43)
> at 
> org.apache.solr.cloud.MultiThreadedOCPTest.testDeduplicationOfSubmittedTasks(MultiThreadedOCPTest.java:162)
> at 
> org.apache.solr.cloud.MultiThreadedOCPTest.doTest(MultiThreadedOCPTest.java:71)
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6270) MultiThreadedOCPTest failures in jenkins

2014-07-23 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14072793#comment-14072793
 ] 

Shalin Shekhar Mangar commented on SOLR-6270:
-

These are spurious failures. We should just wait for as long as required for 
the tasks to succeed.

> MultiThreadedOCPTest failures in jenkins
> 
>
> Key: SOLR-6270
> URL: https://issues.apache.org/jira/browse/SOLR-6270
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud, Tests
>Reporter: Shalin Shekhar Mangar
>Priority: Minor
> Fix For: 5.0, 4.10
>
>
> Latest failure from jenkins:
> https://builds.apache.org/job/Lucene-Solr-Maven-trunk/1172/
> {code}
> FAILED:  org.apache.solr.cloud.MultiThreadedOCPTest.testDistribSearch
> Error Message:
> Task 3002 did not complete, final state: running
> Stack Trace:
> java.lang.AssertionError: Task 3002 did not complete, final state: running
> at 
> __randomizedtesting.SeedInfo.seed([A057826F41471802:21B10C773618783E]:0)
> at org.junit.Assert.fail(Assert.java:93)
> at org.junit.Assert.assertTrue(Assert.java:43)
> at 
> org.apache.solr.cloud.MultiThreadedOCPTest.testDeduplicationOfSubmittedTasks(MultiThreadedOCPTest.java:162)
> at 
> org.apache.solr.cloud.MultiThreadedOCPTest.doTest(MultiThreadedOCPTest.java:71)
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-6270) MultiThreadedOCPTest failures in jenkins

2014-07-23 Thread Shalin Shekhar Mangar (JIRA)
Shalin Shekhar Mangar created SOLR-6270:
---

 Summary: MultiThreadedOCPTest failures in jenkins
 Key: SOLR-6270
 URL: https://issues.apache.org/jira/browse/SOLR-6270
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud, Tests
Reporter: Shalin Shekhar Mangar
Priority: Minor
 Fix For: 5.0, 4.10


Latest failure from jenkins:
https://builds.apache.org/job/Lucene-Solr-Maven-trunk/1172/

{code}
FAILED:  org.apache.solr.cloud.MultiThreadedOCPTest.testDistribSearch

Error Message:
Task 3002 did not complete, final state: running

Stack Trace:
java.lang.AssertionError: Task 3002 did not complete, final state: running
at 
__randomizedtesting.SeedInfo.seed([A057826F41471802:21B10C773618783E]:0)
at org.junit.Assert.fail(Assert.java:93)
at org.junit.Assert.assertTrue(Assert.java:43)
at 
org.apache.solr.cloud.MultiThreadedOCPTest.testDeduplicationOfSubmittedTasks(MultiThreadedOCPTest.java:162)
at 
org.apache.solr.cloud.MultiThreadedOCPTest.doTest(MultiThreadedOCPTest.java:71)
{code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-5852) Add CloudSolrServer helper method to connect to a ZK ensemble

2014-07-23 Thread Varun Thacker (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-5852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Thacker updated SOLR-5852:


Attachment: SOLR-5852.patch

- Updated [~elyograg]'s patch to trunk 
- Modified CloudSolrServerMultiConstructorTest to make the tests random.

> Add CloudSolrServer helper method to connect to a ZK ensemble
> -
>
> Key: SOLR-5852
> URL: https://issues.apache.org/jira/browse/SOLR-5852
> Project: Solr
>  Issue Type: Improvement
>Reporter: Varun Thacker
> Attachments: SOLR-5852-SH.patch, SOLR-5852-SH.patch, SOLR-5852.patch, 
> SOLR-5852.patch, SOLR-5852.patch, SOLR-5852.patch, SOLR-5852_FK.patch, 
> SOLR-5852_FK.patch
>
>
> We should have a CloudSolrServer constructor which takes a list of ZK servers 
> to connect to.
> Something Like 
> {noformat}
> public CloudSolrServer(String... zkHost);
> {noformat}
> - Document the current constructor better to mention that to connect to a ZK 
> ensemble you can pass a comma-delimited list of ZK servers like 
> zk1:2181,zk2:2181,zk3:2181
> - Thirdly should getLbServer() and getZKStatereader() be public?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4396) BooleanScorer should sometimes be used for MUST clauses

2014-07-23 Thread Da Huang (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Da Huang updated LUCENE-4396:
-

Attachment: LUCENE-4396.patch

This is the first try to merge scorers, so that we can get a better perf of 
boolean retrieval.

I create a new class named "BooleanMixedScorerDecider" to choose the best 
scorer.
Rules for choosing remains to be improved. I have been working on it to find an 
elegant way to define rules.
{code}
TaskQPS baseline  StdDevQPS my_version  StdDev  
  Pct diff
   HighAndSomeLowNot   11.53  (7.3%)   10.75 (10.1%)   
-6.8% ( -22% -   11%)
   HighAndTonsLowNot4.87  (4.0%)4.64  (6.0%)   
-4.9% ( -14% -5%)
 LowAndSomeLowOr  306.20  (2.2%)  299.06  (2.8%)   
-2.3% (  -7% -2%)
HighAndSomeLowOr   13.67  (9.4%)   13.38  (2.7%)   
-2.1% ( -13% -   11%)
HighAndTonsLowOr4.04  (6.4%)3.96  (1.9%)   
-1.9% (  -9% -6%)
LowAndSomeLowNot  215.18  (1.9%)  211.14  (2.2%)   
-1.9% (  -5% -2%)
PKLookup   96.26  (2.3%)   94.56  (2.8%)   
-1.8% (  -6% -3%)
  HighAndTonsHighNot0.06  (2.3%)0.06  (2.6%)   
-1.0% (  -5% -4%)
   HighAndTonsHighOr0.06  (0.6%)0.06  (1.3%)
0.9% (   0% -2%)
  HighAndSomeHighNot1.59  (2.2%)1.62  (2.9%)
1.7% (  -3% -6%)
   LowAndSomeHighNot   66.33  (2.1%)   68.77  (2.1%)
3.7% (   0% -8%)
LowAndSomeHighOr   53.75  (1.6%)   56.86  (2.1%)
5.8% (   1% -9%)
LowAndTonsLowNot   14.00  (1.7%)   14.84  (1.5%)
6.1% (   2% -9%)
   HighAndSomeHighOr2.39  (2.2%)2.68  (3.5%)   
12.4% (   6% -   18%)
 LowAndTonsLowOr   17.69  (0.9%)   21.64  (1.7%)   
22.3% (  19% -   25%)
LowAndTonsHighOr1.83  (1.3%)2.33  (2.4%)   
27.2% (  23% -   31%)
   LowAndTonsHighNot1.15  (1.5%)1.51  (3.1%)   
30.9% (  25% -   36%)
{code}

> BooleanScorer should sometimes be used for MUST clauses
> ---
>
> Key: LUCENE-4396
> URL: https://issues.apache.org/jira/browse/LUCENE-4396
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael McCandless
> Attachments: And.tasks, AndOr.tasks, AndOr.tasks, LUCENE-4396.patch, 
> LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, 
> LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, 
> LUCENE-4396.patch, SIZE.perf, all.perf, luceneutil-score-equal.patch, 
> luceneutil-score-equal.patch, stat.cpp, stat.cpp
>
>
> Today we only use BooleanScorer if the query consists of SHOULD and MUST_NOT.
> If there is one or more MUST clauses we always use BooleanScorer2.
> But I suspect that unless the MUST clauses have very low hit count compared 
> to the other clauses, that BooleanScorer would perform better than 
> BooleanScorer2.  BooleanScorer still has some vestiges from when it used to 
> handle MUST so it shouldn't be hard to bring back this capability ... I think 
> the challenging part might be the heuristics on when to use which (likely we 
> would have to use firstDocID as proxy for total hit count).
> Likely we should also have BooleanScorer sometimes use .advance() on the subs 
> in this case, eg if suddenly the MUST clause skips 100 docs then you want 
> to .advance() all the SHOULD clauses.
> I won't have near term time to work on this so feel free to take it if you 
> are inspired!



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2894) Implement distributed pivot faceting

2014-07-23 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-2894:
---

Attachment: SOLR-2894.patch


hey guys, stoked to see all these tests passing!

I've been slowly working my way through Andrew's latest patch, reviewing all 
the code and making some tweaks/improvements as I go.  Here's a checkpointed 
update...

{panel}
Patch updates in attachment:

* fix FacetComponent to mirror refactoring done in SOLR-6216 
* fixed up the String.format calls in various classes so they specify 
Locale.ROOT
** removed some useless "toString()" calls in these format calls as well, 
particularly since it looked like they could cause NPEs
* PivotFacetField
** javadocs:
*** createFromListOfNamedLists
*** convertToListOfNamedLists
** eliminate call to {{PivotFacetFieldValueCollection.contains(...)}} (see 
below)
* PivotFacetValue...
** javadocs:
*** class
*** createFromNamedList
*** shardHasContributed
*** convertToNamedList
* PivotFacetFieldValueCollection...
** javadocs:
*** class
*** refinableCollection
*** refinableSubList
*** refinableSize
*** size
*** get
*** add
** remove unused methods
*** isEmpty()
*** getValue(Comparable)
*** contains(Comparable)
 (this was used, but only in a case where it was immediately followed by a 
call {{get(Comparable)}} so i just optimized it away and replaced it with a 
null check.
** rename: "isSorted" -> "dirty"
** rename: "nullValue" -> "missingValue"
*** it was really confusing because "nullValue" could be null, or it could be a 
PivotFacetValue whose value was null
** fix {{add(PivotFacetValue)}} to set "dirty" directly
** lock down some stuff...
*** methods for accessing some vars so they don't need to be public
*** make some things specified in constructor final
*** make {{refinableCollection}} and {{refinableSubList}} return immutable lists
{panel}


Some things i'm either confused by and/or debating in my head ... 
comments/opinions from others would be apreciated:

* refinement and facet offset
** I haven't looed into this closely, but i noticed the refinement code seems 
to only refine things started at the "facetFieldOffset," of the current 
collection
** don't we need to refine all the values, starting from the beginging of the 
list?
** if if the offset is "1" and the first value X has a count of "100" and the 
second value Y has an initial count of "50" but a post-refinement count of 
"150" pushing itself prior to the offset and putting X into the window, then 
doesn't X miss out on refinement?

* {{refinableCollection()}}
** I think we probably want to rename {{refinableCollection()}} (and 
{{refinableSize()}}) to something more like "{{getExplicitValuesList()}} 
(compared to the {{getMissingValue()}} method I just added) to make it more 
clear what you are really getting form this method ... I recognize that this 
name comes from the fact that we don't ever really need to refine the count for 
the missing value, but that seems like an implementaion detail that doesn't 
affect a lot of places this method is called (and particularly since the 
childPivots of the missing value _do_ still need refined so even when it is 
relevant, it's still missleading from a recursion standpoint.)

* trim
** from what i can understand of the {{trim}} methods - these are typically 
destructive operations that: 
*** should only be called after all refinement is completed
*** prune things that are no longer needed based on the limit/offset params, 
making the objects unusable for any future modifications/refinement so that 
it's only good for...
*** should be called just prior to asking for the final NamedList response 
structure
** if my understanding is correct, then it seems like it might be safer & more 
straight forward to instead just refactor this functionality directly into the 
corrisponding methods for converting to a NamedList, and clearly document those 
methods as destructive?
*** or at the very least add a "trimmed" boolean and sprinkle arround some 
asserts in the various methods related to wether the object has/has not already 
been trimmed



> Implement distributed pivot faceting
> 
>
> Key: SOLR-2894
> URL: https://issues.apache.org/jira/browse/SOLR-2894
> Project: Solr
>  Issue Type: Improvement
>Reporter: Erik Hatcher
>Assignee: Hoss Man
> Fix For: 4.9, 5.0
>
> Attachments: SOLR-2894-mincount-minification.patch, 
> SOLR-2894-reworked.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
> SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
> SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
> SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
> SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 
> SOLR-2894.patch, SOLR-2894.patch, SOLR-2894.patch, 

[jira] [Commented] (SOLR-3881) frequent OOM in LanguageIdentifierUpdateProcessor

2014-07-23 Thread Steve Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14072529#comment-14072529
 ] 

Steve Rowe commented on SOLR-3881:
--

Vitaliy, those changes look good.

About moving {{concatFields()}} to the tika language identifier: I think the 
way to go is just move the whole method there, then change the 
{{detectLanguage()}} method to take the {{SolrInputDocument}} instead of a 
String.  You don't need to carry over the {{field[]}} parameter from 
{{concatFields()}}, since data member {{inputFields}} will be accessible 
everywhere it's needed. 

I should have mentioned previously: I don't like the {{maxAppendSize}} and 
{{maxTotalAppendSize}} names - "size" is ambiguous (could refer to bytes, 
chars, whatever), and "append" refers to an internal operation... I'd like to 
see "append"=>"field value" and "size"=>"chars": {{maxFieldValueChars}}, and 
{{maxTotalChars}} (since appending doesn't need to be mentioned for the global 
limit).  The same thing goes for the default constants and the test method 
names.

Some minor issues I found with your patch:

# As I said previously: "We should also set default maxima for both per-value 
and total chars, rather than MAX_INT, as in the current patch."
# The total chars default should be its own setting; I was thinking we could 
make it double the per-value default?
# It's better not to reorder import statements unless you're already making 
significant changes to them; it distracts from the meat of the change.  (You 
reordered them in {{LangDetectLanguageIdentifierUpdateProcessor}} and 
{{LanguageIdentifierUpdateProcessorFactoryTestCase}})
# In {{LanguageIdentifierUpdateProcessor.concatFields()}}, when you trim the 
concatenated text to {{maxTotalAppendSize}}, I think 
{{StringBuilder.setLength(maxTotalAppendSize);}} would be more efficient than 
{{StringBuilder.delete(maxTotalAppendSize, sb.length() - 1);}} 
# In addition to the test you added for the global limit, we should also test 
using both the per-value and global limits at the same time.
 



> frequent OOM in LanguageIdentifierUpdateProcessor
> -
>
> Key: SOLR-3881
> URL: https://issues.apache.org/jira/browse/SOLR-3881
> Project: Solr
>  Issue Type: Bug
>  Components: update
>Affects Versions: 4.0
> Environment: CentOS 6.x, JDK 1.6, (java -server -Xms2G -Xmx2G 
> -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=)
>Reporter: Rob Tulloh
> Fix For: 4.9, 5.0
>
> Attachments: SOLR-3881.patch, SOLR-3881.patch, SOLR-3881.patch
>
>
> We are seeing frequent failures from Solr causing it to OOM. Here is the 
> stack trace we observe when this happens:
> {noformat}
> Caused by: java.lang.OutOfMemoryError: Java heap space
> at java.util.Arrays.copyOf(Arrays.java:2882)
> at 
> java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100)
> at 
> java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:390)
> at java.lang.StringBuffer.append(StringBuffer.java:224)
> at 
> org.apache.solr.update.processor.LanguageIdentifierUpdateProcessor.concatFields(LanguageIdentifierUpdateProcessor.java:286)
> at 
> org.apache.solr.update.processor.LanguageIdentifierUpdateProcessor.process(LanguageIdentifierUpdateProcessor.java:189)
> at 
> org.apache.solr.update.processor.LanguageIdentifierUpdateProcessor.processAdd(LanguageIdentifierUpdateProcessor.java:171)
> at 
> org.apache.solr.handler.BinaryUpdateRequestHandler$2.update(BinaryUpdateRequestHandler.java:90)
> at 
> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:140)
> at 
> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readIterator(JavaBinUpdateRequestCodec.java:120)
> at 
> org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:221)
> at 
> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readNamedList(JavaBinUpdateRequestCodec.java:105)
> at 
> org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:186)
> at 
> org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:112)
> at 
> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:147)
> at 
> org.apache.solr.handler.BinaryUpdateRequestHandler.parseAndLoadDocs(BinaryUpdateRequestHandler.java:100)
> at 
> org.apache.solr.handler.BinaryUpdateRequestHandler.access$000(BinaryUpdateRequestHandler.java:47)
> at 
> org.apache.solr.handler.BinaryUpdateRequestHandler$1.load(BinaryUpdateRequestHandler.java:58)
> at 
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandle

[jira] [Assigned] (SOLR-6267) Let user override Interval Faceting key with LocalParams

2014-07-23 Thread Erick Erickson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson reassigned SOLR-6267:


Assignee: Erick Erickson

> Let user override Interval Faceting key with LocalParams
> 
>
> Key: SOLR-6267
> URL: https://issues.apache.org/jira/browse/SOLR-6267
> Project: Solr
>  Issue Type: Improvement
>Reporter: Tomás Fernández Löbbe
>Assignee: Erick Erickson
> Attachments: SOLR-6267.patch
>
>
> This issue is related to Interval Faceting, being worked at SOLR-6216. Right 
> now they key of each interval is the string of the interval as entered in the 
> request. For example:
> {noformat}
> [*,20)
> [20,40)
> [40,*]
> {noformat}
> would output something like
> {noformat}
> "facet_intervals":{
>   "size":{
> "[*,20)":3,
> "[20,40)":4,
> "[40,*]":9}}
> {noformat}
> It would be good to be able to override the "key" per interval using local 
> params, for example:
> {noformat}
> {!key='small'}[,20)
> {!key='medium'}[20,40)
> {!key='large'}[40,]
> {noformat}
> Would output:
> {noformat}
> "facet_intervals":{
>   "size":{
> "small":3,
> "medium":4,
> "large":9}}
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-6216) Better faceting for multiple intervals on DV fields

2014-07-23 Thread Erick Erickson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson resolved SOLR-6216.
--

   Resolution: Fixed
Fix Version/s: 4.10

Thanks Tomás!

> Better faceting for multiple intervals on DV fields
> ---
>
> Key: SOLR-6216
> URL: https://issues.apache.org/jira/browse/SOLR-6216
> Project: Solr
>  Issue Type: Improvement
>Reporter: Tomás Fernández Löbbe
>Assignee: Erick Erickson
> Fix For: 4.10
>
> Attachments: SOLR-6216.patch, SOLR-6216.patch, SOLR-6216.patch, 
> SOLR-6216.patch, SOLR-6216.patch, SOLR-6216.patch, SOLR-6216.patch, 
> SOLR-6216.patch, SOLR-6216.patch, SOLR-6216.patch
>
>
> There are two ways to have faceting on values ranges in Solr right now: 
> “Range Faceting” and “Query Faceting” (doing range queries). They both end up 
> doing something similar:
> {code:java}
> searcher.numDocs(rangeQ , docs)
> {code}
> The good thing about this implementation is that it can benefit from caching. 
> The bad thing is that it may be slow with cold caches, and that there will be 
> a query for each of the ranges.
> A different implementation would be one that works similar to regular field 
> faceting, using doc values and validating ranges for each value of the 
> matching documents. This implementation would sometimes be faster than Range 
> Faceting / Query Faceting, specially on cases where caches are not very 
> effective, like on a high update rate, or where ranges change frequently.
> Functionally, the result should be exactly the same as the one obtained by 
> doing a facet query for every interval



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6216) Better faceting for multiple intervals on DV fields

2014-07-23 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14072442#comment-14072442
 ] 

ASF subversion and git services commented on SOLR-6216:
---

Commit 1612958 from [~erickoerickson] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1612958 ]

SOLR-6216: Better faceting for multiple intervals on DV fields. Thanks Tomas

> Better faceting for multiple intervals on DV fields
> ---
>
> Key: SOLR-6216
> URL: https://issues.apache.org/jira/browse/SOLR-6216
> Project: Solr
>  Issue Type: Improvement
>Reporter: Tomás Fernández Löbbe
>Assignee: Erick Erickson
> Attachments: SOLR-6216.patch, SOLR-6216.patch, SOLR-6216.patch, 
> SOLR-6216.patch, SOLR-6216.patch, SOLR-6216.patch, SOLR-6216.patch, 
> SOLR-6216.patch, SOLR-6216.patch, SOLR-6216.patch
>
>
> There are two ways to have faceting on values ranges in Solr right now: 
> “Range Faceting” and “Query Faceting” (doing range queries). They both end up 
> doing something similar:
> {code:java}
> searcher.numDocs(rangeQ , docs)
> {code}
> The good thing about this implementation is that it can benefit from caching. 
> The bad thing is that it may be slow with cold caches, and that there will be 
> a query for each of the ranges.
> A different implementation would be one that works similar to regular field 
> faceting, using doc values and validating ranges for each value of the 
> matching documents. This implementation would sometimes be faster than Range 
> Faceting / Query Faceting, specially on cases where caches are not very 
> effective, like on a high update rate, or where ranges change frequently.
> Functionally, the result should be exactly the same as the one obtained by 
> doing a facet query for every interval



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5843) IndexWriter should refuse to create an index with more than INT_MAX docs

2014-07-23 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14072414#comment-14072414
 ] 

Uwe Schindler commented on LUCENE-5843:
---

bq. I'm not sure what IW does today if you create a too-big index but it's 
probably horrible; it may succeed and then at search time you hit nasty 
exceptions when we overflow int.

If a single segment while merging exceeds the limit, its horrible. If you have 
an index that exceeds the limit, you get an Exception when opening: 
BaseCompositeReader throws Exception in its ctor:

{code:java}
  maxDoc += r.maxDoc();  // compute maxDocs
  if (maxDoc < 0 /* overflow */) {
throw new IllegalArgumentException("Too many documents, composite 
IndexReaders cannot exceed " + Integer.MAX_VALUE);
  }
{code}

The limit is MAX_VALUE, the -1 is just a stupid limitation of TopDocs, but it 
is actually smaller, because arrays have a maximum size in Java. 
DocIdSetIterators sentinel is not a problem, because its simply the last 
document (MAX_VALUE), which is always the last possible one (the iterator is 
always exhausted is you reach the last doc).

> IndexWriter should refuse to create an index with more than INT_MAX docs
> 
>
> Key: LUCENE-5843
> URL: https://issues.apache.org/jira/browse/LUCENE-5843
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/index
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 5.0, 4.10
>
>
> It's more and more common for users these days to create very large indices, 
> e.g.  indexing lines from log files, or packets on a network, etc., and it's 
> not hard to accidentally exceed the maximum number of documents in one index.
> I think the limit is actually Integer.MAX_VALUE-1 docs, because we use that 
> value as a sentinel during searching.
> I'm not sure what IW does today if you create a too-big index but it's 
> probably horrible; it may succeed and then at search time you hit nasty 
> exceptions when we overflow int.
> I think it should throw an IndexFullException instead.  It'd be nice if we 
> could do this on the very doc that when added would go over the limit, but I 
> would also settle for just throwing at flush as well ... i.e. I think what's 
> really important is that the index does not become unusable.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-6248) MoreLikeThis Query Parser

2014-07-23 Thread Vitaliy Zhovtyuk (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Vitaliy Zhovtyuk updated SOLR-6248:
---

Attachment: SOLR-6248.patch

Added mlt qparser, works in single and cloud mode. Added support for numeric 
id. 
Result of mlt written as query result - not in MoreLikeThis. Added tests to 
call in single and cloud modes.

> MoreLikeThis Query Parser
> -
>
> Key: SOLR-6248
> URL: https://issues.apache.org/jira/browse/SOLR-6248
> Project: Solr
>  Issue Type: New Feature
>Reporter: Anshum Gupta
> Attachments: SOLR-6248.patch
>
>
> MLT Component doesn't let people highlight/paginate and the handler comes 
> with an cost of maintaining another piece in the config. Also, any changes to 
> the default (number of results to be fetched etc.) /select handler need to be 
> copied/synced with this handler too.
> Having an MLT QParser would let users get back docs based on a query for them 
> to paginate, highlight etc. It would also give them the flexibility to use 
> this anywhere i.e. q,fq,bq etc.
> A bit of history about MLT (thanks to Hoss)
> MLT Handler pre-dates the existence of QParsers and was meant to take an 
> arbitrary query as input, find docs that match that 
> query, club them together to find interesting terms, and then use those 
> terms as if they were my main query to generate a main result set.
> This result would then be used as the set to facet, highlight etc.
> The flow: Query -> DocList(m) -> Bag (terms) -> Query -> DocList\(y)
> The MLT component on the other hand solved a very different purpose of 
> augmenting the main result set. It is used to get similar docs for each of 
> the doc in the main result set.
> DocSet\(n) -> n * Bag (terms) -> n * (Query) -> n * DocList(m)
> The new approach:
> All of this can be done better and cleaner (and makes more sense too) using 
> an MLT QParser.
> An important thing to handle here is the case where the user doesn't have 
> TermVectors, in which case, it does what happens right now i.e. parsing 
> stored fields.
> Also, in case the user doesn't have a field (to be used for MLT) indexed, the 
> field would need to be a TextField with an index analyzer defined. This 
> analyzer will then be used to extract terms for MLT.
> In case of SolrCloud mode, '/get-termvectors' can be used after looking at 
> the schema (if TermVectors are enabled for the field). If not, a /get call 
> can be used to fetch the field and parse it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS-MAVEN] Lucene-Solr-Maven-trunk #1172: POMs out of sync

2014-07-23 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-Maven-trunk/1172/

2 tests failed.
FAILED:  org.apache.solr.cloud.CollectionsAPIDistributedZkTest.testDistribSearch

Error Message:
Could not find the new collection - 503 : 
https://127.0.0.1:41221/awholynewcollection_0

Stack Trace:
java.lang.AssertionError: Could not find the new collection - 503 : 
https://127.0.0.1:41221/awholynewcollection_0
at 
__randomizedtesting.SeedInfo.seed([FF65550B83DBE450:7E83DB13F484846C]:0)
at org.junit.Assert.fail(Assert.java:93)
at 
org.apache.solr.cloud.AbstractFullDistribZkTestBase.waitForNon403or404or503(AbstractFullDistribZkTestBase.java:1700)
at 
org.apache.solr.cloud.CollectionsAPIDistributedZkTest.testCollectionsAPI(CollectionsAPIDistributedZkTest.java:750)
at 
org.apache.solr.cloud.CollectionsAPIDistributedZkTest.doTest(CollectionsAPIDistributedZkTest.java:203)


FAILED:  org.apache.solr.cloud.MultiThreadedOCPTest.testDistribSearch

Error Message:
Task 3002 did not complete, final state: running

Stack Trace:
java.lang.AssertionError: Task 3002 did not complete, final state: running
at 
__randomizedtesting.SeedInfo.seed([A057826F41471802:21B10C773618783E]:0)
at org.junit.Assert.fail(Assert.java:93)
at org.junit.Assert.assertTrue(Assert.java:43)
at 
org.apache.solr.cloud.MultiThreadedOCPTest.testDeduplicationOfSubmittedTasks(MultiThreadedOCPTest.java:162)
at 
org.apache.solr.cloud.MultiThreadedOCPTest.doTest(MultiThreadedOCPTest.java:71)




Build Log:
[...truncated 55096 lines...]
BUILD FAILED
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-Maven-trunk/build.xml:490: 
The following error occurred while executing this line:
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-Maven-trunk/build.xml:182: 
The following error occurred while executing this line:
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-Maven-trunk/extra-targets.xml:77:
 Java returned: 1

Total time: 268 minutes 4 seconds
Build step 'Invoke Ant' marked build as failure
Recording test results
Email was triggered for: Failure
Sending email for trigger: Failure



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6232) Allow cores that have failed to init to be deleted via CoreAdminHandler

2014-07-23 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14072315#comment-14072315
 ] 

ASF subversion and git services commented on SOLR-6232:
---

Commit 1612942 from hoss...@apache.org in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1612942 ]

SOLR-6232: fix stupid accidental commit

> Allow cores that have failed to init to be deleted via CoreAdminHandler
> ---
>
> Key: SOLR-6232
> URL: https://issues.apache.org/jira/browse/SOLR-6232
> Project: Solr
>  Issue Type: Improvement
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Minor
> Fix For: 4.10
>
> Attachments: SOLR-6232.patch
>
>
> If a core fails to init due to index corruption or something similar, it 
> can't currently be removed with an UNLOAD command, you have to go do it 
> manually.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5843) IndexWriter should refuse to create an index with more than INT_MAX docs

2014-07-23 Thread Jack Krupansky (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14072300#comment-14072300
 ] 

Jack Krupansky commented on LUCENE-5843:


That Solr Jira has my comments as well, but I just want to reiterate that the 
actual limit should be more clearly documented. I filed a Jira for that quite 
awhile ago - LUCENE-4104. And if this new issue will resolve the problem, 
please mark my old LUCENE-4105 issue as a duplicate.


> IndexWriter should refuse to create an index with more than INT_MAX docs
> 
>
> Key: LUCENE-5843
> URL: https://issues.apache.org/jira/browse/LUCENE-5843
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/index
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 5.0, 4.10
>
>
> It's more and more common for users these days to create very large indices, 
> e.g.  indexing lines from log files, or packets on a network, etc., and it's 
> not hard to accidentally exceed the maximum number of documents in one index.
> I think the limit is actually Integer.MAX_VALUE-1 docs, because we use that 
> value as a sentinel during searching.
> I'm not sure what IW does today if you create a too-big index but it's 
> probably horrible; it may succeed and then at search time you hit nasty 
> exceptions when we overflow int.
> I think it should throw an IndexFullException instead.  It'd be nice if we 
> could do this on the very doc that when added would go over the limit, but I 
> would also settle for just throwing at flush as well ... i.e. I think what's 
> really important is that the index does not become unusable.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6167) SolrDocumentList cannot be cast to ResultContext on Solr 4.8.0

2014-07-23 Thread Guido (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14072280#comment-14072280
 ] 

Guido commented on SOLR-6167:
-

You was right: I solved this issue by modifying my library accordingly to the 
new version of Solr.

Thank you very much for your help.

Best Regards,

Guido

> SolrDocumentList cannot be cast to ResultContext on Solr 4.8.0
> --
>
> Key: SOLR-6167
> URL: https://issues.apache.org/jira/browse/SOLR-6167
> Project: Solr
>  Issue Type: Bug
>  Components: clients - php
>Affects Versions: 4.8
> Environment: Linux
>Reporter: Guido
>  Labels: ResultContext, SolrSolrDocumentList, solrcloud
>
> Hello,
> I have a running instance of Solr 4.2.1 (I will refer to it as 'instance A') 
> and I am trying to pass to Solr 4.8.0. For this purpose, I created a copy of 
> the machine which hosts 'instance A' (I will refer to the copy as 'instance 
> B') and, on this new machine, I have installed Solr 4.8.0 and I have created 
> a collection with 4 shards. Then, on the new machine, I have iterated over a 
> copy of the index built on instance A and I have re-indexed all the documents 
> and inserted them inside the new collection of Solr 4.8.0. I have completed 
> this process, the two index seem to contain the same documents and I am 
> successfully able to query properly both the indexes.
> When I try to run the following query on 'instance A':
> http://instance_A_ip:23332/solr/myindex/query?q=id%3Amyid&fl=html&wt=html
> I get a well formatted html response. Unfortunately, when I try the same 
> query on the 'instance B'
> http://instance_B_ip:23332/solr/mycollection/query?q=id%3Amyid&fl=html&wt=html
> I receive the following error code:
> HTTP Status 500 - {msg=org.apache.solr.common.SolrDocumentList cannot be cast 
> to org.apache.solr.response.ResultContext,trace=java.lang.ClassCastException: 
> org.apache.solr.common.SolrDocumentList cannot be cast to 
> org.apache.solr.response.ResultContext at 
> my.library.solr.HtmlResponseWriter.write(HtmlResponseWriter.java:33) at 
> org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:762)
>  at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:431)
>  at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
>  at 
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
>  at 
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
>  at 
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
>  at 
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
>  at 
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) 
> at 
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:103) 
> at 
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
>  at 
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293) 
> at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:861) 
> at 
> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:606)
>  at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489) 
> at java.lang.Thread.run(Thread.java:744) ,code=500}
> This is the error that I grep from the solr log file:
>  org.apache.solr.common.SolrException; null:java.lang.ClassCastException: 
> org.apache.solr.common.SolrDocumentList cannot be cast to 
> org.apache.solr.response.ResultContext
> at 
> my.library.solr.HtmlResponseWriter.write(HtmlResponseWriter.java:33)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:762)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:431)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
> at 
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
> at 
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
> at 
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
> at 
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
> at 
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
> at 
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:103)
> at 
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
> at 
> org.apache.catalina.connector.CoyoteAdapter.service(Co

[jira] [Commented] (SOLR-6216) Better faceting for multiple intervals on DV fields

2014-07-23 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-6216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14072245#comment-14072245
 ] 

Tomás Fernández Löbbe commented on SOLR-6216:
-

Thanks for committing this Erick. I tested this in 4x and tests are passing 
too. The only thing that needs to be added in addition to the merge is 
"Lucene3x" to the list of codecs to skip (in DistributedIntervalFacetingTest 
and TestIntervalFaceting)

> Better faceting for multiple intervals on DV fields
> ---
>
> Key: SOLR-6216
> URL: https://issues.apache.org/jira/browse/SOLR-6216
> Project: Solr
>  Issue Type: Improvement
>Reporter: Tomás Fernández Löbbe
>Assignee: Erick Erickson
> Attachments: SOLR-6216.patch, SOLR-6216.patch, SOLR-6216.patch, 
> SOLR-6216.patch, SOLR-6216.patch, SOLR-6216.patch, SOLR-6216.patch, 
> SOLR-6216.patch, SOLR-6216.patch, SOLR-6216.patch
>
>
> There are two ways to have faceting on values ranges in Solr right now: 
> “Range Faceting” and “Query Faceting” (doing range queries). They both end up 
> doing something similar:
> {code:java}
> searcher.numDocs(rangeQ , docs)
> {code}
> The good thing about this implementation is that it can benefit from caching. 
> The bad thing is that it may be slow with cold caches, and that there will be 
> a query for each of the ranges.
> A different implementation would be one that works similar to regular field 
> faceting, using doc values and validating ranges for each value of the 
> matching documents. This implementation would sometimes be faster than Range 
> Faceting / Query Faceting, specially on cases where caches are not very 
> effective, like on a high update rate, or where ranges change frequently.
> Functionally, the result should be exactly the same as the one obtained by 
> doing a facet query for every interval



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5844) ArrayUtil.grow should not pretend you can actually allocate array[Integer.MAX_VALUE]

2014-07-23 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14072242#comment-14072242
 ] 

ASF subversion and git services commented on LUCENE-5844:
-

Commit 1612936 from [~mikemccand] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1612936 ]

LUCENE-5844: ArrayUtil.grow/oversize now returns at most Integer.MAX_VALUE - 8

> ArrayUtil.grow should not pretend you can actually allocate 
> array[Integer.MAX_VALUE]
> 
>
> Key: LUCENE-5844
> URL: https://issues.apache.org/jira/browse/LUCENE-5844
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/other
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 5.0, 4.10
>
> Attachments: LUCENE-5844.patch, LUCENE-5844.patch
>
>
> Today if the growth it wants would exceed Integer.MAX_VALUE, it returns 
> Integer.MAX_VALUE, but you can't actually allocate arrays this large; the 
> actual limit is JVM dependent and varies across JVMs ...
> It would be nice if we could somehow "introspect" the JVM to find out what 
> its  actual limit is and use that.  
> http://stackoverflow.com/questions/3038392/do-java-arrays-have-a-maximum-size 
> seems to imply that using Integer.MAX_VALUE - 8 may be "safe" (it's what 
> ArrayList.java apparently uses).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-5844) ArrayUtil.grow should not pretend you can actually allocate array[Integer.MAX_VALUE]

2014-07-23 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-5844.


Resolution: Fixed

> ArrayUtil.grow should not pretend you can actually allocate 
> array[Integer.MAX_VALUE]
> 
>
> Key: LUCENE-5844
> URL: https://issues.apache.org/jira/browse/LUCENE-5844
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/other
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 5.0, 4.10
>
> Attachments: LUCENE-5844.patch, LUCENE-5844.patch
>
>
> Today if the growth it wants would exceed Integer.MAX_VALUE, it returns 
> Integer.MAX_VALUE, but you can't actually allocate arrays this large; the 
> actual limit is JVM dependent and varies across JVMs ...
> It would be nice if we could somehow "introspect" the JVM to find out what 
> its  actual limit is and use that.  
> http://stackoverflow.com/questions/3038392/do-java-arrays-have-a-maximum-size 
> seems to imply that using Integer.MAX_VALUE - 8 may be "safe" (it's what 
> ArrayList.java apparently uses).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5844) ArrayUtil.grow should not pretend you can actually allocate array[Integer.MAX_VALUE]

2014-07-23 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14072238#comment-14072238
 ] 

ASF subversion and git services commented on LUCENE-5844:
-

Commit 1612935 from [~mikemccand] in branch 'dev/trunk'
[ https://svn.apache.org/r1612935 ]

LUCENE-5844: ArrayUtil.grow/oversize now returns at most Integer.MAX_VALUE - 8

> ArrayUtil.grow should not pretend you can actually allocate 
> array[Integer.MAX_VALUE]
> 
>
> Key: LUCENE-5844
> URL: https://issues.apache.org/jira/browse/LUCENE-5844
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/other
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 5.0, 4.10
>
> Attachments: LUCENE-5844.patch, LUCENE-5844.patch
>
>
> Today if the growth it wants would exceed Integer.MAX_VALUE, it returns 
> Integer.MAX_VALUE, but you can't actually allocate arrays this large; the 
> actual limit is JVM dependent and varies across JVMs ...
> It would be nice if we could somehow "introspect" the JVM to find out what 
> its  actual limit is and use that.  
> http://stackoverflow.com/questions/3038392/do-java-arrays-have-a-maximum-size 
> seems to imply that using Integer.MAX_VALUE - 8 may be "safe" (it's what 
> ArrayList.java apparently uses).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5844) ArrayUtil.grow should not pretend you can actually allocate array[Integer.MAX_VALUE]

2014-07-23 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14072196#comment-14072196
 ] 

Robert Muir commented on LUCENE-5844:
-

+1

> ArrayUtil.grow should not pretend you can actually allocate 
> array[Integer.MAX_VALUE]
> 
>
> Key: LUCENE-5844
> URL: https://issues.apache.org/jira/browse/LUCENE-5844
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/other
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 5.0, 4.10
>
> Attachments: LUCENE-5844.patch, LUCENE-5844.patch
>
>
> Today if the growth it wants would exceed Integer.MAX_VALUE, it returns 
> Integer.MAX_VALUE, but you can't actually allocate arrays this large; the 
> actual limit is JVM dependent and varies across JVMs ...
> It would be nice if we could somehow "introspect" the JVM to find out what 
> its  actual limit is and use that.  
> http://stackoverflow.com/questions/3038392/do-java-arrays-have-a-maximum-size 
> seems to imply that using Integer.MAX_VALUE - 8 may be "safe" (it's what 
> ArrayList.java apparently uses).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5845) CompressingStoredFieldsWriter on too-big document

2014-07-23 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-5845:


Attachment: LUCENE-5845_test.patch

simple test.

it also includes mike's patch. Currently if you get anywhere close you will 
exceed the VM limit for array size...

> CompressingStoredFieldsWriter on too-big document
> -
>
> Key: LUCENE-5845
> URL: https://issues.apache.org/jira/browse/LUCENE-5845
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Robert Muir
> Attachments: LUCENE-5845_test.patch
>
>
> This guy has a documented limit of 2^31-2^14
> But it becomes possible (with LUCENE-5844) to add a document that exceeds 
> this... we shouldn't give AIOOBE but something more clear than this:
> {noformat}
>   [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestDemo 
> -Dtests.method=testMassiveDoc -Dtests.seed=8306F98D2E2B9750 -Dtests.locale=pl 
> -Dtests.timezone=America/Jamaica -Dtests.file.encoding=ISO-8859-1
>[junit4] ERROR   5.76s | TestDemo.testMassiveDoc <<<
>[junit4]> Throwable #1: java.lang.ArrayIndexOutOfBoundsException
>[junit4]>  at 
> __randomizedtesting.SeedInfo.seed([8306F98D2E2B9750:20FE488BE80074B9]:0)
>[junit4]>  at 
> java.io.BufferedOutputStream.write(BufferedOutputStream.java:128)
>[junit4]>  at 
> org.apache.lucene.store.OutputStreamIndexOutput.writeBytes(OutputStreamIndexOutput.java:51)
>[junit4]>  at 
> org.apache.lucene.store.MockIndexOutputWrapper.writeBytes(MockIndexOutputWrapper.java:125)
>[junit4]>  at 
> org.apache.lucene.codecs.compressing.LZ4.encodeLiterals(LZ4.java:157)
>[junit4]>  at 
> org.apache.lucene.codecs.compressing.LZ4.encodeLastLiterals(LZ4.java:162)
>[junit4]>  at 
> org.apache.lucene.codecs.compressing.LZ4.compress(LZ4.java:252)
>[junit4]>  at 
> org.apache.lucene.codecs.compressing.CompressionMode$LZ4FastCompressor.compress(CompressionMode.java:161)
>[junit4]>  at 
> org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.flush(CompressingStoredFieldsWriter.java:233)
>[junit4]>  at 
> org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.finishDocument(CompressingStoredFieldsWriter.java:166)
>[junit4]>  at 
> org.apache.lucene.index.DefaultIndexingChain.finishStoredFields(DefaultIndexingChain.java:269)
>[junit4]>  at 
> org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:363)
>[junit4]>  at 
> org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:222)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-5845) CompressingStoredFieldsWriter on too-big document

2014-07-23 Thread Robert Muir (JIRA)
Robert Muir created LUCENE-5845:
---

 Summary: CompressingStoredFieldsWriter on too-big document
 Key: LUCENE-5845
 URL: https://issues.apache.org/jira/browse/LUCENE-5845
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: LUCENE-5845_test.patch

This guy has a documented limit of 2^31-2^14

But it becomes possible (with LUCENE-5844) to add a document that exceeds 
this... we shouldn't give AIOOBE but something more clear than this:
{noformat}
  [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestDemo 
-Dtests.method=testMassiveDoc -Dtests.seed=8306F98D2E2B9750 -Dtests.locale=pl 
-Dtests.timezone=America/Jamaica -Dtests.file.encoding=ISO-8859-1
   [junit4] ERROR   5.76s | TestDemo.testMassiveDoc <<<
   [junit4]> Throwable #1: java.lang.ArrayIndexOutOfBoundsException
   [junit4]>at 
__randomizedtesting.SeedInfo.seed([8306F98D2E2B9750:20FE488BE80074B9]:0)
   [junit4]>at 
java.io.BufferedOutputStream.write(BufferedOutputStream.java:128)
   [junit4]>at 
org.apache.lucene.store.OutputStreamIndexOutput.writeBytes(OutputStreamIndexOutput.java:51)
   [junit4]>at 
org.apache.lucene.store.MockIndexOutputWrapper.writeBytes(MockIndexOutputWrapper.java:125)
   [junit4]>at 
org.apache.lucene.codecs.compressing.LZ4.encodeLiterals(LZ4.java:157)
   [junit4]>at 
org.apache.lucene.codecs.compressing.LZ4.encodeLastLiterals(LZ4.java:162)
   [junit4]>at 
org.apache.lucene.codecs.compressing.LZ4.compress(LZ4.java:252)
   [junit4]>at 
org.apache.lucene.codecs.compressing.CompressionMode$LZ4FastCompressor.compress(CompressionMode.java:161)
   [junit4]>at 
org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.flush(CompressingStoredFieldsWriter.java:233)
   [junit4]>at 
org.apache.lucene.codecs.compressing.CompressingStoredFieldsWriter.finishDocument(CompressingStoredFieldsWriter.java:166)
   [junit4]>at 
org.apache.lucene.index.DefaultIndexingChain.finishStoredFields(DefaultIndexingChain.java:269)
   [junit4]>at 
org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:363)
   [junit4]>at 
org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:222)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5844) ArrayUtil.grow should not pretend you can actually allocate array[Integer.MAX_VALUE]

2014-07-23 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-5844:
---

Attachment: LUCENE-5844.patch

New patch, just simplifying / fixing confusing comment in PriorityQueue.java...

> ArrayUtil.grow should not pretend you can actually allocate 
> array[Integer.MAX_VALUE]
> 
>
> Key: LUCENE-5844
> URL: https://issues.apache.org/jira/browse/LUCENE-5844
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/other
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 5.0, 4.10
>
> Attachments: LUCENE-5844.patch, LUCENE-5844.patch
>
>
> Today if the growth it wants would exceed Integer.MAX_VALUE, it returns 
> Integer.MAX_VALUE, but you can't actually allocate arrays this large; the 
> actual limit is JVM dependent and varies across JVMs ...
> It would be nice if we could somehow "introspect" the JVM to find out what 
> its  actual limit is and use that.  
> http://stackoverflow.com/questions/3038392/do-java-arrays-have-a-maximum-size 
> seems to imply that using Integer.MAX_VALUE - 8 may be "safe" (it's what 
> ArrayList.java apparently uses).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5844) ArrayUtil.grow should not pretend you can actually allocate array[Integer.MAX_VALUE]

2014-07-23 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-5844:
---

Attachment: LUCENE-5844.patch

Simple patch, I used Integer.MAX_VALUE-8, and added a couple tests.

> ArrayUtil.grow should not pretend you can actually allocate 
> array[Integer.MAX_VALUE]
> 
>
> Key: LUCENE-5844
> URL: https://issues.apache.org/jira/browse/LUCENE-5844
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/other
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 5.0, 4.10
>
> Attachments: LUCENE-5844.patch
>
>
> Today if the growth it wants would exceed Integer.MAX_VALUE, it returns 
> Integer.MAX_VALUE, but you can't actually allocate arrays this large; the 
> actual limit is JVM dependent and varies across JVMs ...
> It would be nice if we could somehow "introspect" the JVM to find out what 
> its  actual limit is and use that.  
> http://stackoverflow.com/questions/3038392/do-java-arrays-have-a-maximum-size 
> seems to imply that using Integer.MAX_VALUE - 8 may be "safe" (it's what 
> ArrayList.java apparently uses).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5844) ArrayUtil.grow should not pretend you can actually allocate array[Integer.MAX_VALUE]

2014-07-23 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14072099#comment-14072099
 ] 

Robert Muir commented on LUCENE-5844:
-

+1, i hit this today when trying to corrupt my stored fields: I can't make a 
document with 2^31 - 2^14 + 1 bytes, because it tries to overallocate to a 
bigger array that java cant actually make. so today grow() limts you to 
Integer.MAX_VALUE - (Integer.MAX_VALUE/8) or something in practice.

> ArrayUtil.grow should not pretend you can actually allocate 
> array[Integer.MAX_VALUE]
> 
>
> Key: LUCENE-5844
> URL: https://issues.apache.org/jira/browse/LUCENE-5844
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/other
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 5.0, 4.10
>
>
> Today if the growth it wants would exceed Integer.MAX_VALUE, it returns 
> Integer.MAX_VALUE, but you can't actually allocate arrays this large; the 
> actual limit is JVM dependent and varies across JVMs ...
> It would be nice if we could somehow "introspect" the JVM to find out what 
> its  actual limit is and use that.  
> http://stackoverflow.com/questions/3038392/do-java-arrays-have-a-maximum-size 
> seems to imply that using Integer.MAX_VALUE - 8 may be "safe" (it's what 
> ArrayList.java apparently uses).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-5844) ArrayUtil.grow should not pretend you can actually allocate array[Integer.MAX_VALUE]

2014-07-23 Thread Michael McCandless (JIRA)
Michael McCandless created LUCENE-5844:
--

 Summary: ArrayUtil.grow should not pretend you can actually 
allocate array[Integer.MAX_VALUE]
 Key: LUCENE-5844
 URL: https://issues.apache.org/jira/browse/LUCENE-5844
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/other
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 5.0, 4.10


Today if the growth it wants would exceed Integer.MAX_VALUE, it returns 
Integer.MAX_VALUE, but you can't actually allocate arrays this large; the 
actual limit is JVM dependent and varies across JVMs ...

It would be nice if we could somehow "introspect" the JVM to find out what its  
actual limit is and use that.  
http://stackoverflow.com/questions/3038392/do-java-arrays-have-a-maximum-size 
seems to imply that using Integer.MAX_VALUE - 8 may be "safe" (it's what 
ArrayList.java apparently uses).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6232) Allow cores that have failed to init to be deleted via CoreAdminHandler

2014-07-23 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14072038#comment-14072038
 ] 

ASF subversion and git services commented on SOLR-6232:
---

Commit 1612901 from hoss...@apache.org in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1612901 ]

SOLR-6232: fix logging of core init failures (merge r1612896)

> Allow cores that have failed to init to be deleted via CoreAdminHandler
> ---
>
> Key: SOLR-6232
> URL: https://issues.apache.org/jira/browse/SOLR-6232
> Project: Solr
>  Issue Type: Improvement
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Minor
> Fix For: 4.10
>
> Attachments: SOLR-6232.patch
>
>
> If a core fails to init due to index corruption or something similar, it 
> can't currently be removed with an UNLOAD command, you have to go do it 
> manually.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-6265) core errors on startup are not showing up in the log until attempts to use the core?

2014-07-23 Thread Hoss Man (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man resolved SOLR-6265.


   Resolution: Fixed
Fix Version/s: 5.0

bq. Hmmm, that shows an error, it's just not a horribly informative one, 
certainly not the full stack trace...

yeah ... we just need to log the exception object, not just the message.

I'm resolving as a part of SOLR-6232 -- fix committed under the banner of that 
issue since it hasn't been released yet.

> core errors on startup are not showing up in the log until attempts to use 
> the core?
> 
>
> Key: SOLR-6265
> URL: https://issues.apache.org/jira/browse/SOLR-6265
> Project: Solr
>  Issue Type: Bug
>Reporter: Hoss Man
>Priority: Blocker
> Fix For: 5.0, 4.10
>
>
> As of r1612418, both the 4x and trunk svn trees seem to have a bug where any 
> core specific init errors that occur on startup don't show up in the log 
> until/unless someone attempts to access that core via HTTP.
> i'm not sure when exactly this bug was introduced, but it definitely isn't in 
> 4.9.
> The impact on users, particularly new users, is that starting up solr with a 
> mistake in your configs appears to work fine until you actually try to use 
> solr and then you get ugly errors.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6232) Allow cores that have failed to init to be deleted via CoreAdminHandler

2014-07-23 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14072027#comment-14072027
 ] 

ASF subversion and git services commented on SOLR-6232:
---

Commit 1612896 from hoss...@apache.org in branch 'dev/trunk'
[ https://svn.apache.org/r1612896 ]

SOLR-6232: fix logging of core init failures

> Allow cores that have failed to init to be deleted via CoreAdminHandler
> ---
>
> Key: SOLR-6232
> URL: https://issues.apache.org/jira/browse/SOLR-6232
> Project: Solr
>  Issue Type: Improvement
>Reporter: Alan Woodward
>Assignee: Alan Woodward
>Priority: Minor
> Fix For: 4.10
>
> Attachments: SOLR-6232.patch
>
>
> If a core fails to init due to index corruption or something similar, it 
> can't currently be removed with an UNLOAD command, you have to go do it 
> manually.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6216) Better faceting for multiple intervals on DV fields

2014-07-23 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14072017#comment-14072017
 ] 

ASF subversion and git services commented on SOLR-6216:
---

Commit 1612889 from [~erickoerickson] in branch 'dev/trunk'
[ https://svn.apache.org/r1612889 ]

SOLR-6216: Better faceting for multiple intervals on DV fields. Thanks Tomas

> Better faceting for multiple intervals on DV fields
> ---
>
> Key: SOLR-6216
> URL: https://issues.apache.org/jira/browse/SOLR-6216
> Project: Solr
>  Issue Type: Improvement
>Reporter: Tomás Fernández Löbbe
>Assignee: Erick Erickson
> Attachments: SOLR-6216.patch, SOLR-6216.patch, SOLR-6216.patch, 
> SOLR-6216.patch, SOLR-6216.patch, SOLR-6216.patch, SOLR-6216.patch, 
> SOLR-6216.patch, SOLR-6216.patch, SOLR-6216.patch
>
>
> There are two ways to have faceting on values ranges in Solr right now: 
> “Range Faceting” and “Query Faceting” (doing range queries). They both end up 
> doing something similar:
> {code:java}
> searcher.numDocs(rangeQ , docs)
> {code}
> The good thing about this implementation is that it can benefit from caching. 
> The bad thing is that it may be slow with cold caches, and that there will be 
> a query for each of the ranges.
> A different implementation would be one that works similar to regular field 
> faceting, using doc values and validating ranges for each value of the 
> matching documents. This implementation would sometimes be faster than Range 
> Faceting / Query Faceting, specially on cases where caches are not very 
> effective, like on a high update rate, or where ranges change frequently.
> Functionally, the result should be exactly the same as the one obtained by 
> doing a facet query for every interval



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5111) Change SpellCheckComponent default analyzer when queryAnalyzerFieldType is not defined

2014-07-23 Thread James Dyer (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14072006#comment-14072006
 ] 

James Dyer commented on SOLR-5111:
--

It makes sense to me that we should do this.  The only caution is you could 
break someone's config in the case they were depending on WhitespaceAnalyzer.  
I can't imagine this is what anyone would want, but you can never underestimate 
Users.

> Change SpellCheckComponent default analyzer when queryAnalyzerFieldType is 
> not defined
> --
>
> Key: SOLR-5111
> URL: https://issues.apache.org/jira/browse/SOLR-5111
> Project: Solr
>  Issue Type: Improvement
>Reporter: Varun Thacker
>Priority: Minor
> Fix For: 4.9, 5.0
>
> Attachments: SOLR-5111.patch
>
>
> In the collection1 example, the SpellCheckComponent uses the query analyzer 
> of "text_general" FieldType. If "queryAnalyzerFieldType" is removed from the 
> configuration a WhitespaceAnalyzer is used by default.
> I suggest we could change the default to SimpleAnalyzer so that "foo" and 
> "Foo" gives the same results and log that the analyzer is missing.
> Also are there more places in solrconfig which have dependencies on schema 
> like this?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5835) Add sortMissingLast support to TermValComparator

2014-07-23 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14071998#comment-14071998
 ] 

ASF subversion and git services commented on LUCENE-5835:
-

Commit 1612882 from [~jpountz] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1612882 ]

LUCENE-5835: Make TermValComparator extendable.

> Add sortMissingLast support to TermValComparator
> 
>
> Key: LUCENE-5835
> URL: https://issues.apache.org/jira/browse/LUCENE-5835
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Assignee: Adrien Grand
>Priority: Minor
> Fix For: 5.0, 4.10
>
> Attachments: LUCENE-5835.patch
>
>
> It would be nice to allow to configure the behavior on missing values for 
> this comparator, similarly to what TermOrdValComparator does.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5841) Remove FST.Builder.FreezeTail interface

2014-07-23 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14072000#comment-14072000
 ] 

Robert Muir commented on LUCENE-5841:
-

Nice results. I see this tail-freezing as a hotspot frequently.

> Remove FST.Builder.FreezeTail interface
> ---
>
> Key: LUCENE-5841
> URL: https://issues.apache.org/jira/browse/LUCENE-5841
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 5.0, 4.10
>
> Attachments: LUCENE-5841.patch, LUCENE-5841.patch
>
>
> The FST Builder has a crazy-hairy interface called FreezeTail, which is only
> used by BlockTreeTermsWriter to find appropriate prefixes
> (i.e. containing enough terms or sub-blocks) to write term blocks.
> But this is really a silly abuse ... it's cleaner and likely
> faster/less GC for BTTW to compute this itself just by tracking the
> term ordinal where each prefix started in the pending terms/blocks.  The
> code is also insanely hairy, and this is at least a baby step to try
> to make it a bit simpler.
> This also makes it very hard to experiment with different formats at
> write-time because you have to get your new formats working through
> this strange FreezeTail.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5835) Add sortMissingLast support to TermValComparator

2014-07-23 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14071997#comment-14071997
 ] 

ASF subversion and git services commented on LUCENE-5835:
-

Commit 1612881 from [~jpountz] in branch 'dev/trunk'
[ https://svn.apache.org/r1612881 ]

LUCENE-5835: Make TermValComparator extendable.

> Add sortMissingLast support to TermValComparator
> 
>
> Key: LUCENE-5835
> URL: https://issues.apache.org/jira/browse/LUCENE-5835
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Assignee: Adrien Grand
>Priority: Minor
> Fix For: 5.0, 4.10
>
> Attachments: LUCENE-5835.patch
>
>
> It would be nice to allow to configure the behavior on missing values for 
> this comparator, similarly to what TermOrdValComparator does.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5843) IndexWriter should refuse to create an index with more than INT_MAX docs

2014-07-23 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14071993#comment-14071993
 ] 

Robert Muir commented on LUCENE-5843:
-

In my opinion the way to go is to have a package-private limit (for test 
purposes) that defaults to Integer.MAX_VALUE.

this way we can actually test the thing with values like... 5

Its more than just checking either at addDocument (ideal) or flush (not great 
but as you say, better), we also have to handle cases like addIndexes(Dir) and 
addIndexes(Reader).

> IndexWriter should refuse to create an index with more than INT_MAX docs
> 
>
> Key: LUCENE-5843
> URL: https://issues.apache.org/jira/browse/LUCENE-5843
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/index
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 5.0, 4.10
>
>
> It's more and more common for users these days to create very large indices, 
> e.g.  indexing lines from log files, or packets on a network, etc., and it's 
> not hard to accidentally exceed the maximum number of documents in one index.
> I think the limit is actually Integer.MAX_VALUE-1 docs, because we use that 
> value as a sentinel during searching.
> I'm not sure what IW does today if you create a too-big index but it's 
> probably horrible; it may succeed and then at search time you hit nasty 
> exceptions when we overflow int.
> I think it should throw an IndexFullException instead.  It'd be nice if we 
> could do this on the very doc that when added would go over the limit, but I 
> would also settle for just throwing at flush as well ... i.e. I think what's 
> really important is that the index does not become unusable.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6261) Run checkIfIamLeader in a separate thread

2014-07-23 Thread Ramkumar Aiyengar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14071979#comment-14071979
 ] 

Ramkumar Aiyengar commented on SOLR-6261:
-

Updated for Option (1), tests are still running though..

> Run checkIfIamLeader in a separate thread
> -
>
> Key: SOLR-6261
> URL: https://issues.apache.org/jira/browse/SOLR-6261
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Affects Versions: 4.9
>Reporter: Ramkumar Aiyengar
>Assignee: Mark Miller
>Priority: Minor
>
> Currently checking for leadership (due to the leader's ephemeral node going 
> away) happens in ZK's event thread. If there are many cores and all of them 
> are due leadership, then they would have to serially go through the two-way 
> sync and leadership takeover.
> For tens of cores, this could mean 30-40s without leadership before the last 
> in the list even gets to start the leadership process. If the leadership 
> process happens in a separate thread, then the cores could all take over in 
> parallel.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5843) IndexWriter should refuse to create an index with more than INT_MAX docs

2014-07-23 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14071978#comment-14071978
 ] 

Hoss Man commented on LUCENE-5843:
--

bq. I'm not sure what IW does today if you create a too-big index but it's 
probably horrible; it may succeed and then at search time you hit nasty 
exceptions when we overflow int.

that is exactly what happens -- see SOLR-6065 for context

> IndexWriter should refuse to create an index with more than INT_MAX docs
> 
>
> Key: LUCENE-5843
> URL: https://issues.apache.org/jira/browse/LUCENE-5843
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/index
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 5.0, 4.10
>
>
> It's more and more common for users these days to create very large indices, 
> e.g.  indexing lines from log files, or packets on a network, etc., and it's 
> not hard to accidentally exceed the maximum number of documents in one index.
> I think the limit is actually Integer.MAX_VALUE-1 docs, because we use that 
> value as a sentinel during searching.
> I'm not sure what IW does today if you create a too-big index but it's 
> probably horrible; it may succeed and then at search time you hit nasty 
> exceptions when we overflow int.
> I think it should throw an IndexFullException instead.  It'd be nice if we 
> could do this on the very doc that when added would go over the limit, but I 
> would also settle for just throwing at flush as well ... i.e. I think what's 
> really important is that the index does not become unusable.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-5843) IndexWriter should refuse to create an index with more than INT_MAX docs

2014-07-23 Thread Michael McCandless (JIRA)
Michael McCandless created LUCENE-5843:
--

 Summary: IndexWriter should refuse to create an index with more 
than INT_MAX docs
 Key: LUCENE-5843
 URL: https://issues.apache.org/jira/browse/LUCENE-5843
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/index
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 5.0, 4.10


It's more and more common for users these days to create very large indices, 
e.g.  indexing lines from log files, or packets on a network, etc., and it's 
not hard to accidentally exceed the maximum number of documents in one index.

I think the limit is actually Integer.MAX_VALUE-1 docs, because we use that 
value as a sentinel during searching.

I'm not sure what IW does today if you create a too-big index but it's probably 
horrible; it may succeed and then at search time you hit nasty exceptions when 
we overflow int.

I think it should throw an IndexFullException instead.  It'd be nice if we 
could do this on the very doc that when added would go over the limit, but I 
would also settle for just throwing at flush as well ... i.e. I think what's 
really important is that the index does not become unusable.




--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6261) Run checkIfIamLeader in a separate thread

2014-07-23 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14071946#comment-14071946
 ] 

Mark Miller commented on SOLR-6261:
---

I think it's worth considering for sure, but weighing both sides, I think 
enforcing it for all is probably just a really overall beneficial change in 
this case. Getting out of the way of the notification thread without going out 
of your way is great.

> Run checkIfIamLeader in a separate thread
> -
>
> Key: SOLR-6261
> URL: https://issues.apache.org/jira/browse/SOLR-6261
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Affects Versions: 4.9
>Reporter: Ramkumar Aiyengar
>Assignee: Mark Miller
>Priority: Minor
>
> Currently checking for leadership (due to the leader's ephemeral node going 
> away) happens in ZK's event thread. If there are many cores and all of them 
> are due leadership, then they would have to serially go through the two-way 
> sync and leadership takeover.
> For tens of cores, this could mean 30-40s without leadership before the last 
> in the list even gets to start the leadership process. If the leadership 
> process happens in a separate thread, then the cores could all take over in 
> parallel.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-6261) Run checkIfIamLeader in a separate thread

2014-07-23 Thread Ramkumar Aiyengar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14071933#comment-14071933
 ] 

Ramkumar Aiyengar edited comment on SOLR-6261 at 7/23/14 4:45 PM:
--

I agree (1) is ideal, and I was just being paranoid since I am not that 
well-versed in how this class is used outside Solr. I am happy to stick with 
your judgement in this case..


was (Author: andyetitmoves):
I agree (1) is ideal, and I guess I was just being paranoid since I am not that 
well-versed in how this class is used outside Solr. I am happy to stick to your 
judgement in this case..

> Run checkIfIamLeader in a separate thread
> -
>
> Key: SOLR-6261
> URL: https://issues.apache.org/jira/browse/SOLR-6261
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Affects Versions: 4.9
>Reporter: Ramkumar Aiyengar
>Assignee: Mark Miller
>Priority: Minor
>
> Currently checking for leadership (due to the leader's ephemeral node going 
> away) happens in ZK's event thread. If there are many cores and all of them 
> are due leadership, then they would have to serially go through the two-way 
> sync and leadership takeover.
> For tens of cores, this could mean 30-40s without leadership before the last 
> in the list even gets to start the leadership process. If the leadership 
> process happens in a separate thread, then the cores could all take over in 
> parallel.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6261) Run checkIfIamLeader in a separate thread

2014-07-23 Thread Ramkumar Aiyengar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14071933#comment-14071933
 ] 

Ramkumar Aiyengar commented on SOLR-6261:
-

I agree (1) is ideal, and I guess I was just being paranoid since I am not that 
well-versed in how this class is used outside Solr. I am happy to stick to your 
judgement in this case..

> Run checkIfIamLeader in a separate thread
> -
>
> Key: SOLR-6261
> URL: https://issues.apache.org/jira/browse/SOLR-6261
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Affects Versions: 4.9
>Reporter: Ramkumar Aiyengar
>Assignee: Mark Miller
>Priority: Minor
>
> Currently checking for leadership (due to the leader's ephemeral node going 
> away) happens in ZK's event thread. If there are many cores and all of them 
> are due leadership, then they would have to serially go through the two-way 
> sync and leadership takeover.
> For tens of cores, this could mean 30-40s without leadership before the last 
> in the list even gets to start the leadership process. If the leadership 
> process happens in a separate thread, then the cores could all take over in 
> parallel.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5841) Remove FST.Builder.FreezeTail interface

2014-07-23 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-5841:
---

Attachment: LUCENE-5841.patch

New patch, I just changed PendingTerm class to use byte[] not BytesRef to hold 
the term to save some silly garbage.  I think it's ready.

Also I ran a "merge intensive" perf test from Rob, first building a geonames 
index with lots of segments (using NoMergePolicy), and then using 
SerialMergeScheduler measuring how long forceMerge(1) takes, and the patch 
makes this a bit faster: from ~95 seconds for trunk to ~87 seconds with this 
change, or ~8% faster.

> Remove FST.Builder.FreezeTail interface
> ---
>
> Key: LUCENE-5841
> URL: https://issues.apache.org/jira/browse/LUCENE-5841
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 5.0, 4.10
>
> Attachments: LUCENE-5841.patch, LUCENE-5841.patch
>
>
> The FST Builder has a crazy-hairy interface called FreezeTail, which is only
> used by BlockTreeTermsWriter to find appropriate prefixes
> (i.e. containing enough terms or sub-blocks) to write term blocks.
> But this is really a silly abuse ... it's cleaner and likely
> faster/less GC for BTTW to compute this itself just by tracking the
> term ordinal where each prefix started in the pending terms/blocks.  The
> code is also insanely hairy, and this is at least a baby step to try
> to make it a bit simpler.
> This also makes it very hard to experiment with different formats at
> write-time because you have to get your new formats working through
> this strange FreezeTail.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-NightlyTests-4.x - Build # 583 - Failure

2014-07-23 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-NightlyTests-4.x/583/

1 tests failed.
REGRESSION:  org.apache.lucene.analysis.core.TestRandomChains.testRandomChains

Error Message:
startOffset must be non-negative, and endOffset must be >= startOffset, 
startOffset=11,endOffset=9

Stack Trace:
java.lang.IllegalArgumentException: startOffset must be non-negative, and 
endOffset must be >= startOffset, startOffset=11,endOffset=9
at 
__randomizedtesting.SeedInfo.seed([E58A02D1D3CDCA0A:D86B2BB094DFD7CA]:0)
at 
org.apache.lucene.analysis.tokenattributes.PackedTokenAttributeImpl.setOffset(PackedTokenAttributeImpl.java:107)
at 
org.apache.lucene.analysis.shingle.ShingleFilter.incrementToken(ShingleFilter.java:345)
at 
org.apache.lucene.analysis.ValidatingTokenFilter.incrementToken(ValidatingTokenFilter.java:68)
at 
org.apache.lucene.analysis.synonym.SynonymFilter.parse(SynonymFilter.java:358)
at 
org.apache.lucene.analysis.synonym.SynonymFilter.incrementToken(SynonymFilter.java:624)
at 
org.apache.lucene.analysis.ValidatingTokenFilter.incrementToken(ValidatingTokenFilter.java:68)
at 
org.apache.lucene.analysis.BaseTokenStreamTestCase.checkAnalysisConsistency(BaseTokenStreamTestCase.java:703)
at 
org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:614)
at 
org.apache.lucene.analysis.BaseTokenStreamTestCase.checkRandomData(BaseTokenStreamTestCase.java:513)
at 
org.apache.lucene.analysis.core.TestRandomChains.testRandomChains(TestRandomChains.java:927)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1618)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:827)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:877)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:798)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:458)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:836)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:738)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:772)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:783)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43)
 

[jira] [Commented] (SOLR-5111) Change SpellCheckComponent default analyzer when queryAnalyzerFieldType is not defined

2014-07-23 Thread Varun Thacker (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5111?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14071905#comment-14071905
 ] 

Varun Thacker commented on SOLR-5111:
-

[~jdyer] - Any thoughts on this one?

> Change SpellCheckComponent default analyzer when queryAnalyzerFieldType is 
> not defined
> --
>
> Key: SOLR-5111
> URL: https://issues.apache.org/jira/browse/SOLR-5111
> Project: Solr
>  Issue Type: Improvement
>Reporter: Varun Thacker
>Priority: Minor
> Fix For: 4.9, 5.0
>
> Attachments: SOLR-5111.patch
>
>
> In the collection1 example, the SpellCheckComponent uses the query analyzer 
> of "text_general" FieldType. If "queryAnalyzerFieldType" is removed from the 
> configuration a WhitespaceAnalyzer is used by default.
> I suggest we could change the default to SimpleAnalyzer so that "foo" and 
> "Foo" gives the same results and log that the analyzer is missing.
> Also are there more places in solrconfig which have dependencies on schema 
> like this?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-5842) Validate checksum footers for postings lists, docvalues, storedfields, termvectors on init

2014-07-23 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-5842.
-

   Resolution: Fixed
Fix Version/s: 4.10
   5.0

> Validate checksum footers for postings lists, docvalues, storedfields, 
> termvectors on init
> --
>
> Key: LUCENE-5842
> URL: https://issues.apache.org/jira/browse/LUCENE-5842
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Robert Muir
> Fix For: 5.0, 4.10
>
> Attachments: LUCENE-5842.patch, LUCENE-5842.patch
>
>
> For small files (e.g. where we read in all the bytes anyway), we currently 
> validate the checksum on reader init. 
> But for larger files like .doc/.frq/.pos/.dvd/.fdt/.tvd we currently do 
> nothing at all on init, as it would be too expensive.
> We should at least do this:
> {code}
> // NOTE: data file is too costly to verify checksum against all the bytes on 
> // open, but for now we at least verify proper structure of the checksum 
> // footer: which looks for FOOTER_MAGIC + algorithmID. This is cheap 
> // and can detect some forms of corruption such as file truncation.
> CodecUtil.retrieveChecksum(data);
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5842) Validate checksum footers for postings lists, docvalues, storedfields, termvectors on init

2014-07-23 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14071862#comment-14071862
 ] 

ASF subversion and git services commented on LUCENE-5842:
-

Commit 1612852 from [~rcmuir] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1612852 ]

LUCENE-5842: Validate checksum footers for postings 
lists/docvalues/storedfields/vectors on init

> Validate checksum footers for postings lists, docvalues, storedfields, 
> termvectors on init
> --
>
> Key: LUCENE-5842
> URL: https://issues.apache.org/jira/browse/LUCENE-5842
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Robert Muir
> Fix For: 5.0, 4.10
>
> Attachments: LUCENE-5842.patch, LUCENE-5842.patch
>
>
> For small files (e.g. where we read in all the bytes anyway), we currently 
> validate the checksum on reader init. 
> But for larger files like .doc/.frq/.pos/.dvd/.fdt/.tvd we currently do 
> nothing at all on init, as it would be too expensive.
> We should at least do this:
> {code}
> // NOTE: data file is too costly to verify checksum against all the bytes on 
> // open, but for now we at least verify proper structure of the checksum 
> // footer: which looks for FOOTER_MAGIC + algorithmID. This is cheap 
> // and can detect some forms of corruption such as file truncation.
> CodecUtil.retrieveChecksum(data);
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6261) Run checkIfIamLeader in a separate thread

2014-07-23 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14071845#comment-14071845
 ] 

Mark Miller commented on SOLR-6261:
---

I actually kind of like option 1. What is your concern around it being in 
Solrj? I think, at this point, it's pretty unlikely anyone is counting on the 
current behavior - it's generally probably a bug. We have also already treated 
a lot of this at the cloud level as subject to change a bit because a lot of it 
is so early. Depending on the impact, we need some flexibility to get things 
right.

I guess I just don't see a lot of downside or negative impact if we choose 1.

The upside of doing 1 IMO, is that it becomes a lot harder for other/future 
devs to screw up. The default makes it hard to do.

2 is not too bad, but prone to future developers consistently choosing the 
right flag to pass to ensure our zk thread gets to always crank along.

3 is the least preferable to me.

> Run checkIfIamLeader in a separate thread
> -
>
> Key: SOLR-6261
> URL: https://issues.apache.org/jira/browse/SOLR-6261
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Affects Versions: 4.9
>Reporter: Ramkumar Aiyengar
>Assignee: Mark Miller
>Priority: Minor
>
> Currently checking for leadership (due to the leader's ephemeral node going 
> away) happens in ZK's event thread. If there are many cores and all of them 
> are due leadership, then they would have to serially go through the two-way 
> sync and leadership takeover.
> For tens of cores, this could mean 30-40s without leadership before the last 
> in the list even gets to start the leadership process. If the leadership 
> process happens in a separate thread, then the cores could all take over in 
> parallel.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS-MAVEN] Lucene-Solr-Maven-4.x #662: POMs out of sync

2014-07-23 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-Maven-4.x/662/

1 tests failed.
FAILED:  org.apache.solr.cloud.MultiThreadedOCPTest.testDistribSearch

Error Message:
Task 3002 did not complete, final state: running

Stack Trace:
java.lang.AssertionError: Task 3002 did not complete, final state: running
at 
__randomizedtesting.SeedInfo.seed([E62C8403A608BB75:67CA0A1BD157DB49]:0)
at org.junit.Assert.fail(Assert.java:93)
at org.junit.Assert.assertTrue(Assert.java:43)
at 
org.apache.solr.cloud.MultiThreadedOCPTest.testDeduplicationOfSubmittedTasks(MultiThreadedOCPTest.java:162)
at 
org.apache.solr.cloud.MultiThreadedOCPTest.doTest(MultiThreadedOCPTest.java:71)




Build Log:
[...truncated 55429 lines...]
BUILD FAILED
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-Maven-4.x/build.xml:490: 
The following error occurred while executing this line:
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-Maven-4.x/build.xml:182: 
The following error occurred while executing this line:
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-Maven-4.x/extra-targets.xml:77:
 Java returned: 1

Total time: 245 minutes 27 seconds
Build step 'Invoke Ant' marked build as failure
Recording test results
Email was triggered for: Failure
Sending email for trigger: Failure



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5842) Validate checksum footers for postings lists, docvalues, storedfields, termvectors on init

2014-07-23 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14071813#comment-14071813
 ] 

ASF subversion and git services commented on LUCENE-5842:
-

Commit 1612845 from [~rcmuir] in branch 'dev/trunk'
[ https://svn.apache.org/r1612845 ]

LUCENE-5842: Validate checksum footers for postings 
lists/docvalues/storedfields/vectors on init

> Validate checksum footers for postings lists, docvalues, storedfields, 
> termvectors on init
> --
>
> Key: LUCENE-5842
> URL: https://issues.apache.org/jira/browse/LUCENE-5842
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Robert Muir
> Attachments: LUCENE-5842.patch, LUCENE-5842.patch
>
>
> For small files (e.g. where we read in all the bytes anyway), we currently 
> validate the checksum on reader init. 
> But for larger files like .doc/.frq/.pos/.dvd/.fdt/.tvd we currently do 
> nothing at all on init, as it would be too expensive.
> We should at least do this:
> {code}
> // NOTE: data file is too costly to verify checksum against all the bytes on 
> // open, but for now we at least verify proper structure of the checksum 
> // footer: which looks for FOOTER_MAGIC + algorithmID. This is cheap 
> // and can detect some forms of corruption such as file truncation.
> CodecUtil.retrieveChecksum(data);
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6269) Change "rollback" to "error" in DIH

2014-07-23 Thread Erik Hatcher (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14071791#comment-14071791
 ] 

Erik Hatcher commented on SOLR-6269:


I think we should leave the entity level modifications to another ticket.

[~noble.paul] What do you mean that lastException should give the actual 
exception that happened?   Is that different than what's in my patch (other 
than a getter for the internal value could be added)?   Is there a different 
exception that should be captured before this handler is called?

> Change "rollback" to "error" in DIH
> ---
>
> Key: SOLR-6269
> URL: https://issues.apache.org/jira/browse/SOLR-6269
> Project: Solr
>  Issue Type: Bug
>  Components: contrib - DataImportHandler
>Affects Versions: 4.9
>Reporter: Erik Hatcher
>Assignee: Erik Hatcher
> Fix For: 5.0, 4.10
>
> Attachments: SOLR-6269.patch
>
>
> Since rollback (see SOLR-3622) is going away from DIH, at least in SolrCloud 
> mode, let's rename most things "rollback" to "error", such as the new 
> onRollback handler.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6269) Change "rollback" to "error" in DIH

2014-07-23 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14071764#comment-14071764
 ] 

Noble Paul commented on SOLR-6269:
--

Let's change this entity level onError to be an eventHandler which can be a 
java class or javascript function.

Change the signature of EventListener.onEvent(..) to return an Object

If the listener is implemented , it should return one of ABORT|CONTINUE|SKIP

As of now it is an entity level attribute and we can leave it as it is

The ctx.getLastException() should give the actual exception that happened

 

> Change "rollback" to "error" in DIH
> ---
>
> Key: SOLR-6269
> URL: https://issues.apache.org/jira/browse/SOLR-6269
> Project: Solr
>  Issue Type: Bug
>  Components: contrib - DataImportHandler
>Affects Versions: 4.9
>Reporter: Erik Hatcher
>Assignee: Erik Hatcher
> Fix For: 5.0, 4.10
>
> Attachments: SOLR-6269.patch
>
>
> Since rollback (see SOLR-3622) is going away from DIH, at least in SolrCloud 
> mode, let's rename most things "rollback" to "error", such as the new 
> onRollback handler.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6261) Run checkIfIamLeader in a separate thread

2014-07-23 Thread Ramkumar Aiyengar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14071766#comment-14071766
 ] 

Ramkumar Aiyengar commented on SOLR-6261:
-

Added tests for the leader failover case (original symptoms), and the parallel 
watching functionality. Let me know if this approach works, if so, we have 
three transition approaches:

 * Always have `SolrZkClient` use the new way (probably not a great idea, esp. 
considering this is in SolrJ)
 * Have an option per `SolrZkClient`, this will force all or most uses within 
Solr to use the new approach, but allow external uses to continue as they are
 * The way it currently is, decided on a per-watch basis

I am sort of wavering between the second and third options, opinions welcome..


> Run checkIfIamLeader in a separate thread
> -
>
> Key: SOLR-6261
> URL: https://issues.apache.org/jira/browse/SOLR-6261
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Affects Versions: 4.9
>Reporter: Ramkumar Aiyengar
>Assignee: Mark Miller
>Priority: Minor
>
> Currently checking for leadership (due to the leader's ephemeral node going 
> away) happens in ZK's event thread. If there are many cores and all of them 
> are due leadership, then they would have to serially go through the two-way 
> sync and leadership takeover.
> For tens of cores, this could mean 30-40s without leadership before the last 
> in the list even gets to start the leadership process. If the leadership 
> process happens in a separate thread, then the cores could all take over in 
> parallel.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-6269) Change "rollback" to "error" in DIH

2014-07-23 Thread Erik Hatcher (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-6269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Hatcher updated SOLR-6269:
---

Attachment: SOLR-6269.patch

attached patch for trunk.  This patch avoids calling the Solr rollback 
capability when in ZK mode (maybe this should be a tackled separately though)

> Change "rollback" to "error" in DIH
> ---
>
> Key: SOLR-6269
> URL: https://issues.apache.org/jira/browse/SOLR-6269
> Project: Solr
>  Issue Type: Bug
>  Components: contrib - DataImportHandler
>Affects Versions: 4.9
>Reporter: Erik Hatcher
>Assignee: Erik Hatcher
> Fix For: 5.0, 4.10
>
> Attachments: SOLR-6269.patch
>
>
> Since rollback (see SOLR-3622) is going away from DIH, at least in SolrCloud 
> mode, let's rename most things "rollback" to "error", such as the new 
> onRollback handler.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6248) MoreLikeThis Query Parser

2014-07-23 Thread Anshum Gupta (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14071753#comment-14071753
 ] 

Anshum Gupta commented on SOLR-6248:


I don't think this would really work across 2 collections straight out of the 
box, but yes, as long as you have 'text' to pass, that is exactly what this 
parser would take. In other words, for now, it would more or less maintain the 
same mechanism of the handler (but in a manner that makes it work under 
SolrCloud mode).

> MoreLikeThis Query Parser
> -
>
> Key: SOLR-6248
> URL: https://issues.apache.org/jira/browse/SOLR-6248
> Project: Solr
>  Issue Type: New Feature
>Reporter: Anshum Gupta
>
> MLT Component doesn't let people highlight/paginate and the handler comes 
> with an cost of maintaining another piece in the config. Also, any changes to 
> the default (number of results to be fetched etc.) /select handler need to be 
> copied/synced with this handler too.
> Having an MLT QParser would let users get back docs based on a query for them 
> to paginate, highlight etc. It would also give them the flexibility to use 
> this anywhere i.e. q,fq,bq etc.
> A bit of history about MLT (thanks to Hoss)
> MLT Handler pre-dates the existence of QParsers and was meant to take an 
> arbitrary query as input, find docs that match that 
> query, club them together to find interesting terms, and then use those 
> terms as if they were my main query to generate a main result set.
> This result would then be used as the set to facet, highlight etc.
> The flow: Query -> DocList(m) -> Bag (terms) -> Query -> DocList\(y)
> The MLT component on the other hand solved a very different purpose of 
> augmenting the main result set. It is used to get similar docs for each of 
> the doc in the main result set.
> DocSet\(n) -> n * Bag (terms) -> n * (Query) -> n * DocList(m)
> The new approach:
> All of this can be done better and cleaner (and makes more sense too) using 
> an MLT QParser.
> An important thing to handle here is the case where the user doesn't have 
> TermVectors, in which case, it does what happens right now i.e. parsing 
> stored fields.
> Also, in case the user doesn't have a field (to be used for MLT) indexed, the 
> field would need to be a TextField with an index analyzer defined. This 
> analyzer will then be used to extract terms for MLT.
> In case of SolrCloud mode, '/get-termvectors' can be used after looking at 
> the schema (if TermVectors are enabled for the field). If not, a /get call 
> can be used to fetch the field and parse it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-6269) Change "rollback" to "error" in DIH

2014-07-23 Thread Erik Hatcher (JIRA)
Erik Hatcher created SOLR-6269:
--

 Summary: Change "rollback" to "error" in DIH
 Key: SOLR-6269
 URL: https://issues.apache.org/jira/browse/SOLR-6269
 Project: Solr
  Issue Type: Bug
  Components: contrib - DataImportHandler
Affects Versions: 4.9
Reporter: Erik Hatcher
Assignee: Erik Hatcher
 Fix For: 5.0, 4.10


Since rollback (see SOLR-3622) is going away from DIH, at least in SolrCloud 
mode, let's rename most things "rollback" to "error", such as the new 
onRollback handler.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5842) Validate checksum footers for postings lists, docvalues, storedfields, termvectors on init

2014-07-23 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-5842:


Attachment: LUCENE-5842.patch

Updated patch, i missed to do the check before for the IDPostingsFormat terms 
dict in sandbox/

> Validate checksum footers for postings lists, docvalues, storedfields, 
> termvectors on init
> --
>
> Key: LUCENE-5842
> URL: https://issues.apache.org/jira/browse/LUCENE-5842
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Robert Muir
> Attachments: LUCENE-5842.patch, LUCENE-5842.patch
>
>
> For small files (e.g. where we read in all the bytes anyway), we currently 
> validate the checksum on reader init. 
> But for larger files like .doc/.frq/.pos/.dvd/.fdt/.tvd we currently do 
> nothing at all on init, as it would be too expensive.
> We should at least do this:
> {code}
> // NOTE: data file is too costly to verify checksum against all the bytes on 
> // open, but for now we at least verify proper structure of the checksum 
> // footer: which looks for FOOTER_MAGIC + algorithmID. This is cheap 
> // and can detect some forms of corruption such as file truncation.
> CodecUtil.retrieveChecksum(data);
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5842) Validate checksum footers for postings lists, docvalues, storedfields, termvectors on init

2014-07-23 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14071686#comment-14071686
 ] 

Robert Muir commented on LUCENE-5842:
-

By the way, as a followup, we can do even better and iterate a bit more:

Today each file by itself can be 'correct' but you still have a corrupt index 
because the files are mismatched somehow (network replication, or some other 
bug).

it might be worth thinking about reviving segmentinfo.attributes (thats 
cleanest i think), or put in files map directly (would be harder as it enforces 
files have checksums). We could store each files checksum there, and when we 
retrieve it here, validate against that attribute. This would detect 
mismatching. 

Ideally though we'd do this for the commit too (for deletes and dv updates). 

Anyway just something to explore on another issue if we can do it without 
creating a mess. I don't like how we cant detect such mismatching today (except 
via very rudimentary checks like livedocs.length = maxdoc etc).


> Validate checksum footers for postings lists, docvalues, storedfields, 
> termvectors on init
> --
>
> Key: LUCENE-5842
> URL: https://issues.apache.org/jira/browse/LUCENE-5842
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Robert Muir
> Attachments: LUCENE-5842.patch
>
>
> For small files (e.g. where we read in all the bytes anyway), we currently 
> validate the checksum on reader init. 
> But for larger files like .doc/.frq/.pos/.dvd/.fdt/.tvd we currently do 
> nothing at all on init, as it would be too expensive.
> We should at least do this:
> {code}
> // NOTE: data file is too costly to verify checksum against all the bytes on 
> // open, but for now we at least verify proper structure of the checksum 
> // footer: which looks for FOOTER_MAGIC + algorithmID. This is cheap 
> // and can detect some forms of corruption such as file truncation.
> CodecUtil.retrieveChecksum(data);
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5842) Validate checksum footers for postings lists, docvalues, storedfields, termvectors on init

2014-07-23 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14071682#comment-14071682
 ] 

Adrien Grand commented on LUCENE-5842:
--

+1 to the patch

> Validate checksum footers for postings lists, docvalues, storedfields, 
> termvectors on init
> --
>
> Key: LUCENE-5842
> URL: https://issues.apache.org/jira/browse/LUCENE-5842
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Robert Muir
> Attachments: LUCENE-5842.patch
>
>
> For small files (e.g. where we read in all the bytes anyway), we currently 
> validate the checksum on reader init. 
> But for larger files like .doc/.frq/.pos/.dvd/.fdt/.tvd we currently do 
> nothing at all on init, as it would be too expensive.
> We should at least do this:
> {code}
> // NOTE: data file is too costly to verify checksum against all the bytes on 
> // open, but for now we at least verify proper structure of the checksum 
> // footer: which looks for FOOTER_MAGIC + algorithmID. This is cheap 
> // and can detect some forms of corruption such as file truncation.
> CodecUtil.retrieveChecksum(data);
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5842) Validate checksum footers for postings lists, docvalues, storedfields, termvectors on init

2014-07-23 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14071680#comment-14071680
 ] 

Michael McCandless commented on LUCENE-5842:


+1

> Validate checksum footers for postings lists, docvalues, storedfields, 
> termvectors on init
> --
>
> Key: LUCENE-5842
> URL: https://issues.apache.org/jira/browse/LUCENE-5842
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Robert Muir
> Attachments: LUCENE-5842.patch
>
>
> For small files (e.g. where we read in all the bytes anyway), we currently 
> validate the checksum on reader init. 
> But for larger files like .doc/.frq/.pos/.dvd/.fdt/.tvd we currently do 
> nothing at all on init, as it would be too expensive.
> We should at least do this:
> {code}
> // NOTE: data file is too costly to verify checksum against all the bytes on 
> // open, but for now we at least verify proper structure of the checksum 
> // footer: which looks for FOOTER_MAGIC + algorithmID. This is cheap 
> // and can detect some forms of corruption such as file truncation.
> CodecUtil.retrieveChecksum(data);
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5842) Validate checksum footers for postings lists, docvalues, storedfields, termvectors on init

2014-07-23 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-5842:


Attachment: LUCENE-5842.patch

> Validate checksum footers for postings lists, docvalues, storedfields, 
> termvectors on init
> --
>
> Key: LUCENE-5842
> URL: https://issues.apache.org/jira/browse/LUCENE-5842
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Robert Muir
> Attachments: LUCENE-5842.patch
>
>
> For small files (e.g. where we read in all the bytes anyway), we currently 
> validate the checksum on reader init. 
> But for larger files like .doc/.frq/.pos/.dvd/.fdt/.tvd we currently do 
> nothing at all on init, as it would be too expensive.
> We should at least do this:
> {code}
> // NOTE: data file is too costly to verify checksum against all the bytes on 
> // open, but for now we at least verify proper structure of the checksum 
> // footer: which looks for FOOTER_MAGIC + algorithmID. This is cheap 
> // and can detect some forms of corruption such as file truncation.
> CodecUtil.retrieveChecksum(data);
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6248) MoreLikeThis Query Parser

2014-07-23 Thread Steve Molloy (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14071645#comment-14071645
 ] 

Steve Molloy commented on SOLR-6248:


I meant passing in text as parameter as opposed to finding it in the index. 
With current MLT handler (not component), you can pass it in as body or 
stream.body to get documents similar to the text you pass in. In our case, we 
use it to find documents in one collection similar to a document found in 
another, or to some text directly provided by user. So, I know that at some 
point the SearchHandler started rejecting search requests with stream body, 
which would prevent this unless it could be achieved in another way. That's why 
I'm asking. :)

> MoreLikeThis Query Parser
> -
>
> Key: SOLR-6248
> URL: https://issues.apache.org/jira/browse/SOLR-6248
> Project: Solr
>  Issue Type: New Feature
>Reporter: Anshum Gupta
>
> MLT Component doesn't let people highlight/paginate and the handler comes 
> with an cost of maintaining another piece in the config. Also, any changes to 
> the default (number of results to be fetched etc.) /select handler need to be 
> copied/synced with this handler too.
> Having an MLT QParser would let users get back docs based on a query for them 
> to paginate, highlight etc. It would also give them the flexibility to use 
> this anywhere i.e. q,fq,bq etc.
> A bit of history about MLT (thanks to Hoss)
> MLT Handler pre-dates the existence of QParsers and was meant to take an 
> arbitrary query as input, find docs that match that 
> query, club them together to find interesting terms, and then use those 
> terms as if they were my main query to generate a main result set.
> This result would then be used as the set to facet, highlight etc.
> The flow: Query -> DocList(m) -> Bag (terms) -> Query -> DocList\(y)
> The MLT component on the other hand solved a very different purpose of 
> augmenting the main result set. It is used to get similar docs for each of 
> the doc in the main result set.
> DocSet\(n) -> n * Bag (terms) -> n * (Query) -> n * DocList(m)
> The new approach:
> All of this can be done better and cleaner (and makes more sense too) using 
> an MLT QParser.
> An important thing to handle here is the case where the user doesn't have 
> TermVectors, in which case, it does what happens right now i.e. parsing 
> stored fields.
> Also, in case the user doesn't have a field (to be used for MLT) indexed, the 
> field would need to be a TextField with an index analyzer defined. This 
> analyzer will then be used to extract terms for MLT.
> In case of SolrCloud mode, '/get-termvectors' can be used after looking at 
> the schema (if TermVectors are enabled for the field). If not, a /get call 
> can be used to fetch the field and parse it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-5842) Validate checksum footers for postings lists, docvalues, storedfields, termvectors on init

2014-07-23 Thread Robert Muir (JIRA)
Robert Muir created LUCENE-5842:
---

 Summary: Validate checksum footers for postings lists, docvalues, 
storedfields, termvectors on init
 Key: LUCENE-5842
 URL: https://issues.apache.org/jira/browse/LUCENE-5842
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir


For small files (e.g. where we read in all the bytes anyway), we currently 
validate the checksum on reader init. 

But for larger files like .doc/.frq/.pos/.dvd/.fdt/.tvd we currently do nothing 
at all on init, as it would be too expensive.

We should at least do this:
{code}
// NOTE: data file is too costly to verify checksum against all the bytes on 
// open, but for now we at least verify proper structure of the checksum 
// footer: which looks for FOOTER_MAGIC + algorithmID. This is cheap 
// and can detect some forms of corruption such as file truncation.
CodecUtil.retrieveChecksum(data);
{code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5842) Validate checksum footers for postings lists, docvalues, storedfields, termvectors on init

2014-07-23 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14071642#comment-14071642
 ] 

Adrien Grand commented on LUCENE-5842:
--

+1

> Validate checksum footers for postings lists, docvalues, storedfields, 
> termvectors on init
> --
>
> Key: LUCENE-5842
> URL: https://issues.apache.org/jira/browse/LUCENE-5842
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Robert Muir
>
> For small files (e.g. where we read in all the bytes anyway), we currently 
> validate the checksum on reader init. 
> But for larger files like .doc/.frq/.pos/.dvd/.fdt/.tvd we currently do 
> nothing at all on init, as it would be too expensive.
> We should at least do this:
> {code}
> // NOTE: data file is too costly to verify checksum against all the bytes on 
> // open, but for now we at least verify proper structure of the checksum 
> // footer: which looks for FOOTER_MAGIC + algorithmID. This is cheap 
> // and can detect some forms of corruption such as file truncation.
> CodecUtil.retrieveChecksum(data);
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



RE: Why does Solr binary distribution include test-framework?

2014-07-23 Thread Steve Molloy
Well, people building their extensions probably use it, I know we do. This said 
we use the maven artifact so in our case it doesn't really change anything if 
it's in the binary dist or not, but I guess it's not the case for everybody.

Steve

From: Alexandre Rafalovitch [arafa...@gmail.com]
Sent: July 22, 2014 9:23 PM
To: dev@lucene.apache.org
Subject: Why does Solr binary distribution include test-framework?

Hello,

What is the logic/benefit of shipping test framework with Solr
distribution? Is something actually using it outside of build/test
cycle?

It's 12Mb of libraries and documentations.

Regards,
   Alex.
Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-5410) Solr wrapper for the SpanQueryParser in LUCENE-5205

2014-07-23 Thread Tim Allison (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14070524#comment-14070524
 ] 

Tim Allison edited comment on SOLR-5410 at 7/23/14 11:23 AM:
-

Added standalone source and jars for current latest stable version of 
Lucene/Solr (4.9.0) here:

https://github.com/tballison/lucene-addons.

I'll also try to keep my fork of lucene-solr up to date on the same site for 
integration with trunk...if there is interest.

>From the README file: 
To get this to work in Solr:

1) add lucene-sandbox.jar to your Solr class path (you will need to download 
Lucene separately from Solr!)
2) add solr-5410-x.jar to your Solr class path
3) add lucene-5205-x.jar to your Solr class path
4) add the following line to your solrconfig.xml file:

  
5) at search time, add defType=span to your query string OR &q={!span}quick


was (Author: talli...@mitre.org):
Added standalone source and jars for current latest stable version of 
Lucene/Solr (4.9.0) here:

https://github.com/tballison/tallison-lucene-addons.

I'll also try to keep my fork of lucene-solr up to date on the same site for 
integration with trunk...if there is interest.

>From the README file: 
To get this to work in Solr:

1) add lucene-sandbox.jar to your Solr class path (you will need to download 
Lucene separately from Solr!)
2) add solr-5410-x.jar to your Solr class path
3) add lucene-5205-x.jar to your Solr class path
4) add the following line to your solrconfig.xml file:

  
5) at search time, add defType=span to your query string OR &q={!span}quick

> Solr wrapper for the SpanQueryParser in LUCENE-5205
> ---
>
> Key: SOLR-5410
> URL: https://issues.apache.org/jira/browse/SOLR-5410
> Project: Solr
>  Issue Type: New Feature
>Reporter: Jason R Robinson
> Attachments: SOLR-5410.patch, Solr_SpanQueryParser.zip
>
>
> This is a simple Solr wrapper around the SpanQueryParser submitted in 
> [LUCENE-5205|https://issues.apache.org/jira/i#browse/LUCENE-5205].
> Dependent on  
> [LUCENE-5205|https://issues.apache.org/jira/i#browse/LUCENE-5205]
> ***Following Yonik's Law*** 
> This is patch is more of a placeholder for a much more polished draft.  Among 
> other things, test scripts and javadocs are forthcoming!



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-5205) [PATCH] SpanQueryParser with recursion, analysis and syntax very similar to classic QueryParser

2014-07-23 Thread Tim Allison (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14070529#comment-14070529
 ] 

Tim Allison edited comment on LUCENE-5205 at 7/23/14 11:23 AM:
---

Unrelated to work on LUCENE-5758, I added a standalone package including a jar 
to track with current latest stable distro of Lucene here: 
https://github.com/tballison/lucene-addons/tree/master/lucene-5205

For trunk integration, see lucene-5205 branch of my fork on github.


was (Author: talli...@mitre.org):
Unrelated to work on LUCENE-5758, I added a standalone package including a jar 
to track with current latest stable distro of Lucene here: 
https://github.com/tballison/tallison-lucene-addons/tree/master/lucene-5205

For trunk integration, see lucene-5205 branch of my fork on github.

> [PATCH] SpanQueryParser with recursion, analysis and syntax very similar to 
> classic QueryParser
> ---
>
> Key: LUCENE-5205
> URL: https://issues.apache.org/jira/browse/LUCENE-5205
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/queryparser
>Reporter: Tim Allison
>  Labels: patch
> Fix For: 4.9
>
> Attachments: LUCENE-5205-cleanup-tests.patch, 
> LUCENE-5205-date-pkg-prvt.patch, LUCENE-5205.patch.gz, LUCENE-5205.patch.gz, 
> LUCENE-5205_dateTestReInitPkgPrvt.patch, 
> LUCENE-5205_improve_stop_word_handling.patch, 
> LUCENE-5205_smallTestMods.patch, LUCENE_5205.patch, 
> SpanQueryParser_v1.patch.gz, patch.txt
>
>
> This parser extends QueryParserBase and includes functionality from:
> * Classic QueryParser: most of its syntax
> * SurroundQueryParser: recursive parsing for "near" and "not" clauses.
> * ComplexPhraseQueryParser: can handle "near" queries that include multiterms 
> (wildcard, fuzzy, regex, prefix),
> * AnalyzingQueryParser: has an option to analyze multiterms.
> At a high level, there's a first pass BooleanQuery/field parser and then a 
> span query parser handles all terminal nodes and phrases.
> Same as classic syntax:
> * term: test 
> * fuzzy: roam~0.8, roam~2
> * wildcard: te?t, test*, t*st
> * regex: /\[mb\]oat/
> * phrase: "jakarta apache"
> * phrase with slop: "jakarta apache"~3
> * default "or" clause: jakarta apache
> * grouping "or" clause: (jakarta apache)
> * boolean and +/-: (lucene OR apache) NOT jakarta; +lucene +apache -jakarta
> * multiple fields: title:lucene author:hatcher
>  
> Main additions in SpanQueryParser syntax vs. classic syntax:
> * Can require "in order" for phrases with slop with the \~> operator: 
> "jakarta apache"\~>3
> * Can specify "not near": "fever bieber"!\~3,10 ::
> find "fever" but not if "bieber" appears within 3 words before or 10 
> words after it.
> * Fully recursive phrasal queries with \[ and \]; as in: \[\[jakarta 
> apache\]~3 lucene\]\~>4 :: 
> find "jakarta" within 3 words of "apache", and that hit has to be within 
> four words before "lucene"
> * Can also use \[\] for single level phrasal queries instead of " as in: 
> \[jakarta apache\]
> * Can use "or grouping" clauses in phrasal queries: "apache (lucene solr)"\~3 
> :: find "apache" and then either "lucene" or "solr" within three words.
> * Can use multiterms in phrasal queries: "jakarta\~1 ap*che"\~2
> * Did I mention full recursion: \[\[jakarta\~1 ap*che\]\~2 (solr~ 
> /l\[ou\]\+\[cs\]\[en\]\+/)]\~10 :: Find something like "jakarta" within two 
> words of "ap*che" and that hit has to be within ten words of something like 
> "solr" or that "lucene" regex.
> * Can require at least x number of hits at boolean level: "apache AND (lucene 
> solr tika)~2
> * Can use negative only query: -jakarta :: Find all docs that don't contain 
> "jakarta"
> * Can use an edit distance > 2 for fuzzy query via SlowFuzzyQuery (beware of 
> potential performance issues!).
> Trivial additions:
> * Can specify prefix length in fuzzy queries: jakarta~1,2 (edit distance =1, 
> prefix =2)
> * Can specifiy Optimal String Alignment (OSA) vs Levenshtein for distance 
> <=2: (jakarta~1 (OSA) vs jakarta~>1(Levenshtein)
> This parser can be very useful for concordance tasks (see also LUCENE-5317 
> and LUCENE-5318) and for analytical search.  
> Until LUCENE-2878 is closed, this might have a use for fans of SpanQuery.
> Most of the documentation is in the javadoc for SpanQueryParser.
> Any and all feedback is welcome.  Thank you.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-5841) Remove FST.Builder.FreezeTail interface

2014-07-23 Thread Han Jiang (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14071563#comment-14071563
 ] 

Han Jiang commented on LUCENE-5841:
---

It is really great to see this interface removed!

> Remove FST.Builder.FreezeTail interface
> ---
>
> Key: LUCENE-5841
> URL: https://issues.apache.org/jira/browse/LUCENE-5841
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 5.0, 4.10
>
> Attachments: LUCENE-5841.patch
>
>
> The FST Builder has a crazy-hairy interface called FreezeTail, which is only
> used by BlockTreeTermsWriter to find appropriate prefixes
> (i.e. containing enough terms or sub-blocks) to write term blocks.
> But this is really a silly abuse ... it's cleaner and likely
> faster/less GC for BTTW to compute this itself just by tracking the
> term ordinal where each prefix started in the pending terms/blocks.  The
> code is also insanely hairy, and this is at least a baby step to try
> to make it a bit simpler.
> This also makes it very hard to experiment with different formats at
> write-time because you have to get your new formats working through
> this strange FreezeTail.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-5841) Remove FST.Builder.FreezeTail interface

2014-07-23 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-5841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-5841:
---

Attachment: LUCENE-5841.patch

Patch, fixing BTTW (and its forks) to do their own term -> block
assignment w/o abusing FST.Builder, and then entirely removing the
FreezeTail API from FST.Builder.


> Remove FST.Builder.FreezeTail interface
> ---
>
> Key: LUCENE-5841
> URL: https://issues.apache.org/jira/browse/LUCENE-5841
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 5.0, 4.10
>
> Attachments: LUCENE-5841.patch
>
>
> The FST Builder has a crazy-hairy interface called FreezeTail, which is only
> used by BlockTreeTermsWriter to find appropriate prefixes
> (i.e. containing enough terms or sub-blocks) to write term blocks.
> But this is really a silly abuse ... it's cleaner and likely
> faster/less GC for BTTW to compute this itself just by tracking the
> term ordinal where each prefix started in the pending terms/blocks.  The
> code is also insanely hairy, and this is at least a baby step to try
> to make it a bit simpler.
> This also makes it very hard to experiment with different formats at
> write-time because you have to get your new formats working through
> this strange FreezeTail.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-5841) Remove FST.Builder.FreezeTail interface

2014-07-23 Thread Michael McCandless (JIRA)
Michael McCandless created LUCENE-5841:
--

 Summary: Remove FST.Builder.FreezeTail interface
 Key: LUCENE-5841
 URL: https://issues.apache.org/jira/browse/LUCENE-5841
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/codecs
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 5.0, 4.10


The FST Builder has a crazy-hairy interface called FreezeTail, which is only
used by BlockTreeTermsWriter to find appropriate prefixes
(i.e. containing enough terms or sub-blocks) to write term blocks.

But this is really a silly abuse ... it's cleaner and likely
faster/less GC for BTTW to compute this itself just by tracking the
term ordinal where each prefix started in the pending terms/blocks.  The
code is also insanely hairy, and this is at least a baby step to try
to make it a bit simpler.

This also makes it very hard to experiment with different formats at
write-time because you have to get your new formats working through
this strange FreezeTail.




--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Proposal: Full support for multi-word synonyms at query time

2014-07-23 Thread mimimimi
While dealing with synonym at query time, solr failed to work with multi-word
synonyms due to some reasons:

First the lucene queryparser tokenizes user query by space so it split
multi-word term into two terms before feeding to synonym filter, so synonym
filter can't recognized multi-word term to do expansion
Second, if synonym filter expand into multiple terms which contains
multi-word synonym, The SolrQueryParseBase currently use MultiPhraseQuery to
handle synonyms. But MultiPhraseQuery don't work with term have different
number of words.
For the first one, we can extend quoted all multi-word synonym in user query
so that lucene queryparser don't split it. There are a jira task related to
this one https://issues.apache.org/jira/browse/LUCENE-2605.

For the second, we can replace MultiPhraseQuery by an appropriate
BoleanQuery SHOULD which contains multiple PhraseQuery in case tokens stream
have multi-word synonym

barcode java
  




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Proposal-Full-support-for-multi-word-synonyms-at-query-time-tp4000522p4148709.html
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org