date:20130605

[jira] [Updated] (SOLR-4903) Solr sends all doc ids to all shards in the query counting facets

2013-06-05 Thread Dmitry Kan (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-4903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitry Kan updated SOLR-4903:
-

Affects Version/s: 4.3

> Solr sends all doc ids to all shards in the query counting facets
> -
>
> Key: SOLR-4903
> URL: https://issues.apache.org/jira/browse/SOLR-4903
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Affects Versions: 3.4, 4.3
>Reporter: Dmitry Kan
>
> Setup: front end solr and shards.
> Summary: solr frontend sends all doc ids received from QueryComponent to all 
> shards which causes POST request buffer size overflow.
> Symptoms:
> The query is: http://pastebin.com/0DndK1Cs
> I have omitted the shards parameter.
> The router log: http://pastebin.com/FTVH1WF3
> Notice the port of a shard, that is affected. That port changes all the time, 
> even for the same request
> The log entry is prepended with lines:
> SEVERE: org.apache.solr.common.SolrException: Internal Server Error
> Internal Server Error
> (they are not in the pastebin link)
> The shard log: http://pastebin.com/exwCx3LX
> Suggestion: change the data structure in FacetComponent to send only doc ids 
> that belong to a shard and not a concatenation of all doc ids.
> Why is this important: for scaling. Adding more shards will result in 
> overflowing the POST request buffer at some point anyway.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-4904) Send internal doc ids and index version in distributed faceting to make queries more compact

2013-06-05 Thread Dmitry Kan (JIRA)

Dmitry Kan created SOLR-4904:


 Summary: Send internal doc ids and index version in distributed 
faceting to make queries more compact
 Key: SOLR-4904
 URL: https://issues.apache.org/jira/browse/SOLR-4904
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 4.3, 3.4
Reporter: Dmitry Kan


This is suggested by [~ab] at bbuzz conf 2013. This makes a lot of sense and 
works nice with fixing the root cause of issue SOLR-4903.

Basically QueryComponent could send internal lucene ids along with the index 
version number so that in subsequent queries to other solr components, like 
FacetComponent, the internal ids would be sent. The index version is required 
to ensure we deal with the same index.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-4903) Solr sends all doc ids to all shards in the query counting facets

2013-06-05 Thread Dmitry Kan (JIRA)

Dmitry Kan created SOLR-4903:


 Summary: Solr sends all doc ids to all shards in the query 
counting facets
 Key: SOLR-4903
 URL: https://issues.apache.org/jira/browse/SOLR-4903
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 3.4
Reporter: Dmitry Kan


Setup: front end solr and shards.

Summary: solr frontend sends all doc ids received from QueryComponent to all 
shards which causes POST request buffer size overflow.

Symptoms:

The query is: http://pastebin.com/0DndK1Cs
I have omitted the shards parameter.

The router log: http://pastebin.com/FTVH1WF3
Notice the port of a shard, that is affected. That port changes all the time, 
even for the same request
The log entry is prepended with lines:

SEVERE: org.apache.solr.common.SolrException: Internal Server Error

Internal Server Error

(they are not in the pastebin link)

The shard log: http://pastebin.com/exwCx3LX

Suggestion: change the data structure in FacetComponent to send only doc ids 
that belong to a shard and not a concatenation of all doc ids.

Why is this important: for scaling. Adding more shards will result in 
overflowing the POST request buffer at some point anyway.



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-2899) Add OpenNLP Analysis capabilities as a module

2013-06-05 Thread Lance Norskog (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-2899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lance Norskog updated LUCENE-2899:
--

Attachment: LUCENE-2899-x.patch

> Add OpenNLP Analysis capabilities as a module
> -
>
> Key: LUCENE-2899
> URL: https://issues.apache.org/jira/browse/LUCENE-2899
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/analysis
>Reporter: Grant Ingersoll
>Assignee: Grant Ingersoll
>Priority: Minor
> Fix For: 4.4
>
> Attachments: LUCENE-2899-current.patch, LUCENE-2899.patch, 
> LUCENE-2899.patch, LUCENE-2899.patch, LUCENE-2899.patch, LUCENE-2899.patch, 
> LUCENE-2899.patch, LUCENE-2899-RJN.patch, LUCENE-2899-x.patch, 
> LUCENE-2899-x.patch, OpenNLPFilter.java, OpenNLPTokenizer.java, 
> opennlp_trunk.patch
>
>
> Now that OpenNLP is an ASF project and has a nice license, it would be nice 
> to have a submodule (under analysis) that exposed capabilities for it. Drew 
> Farris, Tom Morton and I have code that does:
> * Sentence Detection as a Tokenizer (could also be a TokenFilter, although it 
> would have to change slightly to buffer tokens)
> * NamedEntity recognition as a TokenFilter
> We are also planning a Tokenizer/TokenFilter that can put parts of speech as 
> either payloads (PartOfSpeechAttribute?) on a token or at the same position.
> I'd propose it go under:
> modules/analysis/opennlp

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2899) Add OpenNLP Analysis capabilities as a module

2013-06-05 Thread Lance Norskog (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13676698#comment-13676698
 ] 

Lance Norskog commented on LUCENE-2899:
---

I found the problem with multiple documents. The API for reusing Tokenizers 
changed something more sensible, but I only noticed and implemented part of the 
change. The result was than when you upload multiple documents, it just 
re-processed the first document.

File LUCENE-2899-x.patch has this fix. It applies against the 4.x branch and 
the trunk. It does not apply against Lucene 4.0, 4.1, 4.2 or 4.3. For all 
released Solr versions you want LUCENE-2899.patch from August 27, 2012. There 
are no new features since that release.

> Add OpenNLP Analysis capabilities as a module
> -
>
> Key: LUCENE-2899
> URL: https://issues.apache.org/jira/browse/LUCENE-2899
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/analysis
>Reporter: Grant Ingersoll
>Assignee: Grant Ingersoll
>Priority: Minor
> Fix For: 4.4
>
> Attachments: LUCENE-2899-current.patch, LUCENE-2899.patch, 
> LUCENE-2899.patch, LUCENE-2899.patch, LUCENE-2899.patch, LUCENE-2899.patch, 
> LUCENE-2899.patch, LUCENE-2899-RJN.patch, LUCENE-2899-x.patch, 
> OpenNLPFilter.java, OpenNLPTokenizer.java, opennlp_trunk.patch
>
>
> Now that OpenNLP is an ASF project and has a nice license, it would be nice 
> to have a submodule (under analysis) that exposed capabilities for it. Drew 
> Farris, Tom Morton and I have code that does:
> * Sentence Detection as a Tokenizer (could also be a TokenFilter, although it 
> would have to change slightly to buffer tokens)
> * NamedEntity recognition as a TokenFilter
> We are also planning a Tokenizer/TokenFilter that can put parts of speech as 
> either payloads (PartOfSpeechAttribute?) on a token or at the same position.
> I'd propose it go under:
> modules/analysis/opennlp

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Documentation for Solr/Lucene 4.x, termIndexInterval and limitations of Lucene File format

2013-06-05 Thread Robert Muir

On Wed, Jun 5, 2013 at 4:21 PM, Michael McCandless <
luc...@mikemccandless.com> wrote:

>
> Nice :)  That's good news (that nothing blew up!).  Thanks for sharing.
>

With such a old jvm and such a large index, I'd say its a stroke of pure
luck nothing didn't blow up.

[jira] [Commented] (LUCENE-5033) SlowFuzzyQuery appears to fail with edit distance >=3 in some cases

2013-06-05 Thread Robert Muir (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13676659#comment-13676659
 ] 

Robert Muir commented on LUCENE-5033:
-

Doing an explicit levenshtein calculation here sort of defeats the entire 
purpose of having levenshtein automata at all!

> SlowFuzzyQuery appears to fail with edit distance >=3 in some cases
> ---
>
> Key: LUCENE-5033
> URL: https://issues.apache.org/jira/browse/LUCENE-5033
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/other
>Affects Versions: 4.3
>Reporter: Tim Allison
>Priority: Minor
> Attachments: LUCENE-5033.patch
>
>
> Levenshtein edit btwn "monday" and "montugu" should be 4.  The following 
> shows a query with "sim" set to 3, and there is a hit.
>   public void testFuzzinessLong2() throws Exception {
>  Directory directory = newDirectory();
>  RandomIndexWriter writer = new RandomIndexWriter(random(), directory);
>  addDoc("monday", writer);
>  
>  IndexReader reader = writer.getReader();
>  IndexSearcher searcher = newSearcher(reader);
>  writer.close();
>  SlowFuzzyQuery query;
>  query = new SlowFuzzyQuery(new Term("field", "montugu"), 3, 0);   
>  ScoreDoc[] hits = searcher.search(query, null, 1000).scoreDocs;
>  assertEquals(0, hits.length);
>   }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5033) SlowFuzzyQuery appears to fail with edit distance >=3 in some cases

2013-06-05 Thread Tim Allison (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13676587#comment-13676587
 ] 

Tim Allison commented on LUCENE-5033:
-

Thank you for your quick response!

I, too, was hoping to avoid calcSimilarity if raw is true, but I think we need 
it to calculate the boost.  Let me know if I'm missing something.

The bug in the original code was that FilteredTermsEnum sets minSimilarity to 0 
when the user-specified minSimilarity is >= 1.0f.  So, in SlowFuzzyTermsEnum, 
similarity (unless it was Float.NEGATIVE_INFINITY) was typically > 
minSimilarity no matter its value.  In other words, when the client code made 
the call with minSimilarity >=1.0f, that value was correctly recorded in 
maxEdits, but maxEdits wasn't the determining factor in whether SlowFuzzyTerms 
accepted a term.

> SlowFuzzyQuery appears to fail with edit distance >=3 in some cases
> ---
>
> Key: LUCENE-5033
> URL: https://issues.apache.org/jira/browse/LUCENE-5033
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/other
>Affects Versions: 4.3
>Reporter: Tim Allison
>Priority: Minor
> Attachments: LUCENE-5033.patch
>
>
> Levenshtein edit btwn "monday" and "montugu" should be 4.  The following 
> shows a query with "sim" set to 3, and there is a hit.
>   public void testFuzzinessLong2() throws Exception {
>  Directory directory = newDirectory();
>  RandomIndexWriter writer = new RandomIndexWriter(random(), directory);
>  addDoc("monday", writer);
>  
>  IndexReader reader = writer.getReader();
>  IndexSearcher searcher = newSearcher(reader);
>  writer.close();
>  SlowFuzzyQuery query;
>  query = new SlowFuzzyQuery(new Term("field", "montugu"), 3, 0);   
>  ScoreDoc[] hits = searcher.search(query, null, 1000).scoreDocs;
>  assertEquals(0, hits.length);
>   }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-4.x-MacOSX (64bit/jdk1.6.0) - Build # 521 - Still Failing!

2013-06-05 Thread Policeman Jenkins Server

Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-MacOSX/521/
Java: 64bit/jdk1.6.0 -XX:-UseCompressedOops -XX:+UseParallelGC

1 tests failed.
REGRESSION:  org.apache.lucene.replicator.http.HttpReplicatorTest.testBasic

Error Message:
Connection to http://localhost:51367 refused

Stack Trace:
org.apache.http.conn.HttpHostConnectException: Connection to 
http://localhost:51367 refused
at 
__randomizedtesting.SeedInfo.seed([BFF64C7ADF67C62C:140C516F00BB4002]:0)
at 
org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:190)
at 
org.apache.http.impl.conn.ManagedClientConnectionImpl.open(ManagedClientConnectionImpl.java:294)
at 
org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:645)
at 
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:480)
at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784)
at 
org.apache.lucene.replicator.http.HttpClientBase.executeGET(HttpClientBase.java:178)
at 
org.apache.lucene.replicator.http.HttpReplicator.checkForUpdate(HttpReplicator.java:51)
at 
org.apache.lucene.replicator.ReplicationClient.doUpdate(ReplicationClient.java:196)
at 
org.apache.lucene.replicator.ReplicationClient.updateNow(ReplicationClient.java:402)
at 
org.apache.lucene.replicator.http.HttpReplicatorTest.testBasic(HttpReplicatorTest.java:112)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:782)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:442)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:682)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:693)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43)
at

[jira] [Commented] (LUCENE-5033) SlowFuzzyQuery appears to fail with edit distance >=3 in some cases

2013-06-05 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13676532#comment-13676532
 ] 

Michael McCandless commented on LUCENE-5033:


Thanks Tim!  This looks like a great improvement: I like factoring out 
calcDistance from calcSimilarity.

And I like that we now take raw into account when figuring out which comparison 
to make to accept the term or not.  Maybe we could improve it a bit: if raw is 
true we don't need to calcSimilarity right?

For my sanity ... where exactly was the bug in the original code?

> SlowFuzzyQuery appears to fail with edit distance >=3 in some cases
> ---
>
> Key: LUCENE-5033
> URL: https://issues.apache.org/jira/browse/LUCENE-5033
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/other
>Affects Versions: 4.3
>Reporter: Tim Allison
>Priority: Minor
> Attachments: LUCENE-5033.patch
>
>
> Levenshtein edit btwn "monday" and "montugu" should be 4.  The following 
> shows a query with "sim" set to 3, and there is a hit.
>   public void testFuzzinessLong2() throws Exception {
>  Directory directory = newDirectory();
>  RandomIndexWriter writer = new RandomIndexWriter(random(), directory);
>  addDoc("monday", writer);
>  
>  IndexReader reader = writer.getReader();
>  IndexSearcher searcher = newSearcher(reader);
>  writer.close();
>  SlowFuzzyQuery query;
>  query = new SlowFuzzyQuery(new Term("field", "montugu"), 3, 0);   
>  ScoreDoc[] hits = searcher.search(query, null, 1000).scoreDocs;
>  assertEquals(0, hits.length);
>   }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-5033) SlowFuzzyQuery appears to fail with edit distance >=3 in some cases

2013-06-05 Thread Tim Allison (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-5033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Allison updated LUCENE-5033:


Attachment: LUCENE-5033.patch

First draft of patch attached.  Let me know how this looks. Thank you.

> SlowFuzzyQuery appears to fail with edit distance >=3 in some cases
> ---
>
> Key: LUCENE-5033
> URL: https://issues.apache.org/jira/browse/LUCENE-5033
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/other
>Affects Versions: 4.3
>Reporter: Tim Allison
>Priority: Minor
> Attachments: LUCENE-5033.patch
>
>
> Levenshtein edit btwn "monday" and "montugu" should be 4.  The following 
> shows a query with "sim" set to 3, and there is a hit.
>   public void testFuzzinessLong2() throws Exception {
>  Directory directory = newDirectory();
>  RandomIndexWriter writer = new RandomIndexWriter(random(), directory);
>  addDoc("monday", writer);
>  
>  IndexReader reader = writer.getReader();
>  IndexSearcher searcher = newSearcher(reader);
>  writer.close();
>  SlowFuzzyQuery query;
>  query = new SlowFuzzyQuery(new Term("field", "montugu"), 3, 0);   
>  ScoreDoc[] hits = searcher.search(query, null, 1000).scoreDocs;
>  assertEquals(0, hits.length);
>   }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Documentation for Solr/Lucene 4.x, termIndexInterval and limitations of Lucene File format

2013-06-05 Thread Michael McCandless

On Wed, Jun 5, 2013 at 2:47 PM, Tom Burton-West  wrote:

> 13 Billion unique terms.  (CheckIndex output appended below)

Nice :)  That's good news (that nothing blew up!).  Thanks for sharing.

Mike McCandless

http://blog.mikemccandless.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-5035) FieldCacheImpl.SortedDocValuesImpl should compress addresses to term bytes more efficiently

2013-06-05 Thread Robert Muir (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-5035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-5035.
-

   Resolution: Fixed
Fix Version/s: 4.4
   5.0

> FieldCacheImpl.SortedDocValuesImpl should compress addresses to term bytes 
> more efficiently
> ---
>
> Key: LUCENE-5035
> URL: https://issues.apache.org/jira/browse/LUCENE-5035
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Reporter: Robert Muir
> Fix For: 5.0, 4.4
>
> Attachments: LUCENE-5035.patch
>
>
> Each ordinal in SortedDocValuesImpl has a corresponding address to find its 
> location in the big byte[] to support lookupOrd()
> Today this uses GrowableWriter with absolute addresses.
> But it would be much better to use MonotonicAppendingLongBuffer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-4902) Confusing field name in example schema - text

2013-06-05 Thread Shawn Heisey (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-4902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shawn Heisey updated SOLR-4902:
---

Description: 
The following came up in the IRC channel today:

{noformat}
16:34 < sayuke> I can't work this out for the life of me. Is text in text:blah
some sort of special syntax for searching all text fields?
Google keywords other than text: appreciated
{noformat}

A better name for this would be something that includes catchall.  There is a 
lot of documentation that mentions this field name that would all have to be 
updated.


  was:
The following came up in the IRC channel today:

16:34 < sayuke> I can't work this out for the life of me. Is text in text:blah
some sort of special syntax for searching all text fields?
Google keywords other than text: appreciated

A better name for this would be something that includes catchall.  There is a 
lot of documentation that mentions this field name that would all have to be 
updated.



> Confusing field name in example schema - text
> -
>
> Key: SOLR-4902
> URL: https://issues.apache.org/jira/browse/SOLR-4902
> Project: Solr
>  Issue Type: Improvement
>Reporter: Shawn Heisey
>Priority: Minor
>
> The following came up in the IRC channel today:
> {noformat}
> 16:34 < sayuke> I can't work this out for the life of me. Is text in text:blah
> some sort of special syntax for searching all text fields?
> Google keywords other than text: appreciated
> {noformat}
> A better name for this would be something that includes catchall.  There is a 
> lot of documentation that mentions this field name that would all have to be 
> updated.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-4902) Confusing field name in example schema - text

2013-06-05 Thread Shawn Heisey (JIRA)

Shawn Heisey created SOLR-4902:
--

 Summary: Confusing field name in example schema - text
 Key: SOLR-4902
 URL: https://issues.apache.org/jira/browse/SOLR-4902
 Project: Solr
  Issue Type: Improvement
Reporter: Shawn Heisey
Priority: Minor


The following came up in the IRC channel today:

16:34 < sayuke> I can't work this out for the life of me. Is text in text:blah
some sort of special syntax for searching all text fields?
Google keywords other than text: appreciated

A better name for this would be something that includes catchall.  There is a 
lot of documentation that mentions this field name that would all have to be 
updated.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-Tests-4.3-Java6 - Build # 56 - Failure

2013-06-05 Thread Apache Jenkins Server

Build: https://builds.apache.org/job/Lucene-Solr-Tests-4.3-Java6/56/

1 tests failed.
FAILED:  
org.apache.lucene.spatial.prefix.SpatialOpRecursivePrefixTreeTest.testContains 
{#3 seed=[86987851AA607662:EFD27221455C0D78]}

Error Message:
Shouldn't match I #3:ShapePair(Rect(minX=59.0,maxX=81.0,minY=0.0,maxY=11.0) , 
Rect(minX=189.0,maxX=190.0,minY=-60.0,maxY=64.0)) 
Q:Rect(minX=0.0,maxX=256.0,minY=-128.0,maxY=128.0)

Stack Trace:
java.lang.AssertionError: Shouldn't match I 
#3:ShapePair(Rect(minX=59.0,maxX=81.0,minY=0.0,maxY=11.0) , 
Rect(minX=189.0,maxX=190.0,minY=-60.0,maxY=64.0)) 
Q:Rect(minX=0.0,maxX=256.0,minY=-128.0,maxY=128.0)
at 
__randomizedtesting.SeedInfo.seed([86987851AA607662:EFD27221455C0D78]:0)
at org.junit.Assert.fail(Assert.java:93)
at 
org.apache.lucene.spatial.prefix.SpatialOpRecursivePrefixTreeTest.fail(SpatialOpRecursivePrefixTreeTest.java:287)
at 
org.apache.lucene.spatial.prefix.SpatialOpRecursivePrefixTreeTest.doTest(SpatialOpRecursivePrefixTreeTest.java:273)
at 
org.apache.lucene.spatial.prefix.SpatialOpRecursivePrefixTreeTest.testContains(SpatialOpRecursivePrefixTreeTest.java:101)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:782)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:442)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:682)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:693)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358)
at java.l

[jira] [Commented] (SOLR-4744) Version conflict error during shard split test

2013-06-05 Thread Yonik Seeley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-4744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13676370#comment-13676370
 ] 

Yonik Seeley commented on SOLR-4744:


Your changes look fine Hoss.

It's not clear to me why the forward to subshard needs to be synchronous in the 
original committed patch, but I guess that can always be revisited later as an 
optimization.

> Version conflict error during shard split test
> --
>
> Key: SOLR-4744
> URL: https://issues.apache.org/jira/browse/SOLR-4744
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 4.3
>Reporter: Shalin Shekhar Mangar
>Assignee: Shalin Shekhar Mangar
>Priority: Minor
> Fix For: 4.4, 4.3.1
>
> Attachments: SOLR-4744__no_more_NPE.patch, SOLR-4744.patch, 
> SOLR-4744.patch
>
>
> ShardSplitTest fails sometimes with the following error:
> {code}
> [junit4:junit4]   1> INFO  - 2013-04-14 19:05:26.861; 
> org.apache.solr.cloud.Overseer$ClusterStateUpdater; Update shard state 
> invoked for collection: collection1
> [junit4:junit4]   1> INFO  - 2013-04-14 19:05:26.861; 
> org.apache.solr.cloud.Overseer$ClusterStateUpdater; Update shard state shard1 
> to inactive
> [junit4:junit4]   1> INFO  - 2013-04-14 19:05:26.861; 
> org.apache.solr.cloud.Overseer$ClusterStateUpdater; Update shard state 
> shard1_0 to active
> [junit4:junit4]   1> INFO  - 2013-04-14 19:05:26.861; 
> org.apache.solr.cloud.Overseer$ClusterStateUpdater; Update shard state 
> shard1_1 to active
> [junit4:junit4]   1> INFO  - 2013-04-14 19:05:26.873; 
> org.apache.solr.update.processor.LogUpdateProcessor; [collection1] webapp= 
> path=/update params={wt=javabin&version=2} {add=[169 (1432319507166134272)]} 
> 0 2
> [junit4:junit4]   1> INFO  - 2013-04-14 19:05:26.877; 
> org.apache.solr.common.cloud.ZkStateReader$2; A cluster state change: 
> WatchedEvent state:SyncConnected type:NodeDataChanged 
> path:/clusterstate.json, has occurred - updating... (live nodes size: 5)
> [junit4:junit4]   1> INFO  - 2013-04-14 19:05:26.877; 
> org.apache.solr.common.cloud.ZkStateReader$2; A cluster state change: 
> WatchedEvent state:SyncConnected type:NodeDataChanged 
> path:/clusterstate.json, has occurred - updating... (live nodes size: 5)
> [junit4:junit4]   1> INFO  - 2013-04-14 19:05:26.877; 
> org.apache.solr.common.cloud.ZkStateReader$2; A cluster state change: 
> WatchedEvent state:SyncConnected type:NodeDataChanged 
> path:/clusterstate.json, has occurred - updating... (live nodes size: 5)
> [junit4:junit4]   1> INFO  - 2013-04-14 19:05:26.877; 
> org.apache.solr.common.cloud.ZkStateReader$2; A cluster state change: 
> WatchedEvent state:SyncConnected type:NodeDataChanged 
> path:/clusterstate.json, has occurred - updating... (live nodes size: 5)
> [junit4:junit4]   1> INFO  - 2013-04-14 19:05:26.877; 
> org.apache.solr.common.cloud.ZkStateReader$2; A cluster state change: 
> WatchedEvent state:SyncConnected type:NodeDataChanged 
> path:/clusterstate.json, has occurred - updating... (live nodes size: 5)
> [junit4:junit4]   1> INFO  - 2013-04-14 19:05:26.877; 
> org.apache.solr.common.cloud.ZkStateReader$2; A cluster state change: 
> WatchedEvent state:SyncConnected type:NodeDataChanged 
> path:/clusterstate.json, has occurred - updating... (live nodes size: 5)
> [junit4:junit4]   1> INFO  - 2013-04-14 19:05:26.884; 
> org.apache.solr.update.processor.LogUpdateProcessor; 
> [collection1_shard1_1_replica1] webapp= path=/update 
> params={distrib.from=http://127.0.0.1:41028/collection1/&update.distrib=FROMLEADER&wt=javabin&distrib.from.parent=shard1&version=2}
>  {} 0 1
> [junit4:junit4]   1> INFO  - 2013-04-14 19:05:26.885; 
> org.apache.solr.update.processor.LogUpdateProcessor; [collection1] webapp= 
> path=/update 
> params={distrib.from=http://127.0.0.1:41028/collection1/&update.distrib=FROMLEADER&wt=javabin&distrib.from.parent=shard1&version=2}
>  {add=[169 (1432319507173474304)]} 0 2
> [junit4:junit4]   1> ERROR - 2013-04-14 19:05:26.885; 
> org.apache.solr.common.SolrException; shard update error StdNode: 
> http://127.0.0.1:41028/collection1_shard1_1_replica1/:org.apache.solr.common.SolrException:
>  version conflict for 169 expected=1432319507173474304 actual=-1
> [junit4:junit4]   1>  at 
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:404)
> [junit4:junit4]   1>  at 
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:181)
> [junit4:junit4]   1>  at 
> org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:332)
> [junit4:junit4]   1>  at 
> org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:306)
> [junit4:junit4]   1>  at 
> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> [junit4:ju

[jira] [Comment Edited] (SOLR-4862) Core admin action "CREATE" fails to persist some settings in solr.xml

2013-06-05 Thread Trey Massingill (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13676325#comment-13676325
 ] 

Trey Massingill edited comment on SOLR-4862 at 6/5/13 8:49 PM:
---

Seemingly, I'm running into this issue as well. I'm in the process of upgrading 
from 3.6.1 to 4.3.

The solr log shows that I passed the dataDir option, but it does not show up in 
solr.xml. I'm not sure why "collection" is showing up in solr.xml either.

Log message:
{noformat}
235705|2013-06-05T20:25:16.774+|qtp875010279-17|INFO|o.a.solr.servlet.SolrDispatchFilter|[admin]
 webapp=null path=/admin/cores 
params={schema=schema.xml&loadOnStartup=false&instanceDir=.&transient=true&name=queue-2013060518&action=CREATE&config=solrconfig.xml&dataDir=..
/../index_data/queue-2013060518&wt=json} status=0 QTime=1635
{noformat}

solr.xml
{noformat}


  

  

{noformat}

This doesn't seem to cause issues at first. However, after restarting the 
service, I end up with this warning:

{noformat}
16764|2013-06-05T20:36:15.289+|qtp1711465251-20|WARN|o.a.solr.handler.ReplicationHandler|Unable
 to get IndexCommit on startup 
org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: 
NativeFSLock@/home/tmassi/Development/svn/mta-blockmon-2012/blockmon-solr/blockmon-solr/master/versions/blockmon-solr-2.0.4-SNAPSHOT/config/solr/data/index/write.lock
at org.apache.lucene.store.Lock.obtain(Lock.java:84)
at org.apache.lucene.index.IndexWriter.(IndexWriter.java:644)
at 
org.apache.solr.update.SolrIndexWriter.(SolrIndexWriter.java:77)
at 
org.apache.solr.update.SolrIndexWriter.create(SolrIndexWriter.java:64)
at 
org.apache.solr.update.DefaultSolrCoreState.createMainIndexWriter(DefaultSolrCoreState.java:197)
at 
org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:110)
at 
org.apache.solr.handler.ReplicationHandler.inform(ReplicationHandler.java:939)
at 
org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:616)
at org.apache.solr.core.SolrCore.(SolrCore.java:816)
at org.apache.solr.core.SolrCore.(SolrCore.java:618)
at 
org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:949)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:984)
at org.apache.solr.core.CoreContainer.getCore(CoreContainer.java:1227)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:240)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:525)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1072)
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:382)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1006)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:365)
at 
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:485)
at 
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
at 
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:926)
at 
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:988)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:635)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
at 
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
at 
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at jav

[jira] [Commented] (SOLR-4862) Core admin action "CREATE" fails to persist some settings in solr.xml

2013-06-05 Thread Trey Massingill (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-4862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13676325#comment-13676325
 ] 

Trey Massingill commented on SOLR-4862:
---

Seemingly, I'm running into this issue as well. The solr log shows that I 
passed the dataDir option, but it does not show up in solr.xml. I'm not sure 
why "collection" is showing up in solr.xml either.

Log message:
{noformat}
235705|2013-06-05T20:25:16.774+|qtp875010279-17|INFO|o.a.solr.servlet.SolrDispatchFilter|[admin]
 webapp=null path=/admin/cores 
params={schema=schema.xml&loadOnStartup=false&instanceDir=.&transient=true&name=queue-2013060518&action=CREATE&config=solrconfig.xml&dataDir=..
/../index_data/queue-2013060518&wt=json} status=0 QTime=1635
{noformat}

solr.xml
{noformat}


  

  

{noformat}

This doesn't seem to cause issues at first. However, after restarting the 
service, I end up with this warning:

{noformat}
16764|2013-06-05T20:36:15.289+|qtp1711465251-20|WARN|o.a.solr.handler.ReplicationHandler|Unable
 to get IndexCommit on startup 
org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: 
NativeFSLock@/home/tmassi/Development/svn/mta-blockmon-2012/blockmon-solr/blockmon-solr/master/versions/blockmon-solr-2.0.4-SNAPSHOT/config/solr/data/index/write.lock
at org.apache.lucene.store.Lock.obtain(Lock.java:84)
at org.apache.lucene.index.IndexWriter.(IndexWriter.java:644)
at 
org.apache.solr.update.SolrIndexWriter.(SolrIndexWriter.java:77)
at 
org.apache.solr.update.SolrIndexWriter.create(SolrIndexWriter.java:64)
at 
org.apache.solr.update.DefaultSolrCoreState.createMainIndexWriter(DefaultSolrCoreState.java:197)
at 
org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:110)
at 
org.apache.solr.handler.ReplicationHandler.inform(ReplicationHandler.java:939)
at 
org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:616)
at org.apache.solr.core.SolrCore.(SolrCore.java:816)
at org.apache.solr.core.SolrCore.(SolrCore.java:618)
at 
org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:949)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:984)
at org.apache.solr.core.CoreContainer.getCore(CoreContainer.java:1227)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:240)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:525)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1072)
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:382)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1006)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:365)
at 
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:485)
at 
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
at 
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:926)
at 
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:988)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:635)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
at 
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
at 
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Thread.java:679)
{noformat}

... quickly followed by this error:
{noformat}
17212

Re: Documentation for Solr/Lucene 4.x, termIndexInterval and limitations of Lucene File format

2013-06-05 Thread Tom Burton-West

Hi Mike,

13 Billion unique terms.  (CheckIndex output appended below)

Tom
--

 test: terms, freq, prox...OK [13,068,302,002 terms; 187,284,275,343
terms/docs pairs; 786,014,075,745 tokens]

Segments file=segments_6 numSegments=2 version=4.0.0.2 format=
userData={commitTimeMSec=1357596564850}
  1 of 2: name=_uhj docCount=866984
codec=Lucene40
compound=false
numFiles=10
size (MB)=2,048,537.68
diagnostics = {os=Linux, os.version=2.6.18-308.24.1.el5, mergeFactor=8,
source=merge, lucene.version=4.0.0 1394950 - rmuir - 2012-10-06 03:00:40,
os.arch=amd64, mergeMaxNumSegments=1, java.version=1.6.0_16,
java.vendor=Sun Microsystems Inc.}
no deletions
test: open reader.OK
test: fields..OK [92 fields]
test: field norms.OK [46 fields]
test: terms, freq, prox...OK [13068302002 terms; 187284275343
terms/docs pairs; 786014075745 tokens]
test: stored fields...OK [34172522 total field count; avg 39.415
fields per doc]
test: term vectorsOK [0 total vector count; avg 0 term/freq
vector fields per doc]
test: DocValuesOK [0 total doc Count; Num DocValues Fields 0



On Tue, Jun 4, 2013 at 1:00 PM, Tom Burton-West  wrote:

> Thanks Mike.
>
> I'm running CheckIndex on the 2TB index right now.Hopefully it will
> finish running by tomorrow.  I'll send you a copy of the output.
>
> Tom
>
>
> On Mon, Jun 3, 2013 at 9:04 PM, Michael McCandless <
> luc...@mikemccandless.com> wrote:
>
>> Hi Tom,
>>
>> On Mon, Jun 3, 2013 at 12:11 PM, Tom Burton-West 
>> wrote:
>>
>> > What is the current limit?
>>
>> I *think* (but would be nice to hear back how many terms you were able
>> to index into one segment ;) ) there is no hard limit to the max
>> number of terms, now that FSTs can handle more than 2.1 B
>> bytes/nodes/arcs.
>>
>> I'll update those javadocs, thanks!
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>
>>
>

[jira] [Commented] (LUCENE-4055) Refactor SegmentInfo / FieldInfo to make them extensible

2013-06-05 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13676083#comment-13676083
 ] 

Michael McCandless commented on LUCENE-4055:


Hmm looks like it's package private in 4.3 but is (will be) public in 
4.x/trunk.  Just replicate for now :)

> Refactor SegmentInfo / FieldInfo to make them extensible
> 
>
> Key: LUCENE-4055
> URL: https://issues.apache.org/jira/browse/LUCENE-4055
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs
>Reporter: Andrzej Bialecki 
>Assignee: Robert Muir
> Fix For: 4.0-ALPHA
>
> Attachments: LUCENE-4055.patch
>
>
> After LUCENE-4050 is done the resulting SegmentInfo / FieldInfo classes 
> should be made abstract so that they can be extended by Codec-s.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5035) FieldCacheImpl.SortedDocValuesImpl should compress addresses to term bytes more efficiently

2013-06-05 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13676080#comment-13676080
 ] 

Michael McCandless commented on LUCENE-5035:


+1, awesome!

> FieldCacheImpl.SortedDocValuesImpl should compress addresses to term bytes 
> more efficiently
> ---
>
> Key: LUCENE-5035
> URL: https://issues.apache.org/jira/browse/LUCENE-5035
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Reporter: Robert Muir
> Attachments: LUCENE-5035.patch
>
>
> Each ordinal in SortedDocValuesImpl has a corresponding address to find its 
> location in the big byte[] to support lookupOrd()
> Today this uses GrowableWriter with absolute addresses.
> But it would be much better to use MonotonicAppendingLongBuffer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-4891) JsonLoader should preserve field value types from the JSON content stream

2013-06-05 Thread Steve Rowe (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-4891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13676029#comment-13676029
 ] 

Steve Rowe edited comment on SOLR-4891 at 6/5/13 3:30 PM:
--

Committed:

- trunk: [r1489914|http://svn.apache.org/viewvc?view=rev&rev=1489914]
- branch_4x: [r1489915|http://svn.apache.org/viewvc?view=rev&rev=1489915]

  was (Author: steve_rowe):
Committed:

- trunk: [r1489914|http://svn.apache.org/viewvc?view=rev?rev=1489914]
- branch_4x: [r1489915|http://svn.apache.org/viewvc?view=rev?rev=1489915]
  
> JsonLoader should preserve field value types from the JSON content stream
> -
>
> Key: SOLR-4891
> URL: https://issues.apache.org/jira/browse/SOLR-4891
> Project: Solr
>  Issue Type: Bug
>  Components: update
>Reporter: Steve Rowe
>Assignee: Steve Rowe
>Priority: Minor
> Fix For: 4.4
>
> Attachments: SOLR-4891-BigInteger-bugfix.patch, SOLR-4891.patch
>
>
> JSON content streams carry some basic type information for their field 
> values, as parsed by Noggit: LONG, NUMBER, BIGNUMBER, and BOOLEAN.  
> {{JsonLoader}} should set field value object types in the 
> {{SolrInputDocument}} according to the content stream's data types. 
> Currently {{JsonLoader}} converts all non-{{String}}-typed field values to 
> {{String}}-s.
> There is a comment in {{JsonLoader.parseSingleFieldValue()}}, where the 
> convert-everything-to-string logic happens, that says "for legacy reasons, 
> single values s are expected to be strings", but other content streams' type 
> information is not flattened like this, e.g. {{JavabinLoader}}.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (SOLR-4891) JsonLoader should preserve field value types from the JSON content stream

2013-06-05 Thread Steve Rowe (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-4891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Rowe resolved SOLR-4891.
--

Resolution: Fixed

Committed:

- trunk: [r1489914|http://svn.apache.org/viewvc?view=rev?rev=1489914]
- branch_4x: [r1489915|http://svn.apache.org/viewvc?view=rev?rev=1489915]

> JsonLoader should preserve field value types from the JSON content stream
> -
>
> Key: SOLR-4891
> URL: https://issues.apache.org/jira/browse/SOLR-4891
> Project: Solr
>  Issue Type: Bug
>  Components: update
>Reporter: Steve Rowe
>Assignee: Steve Rowe
>Priority: Minor
> Fix For: 4.4
>
> Attachments: SOLR-4891-BigInteger-bugfix.patch, SOLR-4891.patch
>
>
> JSON content streams carry some basic type information for their field 
> values, as parsed by Noggit: LONG, NUMBER, BIGNUMBER, and BOOLEAN.  
> {{JsonLoader}} should set field value object types in the 
> {{SolrInputDocument}} according to the content stream's data types. 
> Currently {{JsonLoader}} converts all non-{{String}}-typed field values to 
> {{String}}-s.
> There is a comment in {{JsonLoader.parseSingleFieldValue()}}, where the 
> convert-everything-to-string logic happens, that says "for legacy reasons, 
> single values s are expected to be strings", but other content streams' type 
> information is not flattened like this, e.g. {{JavabinLoader}}.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-4891) JsonLoader should preserve field value types from the JSON content stream

2013-06-05 Thread Steve Rowe (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-4891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Rowe updated SOLR-4891:
-

Attachment: SOLR-4891-BigInteger-bugfix.patch

patch - committing shortly

> JsonLoader should preserve field value types from the JSON content stream
> -
>
> Key: SOLR-4891
> URL: https://issues.apache.org/jira/browse/SOLR-4891
> Project: Solr
>  Issue Type: Bug
>  Components: update
>Reporter: Steve Rowe
>Assignee: Steve Rowe
>Priority: Minor
> Fix For: 4.4
>
> Attachments: SOLR-4891-BigInteger-bugfix.patch, SOLR-4891.patch
>
>
> JSON content streams carry some basic type information for their field 
> values, as parsed by Noggit: LONG, NUMBER, BIGNUMBER, and BOOLEAN.  
> {{JsonLoader}} should set field value object types in the 
> {{SolrInputDocument}} according to the content stream's data types. 
> Currently {{JsonLoader}} converts all non-{{String}}-typed field values to 
> {{String}}-s.
> There is a comment in {{JsonLoader.parseSingleFieldValue()}}, where the 
> convert-everything-to-string logic happens, that says "for legacy reasons, 
> single values s are expected to be strings", but other content streams' type 
> information is not flattened like this, e.g. {{JavabinLoader}}.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: SOLR entry point when deployed in web app container

2013-06-05 Thread Yonik Seeley

On Wed, Jun 5, 2013 at 11:01 AM, Prathik Puthran
 wrote:
> I was trying to find the entry point in SOLR web app when it is deployed in
> an application container. Can someone please help me with this?

Check the SolrDispatchFilter class

-Yonik
http://lucidworks.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Reopened] (SOLR-4891) JsonLoader should preserve field value types from the JSON content stream

2013-06-05 Thread Steve Rowe (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-4891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Rowe reopened SOLR-4891:
--


At Hoss's suggestion on #solr IRC last night, I tested whether {{JsonLoader}} 
behavior has changed around {{BigInteger}} and {{BigDecimal}} values as a 
result of the changes committed under this issue.

I'm reopening to address an issue with adding JSON {{BIGNUMBER}}-s (returned by 
the Noggit parser when a number won't fit in either a long or a double) to trie 
integer or long fields: a {{NumberFormatException}} is no longer triggered, and 
the values are silently corrupted.

Before committing the patch on this issue, {{BigInteger}}-typed values were not 
created for {{BIGNUMBER}}-s in {{SolrInputDocument}}; instead, they (along with 
every other JSON value) were converted to {{String}}-s, and then adding such a 
value to an integer or long field would cause a {{NumberFormatException}} to be 
thrown from {{Integer.parseInt()}} or {{Long.parseLong()}}.  This was proper 
and good.

But now, {{BigInteger}}-typed values are converted (in 
{{TrieField.createField()}} to int/long using {{BigInteger}}'s {{intValue()}} 
and {{longValue()}} methods, which return only the low-order 32 and 64 bits, 
respectively.  These values are always corrupted: the truncated high-order bits 
are guaranteed to be non-zero, since {{BigInteger}} typing only happens when 
values won't fit into 64 bits.

Reverting back to {{String}}-typed {{BIGNUMBER}} values fixes the problem.

By contrast, {{BigDecimal}}'s {{doubleValue()}} and {{floatValue()}} methods 
truncate the low-order bits, resulting in loss of precision rather than 
corruption.  This is the same behavior used by {{Double.parseDouble()}} and 
{{Float.parseFloat()}}.  Reverting back to {{String}}-typing for decimal 
{{BIGNUMBER}}-s in addition to integral {{BIGNUMBER}}-s won't be a problem.

Patch forthcoming.

> JsonLoader should preserve field value types from the JSON content stream
> -
>
> Key: SOLR-4891
> URL: https://issues.apache.org/jira/browse/SOLR-4891
> Project: Solr
>  Issue Type: Bug
>  Components: update
>Reporter: Steve Rowe
>Assignee: Steve Rowe
>Priority: Minor
> Fix For: 4.4
>
> Attachments: SOLR-4891.patch
>
>
> JSON content streams carry some basic type information for their field 
> values, as parsed by Noggit: LONG, NUMBER, BIGNUMBER, and BOOLEAN.  
> {{JsonLoader}} should set field value object types in the 
> {{SolrInputDocument}} according to the content stream's data types. 
> Currently {{JsonLoader}} converts all non-{{String}}-typed field values to 
> {{String}}-s.
> There is a comment in {{JsonLoader.parseSingleFieldValue()}}, where the 
> convert-everything-to-string logic happens, that says "for legacy reasons, 
> single values s are expected to be strings", but other content streams' type 
> information is not flattened like this, e.g. {{JavabinLoader}}.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

SOLR entry point when deployed in web app container

2013-06-05 Thread Prathik Puthran

Hi,

I was trying to find the entry point in SOLR web app when it is deployed in
an application container. Can someone please help me with this?

Thanks,
Prathik

[jira] [Created] (SOLR-4901) Newcomer Curb Appeal - improve the Out Of Box experience for new users

2013-06-05 Thread Shawn Heisey (JIRA)

Shawn Heisey created SOLR-4901:
--

 Summary: Newcomer Curb Appeal - improve the Out Of Box experience 
for new users
 Key: SOLR-4901
 URL: https://issues.apache.org/jira/browse/SOLR-4901
 Project: Solr
  Issue Type: Improvement
  Components: documentation, web gui
Reporter: Shawn Heisey


This is a master issue to track improvements affecting a new user's experience 
with Solr.  Please link other issues, blocking this one.

Solr is immensely complex.  When I first started using it, the initial learning 
curve was incredibly steep.  It's still uphill even now, but I mostly know 
where the handrails are.

The general focus for linked issues:

1) Improving what the user sees when they first download Solr.

I think issues for this item will mostly be about the included txt files and 
the wiki pages referenced there.

We want to be sure that the user who downloads Solr and looks at README.txt is 
able to find information that will give them insight into how the Solr startup 
works and what information is configured where.  Any wiki pages referenced need 
to be top quality, with introductions that really help a novice user and 
advanced reference material for users with more experience.

The README should tell them how to get into the example's admin UI.  Until we 
improve the UI, IMHO it should set expectations about what kind of features 
they'll get out of the UI and let them know that they'll probably be accessing 
API URLs directly and editing config files.

Moving from the example in the download to a robust production installation, 
especially for SolrCloud, should be in our documentation.

2) Improving the UI so the novice doesn't have to edit so many config files or 
immediately learn how to use arcane HTTP API calls.  Experienced users look at 
these things and have no problem with them, but they are voodoo to the new user.

When using the UI to make changes (for example, CoreAdmin), the actual API URL 
that was called should be available, and if it fails, helpful text and a wiki 
link should be displayed so that the user can figure out what went wrong.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5035) FieldCacheImpl.SortedDocValuesImpl should compress addresses to term bytes more efficiently

2013-06-05 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13675979#comment-13675979
 ] 

Adrien Grand commented on LUCENE-5035:
--

+1, patch looks good!

> FieldCacheImpl.SortedDocValuesImpl should compress addresses to term bytes 
> more efficiently
> ---
>
> Key: LUCENE-5035
> URL: https://issues.apache.org/jira/browse/LUCENE-5035
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Reporter: Robert Muir
> Attachments: LUCENE-5035.patch
>
>
> Each ordinal in SortedDocValuesImpl has a corresponding address to find its 
> location in the big byte[] to support lookupOrd()
> Today this uses GrowableWriter with absolute addresses.
> But it would be much better to use MonotonicAppendingLongBuffer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4879) Indexing a field of type solr.SpatialRecursivePrefixTreeFieldType fails when at least two vertexes are more than 180 degrees apart

2013-06-05 Thread David Smiley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-4879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13675961#comment-13675961
 ] 

David Smiley commented on SOLR-4879:


No problem.  If not 4.4 then 4.5, I think.  Who knows when 4.4 will be ready so 
it's hard to say.  There is some WKT work going on in Spatial4j that I want to 
get done before cutting a new release there.

> Indexing a field of type solr.SpatialRecursivePrefixTreeFieldType fails when 
> at least two vertexes are more than 180 degrees apart
> --
>
> Key: SOLR-4879
> URL: https://issues.apache.org/jira/browse/SOLR-4879
> Project: Solr
>  Issue Type: Bug
> Environment: Linux, Solr 4.0.0, Solr 4.3.0
>Reporter: Øystein Torget
>Assignee: David Smiley
>
> When trying to index a field of the type 
> solr.SpatialRecursivePrefixTreeFieldType the indexing will fail if two 
> vertexes are more than 180 longitudal degress apart.
> For instance this polygon will fail: 
> POLYGON((-161 49,  0 49,   20 49,   20 89.1,  0 89.1,   -161 89.2,-161 
> 49))
> but this will not.
> POLYGON((-160 49,  0 49,   20 49,   20 89.1,  0 89.1,   -160 89.2,-160 
> 49))
> This contradicts the documentation found here: 
> http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4
> The documentation states that each vertex must be less than 180 longitudal 
> degrees apart from the previous vertex.
> Relevant parts from the schema.xml file:
> 
>  class="solr.SpatialRecursivePrefixTreeFieldType"
>
> spatialContextFactory="com.spatial4j.core.context.jts.JtsSpatialContextFactory"
>distErrPct="0.025"
>maxDistErr="0.09"
>units="degrees"
> />
>  stored="true" />

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-5035) FieldCacheImpl.SortedDocValuesImpl should compress addresses to term bytes more efficiently

2013-06-05 Thread Robert Muir (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-5035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-5035:


Attachment: LUCENE-5035.patch

patch

> FieldCacheImpl.SortedDocValuesImpl should compress addresses to term bytes 
> more efficiently
> ---
>
> Key: LUCENE-5035
> URL: https://issues.apache.org/jira/browse/LUCENE-5035
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Reporter: Robert Muir
> Attachments: LUCENE-5035.patch
>
>
> Each ordinal in SortedDocValuesImpl has a corresponding address to find its 
> location in the big byte[] to support lookupOrd()
> Today this uses GrowableWriter with absolute addresses.
> But it would be much better to use MonotonicAppendingLongBuffer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-5035) FieldCacheImpl.SortedDocValuesImpl should compress addresses to term bytes more efficiently

2013-06-05 Thread Robert Muir (JIRA)

Robert Muir created LUCENE-5035:
---

 Summary: FieldCacheImpl.SortedDocValuesImpl should compress 
addresses to term bytes more efficiently
 Key: LUCENE-5035
 URL: https://issues.apache.org/jira/browse/LUCENE-5035
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/search
Reporter: Robert Muir
 Attachments: LUCENE-5035.patch

Each ordinal in SortedDocValuesImpl has a corresponding address to find its 
location in the big byte[] to support lookupOrd()

Today this uses GrowableWriter with absolute addresses.

But it would be much better to use MonotonicAppendingLongBuffer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4055) Refactor SegmentInfo / FieldInfo to make them extensible

2013-06-05 Thread Grant Ingersoll (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13675907#comment-13675907
 ] 

Grant Ingersoll commented on LUCENE-4055:
-

Hmm, Mike, CODEC_FILE_PATTERN is package access only.  Easy enough to 
replicate/fix, any reason not too?

> Refactor SegmentInfo / FieldInfo to make them extensible
> 
>
> Key: LUCENE-4055
> URL: https://issues.apache.org/jira/browse/LUCENE-4055
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs
>Reporter: Andrzej Bialecki 
>Assignee: Robert Muir
> Fix For: 4.0-ALPHA
>
> Attachments: LUCENE-4055.patch
>
>
> After LUCENE-4050 is done the resulting SegmentInfo / FieldInfo classes 
> should be made abstract so that they can be extended by Codec-s.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-4.x-MacOSX (64bit/jdk1.6.0) - Build # 519 - Still Failing!

2013-06-05 Thread Policeman Jenkins Server

Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-MacOSX/519/
Java: 64bit/jdk1.6.0 -XX:-UseCompressedOops -XX:+UseConcMarkSweepGC

1 tests failed.
FAILED:  org.apache.lucene.replicator.http.HttpReplicatorTest.testBasic

Error Message:
Connection to http://localhost:51371 refused

Stack Trace:
org.apache.http.conn.HttpHostConnectException: Connection to 
http://localhost:51371 refused
at 
__randomizedtesting.SeedInfo.seed([B8423D144314D0:AB425F28CB9F92FE]:0)
at 
org.apache.http.impl.conn.DefaultClientConnectionOperator.openConnection(DefaultClientConnectionOperator.java:190)
at 
org.apache.http.impl.conn.ManagedClientConnectionImpl.open(ManagedClientConnectionImpl.java:294)
at 
org.apache.http.impl.client.DefaultRequestDirector.tryConnect(DefaultRequestDirector.java:645)
at 
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:480)
at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:906)
at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:805)
at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:784)
at 
org.apache.lucene.replicator.http.HttpClientBase.executeGET(HttpClientBase.java:178)
at 
org.apache.lucene.replicator.http.HttpReplicator.checkForUpdate(HttpReplicator.java:51)
at 
org.apache.lucene.replicator.ReplicationClient.doUpdate(ReplicationClient.java:196)
at 
org.apache.lucene.replicator.ReplicationClient.updateNow(ReplicationClient.java:402)
at 
org.apache.lucene.replicator.http.HttpReplicatorTest.testBasic(HttpReplicatorTest.java:112)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:782)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:442)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:682)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:693)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43)
at

[jira] [Updated] (LUCENE-5034) Make AppendingLongBuffer's page size configurable

2013-06-05 Thread Adrien Grand (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-5034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand updated LUCENE-5034:
-

Attachment: LUCENE-5034.patch

> Make AppendingLongBuffer's page size configurable
> -
>
> Key: LUCENE-5034
> URL: https://issues.apache.org/jira/browse/LUCENE-5034
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Assignee: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-5034.patch
>
>
> Depending on the data, it might be interesting to use smaller or larger page 
> sizes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-5034) Make AppendingLongBuffer's page size configurable

2013-06-05 Thread Adrien Grand (JIRA)

Adrien Grand created LUCENE-5034:


 Summary: Make AppendingLongBuffer's page size configurable
 Key: LUCENE-5034
 URL: https://issues.apache.org/jira/browse/LUCENE-5034
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Minor


Depending on the data, it might be interesting to use smaller or larger page 
sizes.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-5026) PagedGrowableWriter

2013-06-05 Thread Adrien Grand (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-5026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand resolved LUCENE-5026.
--

Resolution: Fixed

> PagedGrowableWriter
> ---
>
> Key: LUCENE-5026
> URL: https://issues.apache.org/jira/browse/LUCENE-5026
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Assignee: Adrien Grand
> Fix For: 5.0, 4.4
>
> Attachments: LUCENE-5026.patch, LUCENE-5026.patch
>
>
> We already have packed data structures that support more than 2B values such 
> as AppendingLongBuffer and MonotonicAppendingLongBuffer but none of them 
> supports random write-access.
> We could write a PagedGrowableWriter for this, which would essentially wrap 
> an array of GrowableWriters.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-4.x-Linux (64bit/ibm-j9-jdk7) - Build # 5928 - Failure!

2013-06-05 Thread Policeman Jenkins Server

Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-Linux/5928/
Java: 64bit/ibm-j9-jdk7 
-Xjit:exclude={org/apache/lucene/util/fst/FST.pack(IIF)Lorg/apache/lucene/util/fst/FST;}

1 tests failed.
REGRESSION:  org.apache.solr.core.TestJmxIntegration.testJmxRegistration

Error Message:
No SolrDynamicMBeans found

Stack Trace:
java.lang.AssertionError: No SolrDynamicMBeans found
at 
__randomizedtesting.SeedInfo.seed([B49FBFA20F813AEB:3A4EDB9862C0628E]:0)
at org.junit.Assert.fail(Assert.java:93)
at org.junit.Assert.assertTrue(Assert.java:43)
at 
org.apache.solr.core.TestJmxIntegration.testJmxRegistration(TestJmxIntegration.java:94)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:88)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:55)
at java.lang.reflect.Method.invoke(Method.java:613)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:782)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:442)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:682)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:693)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358)
at java.lang.Thread.run(Thread.java:780)




Build Log:
[...truncated 9478 lines...]
[junit4:junit4] Suite: org.apache.solr.core.TestJmxInte

[jira] [Commented] (SOLR-4805) Calling Collection RELOAD where collection has a single core, leaves collection offline and unusable till reboot

2013-06-05 Thread Alexey Kudinov (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-4805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13675724#comment-13675724
 ] 

Alexey Kudinov commented on SOLR-4805:
--

It seems that the issue happens because ZkController.preRegister sets state 
'Down' while in ZkController.register a piece fo code setting state to Active 
is skipped for reloaded core.
Only recovery should be skipped but not setting state to Active.

> Calling Collection RELOAD where collection has a single core, leaves 
> collection offline and unusable till reboot
> 
>
> Key: SOLR-4805
> URL: https://issues.apache.org/jira/browse/SOLR-4805
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 4.3
>Reporter: Jared Rodriguez
>Assignee: Mark Miller
> Fix For: 5.0, 4.4
>
>
> If you have a collection that is composed of a single core, then calling 
> reload on that collection leaves the core offline.  This happens even if 
> nothing at all has changed about the collection or its config.  This happens 
> whether you call reload via an http GET or if you directly call reload via 
> the collections api. 
> Tried a collection with a single core that contains data, change nothing 
> about the config in ZK and call reload and the collection.  The call 
> completes, but ZK flags that replica with "state":"down"
> Try it where a the single core contains no data and the same thing happens, 
> ZK config updates and broadcasts "state":"down" for the replica.
> I did not try this in a multicore or replicated core environment.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-4381) Query-time multi-word synonym expansion

2013-06-05 Thread Hemant Verma (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-4381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13675702#comment-13675702
 ] 

Hemant Verma edited comment on SOLR-4381 at 6/5/13 8:43 AM:


While using this patch I found one scenario in which it is not working properly.
I have in my synonyms list the below keywords:
   pepsi,pepsico,pbg
   outsourcing,rpo,offshoring

Difference in expanding synonyms comes up when I use any of the word with 
stopword as a prefix.

Search Keyword  Expanded Result

pepsi ---> pepsi, pepsico, pbg
pbg -> pepsi, pepsico, pbg
the pepsi -> pepsi, pepsico
the pbg ---> pepsi, pbg
outsourcing -> outsourc, offshor, rpo
the outsourcing > outsourc, offshor

The above expanded synonyms result shows that when we use any keyword 
(available in synonym list) prefixed with stopword then expanded synonyms do 
miss few synonym.

  was (Author: hemantverma09):
While using this patch I found one scenario in which it is not working 
properly.
I have in my synonyms list the below keywords:
   pepsi,pepsico,pbg
   outsourcing,rpo,offshoring

Difference in expanding synonyms comes up when I use any of the word with 
stopword as a prefix.

Search Keyword  Expanded Result

pepsi ---> pepsi, pepsico, pbg
pbg -> pepsi, pepsico, pbg
the pepsi -> pepsi, pepsico
the pbg > pepsi, pbg
outsourcing -> outsourc, offshor, rpo
the outsourcing -> outsourc, offshor

The above expanded synonyms result shows that when we use any keyword 
(available in synonym list) prefixed with stopword then expanded synonyms do 
miss few synonym.
  
> Query-time multi-word synonym expansion
> ---
>
> Key: SOLR-4381
> URL: https://issues.apache.org/jira/browse/SOLR-4381
> Project: Solr
>  Issue Type: Improvement
>  Components: query parsers
>Reporter: Nolan Lawson
>Priority: Minor
>  Labels: multi-word, queryparser, synonyms
> Fix For: 4.4
>
> Attachments: SOLR-4381-2.patch, SOLR-4381.patch
>
>
> This is an issue that seems to come up perennially.
> The [Solr 
> docs|http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory]
>  caution that index-time synonym expansion should be preferred to query-time 
> synonym expansion, due to the way multi-word synonyms are treated and how IDF 
> values can be boosted artificially. But query-time expansion should have huge 
> benefits, given that changes to the synonyms don't require re-indexing, the 
> index size stays the same, and the IDF values for the documents don't get 
> permanently altered.
> The proposed solution is to move the synonym expansion logic from the 
> analysis chain (either query- or index-type) and into a new QueryParser.  See 
> the attached patch for an implementation.
> The core Lucene functionality is untouched.  Instead, the EDismaxQParser is 
> extended, and synonym expansion is done on-the-fly.  Queries are parsed into 
> a lattice (i.e. all possible synonym combinations), while individual 
> components of the query are still handled by the EDismaxQParser itself.
> It's not an ideal solution by any stretch. But it's nice and self-contained, 
> so it invites experimentation and improvement.  And I think it fits in well 
> with the merry band of misfit query parsers, like {{func}} and {{frange}}.
> More details about this solution can be found in [this blog 
> post|http://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/] and 
> [the Github page for the 
> code|https://github.com/healthonnet/hon-lucene-synonyms].
> At the risk of tooting my own horn, I also think this patch sufficiently 
> fixes SOLR-3390 (highlighting problems with multi-word synonyms) and 
> LUCENE-4499 (better support for multi-word synonyms).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-4381) Query-time multi-word synonym expansion

2013-06-05 Thread Hemant Verma (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-4381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13675702#comment-13675702
 ] 

Hemant Verma edited comment on SOLR-4381 at 6/5/13 8:42 AM:


While using this patch I found one scenario in which it is not working properly.
I have in my synonyms list the below keywords:
   pepsi,pepsico,pbg
   outsourcing,rpo,offshoring

Difference in expanding synonyms comes up when I use any of the word with 
stopword as a prefix.

Search Keyword  Expanded Result

pepsi ---> pepsi, pepsico, pbg
pbg -> pepsi, pepsico, pbg
the pepsi -> pepsi, pepsico
the pbg > pepsi, pbg
outsourcing -> outsourc, offshor, rpo
the outsourcing -> outsourc, offshor

The above expanded synonyms result shows that when we use any keyword 
(available in synonym list) prefixed with stopword then expanded synonyms do 
miss few synonym.

  was (Author: hemantverma09):
While using this patch I found one scenario in which it is not working 
properly.
I have in my synonyms list the below keywords:
   pepsi,pepsico,pbg
   outsourcing,rpo,offshoring

Difference in expanding synonyms comes up when I use any of the word with 
stopword as a prefix.

Search Keyword   Expanded Result
--   ---
pepsipepsi, pepsico, pbg
pbg  pepsi, pepsico, pbg
the pepsipepsi, pepsico
the pbg  pepsi, pbg
outsourcing  outsourc, offshor, rpo
the outsourcing  outsourc, offshor

The above expanded synonyms result shows that when we use any keyword 
(available in synonym list) prefixed with stopword then expanded synonyms do 
miss few synonym.
  
> Query-time multi-word synonym expansion
> ---
>
> Key: SOLR-4381
> URL: https://issues.apache.org/jira/browse/SOLR-4381
> Project: Solr
>  Issue Type: Improvement
>  Components: query parsers
>Reporter: Nolan Lawson
>Priority: Minor
>  Labels: multi-word, queryparser, synonyms
> Fix For: 4.4
>
> Attachments: SOLR-4381-2.patch, SOLR-4381.patch
>
>
> This is an issue that seems to come up perennially.
> The [Solr 
> docs|http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory]
>  caution that index-time synonym expansion should be preferred to query-time 
> synonym expansion, due to the way multi-word synonyms are treated and how IDF 
> values can be boosted artificially. But query-time expansion should have huge 
> benefits, given that changes to the synonyms don't require re-indexing, the 
> index size stays the same, and the IDF values for the documents don't get 
> permanently altered.
> The proposed solution is to move the synonym expansion logic from the 
> analysis chain (either query- or index-type) and into a new QueryParser.  See 
> the attached patch for an implementation.
> The core Lucene functionality is untouched.  Instead, the EDismaxQParser is 
> extended, and synonym expansion is done on-the-fly.  Queries are parsed into 
> a lattice (i.e. all possible synonym combinations), while individual 
> components of the query are still handled by the EDismaxQParser itself.
> It's not an ideal solution by any stretch. But it's nice and self-contained, 
> so it invites experimentation and improvement.  And I think it fits in well 
> with the merry band of misfit query parsers, like {{func}} and {{frange}}.
> More details about this solution can be found in [this blog 
> post|http://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/] and 
> [the Github page for the 
> code|https://github.com/healthonnet/hon-lucene-synonyms].
> At the risk of tooting my own horn, I also think this patch sufficiently 
> fixes SOLR-3390 (highlighting problems with multi-word synonyms) and 
> LUCENE-4499 (better support for multi-word synonyms).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4879) Indexing a field of type solr.SpatialRecursivePrefixTreeFieldType fails when at least two vertexes are more than 180 degrees apart

2013-06-05 Thread JIRA


[ 
https://issues.apache.org/jira/browse/SOLR-4879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13675708#comment-13675708
 ] 

Øystein Torget commented on SOLR-4879:
--

I see that you fixed the bug in Spatial4j already so I tried adding the latest 
snapshot of Spatial4j to Solr and that fixed the problem. Thanks for your help!

Do you know when we can expect a new release of Solr with the next version of 
Spatial4j?


> Indexing a field of type solr.SpatialRecursivePrefixTreeFieldType fails when 
> at least two vertexes are more than 180 degrees apart
> --
>
> Key: SOLR-4879
> URL: https://issues.apache.org/jira/browse/SOLR-4879
> Project: Solr
>  Issue Type: Bug
> Environment: Linux, Solr 4.0.0, Solr 4.3.0
>Reporter: Øystein Torget
>Assignee: David Smiley
>
> When trying to index a field of the type 
> solr.SpatialRecursivePrefixTreeFieldType the indexing will fail if two 
> vertexes are more than 180 longitudal degress apart.
> For instance this polygon will fail: 
> POLYGON((-161 49,  0 49,   20 49,   20 89.1,  0 89.1,   -161 89.2,-161 
> 49))
> but this will not.
> POLYGON((-160 49,  0 49,   20 49,   20 89.1,  0 89.1,   -160 89.2,-160 
> 49))
> This contradicts the documentation found here: 
> http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4
> The documentation states that each vertex must be less than 180 longitudal 
> degrees apart from the previous vertex.
> Relevant parts from the schema.xml file:
> 
>  class="solr.SpatialRecursivePrefixTreeFieldType"
>
> spatialContextFactory="com.spatial4j.core.context.jts.JtsSpatialContextFactory"
>distErrPct="0.025"
>maxDistErr="0.09"
>units="degrees"
> />
>  stored="true" />

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4381) Query-time multi-word synonym expansion

2013-06-05 Thread Hemant Verma (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-4381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13675702#comment-13675702
 ] 

Hemant Verma commented on SOLR-4381:


While using this patch I found one scenario in which it is not working properly.
I have in my synonyms list the below keywords:
   pepsi,pepsico,pbg
   outsourcing,rpo,offshoring

Difference in expanding synonyms comes up when I use any of the word with 
stopword as a prefix.

Search Keyword   Expanded Result
--   ---
pepsipepsi, pepsico, pbg
pbg  pepsi, pepsico, pbg
the pepsipepsi, pepsico
the pbg  pepsi, pbg
outsourcing  outsourc, offshor, rpo
the outsourcing  outsourc, offshor

The above expanded synonyms result shows that when we use any keyword 
(available in synonym list) prefixed with stopword then expanded synonyms do 
miss few synonym.

> Query-time multi-word synonym expansion
> ---
>
> Key: SOLR-4381
> URL: https://issues.apache.org/jira/browse/SOLR-4381
> Project: Solr
>  Issue Type: Improvement
>  Components: query parsers
>Reporter: Nolan Lawson
>Priority: Minor
>  Labels: multi-word, queryparser, synonyms
> Fix For: 4.4
>
> Attachments: SOLR-4381-2.patch, SOLR-4381.patch
>
>
> This is an issue that seems to come up perennially.
> The [Solr 
> docs|http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory]
>  caution that index-time synonym expansion should be preferred to query-time 
> synonym expansion, due to the way multi-word synonyms are treated and how IDF 
> values can be boosted artificially. But query-time expansion should have huge 
> benefits, given that changes to the synonyms don't require re-indexing, the 
> index size stays the same, and the IDF values for the documents don't get 
> permanently altered.
> The proposed solution is to move the synonym expansion logic from the 
> analysis chain (either query- or index-type) and into a new QueryParser.  See 
> the attached patch for an implementation.
> The core Lucene functionality is untouched.  Instead, the EDismaxQParser is 
> extended, and synonym expansion is done on-the-fly.  Queries are parsed into 
> a lattice (i.e. all possible synonym combinations), while individual 
> components of the query are still handled by the EDismaxQParser itself.
> It's not an ideal solution by any stretch. But it's nice and self-contained, 
> so it invites experimentation and improvement.  And I think it fits in well 
> with the merry band of misfit query parsers, like {{func}} and {{frange}}.
> More details about this solution can be found in [this blog 
> post|http://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/] and 
> [the Github page for the 
> code|https://github.com/healthonnet/hon-lucene-synonyms].
> At the risk of tooting my own horn, I also think this patch sufficiently 
> fixes SOLR-3390 (highlighting problems with multi-word synonyms) and 
> LUCENE-4499 (better support for multi-word synonyms).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-4989) Hanging on DocumentsWriterStallControl.waitIfStalled forever

2013-06-05 Thread Simon Willnauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-4989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer resolved LUCENE-4989.
-

Resolution: Fixed

fixed via LUCENE-5002

> Hanging on DocumentsWriterStallControl.waitIfStalled forever
> 
>
> Key: LUCENE-4989
> URL: https://issues.apache.org/jira/browse/LUCENE-4989
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/index
>Affects Versions: 4.1
> Environment: Linux 2.6.32
>Reporter: Jessica Cheng
>Assignee: Simon Willnauer
>  Labels: hang
> Fix For: 5.0, 4.3.1
>
>
> In an environment where our underlying storage was timing out on various 
> operations, we find all of our indexing threads eventually stuck in the 
> following state (so far for 4 days):
> "Thread-0" daemon prio=5 Thread id=556  WAITING
>   at java.lang.Object.wait(Native Method)
>   at java.lang.Object.wait(Object.java:503)
>   at 
> org.apache.lucene.index.DocumentsWriterStallControl.waitIfStalled(DocumentsWriterStallControl.java:74)
>   at 
> org.apache.lucene.index.DocumentsWriterFlushControl.waitIfStalled(DocumentsWriterFlushControl.java:676)
>   at 
> org.apache.lucene.index.DocumentsWriter.preUpdate(DocumentsWriter.java:301)
>   at 
> org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:361)
>   at 
> org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1484)
>   at ...
> I have not yet enabled detail logging and tried to reproduce yet, but looking 
> at the code, I see that DWFC.abortPendingFlushes does
> try {
>   dwpt.abort();
>   doAfterFlush(dwpt);
> } catch (Throwable ex) {
>   // ignore - keep on aborting the flush queue
> }
> (and the same for the blocked ones). Since the throwable is ignored, I can't 
> say for sure, but I've seen DWPT.abort thrown in other cases, so if it does 
> throw, we'd fail to call doAfterFlush and properly decrement flushBytes. This 
> can be a problem, right? Is it possible to do this instead:
> try {
>   dwpt.abort();
> } catch (Throwable ex) {
>   // ignore - keep on aborting the flush queue
> } finally {
>   try {
> doAfterFlush(dwpt);
>   } catch (Throwable ex2) {
> // ignore - keep on aborting the flush queue
>   }
> }
> It's ugly but safer. Otherwise, maybe at least add logging for the throwable 
> just to make sure this is/isn't happening.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4989) Hanging on DocumentsWriterStallControl.waitIfStalled forever

2013-06-05 Thread Simon Willnauer (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13675669#comment-13675669
 ] 

Simon Willnauer commented on LUCENE-4989:
-

jessica, I agree this is not related to LUCENE-5002. I will go ahead and close 
it! thanks for reporting this!

> Hanging on DocumentsWriterStallControl.waitIfStalled forever
> 
>
> Key: LUCENE-4989
> URL: https://issues.apache.org/jira/browse/LUCENE-4989
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/index
>Affects Versions: 4.1
> Environment: Linux 2.6.32
>Reporter: Jessica Cheng
>  Labels: hang
> Fix For: 5.0, 4.3.1
>
>
> In an environment where our underlying storage was timing out on various 
> operations, we find all of our indexing threads eventually stuck in the 
> following state (so far for 4 days):
> "Thread-0" daemon prio=5 Thread id=556  WAITING
>   at java.lang.Object.wait(Native Method)
>   at java.lang.Object.wait(Object.java:503)
>   at 
> org.apache.lucene.index.DocumentsWriterStallControl.waitIfStalled(DocumentsWriterStallControl.java:74)
>   at 
> org.apache.lucene.index.DocumentsWriterFlushControl.waitIfStalled(DocumentsWriterFlushControl.java:676)
>   at 
> org.apache.lucene.index.DocumentsWriter.preUpdate(DocumentsWriter.java:301)
>   at 
> org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:361)
>   at 
> org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1484)
>   at ...
> I have not yet enabled detail logging and tried to reproduce yet, but looking 
> at the code, I see that DWFC.abortPendingFlushes does
> try {
>   dwpt.abort();
>   doAfterFlush(dwpt);
> } catch (Throwable ex) {
>   // ignore - keep on aborting the flush queue
> }
> (and the same for the blocked ones). Since the throwable is ignored, I can't 
> say for sure, but I've seen DWPT.abort thrown in other cases, so if it does 
> throw, we'd fail to call doAfterFlush and properly decrement flushBytes. This 
> can be a problem, right? Is it possible to do this instead:
> try {
>   dwpt.abort();
> } catch (Throwable ex) {
>   // ignore - keep on aborting the flush queue
> } finally {
>   try {
> doAfterFlush(dwpt);
>   } catch (Throwable ex2) {
> // ignore - keep on aborting the flush queue
>   }
> }
> It's ugly but safer. Otherwise, maybe at least add logging for the throwable 
> just to make sure this is/isn't happening.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Assigned] (LUCENE-4989) Hanging on DocumentsWriterStallControl.waitIfStalled forever

2013-06-05 Thread Simon Willnauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-4989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer reassigned LUCENE-4989:
---

Assignee: Simon Willnauer

> Hanging on DocumentsWriterStallControl.waitIfStalled forever
> 
>
> Key: LUCENE-4989
> URL: https://issues.apache.org/jira/browse/LUCENE-4989
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/index
>Affects Versions: 4.1
> Environment: Linux 2.6.32
>Reporter: Jessica Cheng
>Assignee: Simon Willnauer
>  Labels: hang
> Fix For: 5.0, 4.3.1
>
>
> In an environment where our underlying storage was timing out on various 
> operations, we find all of our indexing threads eventually stuck in the 
> following state (so far for 4 days):
> "Thread-0" daemon prio=5 Thread id=556  WAITING
>   at java.lang.Object.wait(Native Method)
>   at java.lang.Object.wait(Object.java:503)
>   at 
> org.apache.lucene.index.DocumentsWriterStallControl.waitIfStalled(DocumentsWriterStallControl.java:74)
>   at 
> org.apache.lucene.index.DocumentsWriterFlushControl.waitIfStalled(DocumentsWriterFlushControl.java:676)
>   at 
> org.apache.lucene.index.DocumentsWriter.preUpdate(DocumentsWriter.java:301)
>   at 
> org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:361)
>   at 
> org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1484)
>   at ...
> I have not yet enabled detail logging and tried to reproduce yet, but looking 
> at the code, I see that DWFC.abortPendingFlushes does
> try {
>   dwpt.abort();
>   doAfterFlush(dwpt);
> } catch (Throwable ex) {
>   // ignore - keep on aborting the flush queue
> }
> (and the same for the blocked ones). Since the throwable is ignored, I can't 
> say for sure, but I've seen DWPT.abort thrown in other cases, so if it does 
> throw, we'd fail to call doAfterFlush and properly decrement flushBytes. This 
> can be a problem, right? Is it possible to do this instead:
> try {
>   dwpt.abort();
> } catch (Throwable ex) {
>   // ignore - keep on aborting the flush queue
> } finally {
>   try {
> doAfterFlush(dwpt);
>   } catch (Throwable ex2) {
> // ignore - keep on aborting the flush queue
>   }
> }
> It's ugly but safer. Otherwise, maybe at least add logging for the throwable 
> just to make sure this is/isn't happening.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

47 matches

Mail list logo