[jira] [Commented] (LUCENE-4569) Allow customization of column stride field and norms via indexing chain

2012-11-28 Thread Chris Male (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505294#comment-13505294
 ] 

Chris Male commented on LUCENE-4569:


John,

I don't really know much about the API you're wanting to change, but to help me 
understand are you able to explain more what you're trying to do in your custom 
indexing format / code? 

I think one of the major motivation for Codecs is to allow this sort of 
customization through their API (there is already Codecs for holding this in 
memory).

 Allow customization of column stride field and norms via indexing chain
 ---

 Key: LUCENE-4569
 URL: https://issues.apache.org/jira/browse/LUCENE-4569
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index
Affects Versions: 4.0
Reporter: John Wang
 Attachments: patch.diff


 We are building an in-memory indexing format and managing our own segments. 
 We are doing this by implementing a custom IndexingChain. We would like to 
 support column-stride-fields and norms without having to wire in a codec 
 (since we are managing our postings differently)
 Suggested change is consistent with the api support for passing in a custom 
 InvertedDocConsumer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server

2012-11-28 Thread Per Steffensen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505295#comment-13505295
 ] 

Per Steffensen commented on SOLR-4114:
--

bq. As far as terminology, when I say replicationFactor of 3, I mean 3 copies 
of the data. I also count the leader as a replica of a shard (which is 
logical). It follows from the clusterstate.json, which lists all replicas for 
a shard and one of them just has a flag indicating it's the leader. This also 
makes it easier to talk about a shard having 0 replicas (meaning there is not 
even a leader).

Ok, its just than the replicationFactor you specify in your request is the 
other thing. You get replicationFactor + 1 shards per slice, if we define 
replicationFactor as the one you give in your request.

 Collection API: Allow multiple shards from one collection on the same Solr 
 server
 -

 Key: SOLR-4114
 URL: https://issues.apache.org/jira/browse/SOLR-4114
 Project: Solr
  Issue Type: New Feature
  Components: multicore, SolrCloud
Affects Versions: 4.0
 Environment: Solr 4.0.0 release
Reporter: Per Steffensen
Assignee: Per Steffensen
  Labels: collection-api, multicore, shard, shard-allocation
 Attachments: SOLR-4114.patch


 We should support running multiple shards from one collection on the same 
 Solr server - the run a collection with 8 shards on a 4 Solr server cluster 
 (each Solr server running 2 shards).
 Performance tests at our side has shown that this is a good idea, and it is 
 also a good idea for easy elasticity later on - it is much easier to move an 
 entire existing shards from one Solr server to another one that just joined 
 the cluter than it is to split an exsiting shard among the Solr that used to 
 run it and the new Solr.
 See dev mailing list discussion Multiple shards for one collection on the 
 same Solr server

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server

2012-11-28 Thread Per Steffensen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505296#comment-13505296
 ] 

Per Steffensen commented on SOLR-4114:
--

bq. Solr 3.X to Solr 4.X back compat is not considered the same as Solr 4.0 to 
Solr 4.1 back compat.

Of course, I agree! But anyway...

 Collection API: Allow multiple shards from one collection on the same Solr 
 server
 -

 Key: SOLR-4114
 URL: https://issues.apache.org/jira/browse/SOLR-4114
 Project: Solr
  Issue Type: New Feature
  Components: multicore, SolrCloud
Affects Versions: 4.0
 Environment: Solr 4.0.0 release
Reporter: Per Steffensen
Assignee: Per Steffensen
  Labels: collection-api, multicore, shard, shard-allocation
 Attachments: SOLR-4114.patch


 We should support running multiple shards from one collection on the same 
 Solr server - the run a collection with 8 shards on a 4 Solr server cluster 
 (each Solr server running 2 shards).
 Performance tests at our side has shown that this is a good idea, and it is 
 also a good idea for easy elasticity later on - it is much easier to move an 
 entire existing shards from one Solr server to another one that just joined 
 the cluter than it is to split an exsiting shard among the Solr that used to 
 run it and the new Solr.
 See dev mailing list discussion Multiple shards for one collection on the 
 same Solr server

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2141) NullPointerException when using escapeSql function

2012-11-28 Thread Dominik Siebel (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505337#comment-13505337
 ] 

Dominik Siebel commented on SOLR-2141:
--

Hi, James, sorry I already forgot about that. Thanks for the good work!

 NullPointerException when using escapeSql function
 --

 Key: SOLR-2141
 URL: https://issues.apache.org/jira/browse/SOLR-2141
 Project: Solr
  Issue Type: Bug
  Components: contrib - DataImportHandler
Affects Versions: 1.4.1, 4.0
 Environment: openjdk 1.6.0 b12
Reporter: Edward Rudd
Assignee: James Dyer
 Fix For: 4.1, 5.0

 Attachments: dih-config.xml, dih-file.xml, SOLR-2141.b341f5b.patch, 
 SOLR-2141.patch, SOLR-2141.patch, SOLR-2141.patch, SOLR-2141.patch, 
 SOLR-2141.patch, SOLR-2141.patch, SOLR-2141-sample.patch, SOLR-2141-test.patch


 I have two entities defined, nested in each other..
 entity name=article query=select category, subcategory from articles
entity name=other query=select other from othertable where 
 category='${dataimporter.functions.escapeSql(article.category)}'
   AND 
 subcategory='${dataimporter.functions.escapeSql(article.subcategory)}'  
/entity
 /entity
 Now, when I run that it bombs on any article where subcategory = '' (it's a 
 NOT NULL column so empty string is there)  If i do where subcategory!='' in 
 the article query it works fine (aside from not pulling in all of the 
 articles).
 org.apache.solr.handler.dataimport.DataImportHandlerException: 
 java.lang.NullPointerException
 at 
 org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:424)
 at 
 org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:383)
 at 
 org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:242)
 at 
 org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:180)
 at 
 org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:331)
 at 
 org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:389)
 at 
 org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:370)
 Caused by: java.lang.NullPointerException
 at 
 org.apache.solr.handler.dataimport.EvaluatorBag$1.evaluate(EvaluatorBag.java:75)
 at 
 org.apache.solr.handler.dataimport.EvaluatorBag$5.get(EvaluatorBag.java:216)
 at 
 org.apache.solr.handler.dataimport.EvaluatorBag$5.get(EvaluatorBag.java:204)
 at 
 org.apache.solr.handler.dataimport.VariableResolverImpl.resolve(VariableResolverImpl.java:107)
 at 
 org.apache.solr.handler.dataimport.TemplateString.fillTokens(TemplateString.java:81)
 at 
 org.apache.solr.handler.dataimport.TemplateString.replaceTokens(TemplateString.java:75)
 at 
 org.apache.solr.handler.dataimport.VariableResolverImpl.replaceTokens(VariableResolverImpl.java:87)
 at 
 org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:71)
 at 
 org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:237)
 at 
 org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:357)
 ... 6 more

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-4115) WordBreakSpellChecker throws ArrayIndexOutOfBoundsException for random query string

2012-11-28 Thread Andreas Hubold (JIRA)
Andreas Hubold created SOLR-4115:


 Summary: WordBreakSpellChecker throws 
ArrayIndexOutOfBoundsException for random query string
 Key: SOLR-4115
 URL: https://issues.apache.org/jira/browse/SOLR-4115
 Project: Solr
  Issue Type: Bug
  Components: spellchecker
Affects Versions: 4.0
 Environment: java version 1.6.0_37
Java(TM) SE Runtime Environment (build 1.6.0_37-b06)
Java HotSpot(TM) 64-Bit Server VM (build 20.12-b01, mixed mode)
Reporter: Andreas Hubold


The following SolrJ test code causes an ArrayIndexOutOfBoundsException in the 
WordBreakSpellChecker. I tested this with the Solr 4.0.0 example webapp started 
with {{java -jar start.jar}}.

{code:java}
  @Test
  public void testWordbreakSpellchecker() throws Exception {
SolrQuery q = new SolrQuery(\uD864\uDC79);
q.setRequestHandler(/browse);
q.setParam(spellcheck.dictionary, wordbreak);
HttpSolrServer server = new HttpSolrServer(http://localhost:8983/solr;);
server.query(q, SolrRequest.METHOD.POST);
  }
{code}

{noformat}
INFO: [collection1] webapp=/solr path=/browse 
params={spellcheck.dictionary=wordbreakqt=/browsewt=javabinq=?version=2} 
hits=0 status=500 QTime=11 
Nov 28, 2012 11:23:01 AM org.apache.solr.common.SolrException log
SEVERE: null:java.lang.ArrayIndexOutOfBoundsException: 1
at org.apache.lucene.util.UnicodeUtil.UTF8toUTF16(UnicodeUtil.java:599)
at org.apache.lucene.util.BytesRef.utf8ToString(BytesRef.java:165)
at org.apache.lucene.index.Term.text(Term.java:72)
at 
org.apache.lucene.search.spell.WordBreakSpellChecker.generateSuggestWord(WordBreakSpellChecker.java:350)
at 
org.apache.lucene.search.spell.WordBreakSpellChecker.generateBreakUpSuggestions(WordBreakSpellChecker.java:283)
at 
org.apache.lucene.search.spell.WordBreakSpellChecker.suggestWordBreaks(WordBreakSpellChecker.java:122)
at 
org.apache.solr.spelling.WordBreakSolrSpellChecker.getSuggestions(WordBreakSolrSpellChecker.java:229)
at 
org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:172)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:206)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1699)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:455)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:276)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:484)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119)
at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:233)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1065)
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:413)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:192)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:999)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:250)
at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:149)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:111)
at org.eclipse.jetty.server.Server.handle(Server.java:351)
at 
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:454)
at 
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:47)
at 
org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:900)
at 
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:954)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:857)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
at 
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:66)
at 
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:254)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:599)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:534)
at java.lang.Thread.run(Thread.java:662)
{noformat}

The query string 

[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server

2012-11-28 Thread Per Steffensen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505368#comment-13505368
 ] 

Per Steffensen commented on SOLR-4114:
--

bq. As far as terminology, when I say replicationFactor of 3, I mean 3 copies 
of the data. I also count the leader as a replica of a shard (which is 
logical). It follows from the clusterstate.json, which lists all replicas for 
a shard and one of them just has a flag indicating it's the leader. This also 
makes it easier to talk about a shard having 0 replicas (meaning there is not 
even a leader).

I understand that you can view all shards under a slice as a replica, but in 
my mind replica is also a role that a shard plays at runtime - all shards 
except one under a slice plays the replica role at runtime, the remaining 
shard play the leader role. To not create to much confusion I suggest you use 
the term shards for all the instances under a slice, and that you use the term 
replica only for a role that a shard plays at runtime.
But that of course would require changes e.g. to Slice-class where e.g. 
getReplicas, getReplicasCopy and getReplicasMap needs to me renamed to 
getShardsXXX. It probably shouldnt be done now, but as a part of a cross-code 
cleaning up in term-usage.

Suggested terms:
 * collection: A big logical bucket to fill data into
 * slice: A logical part of a collection. A part of the data going into a 
collection goes into a particular slice. Slices for a particular collection are 
non-overlapping
 * shard: A physical instance of a slice. Running without replica there is one 
shard per slice. Running with replication-factor X there are X+1 shards per 
slice.
 * replica and leader: Roles played by shards at runtime. As soon as the system 
is not running there are no replica/leader - there are just shards
 * node-base-url: The prefix/base (up to and including the webapp-context) of 
the URL for a specific Solr server
 * node-name: A logical name for the Solr server - the same as node-base-url 
except /'s are replaced by _'s and the protocol part (http(s)://) is removed


 Collection API: Allow multiple shards from one collection on the same Solr 
 server
 -

 Key: SOLR-4114
 URL: https://issues.apache.org/jira/browse/SOLR-4114
 Project: Solr
  Issue Type: New Feature
  Components: multicore, SolrCloud
Affects Versions: 4.0
 Environment: Solr 4.0.0 release
Reporter: Per Steffensen
Assignee: Per Steffensen
  Labels: collection-api, multicore, shard, shard-allocation
 Attachments: SOLR-4114.patch


 We should support running multiple shards from one collection on the same 
 Solr server - the run a collection with 8 shards on a 4 Solr server cluster 
 (each Solr server running 2 shards).
 Performance tests at our side has shown that this is a good idea, and it is 
 also a good idea for easy elasticity later on - it is much easier to move an 
 entire existing shards from one Solr server to another one that just joined 
 the cluter than it is to split an exsiting shard among the Solr that used to 
 run it and the new Solr.
 See dev mailing list discussion Multiple shards for one collection on the 
 same Solr server

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3377) eDismax: A fielded query wrapped by parens is not recognized

2012-11-28 Thread Leonhard Maylein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505370#comment-13505370
 ] 

Leonhard Maylein commented on SOLR-3377:


I do not agree that this issue is solved.

I've tried the following combination with SOLR 4.0.0

q: +sw(a b) +ti:(c d)
qf: freitext exttext^0.5
pf: freitext^6 exttext^3

The result is:

str name=rawquerystring+sw:(a b) +ti:(c d)/str

str name=querystring+sw:(a b) +ti:(c d)/str

str name=parsedquery(+(+(sw:a sw:b) +(ti:c ti:d)) 
DisjunctionMaxQuery((freitext:b d^6.0)) DisjunctionMaxQuery((exttext:b 
d^3.0)))/no_coord/str

There should be no splitting on the qf/pf fields and therefore no 
DisjunctionMaxQueries.

The query '+(sw:a sw:b) +(ti:c ti:d)' works as expected.

 eDismax: A fielded query wrapped by parens is not recognized
 

 Key: SOLR-3377
 URL: https://issues.apache.org/jira/browse/SOLR-3377
 Project: Solr
  Issue Type: Bug
  Components: query parsers
Affects Versions: 3.6
Reporter: Jan Høydahl
Assignee: Yonik Seeley
Priority: Critical
 Fix For: 4.0-BETA

 Attachments: SOLR-3377.patch, SOLR-3377.patch, SOLR-3377.patch, 
 SOLR-3377.patch


 As reported by bernd on the user list, a query like this
 {{q=(name:test)}}
 will yield 0 hits in 3.6 while it worked in 3.5. It works without the parens.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server

2012-11-28 Thread Per Steffensen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505368#comment-13505368
 ] 

Per Steffensen edited comment on SOLR-4114 at 11/28/12 11:20 AM:
-

bq. As far as terminology, when I say replicationFactor of 3, I mean 3 copies 
of the data. I also count the leader as a replica of a shard (which is 
logical). It follows from the clusterstate.json, which lists all replicas for 
a shard and one of them just has a flag indicating it's the leader. This also 
makes it easier to talk about a shard having 0 replicas (meaning there is not 
even a leader).

I understand that you can view all shards under a slice as a replica, but in 
my mind replica is also a role that a shard plays at runtime - all shards 
except one under a slice play the replica role at runtime, the remaining 
shard plays the leader role at runtime. To not create to much confusion, I 
suggest you use the term shards for all the instances under a slice, and that 
you use the terms replica and leader only for a role that a shard plays at 
runtime.
But that of course would require changes e.g. to Slice-class where e.g. 
getReplicas, getReplicasCopy and getReplicasMap needs to me renamed to 
getShardsXXX. It probably shouldnt be done now, but as a part of a cross-code 
cleaning up in term-usage. Today there is a heavy mixup of term-usage in the 
code - replica and shard are sometimes used for a node, replica and shard are 
used for the same thing, etc.

Suggested terms:
 * collection: A big logical bucket to fill data into
 * slice: A logical part of a collection. A part of the data going into a 
collection goes into a particular slice. Slices for a particular collection are 
non-overlapping
 * shard: A physical instance of a slice. Running without replica there is one 
shard per slice. Running with replication-factor X there are X+1 shards per 
slice.
 * replica and leader: Roles played by shards at runtime. As soon as the system 
is not running there are no replica/leader - there are just shards
 * node-base-url: The prefix/base (up to and including the webapp-context) of 
the URL for a specific Solr server
 * node-name: A logical name for the Solr server - the same as node-base-url 
except /'s are replaced by _'s and the protocol part (http(s)://) is removed


  was (Author: steff1193):
bq. As far as terminology, when I say replicationFactor of 3, I mean 3 
copies of the data. I also count the leader as a replica of a shard (which is 
logical). It follows from the clusterstate.json, which lists all replicas for 
a shard and one of them just has a flag indicating it's the leader. This also 
makes it easier to talk about a shard having 0 replicas (meaning there is not 
even a leader).

I understand that you can view all shards under a slice as a replica, but in 
my mind replica is also a role that a shard plays at runtime - all shards 
except one under a slice plays the replica role at runtime, the remaining 
shard play the leader role. To not create to much confusion I suggest you use 
the term shards for all the instances under a slice, and that you use the term 
replica only for a role that a shard plays at runtime.
But that of course would require changes e.g. to Slice-class where e.g. 
getReplicas, getReplicasCopy and getReplicasMap needs to me renamed to 
getShardsXXX. It probably shouldnt be done now, but as a part of a cross-code 
cleaning up in term-usage.

Suggested terms:
 * collection: A big logical bucket to fill data into
 * slice: A logical part of a collection. A part of the data going into a 
collection goes into a particular slice. Slices for a particular collection are 
non-overlapping
 * shard: A physical instance of a slice. Running without replica there is one 
shard per slice. Running with replication-factor X there are X+1 shards per 
slice.
 * replica and leader: Roles played by shards at runtime. As soon as the system 
is not running there are no replica/leader - there are just shards
 * node-base-url: The prefix/base (up to and including the webapp-context) of 
the URL for a specific Solr server
 * node-name: A logical name for the Solr server - the same as node-base-url 
except /'s are replaced by _'s and the protocol part (http(s)://) is removed

  
 Collection API: Allow multiple shards from one collection on the same Solr 
 server
 -

 Key: SOLR-4114
 URL: https://issues.apache.org/jira/browse/SOLR-4114
 Project: Solr
  Issue Type: New Feature
  Components: multicore, SolrCloud
Affects Versions: 4.0
 Environment: Solr 4.0.0 release
Reporter: Per Steffensen
Assignee: Per Steffensen
  Labels: collection-api, multicore, shard, shard-allocation
 

[jira] [Commented] (SOLR-2368) Improve extended dismax (edismax) parser

2012-11-28 Thread Leonhard Maylein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505374#comment-13505374
 ] 

Leonhard Maylein commented on SOLR-2368:


Please consider to also incorporate SOLR-3377 which is marked as fixed but it 
is not completely solved (see my comment on SOLR-3377).

 Improve extended dismax (edismax) parser
 

 Key: SOLR-2368
 URL: https://issues.apache.org/jira/browse/SOLR-2368
 Project: Solr
  Issue Type: Improvement
  Components: query parsers
Reporter: Yonik Seeley
  Labels: QueryParser

 This is a mother issue to track further improvements for eDismax parser.
 The goal is to be able to deprecate and remove the old dismax once edismax 
 satisfies all usecases of dismax.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4032) Files larger than an internal buffer size fail to replicate

2012-11-28 Thread Markus Jelsma (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505378#comment-13505378
 ] 

Markus Jelsma commented on SOLR-4032:
-

Great work, it seems this issue is resolved indeed as i cannot reproduce this 
exact problem. But another EOF exception pops up, i'll open a new issue.

 Files larger than an internal buffer size fail to replicate
 ---

 Key: SOLR-4032
 URL: https://issues.apache.org/jira/browse/SOLR-4032
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 5.0
 Environment: 5.0-SNAPSHOT 1366361:1404534M - markus - 2012-11-01 
 12:37:38
 Debian Squeeze, Tomcat 6, Sun Java 6, 10 nodes, 10 shards, rep. factor 2.
Reporter: Markus Jelsma
Assignee: Mark Miller
Priority: Blocker
 Fix For: 5.0

 Attachments: SOLR-4032.patch


 Please see: 
 http://lucene.472066.n3.nabble.com/trunk-is-unable-to-replicate-between-nodes-Unable-to-download-completely-td4017049.html
  and 
 http://lucene.472066.n3.nabble.com/Possible-memory-leak-in-recovery-td4017833.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-4116) Log Replay [recoveryExecutor-8-thread-1] - : java.io.EOFException

2012-11-28 Thread Markus Jelsma (JIRA)
Markus Jelsma created SOLR-4116:
---

 Summary: Log Replay [recoveryExecutor-8-thread-1] - : 
java.io.EOFException
 Key: SOLR-4116
 URL: https://issues.apache.org/jira/browse/SOLR-4116
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 5.0
 Environment: 5.0.0.2012.11.28.10.42.06
Debian Squeeze, Tomcat 6, Sun Java 6, 10 nodes, 10 shards, rep. factor 2.
Reporter: Markus Jelsma
 Fix For: 5.0


With SOLR-4032 fixed we see other issues when randomly taking down nodes 
(nicely via tomcat restart) while indexing a few million web pages from Hadoop. 
We do make sure that at least one node is up for a shard but due to recovery 
issues it may not be live.

{code}
2012-11-28 11:32:33,086 WARN [solr.update.UpdateLog] - 
[recoveryExecutor-8-thread-1] - : Starting log replay 
tlog{file=/opt/solr/cores/openindex_e/data/tlog/tlog.028 
refcount=2} active=false starting pos=0
2012-11-28 11:32:41,873 ERROR [solr.update.UpdateLog] - 
[recoveryExecutor-8-thread-1] - : java.io.EOFException
at 
org.apache.solr.common.util.FastInputStream.readFully(FastInputStream.java:151)
at 
org.apache.solr.common.util.JavaBinCodec.readStr(JavaBinCodec.java:479)
at 
org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:176)
at 
org.apache.solr.common.util.JavaBinCodec.readSolrInputDocument(JavaBinCodec.java:374)
at 
org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:225)
at 
org.apache.solr.common.util.JavaBinCodec.readArray(JavaBinCodec.java:451)
at 
org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:182)
at 
org.apache.solr.update.TransactionLog$LogReader.next(TransactionLog.java:618)
at 
org.apache.solr.update.UpdateLog$LogReplayer.doReplay(UpdateLog.java:1198)
at org.apache.solr.update.UpdateLog$LogReplayer.run(UpdateLog.java:1143)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
{code}



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-4117) IO error while trying to get the size of the Directory

2012-11-28 Thread Markus Jelsma (JIRA)
Markus Jelsma created SOLR-4117:
---

 Summary: IO error while trying to get the size of the Directory
 Key: SOLR-4117
 URL: https://issues.apache.org/jira/browse/SOLR-4117
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 5.0
 Environment: 5.0.0.2012.11.28.10.42.06
Debian Squeeze, Tomcat 6, Sun Java 6, 10 nodes, 10 shards, rep. factor 2.
Reporter: Markus Jelsma
 Fix For: 5.0


With SOLR-4032 fixed we see other issues when randomly taking down nodes 
(nicely via tomcat restart) while indexing a few million web pages from Hadoop. 
We do make sure that at least one node is up for a shard but due to recovery 
issues it may not be live.

One node seems to work but generates IO errors in the log and ZookeeperExeption 
in the GUI. In the GUI we only see:
{code}

SolrCore Initialization Failures

openindex_f: 
org.apache.solr.common.cloud.ZooKeeperException:org.apache.solr.common.cloud.ZooKeeperException:
 

Please check your logs for more information
{code}

and in the log we only see the following exception:

{code}
2012-11-28 11:47:26,652 ERROR [solr.handler.ReplicationHandler] - 
[http-8080-exec-28] - : IO error while trying to get the size of the 
Directory:org.apache.lucene.store.NoSuchDirectoryException: directory 
'/opt/solr/cores/shard_f/data/index' does not exist
at org.apache.lucene.store.FSDirectory.listAll(FSDirectory.java:217)
at org.apache.lucene.store.FSDirectory.listAll(FSDirectory.java:240)
at 
org.apache.lucene.store.NRTCachingDirectory.listAll(NRTCachingDirectory.java:132)
at 
org.apache.solr.core.DirectoryFactory.sizeOfDirectory(DirectoryFactory.java:146)
at 
org.apache.solr.handler.ReplicationHandler.getIndexSize(ReplicationHandler.java:472)
at 
org.apache.solr.handler.ReplicationHandler.getReplicationDetails(ReplicationHandler.java:568)
at 
org.apache.solr.handler.ReplicationHandler.handleRequestBody(ReplicationHandler.java:213)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:144)
at 
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:240)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1830)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:476)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:276)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
at 
org.apache.coyote.http11.Http11NioProcessor.process(Http11NioProcessor.java:889)
at 
org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.process(Http11NioProtocol.java:744)
at 
org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:2274)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
{code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-4032) Files larger than an internal buffer size fail to replicate

2012-11-28 Thread Markus Jelsma (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505378#comment-13505378
 ] 

Markus Jelsma edited comment on SOLR-4032 at 11/28/12 11:51 AM:


Great work, it seems this issue is resolved indeed as i cannot reproduce this 
exact problem. But another EOF exception pops up, i'll open a new issue.
edit: another issue popped up as well, added SOLR-4116 and SOLR-4117

  was (Author: markus17):
Great work, it seems this issue is resolved indeed as i cannot reproduce 
this exact problem. But another EOF exception pops up, i'll open a new issue.
  
 Files larger than an internal buffer size fail to replicate
 ---

 Key: SOLR-4032
 URL: https://issues.apache.org/jira/browse/SOLR-4032
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 5.0
 Environment: 5.0-SNAPSHOT 1366361:1404534M - markus - 2012-11-01 
 12:37:38
 Debian Squeeze, Tomcat 6, Sun Java 6, 10 nodes, 10 shards, rep. factor 2.
Reporter: Markus Jelsma
Assignee: Mark Miller
Priority: Blocker
 Fix For: 5.0

 Attachments: SOLR-4032.patch


 Please see: 
 http://lucene.472066.n3.nabble.com/trunk-is-unable-to-replicate-between-nodes-Unable-to-download-completely-td4017049.html
  and 
 http://lucene.472066.n3.nabble.com/Possible-memory-leak-in-recovery-td4017833.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-4117) IO error while trying to get the size of the Directory

2012-11-28 Thread Markus Jelsma (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Markus Jelsma updated SOLR-4117:


Priority: Minor  (was: Major)

This issue is the same as reported in SOLR-4032. It does not resolve itself, as 
it did before in SOLR-4032, when reloading a core or restarting the servlet 
container. The Zookeeper exception in the GUI is gone after restart so it's 
likely not related.

 IO error while trying to get the size of the Directory
 --

 Key: SOLR-4117
 URL: https://issues.apache.org/jira/browse/SOLR-4117
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 5.0
 Environment: 5.0.0.2012.11.28.10.42.06
 Debian Squeeze, Tomcat 6, Sun Java 6, 10 nodes, 10 shards, rep. factor 2.
Reporter: Markus Jelsma
Priority: Minor
 Fix For: 5.0


 With SOLR-4032 fixed we see other issues when randomly taking down nodes 
 (nicely via tomcat restart) while indexing a few million web pages from 
 Hadoop. We do make sure that at least one node is up for a shard but due to 
 recovery issues it may not be live.
 One node seems to work but generates IO errors in the log and 
 ZookeeperExeption in the GUI. In the GUI we only see:
 {code}
 SolrCore Initialization Failures
 openindex_f: 
 org.apache.solr.common.cloud.ZooKeeperException:org.apache.solr.common.cloud.ZooKeeperException:
  
 Please check your logs for more information
 {code}
 and in the log we only see the following exception:
 {code}
 2012-11-28 11:47:26,652 ERROR [solr.handler.ReplicationHandler] - 
 [http-8080-exec-28] - : IO error while trying to get the size of the 
 Directory:org.apache.lucene.store.NoSuchDirectoryException: directory 
 '/opt/solr/cores/shard_f/data/index' does not exist
 at org.apache.lucene.store.FSDirectory.listAll(FSDirectory.java:217)
 at org.apache.lucene.store.FSDirectory.listAll(FSDirectory.java:240)
 at 
 org.apache.lucene.store.NRTCachingDirectory.listAll(NRTCachingDirectory.java:132)
 at 
 org.apache.solr.core.DirectoryFactory.sizeOfDirectory(DirectoryFactory.java:146)
 at 
 org.apache.solr.handler.ReplicationHandler.getIndexSize(ReplicationHandler.java:472)
 at 
 org.apache.solr.handler.ReplicationHandler.getReplicationDetails(ReplicationHandler.java:568)
 at 
 org.apache.solr.handler.ReplicationHandler.handleRequestBody(ReplicationHandler.java:213)
 at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:144)
 at 
 org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:240)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1830)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:476)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:276)
 at 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
 at 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
 at 
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
 at 
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
 at 
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
 at 
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
 at 
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
 at 
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
 at 
 org.apache.coyote.http11.Http11NioProcessor.process(Http11NioProcessor.java:889)
 at 
 org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.process(Http11NioProtocol.java:744)
 at 
 org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:2274)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-4117) IO error while trying to get the size of the Directory

2012-11-28 Thread Markus Jelsma (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505391#comment-13505391
 ] 

Markus Jelsma edited comment on SOLR-4117 at 11/28/12 12:17 PM:


This issue is the same as reported in SOLR-4032. It does not resolve itself, as 
it did before in SOLR-4032, when reloading a core or restarting the servlet 
container. The Zookeeper exception in the GUI is gone after restart so it's 
likely not related.

edit: the index.properties file in both cores point to the correct 
index.LARGE_NUMBER directory but NRTDir tries ./data/index regardless.

  was (Author: markus17):
This issue is the same as reported in SOLR-4032. It does not resolve 
itself, as it did before in SOLR-4032, when reloading a core or restarting the 
servlet container. The Zookeeper exception in the GUI is gone after restart so 
it's likely not related.
  
 IO error while trying to get the size of the Directory
 --

 Key: SOLR-4117
 URL: https://issues.apache.org/jira/browse/SOLR-4117
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 5.0
 Environment: 5.0.0.2012.11.28.10.42.06
 Debian Squeeze, Tomcat 6, Sun Java 6, 10 nodes, 10 shards, rep. factor 2.
Reporter: Markus Jelsma
Priority: Minor
 Fix For: 5.0


 With SOLR-4032 fixed we see other issues when randomly taking down nodes 
 (nicely via tomcat restart) while indexing a few million web pages from 
 Hadoop. We do make sure that at least one node is up for a shard but due to 
 recovery issues it may not be live.
 One node seems to work but generates IO errors in the log and 
 ZookeeperExeption in the GUI. In the GUI we only see:
 {code}
 SolrCore Initialization Failures
 openindex_f: 
 org.apache.solr.common.cloud.ZooKeeperException:org.apache.solr.common.cloud.ZooKeeperException:
  
 Please check your logs for more information
 {code}
 and in the log we only see the following exception:
 {code}
 2012-11-28 11:47:26,652 ERROR [solr.handler.ReplicationHandler] - 
 [http-8080-exec-28] - : IO error while trying to get the size of the 
 Directory:org.apache.lucene.store.NoSuchDirectoryException: directory 
 '/opt/solr/cores/shard_f/data/index' does not exist
 at org.apache.lucene.store.FSDirectory.listAll(FSDirectory.java:217)
 at org.apache.lucene.store.FSDirectory.listAll(FSDirectory.java:240)
 at 
 org.apache.lucene.store.NRTCachingDirectory.listAll(NRTCachingDirectory.java:132)
 at 
 org.apache.solr.core.DirectoryFactory.sizeOfDirectory(DirectoryFactory.java:146)
 at 
 org.apache.solr.handler.ReplicationHandler.getIndexSize(ReplicationHandler.java:472)
 at 
 org.apache.solr.handler.ReplicationHandler.getReplicationDetails(ReplicationHandler.java:568)
 at 
 org.apache.solr.handler.ReplicationHandler.handleRequestBody(ReplicationHandler.java:213)
 at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:144)
 at 
 org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:240)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1830)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:476)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:276)
 at 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
 at 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
 at 
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
 at 
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
 at 
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
 at 
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
 at 
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
 at 
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
 at 
 org.apache.coyote.http11.Http11NioProcessor.process(Http11NioProcessor.java:889)
 at 
 org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.process(Http11NioProtocol.java:744)
 at 
 org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:2274)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)
 {code}

--
This message is automatically 

[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server

2012-11-28 Thread Per Steffensen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505397#comment-13505397
 ] 

Per Steffensen commented on SOLR-4114:
--

Patch including the maxShardsPerNode feature comming up. And (much) better 
testing of the create operation of the Collections API.

 Collection API: Allow multiple shards from one collection on the same Solr 
 server
 -

 Key: SOLR-4114
 URL: https://issues.apache.org/jira/browse/SOLR-4114
 Project: Solr
  Issue Type: New Feature
  Components: multicore, SolrCloud
Affects Versions: 4.0
 Environment: Solr 4.0.0 release
Reporter: Per Steffensen
Assignee: Per Steffensen
  Labels: collection-api, multicore, shard, shard-allocation
 Attachments: SOLR-4114.patch


 We should support running multiple shards from one collection on the same 
 Solr server - the run a collection with 8 shards on a 4 Solr server cluster 
 (each Solr server running 2 shards).
 Performance tests at our side has shown that this is a good idea, and it is 
 also a good idea for easy elasticity later on - it is much easier to move an 
 entire existing shards from one Solr server to another one that just joined 
 the cluter than it is to split an exsiting shard among the Solr that used to 
 run it and the new Solr.
 See dev mailing list discussion Multiple shards for one collection on the 
 same Solr server

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4028) When using ZK chroot, it would be nice if Solr would create the initial path when it doesn't exist.

2012-11-28 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-4028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505415#comment-13505415
 ] 

Tomás Fernández Löbbe commented on SOLR-4028:
-

I think I see the issue here, the problem would be if someone mistype the 
initial path, instead of throwing exceptions and stopping, we would be creating 
a new path and probably hiding an error. 
However, we do create paths for overseer and upload configs automatically, I 
think creating the initial path is more consistent with the current behavior 
than stopping startup. Other options I thought are:
•   Only create the initial path when bootstrap_conf is true (or 
bootstrap_confdir). This could still have the same issue described above.
•   Add a new parameter to force creation, something like 
–DzkHost.create=true. This could add unnecessary parameters and configuration 
complexity.


 When using ZK chroot, it would be nice if Solr would create the initial path 
 when it doesn't exist.
 ---

 Key: SOLR-4028
 URL: https://issues.apache.org/jira/browse/SOLR-4028
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud
Reporter: Tomás Fernández Löbbe
Priority: Minor
 Attachments: SOLR-4028.patch


 I think this would make it easier to test and develop with SolrCloud, in 
 order to start with a fresh ZK directory now the approach is to delete ZK 
 data, with this improvement one could just add a chroot to the zkHost like:
 java -DzkHost=localhost:2181/testXYZ -jar start.jar
 Right now this is possible but you have to manually create the initial path. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (SOLR-4117) IO error while trying to get the size of the Directory

2012-11-28 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller reassigned SOLR-4117:
-

Assignee: Mark Miller

 IO error while trying to get the size of the Directory
 --

 Key: SOLR-4117
 URL: https://issues.apache.org/jira/browse/SOLR-4117
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 5.0
 Environment: 5.0.0.2012.11.28.10.42.06
 Debian Squeeze, Tomcat 6, Sun Java 6, 10 nodes, 10 shards, rep. factor 2.
Reporter: Markus Jelsma
Assignee: Mark Miller
Priority: Minor
 Fix For: 5.0


 With SOLR-4032 fixed we see other issues when randomly taking down nodes 
 (nicely via tomcat restart) while indexing a few million web pages from 
 Hadoop. We do make sure that at least one node is up for a shard but due to 
 recovery issues it may not be live.
 One node seems to work but generates IO errors in the log and 
 ZookeeperExeption in the GUI. In the GUI we only see:
 {code}
 SolrCore Initialization Failures
 openindex_f: 
 org.apache.solr.common.cloud.ZooKeeperException:org.apache.solr.common.cloud.ZooKeeperException:
  
 Please check your logs for more information
 {code}
 and in the log we only see the following exception:
 {code}
 2012-11-28 11:47:26,652 ERROR [solr.handler.ReplicationHandler] - 
 [http-8080-exec-28] - : IO error while trying to get the size of the 
 Directory:org.apache.lucene.store.NoSuchDirectoryException: directory 
 '/opt/solr/cores/shard_f/data/index' does not exist
 at org.apache.lucene.store.FSDirectory.listAll(FSDirectory.java:217)
 at org.apache.lucene.store.FSDirectory.listAll(FSDirectory.java:240)
 at 
 org.apache.lucene.store.NRTCachingDirectory.listAll(NRTCachingDirectory.java:132)
 at 
 org.apache.solr.core.DirectoryFactory.sizeOfDirectory(DirectoryFactory.java:146)
 at 
 org.apache.solr.handler.ReplicationHandler.getIndexSize(ReplicationHandler.java:472)
 at 
 org.apache.solr.handler.ReplicationHandler.getReplicationDetails(ReplicationHandler.java:568)
 at 
 org.apache.solr.handler.ReplicationHandler.handleRequestBody(ReplicationHandler.java:213)
 at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:144)
 at 
 org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:240)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1830)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:476)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:276)
 at 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
 at 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
 at 
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
 at 
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
 at 
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
 at 
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
 at 
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
 at 
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
 at 
 org.apache.coyote.http11.Http11NioProcessor.process(Http11NioProcessor.java:889)
 at 
 org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.process(Http11NioProtocol.java:744)
 at 
 org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:2274)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3377) eDismax: A fielded query wrapped by parens is not recognized

2012-11-28 Thread Jack Krupansky (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505453#comment-13505453
 ] 

Jack Krupansky commented on SOLR-3377:
--

Leonhard, your use case seems rather different from that of this Jira.

I presume that you are referring to the generated phrase query boost being a 
little odd, or maybe that the phrase boost should not occur when the terms are 
queried against fields not listed in the pf parameter. Feel free to raise 
that as a separate issue.

You refer to splitting, but I don't see any term splitting in this example.


 eDismax: A fielded query wrapped by parens is not recognized
 

 Key: SOLR-3377
 URL: https://issues.apache.org/jira/browse/SOLR-3377
 Project: Solr
  Issue Type: Bug
  Components: query parsers
Affects Versions: 3.6
Reporter: Jan Høydahl
Assignee: Yonik Seeley
Priority: Critical
 Fix For: 4.0-BETA

 Attachments: SOLR-3377.patch, SOLR-3377.patch, SOLR-3377.patch, 
 SOLR-3377.patch


 As reported by bernd on the user list, a query like this
 {{q=(name:test)}}
 will yield 0 hits in 3.6 while it worked in 3.5. It works without the parens.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4116) Log Replay [recoveryExecutor-8-thread-1] - : java.io.EOFException

2012-11-28 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505458#comment-13505458
 ] 

Yonik Seeley commented on SOLR-4116:


I don't know what tomcat restart does, but perhaps it's not as nice as you 
think if it causes a log replay on restart?  Anyway, bringing down a server 
roughly enough (like kill -9) can cause truncated tlog files.
But truncated log files are expected and should not cause fatal exceptions (and 
we have tests for that).  This exception causes the core not to come up?

 Log Replay [recoveryExecutor-8-thread-1] - : java.io.EOFException
 -

 Key: SOLR-4116
 URL: https://issues.apache.org/jira/browse/SOLR-4116
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 5.0
 Environment: 5.0.0.2012.11.28.10.42.06
 Debian Squeeze, Tomcat 6, Sun Java 6, 10 nodes, 10 shards, rep. factor 2.
Reporter: Markus Jelsma
 Fix For: 5.0


 With SOLR-4032 fixed we see other issues when randomly taking down nodes 
 (nicely via tomcat restart) while indexing a few million web pages from 
 Hadoop. We do make sure that at least one node is up for a shard but due to 
 recovery issues it may not be live.
 {code}
 2012-11-28 11:32:33,086 WARN [solr.update.UpdateLog] - 
 [recoveryExecutor-8-thread-1] - : Starting log replay 
 tlog{file=/opt/solr/cores/openindex_e/data/tlog/tlog.028 
 refcount=2} active=false starting pos=0
 2012-11-28 11:32:41,873 ERROR [solr.update.UpdateLog] - 
 [recoveryExecutor-8-thread-1] - : java.io.EOFException
 at 
 org.apache.solr.common.util.FastInputStream.readFully(FastInputStream.java:151)
 at 
 org.apache.solr.common.util.JavaBinCodec.readStr(JavaBinCodec.java:479)
 at 
 org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:176)
 at 
 org.apache.solr.common.util.JavaBinCodec.readSolrInputDocument(JavaBinCodec.java:374)
 at 
 org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:225)
 at 
 org.apache.solr.common.util.JavaBinCodec.readArray(JavaBinCodec.java:451)
 at 
 org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:182)
 at 
 org.apache.solr.update.TransactionLog$LogReader.next(TransactionLog.java:618)
 at 
 org.apache.solr.update.UpdateLog$LogReplayer.doReplay(UpdateLog.java:1198)
 at 
 org.apache.solr.update.UpdateLog$LogReplayer.run(UpdateLog.java:1143)
 at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
 at java.util.concurrent.FutureTask.run(FutureTask.java:138)
 at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
 at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
 at java.util.concurrent.FutureTask.run(FutureTask.java:138)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4028) When using ZK chroot, it would be nice if Solr would create the initial path when it doesn't exist.

2012-11-28 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505462#comment-13505462
 ] 

Mark Miller commented on SOLR-4028:
---

Yeah, I think that was perhaps the concern - basically, it seems ops type 
people prefer being explicit. Other paths are auto-created, but they are not 
arbitrary paths supplied by the user as a connect string - I guess it's a 
little different. If you are trying to connect to an existing node and type 
something wrong, you just create a new one rather than getting an error.

I don't know what's best, but like I said, I guess I lean towards auto creating.

 When using ZK chroot, it would be nice if Solr would create the initial path 
 when it doesn't exist.
 ---

 Key: SOLR-4028
 URL: https://issues.apache.org/jira/browse/SOLR-4028
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud
Reporter: Tomás Fernández Löbbe
Priority: Minor
 Attachments: SOLR-4028.patch


 I think this would make it easier to test and develop with SolrCloud, in 
 order to start with a fresh ZK directory now the approach is to delete ZK 
 data, with this improvement one could just add a chroot to the zkHost like:
 java -DzkHost=localhost:2181/testXYZ -jar start.jar
 Right now this is possible but you have to manually create the initial path. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4116) Log Replay [recoveryExecutor-8-thread-1] - : java.io.EOFException

2012-11-28 Thread Markus Jelsma (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505466#comment-13505466
 ] 

Markus Jelsma commented on SOLR-4116:
-

Restarting or stopping Tomcat shuts down CoreContainer and stops recovery, i 
believe this is nice enough or isn't it? This error does not cause the core not 
to come up.

{code}
2012-11-28 14:10:15,227 INFO [solr.core.CoreContainer] - [Thread-6] - : 
Shutting down CoreContainer instance=1830423861
2012-11-28 14:10:15,227 WARN [solr.cloud.RecoveryStrategy] - [Thread-6] - : 
Stopping recovery for zkNodeName=178.21.118.195:8080_solr_shard_fcore=shard_f
2012-11-28 14:10:15,227 WARN [solr.cloud.RecoveryStrategy] - [Thread-6] - : 
Stopping recovery for zkNodeName=178.21.118.195:8080_solr_shard_gcore=shard_g
2012-11-28 14:10:15,227 INFO [solr.core.SolrCore] - [Thread-6] - : [shard_f]  
CLOSING SolrCore org.apache.solr.core.SolrCore@513c952f
2012-11-28 14:10:15,230 INFO [solr.update.UpdateHandler] - [Thread-6] - : 
closing DirectUpdateHandler2{commits=1,autocommit 
maxTime=12ms,autocommits=0,soft autocommit maxTime=1ms,soft 
autocommits=0,optimizes=0,rollbacks=0,expungeDeletes=0,docsPending=0,adds=0,deletesById=0,deletesByQuery=0,errors=0,cumulative_adds=0,cumulative_deletesById=0,cumulative_deletesByQuery=0,cumulative_errors=0}
2012-11-28 14:10:15,231 INFO [solr.core.SolrCore] - [Thread-6] - : Closing 
SolrCoreState
2012-11-28 14:10:15,231 INFO [solr.update.DefaultSolrCoreState] - [Thread-6] - 
: SolrCoreState ref count has reached 0 - closing IndexWriter
2012-11-28 14:10:15,231 INFO [solr.update.DefaultSolrCoreState] - [Thread-6] - 
: closing IndexWriter with IndexWriterCloser
2012-11-28 14:10:15,234 INFO [solr.core.CachingDirectoryFactory] - [Thread-6] - 
: Releasing directory:/opt/solr/cores/shard_f/data/index.20121128113300496
2012-11-28 14:10:15,235 INFO [solr.core.SolrCore] - [Thread-6] - : [shard_f] 
Closing main searcher on request.
2012-11-28 14:10:15,244 INFO [solr.core.CachingDirectoryFactory] - [Thread-6] - 
: Releasing directory:/opt/solr/cores/shard_f/data/index.20121128113300496
2012-11-28 14:10:15,244 INFO [solr.core.SolrCore] - [Thread-6] - : [shard_g]  
CLOSING SolrCore org.apache.solr.core.SolrCore@24be0446
2012-11-28 14:10:15,248 INFO [solr.update.UpdateHandler] - [Thread-6] - : 
closing DirectUpdateHandler2{commits=1,autocommit 
maxTime=12ms,autocommits=0,soft autocommit maxTime=1ms,soft 
autocommits=0,optimizes=0,rollbacks=0,expungeDeletes=0,docsPending=0,adds=0,deletesById=0,deletesByQuery=0,errors=0,cumulative_adds=0,cumulative_deletesById=0,cumulative_deletesByQuery=0,cumulative_errors=0}
2012-11-28 14:10:15,248 INFO [solr.core.SolrCore] - [Thread-6] - : Closing 
SolrCoreState
2012-11-28 14:10:15,248 INFO [solr.update.DefaultSolrCoreState] - [Thread-6] - 
: SolrCoreState ref count has reached 0 - closing IndexWriter
2012-11-28 14:10:15,248 INFO [solr.update.DefaultSolrCoreState] - [Thread-6] - 
: closing IndexWriter with IndexWriterCloser
2012-11-28 14:10:15,250 INFO [solr.core.CachingDirectoryFactory] - [Thread-6] - 
: Releasing directory:/opt/solr/cores/shard_g/data/index.20121128113035951
2012-11-28 14:10:15,250 INFO [solr.core.SolrCore] - [Thread-6] - : [shard_g] 
Closing main searcher on request.
2012-11-28 14:10:15,256 INFO [solr.core.CachingDirectoryFactory] - [Thread-6] - 
: Releasing directory:/opt/solr/cores/shard_g/data/index.20121128113035951
2012-11-28 14:10:15,281 INFO [apache.zookeeper.ZooKeeper] - [Thread-6] - : 
Session: 0x13b4668803e000f closed
{code}

 Log Replay [recoveryExecutor-8-thread-1] - : java.io.EOFException
 -

 Key: SOLR-4116
 URL: https://issues.apache.org/jira/browse/SOLR-4116
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 5.0
 Environment: 5.0.0.2012.11.28.10.42.06
 Debian Squeeze, Tomcat 6, Sun Java 6, 10 nodes, 10 shards, rep. factor 2.
Reporter: Markus Jelsma
 Fix For: 5.0


 With SOLR-4032 fixed we see other issues when randomly taking down nodes 
 (nicely via tomcat restart) while indexing a few million web pages from 
 Hadoop. We do make sure that at least one node is up for a shard but due to 
 recovery issues it may not be live.
 {code}
 2012-11-28 11:32:33,086 WARN [solr.update.UpdateLog] - 
 [recoveryExecutor-8-thread-1] - : Starting log replay 
 tlog{file=/opt/solr/cores/openindex_e/data/tlog/tlog.028 
 refcount=2} active=false starting pos=0
 2012-11-28 11:32:41,873 ERROR [solr.update.UpdateLog] - 
 [recoveryExecutor-8-thread-1] - : java.io.EOFException
 at 
 org.apache.solr.common.util.FastInputStream.readFully(FastInputStream.java:151)
 at 
 org.apache.solr.common.util.JavaBinCodec.readStr(JavaBinCodec.java:479)
 at 
 

[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server

2012-11-28 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505468#comment-13505468
 ] 

Mark Miller commented on SOLR-4114:
---

bq.  fixed in collectionCmd (used for delete and reload) but not in 
createCollection 

This fix belongs with the issue that fixed delete and reload - I'm going to fix 
it there.

 Collection API: Allow multiple shards from one collection on the same Solr 
 server
 -

 Key: SOLR-4114
 URL: https://issues.apache.org/jira/browse/SOLR-4114
 Project: Solr
  Issue Type: New Feature
  Components: multicore, SolrCloud
Affects Versions: 4.0
 Environment: Solr 4.0.0 release
Reporter: Per Steffensen
Assignee: Per Steffensen
  Labels: collection-api, multicore, shard, shard-allocation
 Attachments: SOLR-4114.patch


 We should support running multiple shards from one collection on the same 
 Solr server - the run a collection with 8 shards on a 4 Solr server cluster 
 (each Solr server running 2 shards).
 Performance tests at our side has shown that this is a good idea, and it is 
 also a good idea for easy elasticity later on - it is much easier to move an 
 entire existing shards from one Solr server to another one that just joined 
 the cluter than it is to split an exsiting shard among the Solr that used to 
 run it and the new Solr.
 See dev mailing list discussion Multiple shards for one collection on the 
 same Solr server

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server

2012-11-28 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505471#comment-13505471
 ] 

Yonik Seeley commented on SOLR-4114:


bq. Ok, its just than the replicationFactor you specify in your request is the 
other thing.

Hmmm, you're right:
Note: replicationFactor defines the maximum number of replicas created in 
addition to the leader from amongst the nodes currently running

That's not consistent with the original definition 
(http://wiki.apache.org/solr/NewSolrCloudDesign), the way the state is 
represented in clusterstate, or the way others use the term such as in 
hbase/HDFS, cassandra, oracle, etc.  The important part is how many times the 
data is stored (the replication factor), and things like leaders are more of an 
implementation detail.

Luckily we don't yet store this in the cluster, so there's no back compat issue 
with existing clusters.  There's only a change when creating a new cluster, but 
that seems relatively minor.  Given that, I'd lean toward changing this 
parameter to be in line with common usage.

Per: this is unrelated to your patch of course - it just happened to come up 
here.

 Collection API: Allow multiple shards from one collection on the same Solr 
 server
 -

 Key: SOLR-4114
 URL: https://issues.apache.org/jira/browse/SOLR-4114
 Project: Solr
  Issue Type: New Feature
  Components: multicore, SolrCloud
Affects Versions: 4.0
 Environment: Solr 4.0.0 release
Reporter: Per Steffensen
Assignee: Per Steffensen
  Labels: collection-api, multicore, shard, shard-allocation
 Attachments: SOLR-4114.patch


 We should support running multiple shards from one collection on the same 
 Solr server - the run a collection with 8 shards on a 4 Solr server cluster 
 (each Solr server running 2 shards).
 Performance tests at our side has shown that this is a good idea, and it is 
 also a good idea for easy elasticity later on - it is much easier to move an 
 entire existing shards from one Solr server to another one that just joined 
 the cluter than it is to split an exsiting shard among the Solr that used to 
 run it and the new Solr.
 See dev mailing list discussion Multiple shards for one collection on the 
 same Solr server

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server

2012-11-28 Thread Per Steffensen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505472#comment-13505472
 ] 

Per Steffensen commented on SOLR-4114:
--

bq. This fix belongs with the issue that fixed delete and reload - I'm going to 
fix it there.

Yes of course, it is just hard for me to split up the patch, because it is all 
needed for the tests to be green. But commit-wise it belongs to the other issue.

 Collection API: Allow multiple shards from one collection on the same Solr 
 server
 -

 Key: SOLR-4114
 URL: https://issues.apache.org/jira/browse/SOLR-4114
 Project: Solr
  Issue Type: New Feature
  Components: multicore, SolrCloud
Affects Versions: 4.0
 Environment: Solr 4.0.0 release
Reporter: Per Steffensen
Assignee: Per Steffensen
  Labels: collection-api, multicore, shard, shard-allocation
 Attachments: SOLR-4114.patch


 We should support running multiple shards from one collection on the same 
 Solr server - the run a collection with 8 shards on a 4 Solr server cluster 
 (each Solr server running 2 shards).
 Performance tests at our side has shown that this is a good idea, and it is 
 also a good idea for easy elasticity later on - it is much easier to move an 
 entire existing shards from one Solr server to another one that just joined 
 the cluter than it is to split an exsiting shard among the Solr that used to 
 run it and the new Solr.
 See dev mailing list discussion Multiple shards for one collection on the 
 same Solr server

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server

2012-11-28 Thread Per Steffensen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505472#comment-13505472
 ] 

Per Steffensen edited comment on SOLR-4114 at 11/28/12 2:28 PM:


bq. This fix belongs with the issue that fixed delete and reload - I'm going to 
fix it there.

Yes of course, it is just hard for me to split up the patch, because it is all 
needed for the tests to be green - and I really want to give you a patch 
fitting on top of a certain revision where all tests are green if you add the 
patch. But commit-wise it belongs to the other issue.

  was (Author: steff1193):
bq. This fix belongs with the issue that fixed delete and reload - I'm 
going to fix it there.

Yes of course, it is just hard for me to split up the patch, because it is all 
needed for the tests to be green. But commit-wise it belongs to the other issue.
  
 Collection API: Allow multiple shards from one collection on the same Solr 
 server
 -

 Key: SOLR-4114
 URL: https://issues.apache.org/jira/browse/SOLR-4114
 Project: Solr
  Issue Type: New Feature
  Components: multicore, SolrCloud
Affects Versions: 4.0
 Environment: Solr 4.0.0 release
Reporter: Per Steffensen
Assignee: Per Steffensen
  Labels: collection-api, multicore, shard, shard-allocation
 Attachments: SOLR-4114.patch


 We should support running multiple shards from one collection on the same 
 Solr server - the run a collection with 8 shards on a 4 Solr server cluster 
 (each Solr server running 2 shards).
 Performance tests at our side has shown that this is a good idea, and it is 
 also a good idea for easy elasticity later on - it is much easier to move an 
 entire existing shards from one Solr server to another one that just joined 
 the cluter than it is to split an exsiting shard among the Solr that used to 
 run it and the new Solr.
 See dev mailing list discussion Multiple shards for one collection on the 
 same Solr server

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3926) solrj should support better way of finding active sorts

2012-11-28 Thread Eirik Lygre (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505483#comment-13505483
 ] 

Eirik Lygre commented on SOLR-3926:
---

I'll take the blame for guiding Yonik down the Map-path; at the time (while 
parsing the sort-field), returning a LinkedHashMap was an easy way to achieve 
the business objectives. Then, as the idea developed, it became less so. 
Anyway, that's why we review, right?

Here is an extended view of my current implementation. It will probably not be 
like this, ref questions below :-)

{code}
public String getSortField();

public SolrQuery setSorts(ListSortClause value);
public SolrQuery clearSorts();
public ListSortClause getSorts();
public SolrQuery setSort(SortClause sortClause);
public SolrQuery addSort(SortClause sortClause);
public SolrQuery addOrUpdateSort(SortClause sortClause);
public SolrQuery removeSort(String itemName);

public static class SortClause {
  public static SortClause create (String item, ORDER order);
  public static SortClause create (String item, String order)
  public static SortClause asc (String item);
  public static SortClause desc (String item);
  public String getItem();
  public ORDER getOrder();
}
{code}

Some questions, illustrated by code examples. Some questions relate to apis 
shown above, and are REMOVE? questions; some questions relate to apis *not* 
shown above, and are ADD? questions. Note that some of the examples use stuff 
from other

{code}
// Usage, per the api above
query.setSort(SolrQuery.SortClause.desc(rating));
query.setSort(SolrQuery.SortClause.create(rating, SolrQuery.ORDER.desc));
query.setSort(SolrQuery.SortClause.create(rating, 
SortQuery.ORDER.valueOf(desc)));
query.setSort(SolrQuery.SortClause.create(rating, asc));
query.remove(rating);
{code}


I want to retain query.removeSort(String), because that's really the use case 
(remove sort based on item name, ignoring ordering). I'm not really sure about 
query.removeSort(SortClause), which does in fact only use the item name, but it 
would be symmetrical to the add-functions.

{code}
// Q1: Should we REMOVE query.removeSort (String)
query.addSort(new SolrQuery.SortClause(rating, SolrQuery.ORDER.desc));
query.addSort(new SolrQuery.SortClause(price, SolrQuery.ORDER.asc));
query.removeSort(rating);

// Q2: Should we ADD query.removeSort(SortClause)?
query.addSort(new SolrQuery.SortClause(rating, SolrQuery.ORDER.desc));
query.addSort(new SolrQuery.SortClause(price, SolrQuery.ORDER.asc));
query.removeSort(new SolrQuery.SortClause(price, SolrQuery.ORDER.desc));  
// Remove irregardless of order
{code}


We might build convenience functions query.xxxSort (String, order) and 
query.xxxSort (String,String) as shown below. It would make usage simpler, but 
come with a footprint. The SortClause.asc(), .desc() and .create() factory 
functions described below make this less needed, I think:

{code}
// Q3: Should we ADD convenience functions query.xxxSort (String, order)
query.addSort(price, SolrQuery.ORDER.asc);

// Q4: Should we ADD convenience functions query.xxxSort (String, String)
query.addSort(price, asc);
{code}


The api currently has convenience functions for creating SortClause. The 
functions asc() and desc() make it easier (and more compact) to create 
SortClause. The create() functions are there for symmetry (always use static 
methods instead of constructors). The constructors aren't public, but maybe 
they should be?

{code}
// Q5: Should we REMOVE asc() and desc() convenience factory methods:
query.setSort(SolrQuery.SortClause.desc(rating));
query.setSort(SolrQuery.SortClause.asc(rating));

// Q6: Should we REMOVE create(String,ORDER) convenience factory method (use 
constructor instead)
query.setSort(SolrQuery.SortClause.create(rating, SolrQuery.ORDER.desc));
query.setSort(SolrQuery.SortClause.create(rating, 
SolrQuery.ORDER.valueOf(desc)));

// Q7:Should we REMOVE create(String,ORDER) convenience factory method 
(Complements Q5, when the order is in fact a string)
query.setSort(SolrQuery.SortClause.create(rating, desc));

// Q8: Should we ADD a simple constructor, typically instead of Q5-Q7?
query.setSort(new SolrQuery.SortClause(rating, SolrQuery.ORDER.desc));
query.setSort(new SolrQuery.SortClause(rating, 
SolrQuery.ORDER.valueOf(desc)));
{code}

A couple of other items:

Q9: Currently, SortClause is an inner class of SolrQuery. Let me know if this 
is an issue
Q10: What the heck do we call the thing to sort. I don't want to call it a 
field, since it can be many other things. I've chosen to call it an item, 
but is there another, better name?
Q11: Should we have SortClause.hashCode() and SortClause.equals()?

 solrj should support better way of finding active sorts
 ---

 Key: SOLR-3926
 URL: https://issues.apache.org/jira/browse/SOLR-3926
 

[jira] [Commented] (SOLR-4028) When using ZK chroot, it would be nice if Solr would create the initial path when it doesn't exist.

2012-11-28 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505493#comment-13505493
 ] 

Yonik Seeley commented on SOLR-4028:


bq. I think I see the issue here, the problem would be if someone mistype the 
initial path, instead of throwing exceptions and stopping, we would be creating 
a new path and probably hiding an error. 

That can go the other direction too?  A config could be created under /solr and 
then someone could try to join it by forgetting to specify that root in zkHost.

bq. Only create the initial path when bootstrap_conf is true (or 
bootstrap_confdir). 

As long as we need some sort of explicit bootstrap, that seems reasonable.

bq. Add a new parameter to force creation, something like –DzkHost.create=true.

Anything that creates a skeleton layout of a new cluster should work the same 
(auto-create the rot if it doesn't exist).  ZkCLI -cmd bootstrap for example. 
 Not sure if there are others.


 When using ZK chroot, it would be nice if Solr would create the initial path 
 when it doesn't exist.
 ---

 Key: SOLR-4028
 URL: https://issues.apache.org/jira/browse/SOLR-4028
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud
Reporter: Tomás Fernández Löbbe
Priority: Minor
 Attachments: SOLR-4028.patch


 I think this would make it easier to test and develop with SolrCloud, in 
 order to start with a fresh ZK directory now the approach is to delete ZK 
 data, with this improvement one could just add a chroot to the zkHost like:
 java -DzkHost=localhost:2181/testXYZ -jar start.jar
 Right now this is possible but you have to manually create the initial path. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server

2012-11-28 Thread Per Steffensen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505496#comment-13505496
 ] 

Per Steffensen commented on SOLR-4114:
--

bq. Per: this is unrelated to your patch of course - it just happened to come 
up here.

No problem. I could make it as part of this patch if you want, but Im not sure 
I agree with your way of interpreting the term replication-factor. I would 
expect replication-factor to say something about how many times the data is 
REPLICATED. If I run with only one copy of the data for each slice, I would 
logically say that my data is not replicated, and that matches the 
replication-factor of 0.

I have used HDFS and HBase a little a year or so ago, but Im not sure what 
meaning they put into the term replica. I've also worked a lot with 
ElasticSearch (which I believe is more of a pendant to Solr) and in 
ElasticSearch I believe they use the term replica as the number of ADDITIONAL 
copies of the data - equal to your/our current implementation in Solr.

 Collection API: Allow multiple shards from one collection on the same Solr 
 server
 -

 Key: SOLR-4114
 URL: https://issues.apache.org/jira/browse/SOLR-4114
 Project: Solr
  Issue Type: New Feature
  Components: multicore, SolrCloud
Affects Versions: 4.0
 Environment: Solr 4.0.0 release
Reporter: Per Steffensen
Assignee: Per Steffensen
  Labels: collection-api, multicore, shard, shard-allocation
 Attachments: SOLR-4114.patch


 We should support running multiple shards from one collection on the same 
 Solr server - the run a collection with 8 shards on a 4 Solr server cluster 
 (each Solr server running 2 shards).
 Performance tests at our side has shown that this is a good idea, and it is 
 also a good idea for easy elasticity later on - it is much easier to move an 
 entire existing shards from one Solr server to another one that just joined 
 the cluter than it is to split an exsiting shard among the Solr that used to 
 run it and the new Solr.
 See dev mailing list discussion Multiple shards for one collection on the 
 same Solr server

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Reopened] (SOLR-4055) Remove/Reload the collection has the thread safe issue.

2012-11-28 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller reopened SOLR-4055:
---


See SOLR-4114 - we missed a spot.

 Remove/Reload the collection has the thread safe issue.
 ---

 Key: SOLR-4055
 URL: https://issues.apache.org/jira/browse/SOLR-4055
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.0-ALPHA, 4.0-BETA, 4.0
 Environment: Solr cloud
Reporter: Raintung Li
Assignee: Mark Miller
 Fix For: 4.1, 5.0

 Attachments: patch-4055


 OverseerCollectionProcessor class for collectionCmd method has thread safe 
 issue.
 The major issue is ModifiableSolrParams params instance will deliver into 
 other thread use(HttpShardHandler.submit). Modify parameter will affect the 
 other threads the correct parameter.
 In the method collectionCmd , change the value 
 params.set(CoreAdminParams.CORE, node.getStr(ZkStateReader.CORE_NAME_PROP)); 
 , that occur send the http request thread will get the wrong core name. The 
 result is that can't delete/reload the right core.
 The easy fix is clone the ModifiableSolrParams for every request.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4028) When using ZK chroot, it would be nice if Solr would create the initial path when it doesn't exist.

2012-11-28 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-4028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505500#comment-13505500
 ] 

Tomás Fernández Löbbe commented on SOLR-4028:
-

bq. That can go the other direction too? A config could be created under /solr 
and then someone could try to join it by forgetting to specify that root in 
zkHost.
This can happen today too
bq. Anything that creates a skeleton layout of a new cluster should work the 
same (auto-create the rot if it doesn't exist). ZkCLI -cmd bootstrap for 
example. Not sure if there are others.
Yes, I agree

 When using ZK chroot, it would be nice if Solr would create the initial path 
 when it doesn't exist.
 ---

 Key: SOLR-4028
 URL: https://issues.apache.org/jira/browse/SOLR-4028
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud
Reporter: Tomás Fernández Löbbe
Priority: Minor
 Attachments: SOLR-4028.patch


 I think this would make it easier to test and develop with SolrCloud, in 
 order to start with a fresh ZK directory now the approach is to delete ZK 
 data, with this improvement one could just add a chroot to the zkHost like:
 java -DzkHost=localhost:2181/testXYZ -jar start.jar
 Right now this is possible but you have to manually create the initial path. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4117) IO error while trying to get the size of the Directory

2012-11-28 Thread Markus Jelsma (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505507#comment-13505507
 ] 

Markus Jelsma commented on SOLR-4117:
-

I have another node now logging the same exception for a core that has 0 docs 
which is not the leader but clusterstate says the node is active and does not 
attempt recovery. To my surprise it has two index.NUMBER directories of 
different sizes and index.properties points to the largest directory.

The node won't come back up properly. Search and indexing works but accessing 
the GUI is impossible:
{code}
2012-11-28 14:50:00,026 ERROR [solr.servlet.SolrDispatchFilter] - 
[http-8080-exec-6] - : null:org.apache.solr.common.SolrException: Error 
handling 'status' action 
at 
org.apache.solr.handler.admin.CoreAdminHandler.handleStatusAction(CoreAdminHandler.java:724)
at 
org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:157)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:144)
at 
org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:372)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:181)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
at 
org.apache.coyote.http11.Http11NioProcessor.process(Http11NioProcessor.java:889)
at 
org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.process(Http11NioProtocol.java:744)
at 
org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:2274)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: org.apache.solr.common.SolrException: 
java.util.concurrent.RejectedExecutionException
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1674)
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1330)
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1265)
at 
org.apache.solr.handler.admin.CoreAdminHandler.getCoreStatus(CoreAdminHandler.java:996)
at 
org.apache.solr.handler.admin.CoreAdminHandler.handleStatusAction(CoreAdminHandler.java:710)
... 18 more
Caused by: java.util.concurrent.RejectedExecutionException
at 
java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:1768)
at 
java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767)
at 
java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:658)
at 
java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:92)
at 
java.util.concurrent.Executors$DelegatedExecutorService.submit(Executors.java:603)
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1605)
... 22 more
{code}



 IO error while trying to get the size of the Directory
 --

 Key: SOLR-4117
 URL: https://issues.apache.org/jira/browse/SOLR-4117
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 5.0
 Environment: 5.0.0.2012.11.28.10.42.06
 Debian Squeeze, Tomcat 6, Sun Java 6, 10 nodes, 10 shards, rep. factor 2.
Reporter: Markus Jelsma
Assignee: Mark Miller
Priority: Minor
 Fix For: 5.0


 With SOLR-4032 fixed we see other issues when randomly taking down nodes 
 (nicely via tomcat restart) while indexing a few million web pages from 
 Hadoop. We do make sure that at least one node is up for a shard but due to 
 recovery issues it may not be live.
 One node seems to work but generates IO errors in the log and 
 ZookeeperExeption in the GUI. In the GUI we only see:
 {code}
 SolrCore Initialization Failures
 openindex_f: 
 org.apache.solr.common.cloud.ZooKeeperException:org.apache.solr.common.cloud.ZooKeeperException:
  
 Please check your logs for more 

[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server

2012-11-28 Thread Per Steffensen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505506#comment-13505506
 ] 

Per Steffensen commented on SOLR-4114:
--

Another more urgent problem (for me) is that I need to do another change to the 
Solr Collection API, before we can use it as a replacement for what we already 
do in our project (where we create each shard one by one in OUR code). We split 
our set of Solr servers into two subsets - Data-Solrs and Search-Solrs. The 
Search-Solrs are not supposed to carry any data and therefore to be occupied by 
indexing. Search-Solr instead play the role of receiving queries from the 
outside, sub-quering the Data-Solrs and combining the final total response to 
the outside. Data-Solrs are where we create the data-carrying collections. 
Data-Solrs need more CPU and IO-capabilities while Search-Solrs need more RAM - 
hence the splitup.

Therefore I need to be able to provide a list of Solrs to the create operation 
of the Solr Collection API. The shards are then only allowed to be spread 
shards for the collection over the Solrs in this list - default list could be 
all Solrs. As this list we, in our Solr-based projbect, will give our list of 
Data-Solrs.

Can I add such a feature to this SOLR-4114 and include it in a combined patch, 
or do you prefer another ticket for this change? I can create another issue but 
provide a combined patch. Are you interrested in such a feature at all? That 
is, a feature where the create operation takes a list of Solrs to spread the 
created shards over.

 Collection API: Allow multiple shards from one collection on the same Solr 
 server
 -

 Key: SOLR-4114
 URL: https://issues.apache.org/jira/browse/SOLR-4114
 Project: Solr
  Issue Type: New Feature
  Components: multicore, SolrCloud
Affects Versions: 4.0
 Environment: Solr 4.0.0 release
Reporter: Per Steffensen
Assignee: Per Steffensen
  Labels: collection-api, multicore, shard, shard-allocation
 Attachments: SOLR-4114.patch


 We should support running multiple shards from one collection on the same 
 Solr server - the run a collection with 8 shards on a 4 Solr server cluster 
 (each Solr server running 2 shards).
 Performance tests at our side has shown that this is a good idea, and it is 
 also a good idea for easy elasticity later on - it is much easier to move an 
 entire existing shards from one Solr server to another one that just joined 
 the cluter than it is to split an exsiting shard among the Solr that used to 
 run it and the new Solr.
 See dev mailing list discussion Multiple shards for one collection on the 
 same Solr server

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server

2012-11-28 Thread Per Steffensen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505506#comment-13505506
 ] 

Per Steffensen edited comment on SOLR-4114 at 11/28/12 2:54 PM:


Another more urgent problem (for me) is that I need to do another change to the 
Solr Collection API, before we can use it as a replacement for what we already 
do in our project (where we create each shard one by one in OUR code). We split 
our set of Solr servers into two subsets - Data-Solrs and Search-Solrs. The 
Search-Solrs are not supposed to carry any data and therefore to be occupied by 
indexing. Search-Solr instead play the role of receiving queries from the 
outside, sub-quering the Data-Solrs and combining the final total response to 
the outside. Data-Solrs are where we create the data-carrying collections. 
Data-Solrs need more CPU and IO-capabilities while Search-Solrs need more RAM - 
hence the splitup.

Therefore I need to be able to provide a list of Solrs to the create operation 
of the Solr Collection API. The shards of the collection to be created are then 
only allowed to be spread over the Solrs in this list - default list could be 
all Solrs. As this list we, in our Solr-based projbect, will give our list of 
Data-Solrs.

Can I add such a feature to this SOLR-4114 and include it in a combined patch, 
or do you prefer another ticket for this change? I can create another issue but 
provide a combined patch. Are you interrested in such a feature at all? That 
is, a feature where the create operation takes a list of Solrs to spread the 
created shards over.

  was (Author: steff1193):
Another more urgent problem (for me) is that I need to do another change to 
the Solr Collection API, before we can use it as a replacement for what we 
already do in our project (where we create each shard one by one in OUR code). 
We split our set of Solr servers into two subsets - Data-Solrs and 
Search-Solrs. The Search-Solrs are not supposed to carry any data and therefore 
to be occupied by indexing. Search-Solr instead play the role of receiving 
queries from the outside, sub-quering the Data-Solrs and combining the final 
total response to the outside. Data-Solrs are where we create the 
data-carrying collections. Data-Solrs need more CPU and IO-capabilities while 
Search-Solrs need more RAM - hence the splitup.

Therefore I need to be able to provide a list of Solrs to the create operation 
of the Solr Collection API. The shards are then only allowed to be spread 
shards for the collection over the Solrs in this list - default list could be 
all Solrs. As this list we, in our Solr-based projbect, will give our list of 
Data-Solrs.

Can I add such a feature to this SOLR-4114 and include it in a combined patch, 
or do you prefer another ticket for this change? I can create another issue but 
provide a combined patch. Are you interrested in such a feature at all? That 
is, a feature where the create operation takes a list of Solrs to spread the 
created shards over.
  
 Collection API: Allow multiple shards from one collection on the same Solr 
 server
 -

 Key: SOLR-4114
 URL: https://issues.apache.org/jira/browse/SOLR-4114
 Project: Solr
  Issue Type: New Feature
  Components: multicore, SolrCloud
Affects Versions: 4.0
 Environment: Solr 4.0.0 release
Reporter: Per Steffensen
Assignee: Per Steffensen
  Labels: collection-api, multicore, shard, shard-allocation
 Attachments: SOLR-4114.patch


 We should support running multiple shards from one collection on the same 
 Solr server - the run a collection with 8 shards on a 4 Solr server cluster 
 (each Solr server running 2 shards).
 Performance tests at our side has shown that this is a good idea, and it is 
 also a good idea for easy elasticity later on - it is much easier to move an 
 entire existing shards from one Solr server to another one that just joined 
 the cluter than it is to split an exsiting shard among the Solr that used to 
 run it and the new Solr.
 See dev mailing list discussion Multiple shards for one collection on the 
 same Solr server

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server

2012-11-28 Thread Per Steffensen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Per Steffensen updated SOLR-4114:
-

Attachment: SOLR-4114.patch

New patch SOLR-4114.patch attached (not including the 
only-spread-shards-over-solrs-mentioned-in-provided-list thingy)

New, compared to the first patch:
* maxShardsPerNode implemented
* Tests (BasicDistributedZkTest.testCollectionAPI) now tests additional stuff
** That the expected number of shards are actually created
** That if there is not room for all the shards due to the provided 
maxShardsPerNode, nothing is created

 Collection API: Allow multiple shards from one collection on the same Solr 
 server
 -

 Key: SOLR-4114
 URL: https://issues.apache.org/jira/browse/SOLR-4114
 Project: Solr
  Issue Type: New Feature
  Components: multicore, SolrCloud
Affects Versions: 4.0
 Environment: Solr 4.0.0 release
Reporter: Per Steffensen
Assignee: Per Steffensen
  Labels: collection-api, multicore, shard, shard-allocation
 Attachments: SOLR-4114.patch, SOLR-4114.patch


 We should support running multiple shards from one collection on the same 
 Solr server - the run a collection with 8 shards on a 4 Solr server cluster 
 (each Solr server running 2 shards).
 Performance tests at our side has shown that this is a good idea, and it is 
 also a good idea for easy elasticity later on - it is much easier to move an 
 entire existing shards from one Solr server to another one that just joined 
 the cluter than it is to split an exsiting shard among the Solr that used to 
 run it and the new Solr.
 See dev mailing list discussion Multiple shards for one collection on the 
 same Solr server

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4574) FunctionQuery ValueSource value computed twice per document

2012-11-28 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505528#comment-13505528
 ] 

David Smiley commented on LUCENE-4574:
--

But Robert, if I simply change the scenario slightly such that there is more 
than one sort field, TopScoreDocCollector (the specific collector I think you 
actually meant to suggest) is no longer suitable.

Is your concern that the overhead might be too much?  It seems so small to me; 
it only caches the last docid  score pair.

My patch only did the score caching at for 
OneComparatorScoreing[No]MaxScoreCollector but after further experimentation by 
modifying the test to sort on an additional field, it appears that all 
subclasses of TopFieldCollector are affected.

 FunctionQuery ValueSource value computed twice per document
 ---

 Key: LUCENE-4574
 URL: https://issues.apache.org/jira/browse/LUCENE-4574
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/search
Affects Versions: 4.0, 4.1
Reporter: David Smiley
 Attachments: LUCENE-4574.patch, Test_for_LUCENE-4574.patch


 I was working on a custom ValueSource and did some basic profiling and 
 debugging to see if it was being used optimally.  To my surprise, the value 
 was being fetched twice per document in a row.  This computation isn't 
 exactly cheap to calculate so this is a big problem.  I was able to 
 work-around this problem trivially on my end by caching the last value with 
 corresponding docid in my FunctionValues implementation.
 Here is an excerpt of the code path to the first execution:
 {noformat}
 at 
 org.apache.lucene.queries.function.docvalues.DoubleDocValues.floatVal(DoubleDocValues.java:48)
 at 
 org.apache.lucene.queries.function.FunctionQuery$AllScorer.score(FunctionQuery.java:153)
 at 
 org.apache.lucene.search.TopFieldCollector$OneComparatorScoringMaxScoreCollector.collect(TopFieldCollector.java:291)
 at org.apache.lucene.search.Scorer.score(Scorer.java:62)
 at 
 org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:588)
 at 
 org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:280)
 {noformat}
 And here is the 2nd call:
 {noformat}
 at 
 org.apache.lucene.queries.function.docvalues.DoubleDocValues.floatVal(DoubleDocValues.java:48)
 at 
 org.apache.lucene.queries.function.FunctionQuery$AllScorer.score(FunctionQuery.java:153)
 at 
 org.apache.lucene.search.ScoreCachingWrappingScorer.score(ScoreCachingWrappingScorer.java:56)
 at 
 org.apache.lucene.search.FieldComparator$RelevanceComparator.copy(FieldComparator.java:951)
 at 
 org.apache.lucene.search.TopFieldCollector$OneComparatorScoringMaxScoreCollector.collect(TopFieldCollector.java:312)
 at org.apache.lucene.search.Scorer.score(Scorer.java:62)
 at 
 org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:588)
 at 
 org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:280)
 {noformat}
 The 2nd call appears to use some score caching mechanism, which is all well 
 and good, but that same mechanism wasn't used in the first call so there's no 
 cached value to retrieve.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4117) IO error while trying to get the size of the Directory

2012-11-28 Thread Eks Dev (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505530#comment-13505530
 ] 

Eks Dev commented on SOLR-4117:
---

fwiw, we *think* we observed the following problem in simple master slave setup 
with NRTCachingDirectory... I am not sure it has something to do with issue, 
because ewe did not see this exception, anyhow   

on replication, slave gets the index from master and works fine, then on:
1. graceful restart, the world looks fine 
2. kill -9 or such, solr does not start because an index gets corrupt (should 
actually not happen)

We speculate that solr now does replication directly to Directory 
implementation and does not ensure that replicated files get fsck-ed completely 
after replication. As far as I remember, replication was going to /temp (disk) 
and than moving files if all went ok. Working under assumption that everything 
is already persisted. Maybe this invariant does not hold any more and some 
explicit fsck is needed for caching directories? 

I might be completely wrong, we just observed symptoms in not really 
debug-friendly environment



 

 IO error while trying to get the size of the Directory
 --

 Key: SOLR-4117
 URL: https://issues.apache.org/jira/browse/SOLR-4117
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 5.0
 Environment: 5.0.0.2012.11.28.10.42.06
 Debian Squeeze, Tomcat 6, Sun Java 6, 10 nodes, 10 shards, rep. factor 2.
Reporter: Markus Jelsma
Assignee: Mark Miller
Priority: Minor
 Fix For: 5.0


 With SOLR-4032 fixed we see other issues when randomly taking down nodes 
 (nicely via tomcat restart) while indexing a few million web pages from 
 Hadoop. We do make sure that at least one node is up for a shard but due to 
 recovery issues it may not be live.
 One node seems to work but generates IO errors in the log and 
 ZookeeperExeption in the GUI. In the GUI we only see:
 {code}
 SolrCore Initialization Failures
 openindex_f: 
 org.apache.solr.common.cloud.ZooKeeperException:org.apache.solr.common.cloud.ZooKeeperException:
  
 Please check your logs for more information
 {code}
 and in the log we only see the following exception:
 {code}
 2012-11-28 11:47:26,652 ERROR [solr.handler.ReplicationHandler] - 
 [http-8080-exec-28] - : IO error while trying to get the size of the 
 Directory:org.apache.lucene.store.NoSuchDirectoryException: directory 
 '/opt/solr/cores/shard_f/data/index' does not exist
 at org.apache.lucene.store.FSDirectory.listAll(FSDirectory.java:217)
 at org.apache.lucene.store.FSDirectory.listAll(FSDirectory.java:240)
 at 
 org.apache.lucene.store.NRTCachingDirectory.listAll(NRTCachingDirectory.java:132)
 at 
 org.apache.solr.core.DirectoryFactory.sizeOfDirectory(DirectoryFactory.java:146)
 at 
 org.apache.solr.handler.ReplicationHandler.getIndexSize(ReplicationHandler.java:472)
 at 
 org.apache.solr.handler.ReplicationHandler.getReplicationDetails(ReplicationHandler.java:568)
 at 
 org.apache.solr.handler.ReplicationHandler.handleRequestBody(ReplicationHandler.java:213)
 at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:144)
 at 
 org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:240)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1830)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:476)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:276)
 at 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
 at 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
 at 
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
 at 
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
 at 
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
 at 
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
 at 
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
 at 
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
 at 
 org.apache.coyote.http11.Http11NioProcessor.process(Http11NioProcessor.java:889)
 at 
 org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.process(Http11NioProtocol.java:744)
 at 
 org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:2274)
 at 
 

[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server

2012-11-28 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505531#comment-13505531
 ] 

Mark Miller commented on SOLR-4114:
---

When grabbing the params fix I noticed you set the data dir to something like 
shardname+_data - that's not strictly necessary right? Since each core should 
have it's own instance dir?

I've been thinking about how to set custom datadirs with this api - it would be 
nice to be able to specify the data dir - and in some cases perhaps base it on 
something like the core name rather than just some static string. But have you 
found it 'necessary' with your work?

 Collection API: Allow multiple shards from one collection on the same Solr 
 server
 -

 Key: SOLR-4114
 URL: https://issues.apache.org/jira/browse/SOLR-4114
 Project: Solr
  Issue Type: New Feature
  Components: multicore, SolrCloud
Affects Versions: 4.0
 Environment: Solr 4.0.0 release
Reporter: Per Steffensen
Assignee: Per Steffensen
  Labels: collection-api, multicore, shard, shard-allocation
 Attachments: SOLR-4114.patch, SOLR-4114.patch


 We should support running multiple shards from one collection on the same 
 Solr server - the run a collection with 8 shards on a 4 Solr server cluster 
 (each Solr server running 2 shards).
 Performance tests at our side has shown that this is a good idea, and it is 
 also a good idea for easy elasticity later on - it is much easier to move an 
 entire existing shards from one Solr server to another one that just joined 
 the cluter than it is to split an exsiting shard among the Solr that used to 
 run it and the new Solr.
 See dev mailing list discussion Multiple shards for one collection on the 
 same Solr server

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server

2012-11-28 Thread Radim Kolar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505533#comment-13505533
 ] 

Radim Kolar commented on SOLR-4114:
---

could not you do same thing as Elastic Search. Build index with number of 
shards (initial number is 5). If there is 1 machine in cluster, then all shards 
are on this machine. If you add more machines, they will move to other 
machines. It is way simple for administration.

 Collection API: Allow multiple shards from one collection on the same Solr 
 server
 -

 Key: SOLR-4114
 URL: https://issues.apache.org/jira/browse/SOLR-4114
 Project: Solr
  Issue Type: New Feature
  Components: multicore, SolrCloud
Affects Versions: 4.0
 Environment: Solr 4.0.0 release
Reporter: Per Steffensen
Assignee: Per Steffensen
  Labels: collection-api, multicore, shard, shard-allocation
 Attachments: SOLR-4114.patch, SOLR-4114.patch


 We should support running multiple shards from one collection on the same 
 Solr server - the run a collection with 8 shards on a 4 Solr server cluster 
 (each Solr server running 2 shards).
 Performance tests at our side has shown that this is a good idea, and it is 
 also a good idea for easy elasticity later on - it is much easier to move an 
 entire existing shards from one Solr server to another one that just joined 
 the cluter than it is to split an exsiting shard among the Solr that used to 
 run it and the new Solr.
 See dev mailing list discussion Multiple shards for one collection on the 
 same Solr server

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



RE: dismax vs edismax

2012-11-28 Thread David Smiley (@MITRE.org)
I absolutely agree with your first point.  For second point, I agree for 
defType (so that it affects 'q') but not all the other numerous spots.  As far 
as pushing down into Lucene; if you haven't noticed, Solr now has its own copy 
of the Lucene's query parser so it can customize it.  This is a good thing that 
was inevitable IMO.
~ David

From: Jack Krupansky-2 [via Lucene] [ml-node+s472066n4022841...@n3.nabble.com]
Sent: Wednesday, November 28, 2012 1:07 AM
To: Smiley, David W.
Subject: Re: dismax vs edismax

My view is that if we simply added an option to edismax to restrict the
syntax to the very limited syntax of dismax, then we could have one, common
xdismax query parser.

And then, why not simply rename the current Solr query parser to classic
and make the new xdismax be the default Solr query parser.

And then... push a lot of the so-called Solr-specific features down into
the Lucene query parser (abstracting away the specifics of Solr schema, Solr
plugin, Solr parameter format, etc.) and then we can have one, unified query
parser for Lucene and Solr. But... not everyone is persuaded!

-- Jack Krupansky

-Original Message-
From: David Smiley (@MITRE.org)
Sent: Tuesday, November 27, 2012 11:43 PM
To: [hidden email]UrlBlockedError.aspx
Subject: dismax vs edismax

It was my hope that by now, the dismax  edismax distinction would be a
thing
of the past, such that we'd simply call this by one name, simply dismax.
From memories of various JIRA commentary, Jan wants this too and made great
progress enhancing edismax, but Hoss pushed back on edismax overtaking
dismax as the one new dismax.  I see this as very unfortunate, as having
both complicates things and makes it harder to write them in books ;-)  I'd
love to simply say dismax without having to say edismax or wonder if
when someone said dismax they meant edismax, etc.  Does anyone see this
changing / progressing?

~ David



-
Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
--
View this message in context:
http://lucene.472066.n3.nabble.com/dismax-vs-edismax-tp4022834.html
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: [hidden email]UrlBlockedError.aspx
For additional commands, e-mail: [hidden email]UrlBlockedError.aspx


-
To unsubscribe, e-mail: [hidden email]UrlBlockedError.aspx
For additional commands, e-mail: [hidden email]UrlBlockedError.aspx




If you reply to this email, your message will be added to the discussion below:
http://lucene.472066.n3.nabble.com/dismax-vs-edismax-tp4022834p4022841.html
To unsubscribe from dismax vs edismax, click 
herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4022834code=RFNNSUxFWUBtaXRyZS5vcmd8NDAyMjgzNHwxMDE2NDI2OTUw.
NAMLhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml




-
 Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
--
View this message in context: 
http://lucene.472066.n3.nabble.com/dismax-vs-edismax-tp4022834p4022957.html
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.

[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server

2012-11-28 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505535#comment-13505535
 ] 

Mark Miller commented on SOLR-4114:
---

bq. Can I add such a feature to this SOLR-4114 and include it in a combined 
patch, or do you prefer another ticket for this change?

My preference would be a new issue. If it has to be done as one piece, I would 
wait for this to go in before supplying the patch for that issue. Or supply a 
patch for that issue and note that it requires applying this patch first. 
Combining multiple issues into one patch just makes it more difficult to get it 
in generally.

 Collection API: Allow multiple shards from one collection on the same Solr 
 server
 -

 Key: SOLR-4114
 URL: https://issues.apache.org/jira/browse/SOLR-4114
 Project: Solr
  Issue Type: New Feature
  Components: multicore, SolrCloud
Affects Versions: 4.0
 Environment: Solr 4.0.0 release
Reporter: Per Steffensen
Assignee: Per Steffensen
  Labels: collection-api, multicore, shard, shard-allocation
 Attachments: SOLR-4114.patch, SOLR-4114.patch


 We should support running multiple shards from one collection on the same 
 Solr server - the run a collection with 8 shards on a 4 Solr server cluster 
 (each Solr server running 2 shards).
 Performance tests at our side has shown that this is a good idea, and it is 
 also a good idea for easy elasticity later on - it is much easier to move an 
 entire existing shards from one Solr server to another one that just joined 
 the cluter than it is to split an exsiting shard among the Solr that used to 
 run it and the new Solr.
 See dev mailing list discussion Multiple shards for one collection on the 
 same Solr server

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server

2012-11-28 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505539#comment-13505539
 ] 

Mark Miller commented on SOLR-4114:
---

bq. you add more machines, they will move to other machines.

Personally, I'm not really sold on this auto re balancing idea. I'd prefer the 
user had to explicitly make these moves.

 Collection API: Allow multiple shards from one collection on the same Solr 
 server
 -

 Key: SOLR-4114
 URL: https://issues.apache.org/jira/browse/SOLR-4114
 Project: Solr
  Issue Type: New Feature
  Components: multicore, SolrCloud
Affects Versions: 4.0
 Environment: Solr 4.0.0 release
Reporter: Per Steffensen
Assignee: Per Steffensen
  Labels: collection-api, multicore, shard, shard-allocation
 Attachments: SOLR-4114.patch, SOLR-4114.patch


 We should support running multiple shards from one collection on the same 
 Solr server - the run a collection with 8 shards on a 4 Solr server cluster 
 (each Solr server running 2 shards).
 Performance tests at our side has shown that this is a good idea, and it is 
 also a good idea for easy elasticity later on - it is much easier to move an 
 entire existing shards from one Solr server to another one that just joined 
 the cluter than it is to split an exsiting shard among the Solr that used to 
 run it and the new Solr.
 See dev mailing list discussion Multiple shards for one collection on the 
 same Solr server

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-4117) IO error while trying to get the size of the Directory

2012-11-28 Thread Eks Dev (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505530#comment-13505530
 ] 

Eks Dev edited comment on SOLR-4117 at 11/28/12 3:27 PM:
-

fwiw, we *think* we observed the following problem in simple master slave setup 
with NRTCachingDirectory... I am not sure it has something to do with issue, 
because ewe did not see this exception, anyhow   

on replication, slave gets the index from master and works fine, then on:
1. graceful restart, the world looks fine 
2. kill -9 or such, solr does not start because an index gets corrupt (should 
actually not happen)

We speculate that solr now does replication directly to Directory 
implementation and does not ensure that replicated files get fsck-ed completely 
after replication. As far as I remember, replication was going to /temp (disk) 
and than moving files if all went ok. Working under assumption that everything 
is already persisted. Maybe this invariant does not hold any more and some 
explicit fsck is needed for caching directories? 

I might be completely wrong, we just observed symptoms in not really 
debug-friendly environment

Here Exception after  hard restart:

Caused by: org.apache.solr.common.SolrException: Error opening new searcher
   at org.apache.solr.core.SolrCore.init(SolrCore.java:804)
   at org.apache.solr.core.SolrCore.init(SolrCore.java:618)
   at org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:973)
   at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1003)
   ... 10 more
Caused by: org.apache.solr.common.SolrException: Error opening new searcher
   at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1441)
   at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1553)
   at org.apache.solr.core.SolrCore.init(SolrCore.java:779)
   ... 13 more
Caused by: java.io.FileNotFoundException: ...\core0\data\index\segments_1 (The 
system cannot find the file specified)
   at java.io.RandomAccessFile.open(Native Method)
   at java.io.RandomAccessFile.init(RandomAccessFile.java:233)
   at org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:222)
   at 
org.apache.lucene.store.NRTCachingDirectory.openInput(NRTCachingDirectory.java:232)
   at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:281)
   at 
org.apache.lucene.index.StandardDirectoryReader$1.doBody(StandardDirectoryReader.java:56)
   at 
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:668)
   at 
org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:52)
   at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:87)
   at 
org.apache.solr.core.StandardIndexReaderFactory.newReader(StandardIndexReaderFactory.java:34)
   at 
org.apache.solr.search.SolrIndexSearcher.init(SolrIndexSearcher.java:120)
   at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1417)

 

  was (Author: eksdev):
fwiw, we *think* we observed the following problem in simple master slave 
setup with NRTCachingDirectory... I am not sure it has something to do with 
issue, because ewe did not see this exception, anyhow   

on replication, slave gets the index from master and works fine, then on:
1. graceful restart, the world looks fine 
2. kill -9 or such, solr does not start because an index gets corrupt (should 
actually not happen)

We speculate that solr now does replication directly to Directory 
implementation and does not ensure that replicated files get fsck-ed completely 
after replication. As far as I remember, replication was going to /temp (disk) 
and than moving files if all went ok. Working under assumption that everything 
is already persisted. Maybe this invariant does not hold any more and some 
explicit fsck is needed for caching directories? 

I might be completely wrong, we just observed symptoms in not really 
debug-friendly environment



 
  
 IO error while trying to get the size of the Directory
 --

 Key: SOLR-4117
 URL: https://issues.apache.org/jira/browse/SOLR-4117
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 5.0
 Environment: 5.0.0.2012.11.28.10.42.06
 Debian Squeeze, Tomcat 6, Sun Java 6, 10 nodes, 10 shards, rep. factor 2.
Reporter: Markus Jelsma
Assignee: Mark Miller
Priority: Minor
 Fix For: 5.0


 With SOLR-4032 fixed we see other issues when randomly taking down nodes 
 (nicely via tomcat restart) while indexing a few million web pages from 
 Hadoop. We do make sure that at least one node is up for a shard but due to 
 recovery issues it may not be live.
 One node seems to work but generates IO errors in the log and 
 ZookeeperExeption in the GUI. 

[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server

2012-11-28 Thread Jack Krupansky (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505541#comment-13505541
 ] 

Jack Krupansky commented on SOLR-4114:
--

I certainly think of replica as a copy of the ORIGINAL, which makes perfect 
sense in a master with n-slaves configuration, but in a fully distributed 
environment such as SolrCloud where the leader of a shard can vary over time 
and updates are distributed to all nodes all of the time, there is no longer 
the concept of an original copy of the data. If anything, the original data 
is the source data on the wire before it gets instantiated on each node. No 
node is truly the original.

The terminology has this difficulty that it is only partially shared between 
the worlds of master/slave and the cloud. In master/slave, only the slaves are 
replicas and the master is the original, while in cloud ALL nodes are replicas 
since there are no originals. The leader is not a master copy of the data 
in the sense of master/slave.

So, I guess I am semi-comfortable with replica referring to all instances of 
the data, but we do need to be careful to highlight the distinction between how 
the term replica is used in the world of master/slave vs. SolrCloud, especially 
since many Cloud users will be migrating from the world of master/slave.

We also need to be careful not to refer to leader and replicas which implies 
that a leader is not a replica!


 Collection API: Allow multiple shards from one collection on the same Solr 
 server
 -

 Key: SOLR-4114
 URL: https://issues.apache.org/jira/browse/SOLR-4114
 Project: Solr
  Issue Type: New Feature
  Components: multicore, SolrCloud
Affects Versions: 4.0
 Environment: Solr 4.0.0 release
Reporter: Per Steffensen
Assignee: Per Steffensen
  Labels: collection-api, multicore, shard, shard-allocation
 Attachments: SOLR-4114.patch, SOLR-4114.patch


 We should support running multiple shards from one collection on the same 
 Solr server - the run a collection with 8 shards on a 4 Solr server cluster 
 (each Solr server running 2 shards).
 Performance tests at our side has shown that this is a good idea, and it is 
 also a good idea for easy elasticity later on - it is much easier to move an 
 entire existing shards from one Solr server to another one that just joined 
 the cluter than it is to split an exsiting shard among the Solr that used to 
 run it and the new Solr.
 See dev mailing list discussion Multiple shards for one collection on the 
 same Solr server

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4117) IO error while trying to get the size of the Directory

2012-11-28 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505542#comment-13505542
 ] 

Mark Miller commented on SOLR-4117:
---

Do you mean fsync rather than fsck (isnt that a file system check?)

That did change in that we are now using the Directory's sync method - but it 
*should* still work the same as before...

2 should not happen though - so we should dig in. I'm guessing it's not related 
to this issue, but we will see.

 IO error while trying to get the size of the Directory
 --

 Key: SOLR-4117
 URL: https://issues.apache.org/jira/browse/SOLR-4117
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 5.0
 Environment: 5.0.0.2012.11.28.10.42.06
 Debian Squeeze, Tomcat 6, Sun Java 6, 10 nodes, 10 shards, rep. factor 2.
Reporter: Markus Jelsma
Assignee: Mark Miller
Priority: Minor
 Fix For: 5.0


 With SOLR-4032 fixed we see other issues when randomly taking down nodes 
 (nicely via tomcat restart) while indexing a few million web pages from 
 Hadoop. We do make sure that at least one node is up for a shard but due to 
 recovery issues it may not be live.
 One node seems to work but generates IO errors in the log and 
 ZookeeperExeption in the GUI. In the GUI we only see:
 {code}
 SolrCore Initialization Failures
 openindex_f: 
 org.apache.solr.common.cloud.ZooKeeperException:org.apache.solr.common.cloud.ZooKeeperException:
  
 Please check your logs for more information
 {code}
 and in the log we only see the following exception:
 {code}
 2012-11-28 11:47:26,652 ERROR [solr.handler.ReplicationHandler] - 
 [http-8080-exec-28] - : IO error while trying to get the size of the 
 Directory:org.apache.lucene.store.NoSuchDirectoryException: directory 
 '/opt/solr/cores/shard_f/data/index' does not exist
 at org.apache.lucene.store.FSDirectory.listAll(FSDirectory.java:217)
 at org.apache.lucene.store.FSDirectory.listAll(FSDirectory.java:240)
 at 
 org.apache.lucene.store.NRTCachingDirectory.listAll(NRTCachingDirectory.java:132)
 at 
 org.apache.solr.core.DirectoryFactory.sizeOfDirectory(DirectoryFactory.java:146)
 at 
 org.apache.solr.handler.ReplicationHandler.getIndexSize(ReplicationHandler.java:472)
 at 
 org.apache.solr.handler.ReplicationHandler.getReplicationDetails(ReplicationHandler.java:568)
 at 
 org.apache.solr.handler.ReplicationHandler.handleRequestBody(ReplicationHandler.java:213)
 at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:144)
 at 
 org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:240)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1830)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:476)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:276)
 at 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
 at 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
 at 
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
 at 
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
 at 
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
 at 
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
 at 
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
 at 
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
 at 
 org.apache.coyote.http11.Http11NioProcessor.process(Http11NioProcessor.java:889)
 at 
 org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.process(Http11NioProtocol.java:744)
 at 
 org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:2274)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server

2012-11-28 Thread Per Steffensen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505547#comment-13505547
 ] 

Per Steffensen commented on SOLR-4114:
--

bq. could not you do same thing as Elastic Search. Build index with number of 
shards (initial number is 5). If there is 1 machine in cluster, then all shards 
are on this machine. If you add more machines, they will move to other 
machines. It is way simple for administration.

This moving shards around as more Solr servers join the cluster is the easiest 
way to provide elasticity (as I mentioned above somewhere). That is one of the 
reasons, that I want to be able to run multiple shards for a collection on the 
same Solr server. In that way you will have shards already to move to other 
Solrs that might join the cluster later.

In Solr, right now, we dont have the abillity to move shards from one server to 
another (ES has it), but in order to be able to bennefit from such a future 
feature, you will need to be able have multiple shards on one Solr server. 
Alternatively you have to go split a shard, but that is much harder, and should 
only be used if you did not forsee, when you created your collection, that you 
would add more servers later, and therefore created your collection with 
multiple shards per server.

 Collection API: Allow multiple shards from one collection on the same Solr 
 server
 -

 Key: SOLR-4114
 URL: https://issues.apache.org/jira/browse/SOLR-4114
 Project: Solr
  Issue Type: New Feature
  Components: multicore, SolrCloud
Affects Versions: 4.0
 Environment: Solr 4.0.0 release
Reporter: Per Steffensen
Assignee: Per Steffensen
  Labels: collection-api, multicore, shard, shard-allocation
 Attachments: SOLR-4114.patch, SOLR-4114.patch


 We should support running multiple shards from one collection on the same 
 Solr server - the run a collection with 8 shards on a 4 Solr server cluster 
 (each Solr server running 2 shards).
 Performance tests at our side has shown that this is a good idea, and it is 
 also a good idea for easy elasticity later on - it is much easier to move an 
 entire existing shards from one Solr server to another one that just joined 
 the cluter than it is to split an exsiting shard among the Solr that used to 
 run it and the new Solr.
 See dev mailing list discussion Multiple shards for one collection on the 
 same Solr server

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4574) FunctionQuery ValueSource value computed twice per document

2012-11-28 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505548#comment-13505548
 ] 

Robert Muir commented on LUCENE-4574:
-

Right, there is more fixing needed for the other collectors and other 
situations.
But I think solr should still be fixed for the common sort-by-score case.

I don't like the duplicate calls to score. I feel like the API should not 
support this. But i don't think caching is the correct solution.
It already frustrates me that there are caches everywhere, for example 
BooleanScorer2 has a super-secret score cache just like this.
I have plans to hunt down and kill all such little caches in lucene. Its not 
the right solution.

The questions for this one is: 
If the user adds relevance as a sort but then also asks to track doc scores/max 
scores, how should the collector work?
I definitely don't like the idea of more specialized collectors: god knows 
there are already too many, but maybe we can avoid this. 

Also: can we speed up this particular query? why is its score so costly?


 FunctionQuery ValueSource value computed twice per document
 ---

 Key: LUCENE-4574
 URL: https://issues.apache.org/jira/browse/LUCENE-4574
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/search
Affects Versions: 4.0, 4.1
Reporter: David Smiley
 Attachments: LUCENE-4574.patch, Test_for_LUCENE-4574.patch


 I was working on a custom ValueSource and did some basic profiling and 
 debugging to see if it was being used optimally.  To my surprise, the value 
 was being fetched twice per document in a row.  This computation isn't 
 exactly cheap to calculate so this is a big problem.  I was able to 
 work-around this problem trivially on my end by caching the last value with 
 corresponding docid in my FunctionValues implementation.
 Here is an excerpt of the code path to the first execution:
 {noformat}
 at 
 org.apache.lucene.queries.function.docvalues.DoubleDocValues.floatVal(DoubleDocValues.java:48)
 at 
 org.apache.lucene.queries.function.FunctionQuery$AllScorer.score(FunctionQuery.java:153)
 at 
 org.apache.lucene.search.TopFieldCollector$OneComparatorScoringMaxScoreCollector.collect(TopFieldCollector.java:291)
 at org.apache.lucene.search.Scorer.score(Scorer.java:62)
 at 
 org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:588)
 at 
 org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:280)
 {noformat}
 And here is the 2nd call:
 {noformat}
 at 
 org.apache.lucene.queries.function.docvalues.DoubleDocValues.floatVal(DoubleDocValues.java:48)
 at 
 org.apache.lucene.queries.function.FunctionQuery$AllScorer.score(FunctionQuery.java:153)
 at 
 org.apache.lucene.search.ScoreCachingWrappingScorer.score(ScoreCachingWrappingScorer.java:56)
 at 
 org.apache.lucene.search.FieldComparator$RelevanceComparator.copy(FieldComparator.java:951)
 at 
 org.apache.lucene.search.TopFieldCollector$OneComparatorScoringMaxScoreCollector.collect(TopFieldCollector.java:312)
 at org.apache.lucene.search.Scorer.score(Scorer.java:62)
 at 
 org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:588)
 at 
 org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:280)
 {noformat}
 The 2nd call appears to use some score caching mechanism, which is all well 
 and good, but that same mechanism wasn't used in the first call so there's no 
 cached value to retrieve.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4055) Remove/Reload the collection has the thread safe issue.

2012-11-28 Thread Commit Tag Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505549#comment-13505549
 ] 

Commit Tag Bot commented on SOLR-4055:
--

[trunk commit] Mark Robert Miller
http://svn.apache.org/viewvc?view=revisionrevision=1414744

SOLR-4055: clone params for create calls



 Remove/Reload the collection has the thread safe issue.
 ---

 Key: SOLR-4055
 URL: https://issues.apache.org/jira/browse/SOLR-4055
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.0-ALPHA, 4.0-BETA, 4.0
 Environment: Solr cloud
Reporter: Raintung Li
Assignee: Mark Miller
 Fix For: 4.1, 5.0

 Attachments: patch-4055


 OverseerCollectionProcessor class for collectionCmd method has thread safe 
 issue.
 The major issue is ModifiableSolrParams params instance will deliver into 
 other thread use(HttpShardHandler.submit). Modify parameter will affect the 
 other threads the correct parameter.
 In the method collectionCmd , change the value 
 params.set(CoreAdminParams.CORE, node.getStr(ZkStateReader.CORE_NAME_PROP)); 
 , that occur send the http request thread will get the wrong core name. The 
 result is that can't delete/reload the right core.
 The easy fix is clone the ModifiableSolrParams for every request.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Trying out the commit bot tagger at a larger scale

2012-11-28 Thread Mark Miller
Well, it happened again. The HTMLStripCharFilter once again somehow showed up 
with local mods in the commit bot git repo…weird.

I'm not sure how to address this yet - for the moment it's a manual process of 
discarding the change and then everything works again. blah…

- Mark

On Nov 26, 2012, at 4:51 PM, Mark Miller markrmil...@gmail.com wrote:

 I took a look at the local repo with a git client and it seemed to show local 
 changes in an HTMLStripCharFilter…odd…
 
 Anyway, I discarded those changes and let the bot run again and it caught up 
 with the missed tags.
 
 Not sure if it will happen again or not (this stuff is pretty isolated and 
 untouched) - please let me know if anyone notices the tags are not being sent 
 out.
 
 - Mark
 
 On Nov 26, 2012, at 4:32 PM, Mark Miller markrmil...@gmail.com wrote:
 
 Thanks for the note - it actually has not been firing lately - I just
 took a look and for some reason it has having trouble doing an update
 (I'm using jgit under the covers). It's claiming there is a conflict
 when I am trying to check out a branch.
 
 I'll solve this and get it kicking again.
 
 It's currently cron'd to run every 2 minutes.
 
 - Mark
 
 On Mon, Nov 26, 2012 at 4:04 PM, David Smiley (@MITRE.org)
 dsmi...@mitre.org wrote:
 Mark,
 Do I need to do anything for the bot to make its comment, aside from the
 commit?  I just made a commit to both branches.  How much delay is there /
 i.e. what's its schedule?
 ~ David
 
 
 
 -
 Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Trying-out-the-commit-bot-tagger-at-a-larger-scale-tp4021178p4022451.html
 Sent from the Lucene - Java Developer mailing list archive at Nabble.com.
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org
 
 
 
 
 -- 
 - Mark
 


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3377) eDismax: A fielded query wrapped by parens is not recognized

2012-11-28 Thread Leonhard Maylein (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505554#comment-13505554
 ] 

Leonhard Maylein commented on SOLR-3377:


Ok, I understand.
The phrase boost queries are separated from the normal query expansion via the 
qf paramter.

But, all terms are (equally) qualified by a field (field sw for the terms a and 
b, field ti for the terms c and d).
Why do the eDismax handler only use the terms b and d to build the phrase boost 
query?
Isn't it a bug?



 eDismax: A fielded query wrapped by parens is not recognized
 

 Key: SOLR-3377
 URL: https://issues.apache.org/jira/browse/SOLR-3377
 Project: Solr
  Issue Type: Bug
  Components: query parsers
Affects Versions: 3.6
Reporter: Jan Høydahl
Assignee: Yonik Seeley
Priority: Critical
 Fix For: 4.0-BETA

 Attachments: SOLR-3377.patch, SOLR-3377.patch, SOLR-3377.patch, 
 SOLR-3377.patch


 As reported by bernd on the user list, a query like this
 {{q=(name:test)}}
 will yield 0 hits in 3.6 while it worked in 3.5. It works without the parens.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-4577) Nuke TFIDFSim's cache

2012-11-28 Thread Robert Muir (JIRA)
Robert Muir created LUCENE-4577:
---

 Summary: Nuke TFIDFSim's cache
 Key: LUCENE-4577
 URL: https://issues.apache.org/jira/browse/LUCENE-4577
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir


This is the old termscorer cache. 

This helps nothing, and maybe hurts: I removed it and here are the results:
{noformat}
Chart saved to out.png... (wd: 
/home/rmuir/workspace/lucene-trunk/lucene/benchmark)
TaskQPS base  StdDev   QPS patch  StdDev
Pct diff
 TermGroup1M   52.87  (2.2%)   52.62  (2.4%)   
-0.5% (  -4% -4%)
  AndHighMed   34.82  (2.8%)   34.70  (3.6%)   
-0.3% (  -6% -6%)
SpanNear6.28  (5.3%)6.26  (3.9%)   
-0.2% (  -8% -9%)
  IntNRQ   13.24 (11.0%)   13.24  (9.9%)
0.0% ( -18% -   23%)
 Prefix3   42.19  (7.6%)   42.21  (7.0%)
0.1% ( -13% -   15%)
Wildcard   36.90  (6.8%)   37.02  (5.9%)
0.3% ( -11% -   13%)
 AndHighHigh   25.68  (4.5%)   25.79  (3.2%)
0.5% (  -6% -8%)
  Phrase9.28  (4.7%)9.35  (4.4%)
0.7% (  -8% -   10%)
TermBGroup1M   45.76  (6.3%)   46.10  (3.2%)
0.7% (  -8% -   10%)
SloppyPhrase   10.25  (3.9%)   10.33  (4.4%)
0.8% (  -7% -9%)
  OrHighHigh8.87  (6.4%)8.97  (6.7%)
1.1% ( -11% -   15%)
  Fuzzy1   70.28  (4.3%)   71.24  (7.1%)
1.4% (  -9% -   13%)
   OrHighMed   10.70  (7.0%)   10.86  (6.4%)
1.5% ( -11% -   15%)
  Fuzzy2   27.79  (6.1%)   28.31  (5.1%)
1.9% (  -8% -   13%)
 Respell   71.72  (6.8%)   73.39  (3.7%)
2.3% (  -7% -   13%)
Term  209.49  (4.4%)  214.58  (3.7%)
2.4% (  -5% -   11%)
  TermBGroup1M1P7.10  (5.1%)7.48  (7.8%)
5.3% (  -7% -   19%)
{noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4117) IO error while trying to get the size of the Directory

2012-11-28 Thread Eks Dev (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1350#comment-1350
 ] 

Eks Dev commented on SOLR-4117:
---

fsync of course, fsck was intended for my terminal window :) 

 IO error while trying to get the size of the Directory
 --

 Key: SOLR-4117
 URL: https://issues.apache.org/jira/browse/SOLR-4117
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 5.0
 Environment: 5.0.0.2012.11.28.10.42.06
 Debian Squeeze, Tomcat 6, Sun Java 6, 10 nodes, 10 shards, rep. factor 2.
Reporter: Markus Jelsma
Assignee: Mark Miller
Priority: Minor
 Fix For: 5.0


 With SOLR-4032 fixed we see other issues when randomly taking down nodes 
 (nicely via tomcat restart) while indexing a few million web pages from 
 Hadoop. We do make sure that at least one node is up for a shard but due to 
 recovery issues it may not be live.
 One node seems to work but generates IO errors in the log and 
 ZookeeperExeption in the GUI. In the GUI we only see:
 {code}
 SolrCore Initialization Failures
 openindex_f: 
 org.apache.solr.common.cloud.ZooKeeperException:org.apache.solr.common.cloud.ZooKeeperException:
  
 Please check your logs for more information
 {code}
 and in the log we only see the following exception:
 {code}
 2012-11-28 11:47:26,652 ERROR [solr.handler.ReplicationHandler] - 
 [http-8080-exec-28] - : IO error while trying to get the size of the 
 Directory:org.apache.lucene.store.NoSuchDirectoryException: directory 
 '/opt/solr/cores/shard_f/data/index' does not exist
 at org.apache.lucene.store.FSDirectory.listAll(FSDirectory.java:217)
 at org.apache.lucene.store.FSDirectory.listAll(FSDirectory.java:240)
 at 
 org.apache.lucene.store.NRTCachingDirectory.listAll(NRTCachingDirectory.java:132)
 at 
 org.apache.solr.core.DirectoryFactory.sizeOfDirectory(DirectoryFactory.java:146)
 at 
 org.apache.solr.handler.ReplicationHandler.getIndexSize(ReplicationHandler.java:472)
 at 
 org.apache.solr.handler.ReplicationHandler.getReplicationDetails(ReplicationHandler.java:568)
 at 
 org.apache.solr.handler.ReplicationHandler.handleRequestBody(ReplicationHandler.java:213)
 at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:144)
 at 
 org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:240)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1830)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:476)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:276)
 at 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
 at 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
 at 
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
 at 
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
 at 
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
 at 
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
 at 
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
 at 
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
 at 
 org.apache.coyote.http11.Http11NioProcessor.process(Http11NioProcessor.java:889)
 at 
 org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.process(Http11NioProtocol.java:744)
 at 
 org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:2274)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server

2012-11-28 Thread Per Steffensen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505556#comment-13505556
 ] 

Per Steffensen commented on SOLR-4114:
--

bq. Personally, I'm not really sold on this auto re balancing idea. I'd prefer 
the user had to explicitly make these moves.

Me neither - and I can say that ES sometimes f it up. At least when I was 
working with it, but that was mainly because of bad re-balancing algoritms. But 
I like moving shards manually from a admin-console!

 Collection API: Allow multiple shards from one collection on the same Solr 
 server
 -

 Key: SOLR-4114
 URL: https://issues.apache.org/jira/browse/SOLR-4114
 Project: Solr
  Issue Type: New Feature
  Components: multicore, SolrCloud
Affects Versions: 4.0
 Environment: Solr 4.0.0 release
Reporter: Per Steffensen
Assignee: Per Steffensen
  Labels: collection-api, multicore, shard, shard-allocation
 Attachments: SOLR-4114.patch, SOLR-4114.patch


 We should support running multiple shards from one collection on the same 
 Solr server - the run a collection with 8 shards on a 4 Solr server cluster 
 (each Solr server running 2 shards).
 Performance tests at our side has shown that this is a good idea, and it is 
 also a good idea for easy elasticity later on - it is much easier to move an 
 entire existing shards from one Solr server to another one that just joined 
 the cluter than it is to split an exsiting shard among the Solr that used to 
 run it and the new Solr.
 See dev mailing list discussion Multiple shards for one collection on the 
 same Solr server

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4577) Nuke TFIDFSim's cache

2012-11-28 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-4577:


Attachment: LUCENE-4577.patch

 Nuke TFIDFSim's cache
 -

 Key: LUCENE-4577
 URL: https://issues.apache.org/jira/browse/LUCENE-4577
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: LUCENE-4577.patch


 This is the old termscorer cache. 
 This helps nothing, and maybe hurts: I removed it and here are the results:
 {noformat}
 Chart saved to out.png... (wd: 
 /home/rmuir/workspace/lucene-trunk/lucene/benchmark)
 TaskQPS base  StdDev   QPS patch  StdDev  
   Pct diff
  TermGroup1M   52.87  (2.2%)   52.62  (2.4%)   
 -0.5% (  -4% -4%)
   AndHighMed   34.82  (2.8%)   34.70  (3.6%)   
 -0.3% (  -6% -6%)
 SpanNear6.28  (5.3%)6.26  (3.9%)   
 -0.2% (  -8% -9%)
   IntNRQ   13.24 (11.0%)   13.24  (9.9%)
 0.0% ( -18% -   23%)
  Prefix3   42.19  (7.6%)   42.21  (7.0%)
 0.1% ( -13% -   15%)
 Wildcard   36.90  (6.8%)   37.02  (5.9%)
 0.3% ( -11% -   13%)
  AndHighHigh   25.68  (4.5%)   25.79  (3.2%)
 0.5% (  -6% -8%)
   Phrase9.28  (4.7%)9.35  (4.4%)
 0.7% (  -8% -   10%)
 TermBGroup1M   45.76  (6.3%)   46.10  (3.2%)
 0.7% (  -8% -   10%)
 SloppyPhrase   10.25  (3.9%)   10.33  (4.4%)
 0.8% (  -7% -9%)
   OrHighHigh8.87  (6.4%)8.97  (6.7%)
 1.1% ( -11% -   15%)
   Fuzzy1   70.28  (4.3%)   71.24  (7.1%)
 1.4% (  -9% -   13%)
OrHighMed   10.70  (7.0%)   10.86  (6.4%)
 1.5% ( -11% -   15%)
   Fuzzy2   27.79  (6.1%)   28.31  (5.1%)
 1.9% (  -8% -   13%)
  Respell   71.72  (6.8%)   73.39  (3.7%)
 2.3% (  -7% -   13%)
 Term  209.49  (4.4%)  214.58  (3.7%)
 2.4% (  -5% -   11%)
   TermBGroup1M1P7.10  (5.1%)7.48  (7.8%)
 5.3% (  -7% -   19%)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server

2012-11-28 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505558#comment-13505558
 ] 

Mark Miller commented on SOLR-4114:
---

I've committed the shared params issue under SOLR-4055 and added Per to the 
Changes entry.

 Collection API: Allow multiple shards from one collection on the same Solr 
 server
 -

 Key: SOLR-4114
 URL: https://issues.apache.org/jira/browse/SOLR-4114
 Project: Solr
  Issue Type: New Feature
  Components: multicore, SolrCloud
Affects Versions: 4.0
 Environment: Solr 4.0.0 release
Reporter: Per Steffensen
Assignee: Per Steffensen
  Labels: collection-api, multicore, shard, shard-allocation
 Attachments: SOLR-4114.patch, SOLR-4114.patch


 We should support running multiple shards from one collection on the same 
 Solr server - the run a collection with 8 shards on a 4 Solr server cluster 
 (each Solr server running 2 shards).
 Performance tests at our side has shown that this is a good idea, and it is 
 also a good idea for easy elasticity later on - it is much easier to move an 
 entire existing shards from one Solr server to another one that just joined 
 the cluter than it is to split an exsiting shard among the Solr that used to 
 run it and the new Solr.
 See dev mailing list discussion Multiple shards for one collection on the 
 same Solr server

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4055) Remove/Reload the collection has the thread safe issue.

2012-11-28 Thread Commit Tag Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505559#comment-13505559
 ] 

Commit Tag Bot commented on SOLR-4055:
--

[branch_4x commit] Mark Robert Miller
http://svn.apache.org/viewvc?view=revisionrevision=1414760

SOLR-4055: clone params for create calls



 Remove/Reload the collection has the thread safe issue.
 ---

 Key: SOLR-4055
 URL: https://issues.apache.org/jira/browse/SOLR-4055
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.0-ALPHA, 4.0-BETA, 4.0
 Environment: Solr cloud
Reporter: Raintung Li
Assignee: Mark Miller
 Fix For: 4.1, 5.0

 Attachments: patch-4055


 OverseerCollectionProcessor class for collectionCmd method has thread safe 
 issue.
 The major issue is ModifiableSolrParams params instance will deliver into 
 other thread use(HttpShardHandler.submit). Modify parameter will affect the 
 other threads the correct parameter.
 In the method collectionCmd , change the value 
 params.set(CoreAdminParams.CORE, node.getStr(ZkStateReader.CORE_NAME_PROP)); 
 , that occur send the http request thread will get the wrong core name. The 
 result is that can't delete/reload the right core.
 The easy fix is clone the ModifiableSolrParams for every request.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server

2012-11-28 Thread Per Steffensen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505562#comment-13505562
 ] 

Per Steffensen commented on SOLR-4114:
--

bq. When grabbing the params fix I noticed you set the data dir to something 
like shardname+_data - that's not strictly necessary right? Since each core 
should have it's own instance dir

Well I use the same instance-dir for all shards, but a different data-dir - 
this is just how we used to do it in my project, but it can be changed. As long 
as the code uses same instance-dir different data-dirs are necessary though.

 Collection API: Allow multiple shards from one collection on the same Solr 
 server
 -

 Key: SOLR-4114
 URL: https://issues.apache.org/jira/browse/SOLR-4114
 Project: Solr
  Issue Type: New Feature
  Components: multicore, SolrCloud
Affects Versions: 4.0
 Environment: Solr 4.0.0 release
Reporter: Per Steffensen
Assignee: Per Steffensen
  Labels: collection-api, multicore, shard, shard-allocation
 Attachments: SOLR-4114.patch, SOLR-4114.patch


 We should support running multiple shards from one collection on the same 
 Solr server - the run a collection with 8 shards on a 4 Solr server cluster 
 (each Solr server running 2 shards).
 Performance tests at our side has shown that this is a good idea, and it is 
 also a good idea for easy elasticity later on - it is much easier to move an 
 entire existing shards from one Solr server to another one that just joined 
 the cluter than it is to split an exsiting shard among the Solr that used to 
 run it and the new Solr.
 See dev mailing list discussion Multiple shards for one collection on the 
 same Solr server

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server

2012-11-28 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505564#comment-13505564
 ] 

Yonik Seeley commented on SOLR-4114:


bq. I would expect replication-factor to say something about how many times 
the data is REPLICATED.

I would too, but we would still disagree on what that meant since I would 
interpret the number of times the data is replicated to mean the total number 
of copies that exist after a write operation to the cluster.  That seems to be 
the much more common interpretation in this context since there is no 
original... everyone has stored/indexed a copy.

$ echo hello  file1.txt
$ cp file1.txt file2.txt

How many copies of the file are there? If you look at the state (and not the 
mechanism by which you arrived there) most would say there are 2 copies.
In one interpretation, there is only one copy, but that's too literal and 
assignes some special category to the original.


http://hadoop.apache.org/docs/r0.20.2/hdfs_design.html
The number of copies of a file is called the replication factor of that file.

http://www.datastax.com/docs/1.0/cluster_architecture/replication
The total number of replicas across the cluster is referred to as the 
replication factor. A replication factor of 1 means that there is only one copy 
of each row on one node.

Oracle NoSQL store:
http://docs.oracle.com/cd/NOSQL/html/AdminGuide/introduction.html#replicationfactor
http://docs.oracle.com/cd/NOSQL/html/AdminGuide/store-config.html
A Replication Factor of 3 gives you shards with one master plus two replicas.

Riak:
http://wiki.basho.com/What-is-Riak%3F.html
An n value of 3 (default) means that each object is replicated 3 times. When 
an object’s key is mapped onto a given partition, Riak won’t stop there – it 
automatically replicates the data onto the next two partitions as well.

Splunk:
http://docs.splunk.com/Documentation/Splunk/latest/Indexer/Thereplicationfactor
The number of data/bucket copies is called the cluster's replication factor.
The cluster can tolerate a failure of (replication factor - 1) peer nodes. So, 
for example, to ensure that your system can tolerate a failure of two peers, 
you must configure a replication factor of 3, which means that the cluster 
stores three identical copies of each bucket on separate nodes. With a 
replication factor of 3, you can be certain that all your data will be 
available if no more than two peer nodes in the cluster fail. With two nodes 
down, you still have one complete copy of your data available on the remaining 
peer(s).

It's clear that 3 copies means 3 total instances of the same data, not 4 (an 
original plus 3 more copies of it.)


 Collection API: Allow multiple shards from one collection on the same Solr 
 server
 -

 Key: SOLR-4114
 URL: https://issues.apache.org/jira/browse/SOLR-4114
 Project: Solr
  Issue Type: New Feature
  Components: multicore, SolrCloud
Affects Versions: 4.0
 Environment: Solr 4.0.0 release
Reporter: Per Steffensen
Assignee: Per Steffensen
  Labels: collection-api, multicore, shard, shard-allocation
 Attachments: SOLR-4114.patch, SOLR-4114.patch


 We should support running multiple shards from one collection on the same 
 Solr server - the run a collection with 8 shards on a 4 Solr server cluster 
 (each Solr server running 2 shards).
 Performance tests at our side has shown that this is a good idea, and it is 
 also a good idea for easy elasticity later on - it is much easier to move an 
 entire existing shards from one Solr server to another one that just joined 
 the cluter than it is to split an exsiting shard among the Solr that used to 
 run it and the new Solr.
 See dev mailing list discussion Multiple shards for one collection on the 
 same Solr server

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server

2012-11-28 Thread Per Steffensen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505565#comment-13505565
 ] 

Per Steffensen commented on SOLR-4114:
--

bq. I've committed the shared params issue under SOLR-4055 and added Per to the 
Changes entry.

On which branch are you committing, Mark?

 Collection API: Allow multiple shards from one collection on the same Solr 
 server
 -

 Key: SOLR-4114
 URL: https://issues.apache.org/jira/browse/SOLR-4114
 Project: Solr
  Issue Type: New Feature
  Components: multicore, SolrCloud
Affects Versions: 4.0
 Environment: Solr 4.0.0 release
Reporter: Per Steffensen
Assignee: Per Steffensen
  Labels: collection-api, multicore, shard, shard-allocation
 Attachments: SOLR-4114.patch, SOLR-4114.patch


 We should support running multiple shards from one collection on the same 
 Solr server - the run a collection with 8 shards on a 4 Solr server cluster 
 (each Solr server running 2 shards).
 Performance tests at our side has shown that this is a good idea, and it is 
 also a good idea for easy elasticity later on - it is much easier to move an 
 entire existing shards from one Solr server to another one that just joined 
 the cluter than it is to split an exsiting shard among the Solr that used to 
 run it and the new Solr.
 See dev mailing list discussion Multiple shards for one collection on the 
 same Solr server

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-4055) Remove/Reload the collection has the thread safe issue.

2012-11-28 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller resolved SOLR-4055.
---

Resolution: Fixed

 Remove/Reload the collection has the thread safe issue.
 ---

 Key: SOLR-4055
 URL: https://issues.apache.org/jira/browse/SOLR-4055
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.0-ALPHA, 4.0-BETA, 4.0
 Environment: Solr cloud
Reporter: Raintung Li
Assignee: Mark Miller
 Fix For: 4.1, 5.0

 Attachments: patch-4055


 OverseerCollectionProcessor class for collectionCmd method has thread safe 
 issue.
 The major issue is ModifiableSolrParams params instance will deliver into 
 other thread use(HttpShardHandler.submit). Modify parameter will affect the 
 other threads the correct parameter.
 In the method collectionCmd , change the value 
 params.set(CoreAdminParams.CORE, node.getStr(ZkStateReader.CORE_NAME_PROP)); 
 , that occur send the http request thread will get the wrong core name. The 
 result is that can't delete/reload the right core.
 The easy fix is clone the ModifiableSolrParams for every request.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server

2012-11-28 Thread Per Steffensen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505570#comment-13505570
 ] 

Per Steffensen commented on SOLR-4114:
--

bq. I would too, but we would still disagree on what that meant since I would 
interpret the number of times the data is replicated...

I actually agree with you. I just dont like replica to part of the name for 
it then. If we rename replication-factor to number-of-copies or something I 
would be much happier changing the semantics of it :-) But really, this is 
another issue.

 Collection API: Allow multiple shards from one collection on the same Solr 
 server
 -

 Key: SOLR-4114
 URL: https://issues.apache.org/jira/browse/SOLR-4114
 Project: Solr
  Issue Type: New Feature
  Components: multicore, SolrCloud
Affects Versions: 4.0
 Environment: Solr 4.0.0 release
Reporter: Per Steffensen
Assignee: Per Steffensen
  Labels: collection-api, multicore, shard, shard-allocation
 Attachments: SOLR-4114.patch, SOLR-4114.patch


 We should support running multiple shards from one collection on the same 
 Solr server - the run a collection with 8 shards on a 4 Solr server cluster 
 (each Solr server running 2 shards).
 Performance tests at our side has shown that this is a good idea, and it is 
 also a good idea for easy elasticity later on - it is much easier to move an 
 entire existing shards from one Solr server to another one that just joined 
 the cluter than it is to split an exsiting shard among the Solr that used to 
 run it and the new Solr.
 See dev mailing list discussion Multiple shards for one collection on the 
 same Solr server

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server

2012-11-28 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505571#comment-13505571
 ] 

Mark Miller commented on SOLR-4114:
---

bq. On which branch are you committing, Mark?

5x and then merged to 4x - just that small fix though - have not had a chance 
to review this patch fully yet.

 Collection API: Allow multiple shards from one collection on the same Solr 
 server
 -

 Key: SOLR-4114
 URL: https://issues.apache.org/jira/browse/SOLR-4114
 Project: Solr
  Issue Type: New Feature
  Components: multicore, SolrCloud
Affects Versions: 4.0
 Environment: Solr 4.0.0 release
Reporter: Per Steffensen
Assignee: Per Steffensen
  Labels: collection-api, multicore, shard, shard-allocation
 Attachments: SOLR-4114.patch, SOLR-4114.patch


 We should support running multiple shards from one collection on the same 
 Solr server - the run a collection with 8 shards on a 4 Solr server cluster 
 (each Solr server running 2 shards).
 Performance tests at our side has shown that this is a good idea, and it is 
 also a good idea for easy elasticity later on - it is much easier to move an 
 entire existing shards from one Solr server to another one that just joined 
 the cluter than it is to split an exsiting shard among the Solr that used to 
 run it and the new Solr.
 See dev mailing list discussion Multiple shards for one collection on the 
 same Solr server

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server

2012-11-28 Thread Per Steffensen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505575#comment-13505575
 ] 

Per Steffensen commented on SOLR-4114:
--

Well Im off for today. Will probably (if my POs head does not turn green) be 
making the spread-shards-according-to-provided-list feature tomorrow. If you 
commit the entire patch for SOLR-4114 it will be easier for me to provide a new 
patch for this new feature and attach it to a new issue :-)

 Collection API: Allow multiple shards from one collection on the same Solr 
 server
 -

 Key: SOLR-4114
 URL: https://issues.apache.org/jira/browse/SOLR-4114
 Project: Solr
  Issue Type: New Feature
  Components: multicore, SolrCloud
Affects Versions: 4.0
 Environment: Solr 4.0.0 release
Reporter: Per Steffensen
Assignee: Per Steffensen
  Labels: collection-api, multicore, shard, shard-allocation
 Attachments: SOLR-4114.patch, SOLR-4114.patch


 We should support running multiple shards from one collection on the same 
 Solr server - the run a collection with 8 shards on a 4 Solr server cluster 
 (each Solr server running 2 shards).
 Performance tests at our side has shown that this is a good idea, and it is 
 also a good idea for easy elasticity later on - it is much easier to move an 
 entire existing shards from one Solr server to another one that just joined 
 the cluter than it is to split an exsiting shard among the Solr that used to 
 run it and the new Solr.
 See dev mailing list discussion Multiple shards for one collection on the 
 same Solr server

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-4118) fix replicationFactor to align with industry usage

2012-11-28 Thread Yonik Seeley (JIRA)
Yonik Seeley created SOLR-4118:
--

 Summary: fix replicationFactor to align with industry usage 
 Key: SOLR-4118
 URL: https://issues.apache.org/jira/browse/SOLR-4118
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.0
Reporter: Yonik Seeley
Priority: Minor
 Fix For: 4.1


replicationFactor should be the number of different nodes that have a document.
See discussion in SOLR-4114

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server

2012-11-28 Thread Per Steffensen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505578#comment-13505578
 ] 

Per Steffensen commented on SOLR-4114:
--

bq. 5x and then merged to 4x - just that small fix though - have not had a 
chance to review this patch fully yet.

But is it also going to be backported to lucene_solr_4_0, which is actually the 
branch I am working on top of?

 Collection API: Allow multiple shards from one collection on the same Solr 
 server
 -

 Key: SOLR-4114
 URL: https://issues.apache.org/jira/browse/SOLR-4114
 Project: Solr
  Issue Type: New Feature
  Components: multicore, SolrCloud
Affects Versions: 4.0
 Environment: Solr 4.0.0 release
Reporter: Per Steffensen
Assignee: Per Steffensen
  Labels: collection-api, multicore, shard, shard-allocation
 Attachments: SOLR-4114.patch, SOLR-4114.patch


 We should support running multiple shards from one collection on the same 
 Solr server - the run a collection with 8 shards on a 4 Solr server cluster 
 (each Solr server running 2 shards).
 Performance tests at our side has shown that this is a good idea, and it is 
 also a good idea for easy elasticity later on - it is much easier to move an 
 entire existing shards from one Solr server to another one that just joined 
 the cluter than it is to split an exsiting shard among the Solr that used to 
 run it and the new Solr.
 See dev mailing list discussion Multiple shards for one collection on the 
 same Solr server

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4117) IO error while trying to get the size of the Directory

2012-11-28 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505581#comment-13505581
 ] 

Mark Miller commented on SOLR-4117:
---

Markus, I'm about to commit a fix to this issue - but I doubt it's the same as 
the issue you then mention in a comment.

 IO error while trying to get the size of the Directory
 --

 Key: SOLR-4117
 URL: https://issues.apache.org/jira/browse/SOLR-4117
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 5.0
 Environment: 5.0.0.2012.11.28.10.42.06
 Debian Squeeze, Tomcat 6, Sun Java 6, 10 nodes, 10 shards, rep. factor 2.
Reporter: Markus Jelsma
Assignee: Mark Miller
Priority: Minor
 Fix For: 5.0


 With SOLR-4032 fixed we see other issues when randomly taking down nodes 
 (nicely via tomcat restart) while indexing a few million web pages from 
 Hadoop. We do make sure that at least one node is up for a shard but due to 
 recovery issues it may not be live.
 One node seems to work but generates IO errors in the log and 
 ZookeeperExeption in the GUI. In the GUI we only see:
 {code}
 SolrCore Initialization Failures
 openindex_f: 
 org.apache.solr.common.cloud.ZooKeeperException:org.apache.solr.common.cloud.ZooKeeperException:
  
 Please check your logs for more information
 {code}
 and in the log we only see the following exception:
 {code}
 2012-11-28 11:47:26,652 ERROR [solr.handler.ReplicationHandler] - 
 [http-8080-exec-28] - : IO error while trying to get the size of the 
 Directory:org.apache.lucene.store.NoSuchDirectoryException: directory 
 '/opt/solr/cores/shard_f/data/index' does not exist
 at org.apache.lucene.store.FSDirectory.listAll(FSDirectory.java:217)
 at org.apache.lucene.store.FSDirectory.listAll(FSDirectory.java:240)
 at 
 org.apache.lucene.store.NRTCachingDirectory.listAll(NRTCachingDirectory.java:132)
 at 
 org.apache.solr.core.DirectoryFactory.sizeOfDirectory(DirectoryFactory.java:146)
 at 
 org.apache.solr.handler.ReplicationHandler.getIndexSize(ReplicationHandler.java:472)
 at 
 org.apache.solr.handler.ReplicationHandler.getReplicationDetails(ReplicationHandler.java:568)
 at 
 org.apache.solr.handler.ReplicationHandler.handleRequestBody(ReplicationHandler.java:213)
 at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:144)
 at 
 org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:240)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1830)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:476)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:276)
 at 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
 at 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
 at 
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
 at 
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
 at 
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
 at 
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
 at 
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
 at 
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
 at 
 org.apache.coyote.http11.Http11NioProcessor.process(Http11NioProcessor.java:889)
 at 
 org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.process(Http11NioProtocol.java:744)
 at 
 org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:2274)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4574) FunctionQuery ValueSource value computed twice per document

2012-11-28 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505582#comment-13505582
 ] 

David Smiley commented on LUCENE-4574:
--

I don't have any conviction on what the right answer should be; this area of 
Lucene is not one I've explored before.  If scorer.score() is cheap in general 
(is it?), then I can see your reservations.  Perhaps the solution is to only 
cache specific Scorers that are or could be expensive.  So for me this means 
adding the cache at FunctionQuery$AllScorer.  This cache is as lightweight as a 
cache can possibly be, remember; no hashtable lookup, just a docid comparison 
with branch.

bq. Also: can we speed up this particular query? why is its score so costly?

It's a FunctionQuery tied to a ValueSource doing spatial distance.  Applying 
this very simple cache on my custom ValueSource cut my response time in nearly 
a half!


 FunctionQuery ValueSource value computed twice per document
 ---

 Key: LUCENE-4574
 URL: https://issues.apache.org/jira/browse/LUCENE-4574
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/search
Affects Versions: 4.0, 4.1
Reporter: David Smiley
 Attachments: LUCENE-4574.patch, Test_for_LUCENE-4574.patch


 I was working on a custom ValueSource and did some basic profiling and 
 debugging to see if it was being used optimally.  To my surprise, the value 
 was being fetched twice per document in a row.  This computation isn't 
 exactly cheap to calculate so this is a big problem.  I was able to 
 work-around this problem trivially on my end by caching the last value with 
 corresponding docid in my FunctionValues implementation.
 Here is an excerpt of the code path to the first execution:
 {noformat}
 at 
 org.apache.lucene.queries.function.docvalues.DoubleDocValues.floatVal(DoubleDocValues.java:48)
 at 
 org.apache.lucene.queries.function.FunctionQuery$AllScorer.score(FunctionQuery.java:153)
 at 
 org.apache.lucene.search.TopFieldCollector$OneComparatorScoringMaxScoreCollector.collect(TopFieldCollector.java:291)
 at org.apache.lucene.search.Scorer.score(Scorer.java:62)
 at 
 org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:588)
 at 
 org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:280)
 {noformat}
 And here is the 2nd call:
 {noformat}
 at 
 org.apache.lucene.queries.function.docvalues.DoubleDocValues.floatVal(DoubleDocValues.java:48)
 at 
 org.apache.lucene.queries.function.FunctionQuery$AllScorer.score(FunctionQuery.java:153)
 at 
 org.apache.lucene.search.ScoreCachingWrappingScorer.score(ScoreCachingWrappingScorer.java:56)
 at 
 org.apache.lucene.search.FieldComparator$RelevanceComparator.copy(FieldComparator.java:951)
 at 
 org.apache.lucene.search.TopFieldCollector$OneComparatorScoringMaxScoreCollector.collect(TopFieldCollector.java:312)
 at org.apache.lucene.search.Scorer.score(Scorer.java:62)
 at 
 org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:588)
 at 
 org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:280)
 {noformat}
 The 2nd call appears to use some score caching mechanism, which is all well 
 and good, but that same mechanism wasn't used in the first call so there's no 
 cached value to retrieve.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4117) IO error while trying to get the size of the Directory

2012-11-28 Thread Commit Tag Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505584#comment-13505584
 ] 

Commit Tag Bot commented on SOLR-4117:
--

[trunk commit] Mark Robert Miller
http://svn.apache.org/viewvc?view=revisionrevision=1414773

SOLR-4117: Retrieving the size of the index may use the wrong index dir if you 
are replicating.



 IO error while trying to get the size of the Directory
 --

 Key: SOLR-4117
 URL: https://issues.apache.org/jira/browse/SOLR-4117
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 5.0
 Environment: 5.0.0.2012.11.28.10.42.06
 Debian Squeeze, Tomcat 6, Sun Java 6, 10 nodes, 10 shards, rep. factor 2.
Reporter: Markus Jelsma
Assignee: Mark Miller
Priority: Minor
 Fix For: 5.0


 With SOLR-4032 fixed we see other issues when randomly taking down nodes 
 (nicely via tomcat restart) while indexing a few million web pages from 
 Hadoop. We do make sure that at least one node is up for a shard but due to 
 recovery issues it may not be live.
 One node seems to work but generates IO errors in the log and 
 ZookeeperExeption in the GUI. In the GUI we only see:
 {code}
 SolrCore Initialization Failures
 openindex_f: 
 org.apache.solr.common.cloud.ZooKeeperException:org.apache.solr.common.cloud.ZooKeeperException:
  
 Please check your logs for more information
 {code}
 and in the log we only see the following exception:
 {code}
 2012-11-28 11:47:26,652 ERROR [solr.handler.ReplicationHandler] - 
 [http-8080-exec-28] - : IO error while trying to get the size of the 
 Directory:org.apache.lucene.store.NoSuchDirectoryException: directory 
 '/opt/solr/cores/shard_f/data/index' does not exist
 at org.apache.lucene.store.FSDirectory.listAll(FSDirectory.java:217)
 at org.apache.lucene.store.FSDirectory.listAll(FSDirectory.java:240)
 at 
 org.apache.lucene.store.NRTCachingDirectory.listAll(NRTCachingDirectory.java:132)
 at 
 org.apache.solr.core.DirectoryFactory.sizeOfDirectory(DirectoryFactory.java:146)
 at 
 org.apache.solr.handler.ReplicationHandler.getIndexSize(ReplicationHandler.java:472)
 at 
 org.apache.solr.handler.ReplicationHandler.getReplicationDetails(ReplicationHandler.java:568)
 at 
 org.apache.solr.handler.ReplicationHandler.handleRequestBody(ReplicationHandler.java:213)
 at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:144)
 at 
 org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:240)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1830)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:476)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:276)
 at 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
 at 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
 at 
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
 at 
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
 at 
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
 at 
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
 at 
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
 at 
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
 at 
 org.apache.coyote.http11.Http11NioProcessor.process(Http11NioProcessor.java:889)
 at 
 org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.process(Http11NioProtocol.java:744)
 at 
 org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:2274)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-4118) fix replicationFactor to align with industry usage

2012-11-28 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated SOLR-4118:
--

Fix Version/s: 5.0

 fix replicationFactor to align with industry usage 
 ---

 Key: SOLR-4118
 URL: https://issues.apache.org/jira/browse/SOLR-4118
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.0
Reporter: Yonik Seeley
Priority: Minor
 Fix For: 4.1, 5.0


 replicationFactor should be the number of different nodes that have a 
 document.
 See discussion in SOLR-4114

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server

2012-11-28 Thread Per Steffensen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505570#comment-13505570
 ] 

Per Steffensen edited comment on SOLR-4114 at 11/28/12 3:59 PM:


bq. I would too, but we would still disagree on what that meant since I would 
interpret the number of times the data is replicated...

I actually agree with you. I just dont like replication to part of the name 
for it then. If we rename replication-factor to number-of-copies or even 
number-of-replica or something I would be much happier changing the semantics 
of it :-) But really, this is another issue.

  was (Author: steff1193):
bq. I would too, but we would still disagree on what that meant since I 
would interpret the number of times the data is replicated...

I actually agree with you. I just dont like replica to part of the name for 
it then. If we rename replication-factor to number-of-copies or something I 
would be much happier changing the semantics of it :-) But really, this is 
another issue.
  
 Collection API: Allow multiple shards from one collection on the same Solr 
 server
 -

 Key: SOLR-4114
 URL: https://issues.apache.org/jira/browse/SOLR-4114
 Project: Solr
  Issue Type: New Feature
  Components: multicore, SolrCloud
Affects Versions: 4.0
 Environment: Solr 4.0.0 release
Reporter: Per Steffensen
Assignee: Per Steffensen
  Labels: collection-api, multicore, shard, shard-allocation
 Attachments: SOLR-4114.patch, SOLR-4114.patch


 We should support running multiple shards from one collection on the same 
 Solr server - the run a collection with 8 shards on a 4 Solr server cluster 
 (each Solr server running 2 shards).
 Performance tests at our side has shown that this is a good idea, and it is 
 also a good idea for easy elasticity later on - it is much easier to move an 
 entire existing shards from one Solr server to another one that just joined 
 the cluter than it is to split an exsiting shard among the Solr that used to 
 run it and the new Solr.
 See dev mailing list discussion Multiple shards for one collection on the 
 same Solr server

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server

2012-11-28 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505590#comment-13505590
 ] 

Mark Miller commented on SOLR-4114:
---

bq. But is it also going to be backported to lucene_solr_4_0

Given past discussion, it's very unlikely that we will release a 4.0.1 (I was 
for it FWIW) and will just do a 4.1 - so no, generally nothing is being back 
ported to the 4.0 branch.

If we did end up deciding to do a 4.0.1, then we would select which issues 
should go in and then do those back ports later.


 Collection API: Allow multiple shards from one collection on the same Solr 
 server
 -

 Key: SOLR-4114
 URL: https://issues.apache.org/jira/browse/SOLR-4114
 Project: Solr
  Issue Type: New Feature
  Components: multicore, SolrCloud
Affects Versions: 4.0
 Environment: Solr 4.0.0 release
Reporter: Per Steffensen
Assignee: Per Steffensen
  Labels: collection-api, multicore, shard, shard-allocation
 Attachments: SOLR-4114.patch, SOLR-4114.patch


 We should support running multiple shards from one collection on the same 
 Solr server - the run a collection with 8 shards on a 4 Solr server cluster 
 (each Solr server running 2 shards).
 Performance tests at our side has shown that this is a good idea, and it is 
 also a good idea for easy elasticity later on - it is much easier to move an 
 entire existing shards from one Solr server to another one that just joined 
 the cluter than it is to split an exsiting shard among the Solr that used to 
 run it and the new Solr.
 See dev mailing list discussion Multiple shards for one collection on the 
 same Solr server

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4117) IO error while trying to get the size of the Directory

2012-11-28 Thread Commit Tag Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505594#comment-13505594
 ] 

Commit Tag Bot commented on SOLR-4117:
--

[branch_4x commit] Mark Robert Miller
http://svn.apache.org/viewvc?view=revisionrevision=1414774

SOLR-4117: Retrieving the size of the index may use the wrong index dir if you 
are replicating.



 IO error while trying to get the size of the Directory
 --

 Key: SOLR-4117
 URL: https://issues.apache.org/jira/browse/SOLR-4117
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 5.0
 Environment: 5.0.0.2012.11.28.10.42.06
 Debian Squeeze, Tomcat 6, Sun Java 6, 10 nodes, 10 shards, rep. factor 2.
Reporter: Markus Jelsma
Assignee: Mark Miller
Priority: Minor
 Fix For: 5.0


 With SOLR-4032 fixed we see other issues when randomly taking down nodes 
 (nicely via tomcat restart) while indexing a few million web pages from 
 Hadoop. We do make sure that at least one node is up for a shard but due to 
 recovery issues it may not be live.
 One node seems to work but generates IO errors in the log and 
 ZookeeperExeption in the GUI. In the GUI we only see:
 {code}
 SolrCore Initialization Failures
 openindex_f: 
 org.apache.solr.common.cloud.ZooKeeperException:org.apache.solr.common.cloud.ZooKeeperException:
  
 Please check your logs for more information
 {code}
 and in the log we only see the following exception:
 {code}
 2012-11-28 11:47:26,652 ERROR [solr.handler.ReplicationHandler] - 
 [http-8080-exec-28] - : IO error while trying to get the size of the 
 Directory:org.apache.lucene.store.NoSuchDirectoryException: directory 
 '/opt/solr/cores/shard_f/data/index' does not exist
 at org.apache.lucene.store.FSDirectory.listAll(FSDirectory.java:217)
 at org.apache.lucene.store.FSDirectory.listAll(FSDirectory.java:240)
 at 
 org.apache.lucene.store.NRTCachingDirectory.listAll(NRTCachingDirectory.java:132)
 at 
 org.apache.solr.core.DirectoryFactory.sizeOfDirectory(DirectoryFactory.java:146)
 at 
 org.apache.solr.handler.ReplicationHandler.getIndexSize(ReplicationHandler.java:472)
 at 
 org.apache.solr.handler.ReplicationHandler.getReplicationDetails(ReplicationHandler.java:568)
 at 
 org.apache.solr.handler.ReplicationHandler.handleRequestBody(ReplicationHandler.java:213)
 at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:144)
 at 
 org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:240)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1830)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:476)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:276)
 at 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
 at 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
 at 
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
 at 
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
 at 
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
 at 
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
 at 
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
 at 
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
 at 
 org.apache.coyote.http11.Http11NioProcessor.process(Http11NioProcessor.java:889)
 at 
 org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.process(Http11NioProtocol.java:744)
 at 
 org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:2274)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server

2012-11-28 Thread Per Steffensen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505597#comment-13505597
 ] 

Per Steffensen commented on SOLR-4114:
--

bq. so no, generally nothing is being back ported to the 4.0 branch

Well, I guess the essence of my question is, if it is ok that I keep providing 
patches relative to lucene_solr_4_0? At least for this issue and the 
spread-shards-across-provided-list-of-solrs one?

 Collection API: Allow multiple shards from one collection on the same Solr 
 server
 -

 Key: SOLR-4114
 URL: https://issues.apache.org/jira/browse/SOLR-4114
 Project: Solr
  Issue Type: New Feature
  Components: multicore, SolrCloud
Affects Versions: 4.0
 Environment: Solr 4.0.0 release
Reporter: Per Steffensen
Assignee: Per Steffensen
  Labels: collection-api, multicore, shard, shard-allocation
 Attachments: SOLR-4114.patch, SOLR-4114.patch


 We should support running multiple shards from one collection on the same 
 Solr server - the run a collection with 8 shards on a 4 Solr server cluster 
 (each Solr server running 2 shards).
 Performance tests at our side has shown that this is a good idea, and it is 
 also a good idea for easy elasticity later on - it is much easier to move an 
 entire existing shards from one Solr server to another one that just joined 
 the cluter than it is to split an exsiting shard among the Solr that used to 
 run it and the new Solr.
 See dev mailing list discussion Multiple shards for one collection on the 
 same Solr server

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4117) IO error while trying to get the size of the Directory

2012-11-28 Thread Markus Jelsma (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505600#comment-13505600
 ] 

Markus Jelsma commented on SOLR-4117:
-

Likely indeed. I'll check on this issue tomorrow and try to reproduce the other 
one, will open new issue if i can.

Thanks

 IO error while trying to get the size of the Directory
 --

 Key: SOLR-4117
 URL: https://issues.apache.org/jira/browse/SOLR-4117
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 5.0
 Environment: 5.0.0.2012.11.28.10.42.06
 Debian Squeeze, Tomcat 6, Sun Java 6, 10 nodes, 10 shards, rep. factor 2.
Reporter: Markus Jelsma
Assignee: Mark Miller
Priority: Minor
 Fix For: 5.0


 With SOLR-4032 fixed we see other issues when randomly taking down nodes 
 (nicely via tomcat restart) while indexing a few million web pages from 
 Hadoop. We do make sure that at least one node is up for a shard but due to 
 recovery issues it may not be live.
 One node seems to work but generates IO errors in the log and 
 ZookeeperExeption in the GUI. In the GUI we only see:
 {code}
 SolrCore Initialization Failures
 openindex_f: 
 org.apache.solr.common.cloud.ZooKeeperException:org.apache.solr.common.cloud.ZooKeeperException:
  
 Please check your logs for more information
 {code}
 and in the log we only see the following exception:
 {code}
 2012-11-28 11:47:26,652 ERROR [solr.handler.ReplicationHandler] - 
 [http-8080-exec-28] - : IO error while trying to get the size of the 
 Directory:org.apache.lucene.store.NoSuchDirectoryException: directory 
 '/opt/solr/cores/shard_f/data/index' does not exist
 at org.apache.lucene.store.FSDirectory.listAll(FSDirectory.java:217)
 at org.apache.lucene.store.FSDirectory.listAll(FSDirectory.java:240)
 at 
 org.apache.lucene.store.NRTCachingDirectory.listAll(NRTCachingDirectory.java:132)
 at 
 org.apache.solr.core.DirectoryFactory.sizeOfDirectory(DirectoryFactory.java:146)
 at 
 org.apache.solr.handler.ReplicationHandler.getIndexSize(ReplicationHandler.java:472)
 at 
 org.apache.solr.handler.ReplicationHandler.getReplicationDetails(ReplicationHandler.java:568)
 at 
 org.apache.solr.handler.ReplicationHandler.handleRequestBody(ReplicationHandler.java:213)
 at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:144)
 at 
 org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:240)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1830)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:476)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:276)
 at 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
 at 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
 at 
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
 at 
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
 at 
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
 at 
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
 at 
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
 at 
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
 at 
 org.apache.coyote.http11.Http11NioProcessor.process(Http11NioProcessor.java:889)
 at 
 org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.process(Http11NioProtocol.java:744)
 at 
 org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:2274)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-3793) Use ReferenceManager in DirectoryTaxonomyReader

2012-11-28 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera resolved LUCENE-3793.


Resolution: Implemented

This issue was taken care of as part of LUCENE-3441.

 Use ReferenceManager in DirectoryTaxonomyReader
 ---

 Key: LUCENE-3793
 URL: https://issues.apache.org/jira/browse/LUCENE-3793
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/facet
Reporter: Shai Erera
Assignee: Shai Erera
Priority: Minor
 Fix For: 4.1


 DirTaxoReader uses hairy code to protect its indexReader instance from 
 being modified while threads use it. It maintains a ReentrantLock 
 (indexReaderLock) which is obtained on every 'read' access, while 
 refresh() locks it for 'write' operations (refreshing the IndexReader). 
 Instead of all that, now that we have ReferenceManager in place, I think 
 that we can write a ReaderManagerIndexReader which will be used by 
 DirTR. Every method that requires access to the indexReader will 
 acquire/release (not too different than obtaining/releasing the read 
 lock), and refresh() will call ReaderManager.maybeRefresh(). It will 
 simplify the code and remove some rather long comments, that go into 
 great length explaining why does the code looks like that. 
 This ReaderManager cannot be used for every IndexReader, because DirTR's
 refresh() logic is special -- it reopens the indexReader, and then
 verifies that the createTime still matches on the reopened reader as
 well. Otherwise, it closes the reopened reader and fails with an exception.
 Therefore, this ReaderManager.refreshIfNeeded will need to take the
 createTime into consideration and fail if they do not match.
 And while we're at it ... I wonder if we should have a manager for an
 IndexReader/ParentArray pair? I think that it makes sense because we
 don't want DirTR to use a ParentArray that does not match the IndexReader.
 Today this can happen in refresh() if e.g. after the indexReader instance
 has been replaced, parentArray.refresh(indexReader) fails. DirTR will be
 left with a newer IndexReader instance, but old (or worse, corrupt?)
 ParentArray ... I think it'll be good if we introduce clone() on ParentArray,
 or a new ctor which takes an int[].
 I'll work on a patch once I finish with LUCENE-3786.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-4157) Improve Spatial Testing

2012-11-28 Thread David Smiley (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4157?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley resolved LUCENE-4157.
--

   Resolution: Fixed
Fix Version/s: (was: 4.1)
   4.0

Marking as fixed. The titles is a bit general and there has indeed been testing 
improvements that made it into 4.0.  If there's something in particular that 
needs testing than an issue should be created for it, and there are already 
such issues.

 Improve Spatial Testing
 ---

 Key: LUCENE-4157
 URL: https://issues.apache.org/jira/browse/LUCENE-4157
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/spatial
Reporter: David Smiley
Assignee: David Smiley
Priority: Critical
 Fix For: 4.0

 Attachments: LUCENE-4157_Improve_Lucene_Spatial_testing_p1.patch, 
 LUCENE-4157_Improve_TermQueryPrefixTreeStrategy_and_move_makeQuery_impl_to_SpatialStrategy.patch


 Looking back at the tests for the Lucene Spatial Module, they seem 
 half-baked.  (At least Spatial4j is well tested).  I've started working on 
 some improvements:
 * Some tests are in an abstract base class which have a subclass that 
 provides a SpatialContext. The idea was that the same tests could test other 
 contexts (such as geo vs not or different distance calculators (haversine vs 
 vincenty) but this can be done using RandomizedTesting's nifty parameterized 
 test feature, once there is a need to do this.
 * Port the complex geohash recursive prefix tree test that was developed on 
 the Solr side to the Lucene side where it belongs.
 And some things are not tested or aren't well tested:
 * Distance order as the query score
 * Indexing shapes other than points (i.e. shapes with area / regions)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server

2012-11-28 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505612#comment-13505612
 ] 

Mark Miller commented on SOLR-4114:
---

Well, it makes things a little more painful in that I have to merge it to 
4x/5x, but I can do that. It's probably not too difficult.

 Collection API: Allow multiple shards from one collection on the same Solr 
 server
 -

 Key: SOLR-4114
 URL: https://issues.apache.org/jira/browse/SOLR-4114
 Project: Solr
  Issue Type: New Feature
  Components: multicore, SolrCloud
Affects Versions: 4.0
 Environment: Solr 4.0.0 release
Reporter: Per Steffensen
Assignee: Per Steffensen
  Labels: collection-api, multicore, shard, shard-allocation
 Attachments: SOLR-4114.patch, SOLR-4114.patch


 We should support running multiple shards from one collection on the same 
 Solr server - the run a collection with 8 shards on a 4 Solr server cluster 
 (each Solr server running 2 shards).
 Performance tests at our side has shown that this is a good idea, and it is 
 also a good idea for easy elasticity later on - it is much easier to move an 
 entire existing shards from one Solr server to another one that just joined 
 the cluter than it is to split an exsiting shard among the Solr that used to 
 run it and the new Solr.
 See dev mailing list discussion Multiple shards for one collection on the 
 same Solr server

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-3942) Cannot use geodist() function with edismax

2012-11-28 Thread David Smiley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley resolved SOLR-3942.


Resolution: Cannot Reproduce

 Cannot use geodist() function with edismax
 --

 Key: SOLR-3942
 URL: https://issues.apache.org/jira/browse/SOLR-3942
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.0
 Environment: Windows Server 2008 R2, Windows 7
Reporter: Shane Andrade
Assignee: David Smiley
Priority: Critical

 Using the spatial example from the wiki when boosting with edismax:
 http://localhost:8983/solr/select?defType=edismaxq.alt=*:*fq={!geofilt}sfield=storept=45.15,-93.85d=50boost=recip(geodist(),2,200,20)sort=score%20desc
 Produces the following error:
 lst name=error
 str name=msg
 org.apache.lucene.queryparser.classic.ParseException: Spatial field must 
 implement 
 MultiValueSource:store{type=geohash,properties=indexed,stored,omitTermFreqAndPositions}
 /str
 int name=code400/int
 /lst
 When the defType is changed to dismax, the query works as expected.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4574) FunctionQuery ValueSource value computed twice per document

2012-11-28 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505619#comment-13505619
 ] 

Robert Muir commented on LUCENE-4574:
-

I think its generally cheap. like today its already cached in BooleanScorer2 
(which solr always gets for a booleanquery), and for
a term query its typically like a multiply and so on. So i think caching in 
general is only useless and would hurt here. in these 
silly cases (sorting with relevance but also asking for filling scores versus 
etc), cheaper to just call it twice rather than try to do 
something funkier in the collector: we would have to benchmark this.

{quote}
So for me this means adding the cache at FunctionQuery$AllScorer. 
{quote}

I think I like this idea better than adding caching in general to these 
collectors. Is the score() method typically expensive
for function queries?

Yet another possibility is, instead of asking to track scores when sorting by 
relevance, to ask to fill sort fields (the default anyway right?).
Its sorta redundant to ask for both. If you do this, i dont think it calls 
score() twice.

Finally, we could also consider something like your patch, except more honed in 
these particular silly situations. so thats something like,
up-front setting a boolean in these collectors ctors if one of the comparators 
is relevance and also its asked to track scores/max scores. 
then in setscorer, we could do like your patch only if this boolean is set. i 
feel like we wouldnt have to add 87 more specialized collectors to do this. I 
just havent looked at the code to try to figure out what all the situations can 
be (all those booleans etc to indexsearcher) where 
score() can currently be called twice.


 FunctionQuery ValueSource value computed twice per document
 ---

 Key: LUCENE-4574
 URL: https://issues.apache.org/jira/browse/LUCENE-4574
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/search
Affects Versions: 4.0, 4.1
Reporter: David Smiley
 Attachments: LUCENE-4574.patch, Test_for_LUCENE-4574.patch


 I was working on a custom ValueSource and did some basic profiling and 
 debugging to see if it was being used optimally.  To my surprise, the value 
 was being fetched twice per document in a row.  This computation isn't 
 exactly cheap to calculate so this is a big problem.  I was able to 
 work-around this problem trivially on my end by caching the last value with 
 corresponding docid in my FunctionValues implementation.
 Here is an excerpt of the code path to the first execution:
 {noformat}
 at 
 org.apache.lucene.queries.function.docvalues.DoubleDocValues.floatVal(DoubleDocValues.java:48)
 at 
 org.apache.lucene.queries.function.FunctionQuery$AllScorer.score(FunctionQuery.java:153)
 at 
 org.apache.lucene.search.TopFieldCollector$OneComparatorScoringMaxScoreCollector.collect(TopFieldCollector.java:291)
 at org.apache.lucene.search.Scorer.score(Scorer.java:62)
 at 
 org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:588)
 at 
 org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:280)
 {noformat}
 And here is the 2nd call:
 {noformat}
 at 
 org.apache.lucene.queries.function.docvalues.DoubleDocValues.floatVal(DoubleDocValues.java:48)
 at 
 org.apache.lucene.queries.function.FunctionQuery$AllScorer.score(FunctionQuery.java:153)
 at 
 org.apache.lucene.search.ScoreCachingWrappingScorer.score(ScoreCachingWrappingScorer.java:56)
 at 
 org.apache.lucene.search.FieldComparator$RelevanceComparator.copy(FieldComparator.java:951)
 at 
 org.apache.lucene.search.TopFieldCollector$OneComparatorScoringMaxScoreCollector.collect(TopFieldCollector.java:312)
 at org.apache.lucene.search.Scorer.score(Scorer.java:62)
 at 
 org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:588)
 at 
 org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:280)
 {noformat}
 The 2nd call appears to use some score caching mechanism, which is all well 
 and good, but that same mechanism wasn't used in the first call so there's no 
 cached value to retrieve.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-3601) Reconsider Google Guava dependency

2012-11-28 Thread David Smiley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley resolved SOLR-3601.


Resolution: Won't Fix

Marking as Won't Fix, as Guava successfully made it into 4.0 for better or 
worse, and so that about settles it.

 Reconsider Google Guava dependency
 --

 Key: SOLR-3601
 URL: https://issues.apache.org/jira/browse/SOLR-3601
 Project: Solr
  Issue Type: Improvement
Reporter: David Smiley
Assignee: Hoss Man
Priority: Minor

 Google Guava is a cool Java library with lots of useful stuff in it.  But 
 note that the old version r05 that we have is 935kb in size and FWIW the 
 latest v12 is 1.8MB.  Despite its usefulness, Solr (core) is not actually 
 using it aside for a trivial case in org.apache.solr.logging.jul to get a 
 string from a Throwable.  And I'm using it in my uncommitted patch for Solr 
 adapters to the Lucene module.  The Clustering contrib module definitely 
 needs it.  This dependency to Solr core seems half-hearted and I suspect it 
 may have been inadvertent during improvements to the Clustering contrib 
 module at some point.
 Shall we get rid of this dependency to Solr core, and push it back to the 
 contrib module?  I like Guava, I want to use it in my work, but the reality 
 is that Solr core doesn't even touch 1% of it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3377) eDismax: A fielded query wrapped by parens is not recognized

2012-11-28 Thread Jack Krupansky (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505623#comment-13505623
 ] 

Jack Krupansky commented on SOLR-3377:
--

Yes, it looks like a bug, but distinct from this current Jira. Actually, two 
bugs:

1. Fielded terms should not be used in phrase boost except for the specified 
field.
2. Some terms appear to have been skipped for phrase boost.


 eDismax: A fielded query wrapped by parens is not recognized
 

 Key: SOLR-3377
 URL: https://issues.apache.org/jira/browse/SOLR-3377
 Project: Solr
  Issue Type: Bug
  Components: query parsers
Affects Versions: 3.6
Reporter: Jan Høydahl
Assignee: Yonik Seeley
Priority: Critical
 Fix For: 4.0-BETA

 Attachments: SOLR-3377.patch, SOLR-3377.patch, SOLR-3377.patch, 
 SOLR-3377.patch


 As reported by bernd on the user list, a query like this
 {{q=(name:test)}}
 will yield 0 hits in 3.6 while it worked in 3.5. It works without the parens.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Closed] (LUCENE-4197) Small improvements to Lucene Spatial Module for v4

2012-11-28 Thread David Smiley (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley closed LUCENE-4197.


   Resolution: Fixed
Fix Version/s: (was: 4.1)
   4.0
 Assignee: David Smiley

Closing against 4.0; this issue was for small improvements to 4.0 which already 
shipped.

 Small improvements to Lucene Spatial Module for v4
 --

 Key: LUCENE-4197
 URL: https://issues.apache.org/jira/browse/LUCENE-4197
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/spatial
Reporter: David Smiley
Assignee: David Smiley
 Fix For: 4.0

 Attachments: LUCENE-4197_rename_CachedDistanceValueSource.patch, 
 LUCENE-4197_SpatialArgs_doesn_t_need_overloaded_toString()_with_a_ctx_param_.patch,
  LUCENE-4413_better_spatial_exception_handling.patch, 
 SpatialArgs-_remove_unused_min_and_max_params.patch


 This issue is to capture small changes to the Lucene spatial module that 
 don't deserve their own issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4574) FunctionQuery ValueSource value computed twice per document

2012-11-28 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505650#comment-13505650
 ] 

David Smiley commented on LUCENE-4574:
--

Rob, FunctionQuery$AllScorer.score() is pretty simple and innocent enough so 
perhaps that is not the right place to add the cache either.  Some ValueSources 
might have a trivial value e.g. a constant, some might be expensive.

[~yo...@apache.org], your first comment was:
bq. FunctionValues isn't the right place to solve this... that would cause 
caching/checking at every level of a function.

Do you mean it's wrong for a custom ValueSource I wrote to have its 
FunctionValues, which I know to be expensive because I wrote it, cache its 
previous value?  That's hard to believe so perhaps you don't mean that.

Here's a proposal.  Add a ValueSource method boolean nonTrivial(), defaulting 
to true to be safe but overriding in many subclasses to use false as 
appropriate.  Then, FunctionQuery$AllScorer's constructor (called only 
per-segment) can check and wrap in a to-be-developed FunctionValues caching 
wrapper for floatVal().  Unlike my previous proposal in the collector, this 
proposal targets cases that self-declare themselves to have non-trivial 
implementations and so are worth caching.


 FunctionQuery ValueSource value computed twice per document
 ---

 Key: LUCENE-4574
 URL: https://issues.apache.org/jira/browse/LUCENE-4574
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/search
Affects Versions: 4.0, 4.1
Reporter: David Smiley
 Attachments: LUCENE-4574.patch, Test_for_LUCENE-4574.patch


 I was working on a custom ValueSource and did some basic profiling and 
 debugging to see if it was being used optimally.  To my surprise, the value 
 was being fetched twice per document in a row.  This computation isn't 
 exactly cheap to calculate so this is a big problem.  I was able to 
 work-around this problem trivially on my end by caching the last value with 
 corresponding docid in my FunctionValues implementation.
 Here is an excerpt of the code path to the first execution:
 {noformat}
 at 
 org.apache.lucene.queries.function.docvalues.DoubleDocValues.floatVal(DoubleDocValues.java:48)
 at 
 org.apache.lucene.queries.function.FunctionQuery$AllScorer.score(FunctionQuery.java:153)
 at 
 org.apache.lucene.search.TopFieldCollector$OneComparatorScoringMaxScoreCollector.collect(TopFieldCollector.java:291)
 at org.apache.lucene.search.Scorer.score(Scorer.java:62)
 at 
 org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:588)
 at 
 org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:280)
 {noformat}
 And here is the 2nd call:
 {noformat}
 at 
 org.apache.lucene.queries.function.docvalues.DoubleDocValues.floatVal(DoubleDocValues.java:48)
 at 
 org.apache.lucene.queries.function.FunctionQuery$AllScorer.score(FunctionQuery.java:153)
 at 
 org.apache.lucene.search.ScoreCachingWrappingScorer.score(ScoreCachingWrappingScorer.java:56)
 at 
 org.apache.lucene.search.FieldComparator$RelevanceComparator.copy(FieldComparator.java:951)
 at 
 org.apache.lucene.search.TopFieldCollector$OneComparatorScoringMaxScoreCollector.collect(TopFieldCollector.java:312)
 at org.apache.lucene.search.Scorer.score(Scorer.java:62)
 at 
 org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:588)
 at 
 org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:280)
 {noformat}
 The 2nd call appears to use some score caching mechanism, which is all well 
 and good, but that same mechanism wasn't used in the first call so there's no 
 cached value to retrieve.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4569) Allow customization of column stride field and norms via indexing chain

2012-11-28 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505658#comment-13505658
 ] 

Simon Willnauer commented on LUCENE-4569:
-

sorry john, busy times over here... I will look into that later this week 
though. seems pretty straight forward to me at a first glance ie. doesn't hurt 
anyone 

 Allow customization of column stride field and norms via indexing chain
 ---

 Key: LUCENE-4569
 URL: https://issues.apache.org/jira/browse/LUCENE-4569
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index
Affects Versions: 4.0
Reporter: John Wang
 Attachments: patch.diff


 We are building an in-memory indexing format and managing our own segments. 
 We are doing this by implementing a custom IndexingChain. We would like to 
 support column-stride-fields and norms without having to wire in a codec 
 (since we are managing our postings differently)
 Suggested change is consistent with the api support for passing in a custom 
 InvertedDocConsumer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (LUCENE-4569) Allow customization of column stride field and norms via indexing chain

2012-11-28 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer reassigned LUCENE-4569:
---

Assignee: Simon Willnauer

 Allow customization of column stride field and norms via indexing chain
 ---

 Key: LUCENE-4569
 URL: https://issues.apache.org/jira/browse/LUCENE-4569
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index
Affects Versions: 4.0
Reporter: John Wang
Assignee: Simon Willnauer
 Attachments: patch.diff


 We are building an in-memory indexing format and managing our own segments. 
 We are doing this by implementing a custom IndexingChain. We would like to 
 support column-stride-fields and norms without having to wire in a codec 
 (since we are managing our postings differently)
 Suggested change is consistent with the api support for passing in a custom 
 InvertedDocConsumer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (LUCENE-4419) Test RecursivePrefixTree indexing non-point data

2012-11-28 Thread David Smiley (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley reassigned LUCENE-4419:


Assignee: David Smiley

 Test RecursivePrefixTree indexing non-point data
 

 Key: LUCENE-4419
 URL: https://issues.apache.org/jira/browse/LUCENE-4419
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/spatial
Reporter: David Smiley
Assignee: David Smiley

 RecursivePrefixTreeFilter was modified in ~July 2011 to support spatial 
 filtering of non-point indexed shapes.  It seems to work when playing with 
 the capability but it isn't tested.  It really needs to be as this is a major 
 feature.
 I imagine an approach in which some randomly generated rectangles are indexed 
 and then a randomly generated rectangle is queried.  The right answer can be 
 calculated brute-force and then compared with the filter.  In order to deal 
 with shape imprecision, the randomly generated shapes could be generated to 
 fit a course grid (e.g. round everything to a 1 degree interval).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Active 4.x branches?

2012-11-28 Thread Mark Miller

On Nov 27, 2012, at 8:39 PM, Mark Miller markrmil...@gmail.com wrote:

 40 committers perhaps on paper but precious few are active

I'd like to toss some numbers against that statement after taking a look at 
Ohloh.

According to Ohloh, the stats for Lucene and Solr are:

Lucene
Committers active within the past month: 13
Committers active within the past year: 29

Solr
Committers active within the past month: 9
Committers active within the past year: 29

For just under 40 committers and the volunteer nature of OpenSource, those 
numbers look pretty good!

Not everyone that is active has the same amount of time to volunteer.

- Mark
-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4574) FunctionQuery ValueSource value computed twice per document

2012-11-28 Thread Adrien Grand (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505676#comment-13505676
 ] 

Adrien Grand commented on LUCENE-4574:
--

bq. Add a ValueSource method boolean nonTrivial()

Could we move this logic to a upper level and expect callers of 
{{FunctionQuery(ValueSource)}} to provide a ValueSource impl that returns 
FunctionValues impls that cache their values when the computation is expensive?

Then Solr could wrap costly value sources when its function values get* methods 
are likely to be called several times per document?

 FunctionQuery ValueSource value computed twice per document
 ---

 Key: LUCENE-4574
 URL: https://issues.apache.org/jira/browse/LUCENE-4574
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/search
Affects Versions: 4.0, 4.1
Reporter: David Smiley
 Attachments: LUCENE-4574.patch, Test_for_LUCENE-4574.patch


 I was working on a custom ValueSource and did some basic profiling and 
 debugging to see if it was being used optimally.  To my surprise, the value 
 was being fetched twice per document in a row.  This computation isn't 
 exactly cheap to calculate so this is a big problem.  I was able to 
 work-around this problem trivially on my end by caching the last value with 
 corresponding docid in my FunctionValues implementation.
 Here is an excerpt of the code path to the first execution:
 {noformat}
 at 
 org.apache.lucene.queries.function.docvalues.DoubleDocValues.floatVal(DoubleDocValues.java:48)
 at 
 org.apache.lucene.queries.function.FunctionQuery$AllScorer.score(FunctionQuery.java:153)
 at 
 org.apache.lucene.search.TopFieldCollector$OneComparatorScoringMaxScoreCollector.collect(TopFieldCollector.java:291)
 at org.apache.lucene.search.Scorer.score(Scorer.java:62)
 at 
 org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:588)
 at 
 org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:280)
 {noformat}
 And here is the 2nd call:
 {noformat}
 at 
 org.apache.lucene.queries.function.docvalues.DoubleDocValues.floatVal(DoubleDocValues.java:48)
 at 
 org.apache.lucene.queries.function.FunctionQuery$AllScorer.score(FunctionQuery.java:153)
 at 
 org.apache.lucene.search.ScoreCachingWrappingScorer.score(ScoreCachingWrappingScorer.java:56)
 at 
 org.apache.lucene.search.FieldComparator$RelevanceComparator.copy(FieldComparator.java:951)
 at 
 org.apache.lucene.search.TopFieldCollector$OneComparatorScoringMaxScoreCollector.collect(TopFieldCollector.java:312)
 at org.apache.lucene.search.Scorer.score(Scorer.java:62)
 at 
 org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:588)
 at 
 org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:280)
 {noformat}
 The 2nd call appears to use some score caching mechanism, which is all well 
 and good, but that same mechanism wasn't used in the first call so there's no 
 cached value to retrieve.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-4117) IO error while trying to get the size of the Directory

2012-11-28 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller resolved SOLR-4117.
---

Resolution: Fixed

 IO error while trying to get the size of the Directory
 --

 Key: SOLR-4117
 URL: https://issues.apache.org/jira/browse/SOLR-4117
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 5.0
 Environment: 5.0.0.2012.11.28.10.42.06
 Debian Squeeze, Tomcat 6, Sun Java 6, 10 nodes, 10 shards, rep. factor 2.
Reporter: Markus Jelsma
Assignee: Mark Miller
Priority: Minor
 Fix For: 5.0


 With SOLR-4032 fixed we see other issues when randomly taking down nodes 
 (nicely via tomcat restart) while indexing a few million web pages from 
 Hadoop. We do make sure that at least one node is up for a shard but due to 
 recovery issues it may not be live.
 One node seems to work but generates IO errors in the log and 
 ZookeeperExeption in the GUI. In the GUI we only see:
 {code}
 SolrCore Initialization Failures
 openindex_f: 
 org.apache.solr.common.cloud.ZooKeeperException:org.apache.solr.common.cloud.ZooKeeperException:
  
 Please check your logs for more information
 {code}
 and in the log we only see the following exception:
 {code}
 2012-11-28 11:47:26,652 ERROR [solr.handler.ReplicationHandler] - 
 [http-8080-exec-28] - : IO error while trying to get the size of the 
 Directory:org.apache.lucene.store.NoSuchDirectoryException: directory 
 '/opt/solr/cores/shard_f/data/index' does not exist
 at org.apache.lucene.store.FSDirectory.listAll(FSDirectory.java:217)
 at org.apache.lucene.store.FSDirectory.listAll(FSDirectory.java:240)
 at 
 org.apache.lucene.store.NRTCachingDirectory.listAll(NRTCachingDirectory.java:132)
 at 
 org.apache.solr.core.DirectoryFactory.sizeOfDirectory(DirectoryFactory.java:146)
 at 
 org.apache.solr.handler.ReplicationHandler.getIndexSize(ReplicationHandler.java:472)
 at 
 org.apache.solr.handler.ReplicationHandler.getReplicationDetails(ReplicationHandler.java:568)
 at 
 org.apache.solr.handler.ReplicationHandler.handleRequestBody(ReplicationHandler.java:213)
 at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:144)
 at 
 org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:240)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1830)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:476)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:276)
 at 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
 at 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
 at 
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
 at 
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
 at 
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
 at 
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
 at 
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
 at 
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
 at 
 org.apache.coyote.http11.Http11NioProcessor.process(Http11NioProcessor.java:889)
 at 
 org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.process(Http11NioProtocol.java:744)
 at 
 org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:2274)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-4016) Deduplication is broken by partial update

2012-11-28 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated SOLR-4016:
--

  Description: 
The SignatureUpdateProcessorFactory used (primarily?) for deduplication does 
not consider partial update semantics.

The below uses the following solrconfig.xml excerpt:

{noformat}
 updateRequestProcessorChain name=text_hash
   processor class=solr.processor.SignatureUpdateProcessorFactory
 bool name=enabledtrue/bool
 str name=signatureFieldtext_hash/str
 bool name=overwriteDupesfalse/bool
 str name=fieldstext/str
 str name=signatureClasssolr.processor.TextProfileSignature/str
   /processor
   processor class=solr.LogUpdateProcessorFactory /
   processor class=solr.RunUpdateProcessorFactory /
 /updateRequestProcessorChain
{noformat}

Firstly, the processor treats {noformat}{set: value}{noformat} as a string 
and hashes it, instead of the value alone:

{noformat}
$ curl '$URL/update?commit=true' -H 'Content-type:application/json' -d 
'{add:{doc:{id: abcde, text: {set: hello world'  curl 
'$URL/select?q=id:abcde'
{responseHeader:{status:0,QTime:30}}
?xml version=1.0 encoding=UTF-8?responselst name=responseHeaderint 
name=status0/intint name=QTime1/intlst name=paramsstr 
name=qid:abcde/str/lst/lstresult name=response numFound=1 
start=0docstr name=idabcde/strstr name=texthello world/strstr 
name=text_hashad48c7ad60ac22cc/strlong 
name=_version_1417247434224959488/long/doc/result
/response
$
$ curl '$URL/update?commit=true' -H 'Content-type:application/json' -d 
'{add:{doc:{id: abcde, text: hello world}}}'  curl 
'$URL/select?q=id:abcde'
{responseHeader:{status:0,QTime:27}}
?xml version=1.0 encoding=UTF-8?
response
lst name=responseHeaderint name=status0/intint 
name=QTime1/intlst name=paramsstr 
name=qid:abcde/str/lst/lstresult name=response numFound=1 
start=0docstr name=idabcde/strstr name=texthello world/strstr 
name=text_hashb169c743d220da8d/strlong 
name=_version_141724802221564/long/doc/result
/response
{noformat}

Note the different text_hash value.

Secondly, when updating a field other than those used to create the signature 
(which I imagine is a more common use-case), the signature is recalculated from 
no values:

{noformat}
$ curl '$URL/update?commit=true' -H 'Content-type:application/json' -d 
'{add:{doc:{id: abcde, title: {set: new title'  curl 
'$URL/select?q=id:abcde'
{responseHeader:{status:0,QTime:39}}
?xml version=1.0 encoding=UTF-8?
response
lst name=responseHeaderint name=status0/intint 
name=QTime1/intlst name=paramsstr 
name=qid:abcde/str/lst/lstresult name=response numFound=1 
start=0docstr name=idabcde/strstr name=texthello world/strstr 
name=text_hash/strstr name=titlenew title/strlong 
name=_version_1417248120480202752/long/doc/result
/response
{noformat}

  was:

The SignatureUpdateProcessorFactory used (primarily?) for deduplication does 
not consider partial update semantics.

The below uses the following solrconfig.xml excerpt:

{noformat}
 updateRequestProcessorChain name=text_hash
   processor class=solr.processor.SignatureUpdateProcessorFactory
 bool name=enabledtrue/bool
 str name=signatureFieldtext_hash/str
 bool name=overwriteDupesfalse/bool
 str name=fieldstext/str
 str name=signatureClasssolr.processor.TextProfileSignature/str
   /processor
   processor class=solr.LogUpdateProcessorFactory /
   processor class=solr.RunUpdateProcessorFactory /
 /updateRequestProcessorChain
{noformat}

Firstly, the processor treats {noformat}{set: value}{noformat} as a string 
and hashes it, instead of the value alone:

{noformat}
$ curl '$URL/update?commit=true' -H 'Content-type:application/json' -d 
'{add:{doc:{id: abcde, text: {set: hello world'  curl 
'$URL/select?q=id:abcde'
{responseHeader:{status:0,QTime:30}}
?xml version=1.0 encoding=UTF-8?responselst name=responseHeaderint 
name=status0/intint name=QTime1/intlst name=paramsstr 
name=qid:abcde/str/lst/lstresult name=response numFound=1 
start=0docstr name=idabcde/strstr name=texthello world/strstr 
name=text_hashad48c7ad60ac22cc/strlong 
name=_version_1417247434224959488/long/doc/result
/response
$
$ curl '$URL/update?commit=true' -H 'Content-type:application/json' -d 
'{add:{doc:{id: abcde, text: hello world}}}'  curl 
'$URL/select?q=id:abcde'
{responseHeader:{status:0,QTime:27}}
?xml version=1.0 encoding=UTF-8?
response
lst name=responseHeaderint name=status0/intint 
name=QTime1/intlst name=paramsstr 
name=qid:abcde/str/lst/lstresult name=response numFound=1 
start=0docstr name=idabcde/strstr name=texthello world/strstr 
name=text_hashb169c743d220da8d/strlong 
name=_version_141724802221564/long/doc/result
/response
{noformat}

Note the different text_hash value.

Secondly, when updating a field other than those used to create the signature 
(which I imagine 

[jira] [Updated] (SOLR-4016) Deduplication is broken by partial update

2012-11-28 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated SOLR-4016:
--

Labels: 4.0.1_Candidate  (was: )

 Deduplication is broken by partial update
 -

 Key: SOLR-4016
 URL: https://issues.apache.org/jira/browse/SOLR-4016
 Project: Solr
  Issue Type: Bug
  Components: update
Affects Versions: 4.0
 Environment: Tomcat6 / Catalina on Ubuntu 12.04 LTS
Reporter: Joel Nothman
  Labels: 4.0.1_Candidate
 Fix For: 4.1, 5.0


 The SignatureUpdateProcessorFactory used (primarily?) for deduplication does 
 not consider partial update semantics.
 The below uses the following solrconfig.xml excerpt:
 {noformat}
  updateRequestProcessorChain name=text_hash
processor class=solr.processor.SignatureUpdateProcessorFactory
  bool name=enabledtrue/bool
  str name=signatureFieldtext_hash/str
  bool name=overwriteDupesfalse/bool
  str name=fieldstext/str
  str name=signatureClasssolr.processor.TextProfileSignature/str
/processor
processor class=solr.LogUpdateProcessorFactory /
processor class=solr.RunUpdateProcessorFactory /
  /updateRequestProcessorChain
 {noformat}
 Firstly, the processor treats {noformat}{set: value}{noformat} as a 
 string and hashes it, instead of the value alone:
 {noformat}
 $ curl '$URL/update?commit=true' -H 'Content-type:application/json' -d 
 '{add:{doc:{id: abcde, text: {set: hello world'  curl 
 '$URL/select?q=id:abcde'
 {responseHeader:{status:0,QTime:30}}
 ?xml version=1.0 encoding=UTF-8?responselst 
 name=responseHeaderint name=status0/intint name=QTime1/intlst 
 name=paramsstr name=qid:abcde/str/lst/lstresult name=response 
 numFound=1 start=0docstr name=idabcde/strstr name=texthello 
 world/strstr name=text_hashad48c7ad60ac22cc/strlong 
 name=_version_1417247434224959488/long/doc/result
 /response
 $
 $ curl '$URL/update?commit=true' -H 'Content-type:application/json' -d 
 '{add:{doc:{id: abcde, text: hello world}}}'  curl 
 '$URL/select?q=id:abcde'
 {responseHeader:{status:0,QTime:27}}
 ?xml version=1.0 encoding=UTF-8?
 response
 lst name=responseHeaderint name=status0/intint 
 name=QTime1/intlst name=paramsstr 
 name=qid:abcde/str/lst/lstresult name=response numFound=1 
 start=0docstr name=idabcde/strstr name=texthello 
 world/strstr name=text_hashb169c743d220da8d/strlong 
 name=_version_141724802221564/long/doc/result
 /response
 {noformat}
 Note the different text_hash value.
 Secondly, when updating a field other than those used to create the signature 
 (which I imagine is a more common use-case), the signature is recalculated 
 from no values:
 {noformat}
 $ curl '$URL/update?commit=true' -H 'Content-type:application/json' -d 
 '{add:{doc:{id: abcde, title: {set: new title'  curl 
 '$URL/select?q=id:abcde'
 {responseHeader:{status:0,QTime:39}}
 ?xml version=1.0 encoding=UTF-8?
 response
 lst name=responseHeaderint name=status0/intint 
 name=QTime1/intlst name=paramsstr 
 name=qid:abcde/str/lst/lstresult name=response numFound=1 
 start=0docstr name=idabcde/strstr name=texthello 
 world/strstr name=text_hash/strstr name=titlenew 
 title/strlong name=_version_1417248120480202752/long/doc/result
 /response
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4110) Configurable Content-Type headers for PHPResponseWriters and PHPSerializedResponseWriter

2012-11-28 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505695#comment-13505695
 ] 

Mark Miller commented on SOLR-4110:
---

What about back compat? How might this affect those upgrading from 4.0?

 Configurable Content-Type headers for PHPResponseWriters and 
 PHPSerializedResponseWriter
 

 Key: SOLR-4110
 URL: https://issues.apache.org/jira/browse/SOLR-4110
 Project: Solr
  Issue Type: Improvement
  Components: Response Writers
Affects Versions: 4.0
Reporter: Dominik Siebel
Priority: Minor
  Labels: 4.0.1_Candidate
 Fix For: 4.1, 5.0

 Attachments: SOLR-4110.patch


 The *PHPResponseWriter* and *PHPSerializedResponseWriter* currently send a 
 hard coded Content-Type header of _text/plain; charset=UTF-8_ although there 
 are constants defining _text/x-php;charset=UTF-8_ and 
 _text/x-php-serialized;charset=UTF-8_ which remain unused. This makes content 
 type guessing on the client side quite complicated.
 I already created a patch (from the branch_4x github branch) to use the 
 respective constants and also added the possibility to configure the 
 Content-Type header via solrconfig.xml (like in JSONResponseWriter).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (SOLR-4110) Configurable Content-Type headers for PHPResponseWriters and PHPSerializedResponseWriter

2012-11-28 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller reassigned SOLR-4110:
-

Assignee: Mark Miller

 Configurable Content-Type headers for PHPResponseWriters and 
 PHPSerializedResponseWriter
 

 Key: SOLR-4110
 URL: https://issues.apache.org/jira/browse/SOLR-4110
 Project: Solr
  Issue Type: Improvement
  Components: Response Writers
Affects Versions: 4.0
Reporter: Dominik Siebel
Assignee: Mark Miller
Priority: Minor
  Labels: 4.0.1_Candidate
 Fix For: 4.1, 5.0

 Attachments: SOLR-4110.patch


 The *PHPResponseWriter* and *PHPSerializedResponseWriter* currently send a 
 hard coded Content-Type header of _text/plain; charset=UTF-8_ although there 
 are constants defining _text/x-php;charset=UTF-8_ and 
 _text/x-php-serialized;charset=UTF-8_ which remain unused. This makes content 
 type guessing on the client side quite complicated.
 I already created a patch (from the branch_4x github branch) to use the 
 respective constants and also added the possibility to configure the 
 Content-Type header via solrconfig.xml (like in JSONResponseWriter).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-4087) MoreLikeThis missing MAX_DOC_FREQ option

2012-11-28 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated SOLR-4087:
--

 Priority: Minor  (was: Major)
Fix Version/s: (was: 4.0.1)
   5.0

 MoreLikeThis missing MAX_DOC_FREQ option
 

 Key: SOLR-4087
 URL: https://issues.apache.org/jira/browse/SOLR-4087
 Project: Solr
  Issue Type: Improvement
Affects Versions: 4.0
Reporter: Andrew Janowczyk
Priority: Minor
 Fix For: 4.1, 5.0

 Attachments: MorelikeThis-maxdocfreq.patch


 the MoreLikeThisHandler supports almost all of the underlying MoreLikeThis 
 options except for MAX_DOC_FREQ, which seems important in preventing terms 
 from being selected for the query which are present in too many documents.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4569) Allow customization of column stride field and norms via indexing chain

2012-11-28 Thread John Wang (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505718#comment-13505718
 ] 

John Wang commented on LUCENE-4569:
---

Thanks Simon! No rush, was just wanted some feedback. 

Do you think we should do the same for stored fields?

Chris: we are building a custom IndexingChain which is at a higher level than 
Codecs. You are definitely right, and currently I am able to get what I needed 
via a codec. e.g. our custom indexing chain to handle indexed documents, and 
then register a codec to intercept the code path for norms and CSF, but this 
ends up with 2 customization hooks for the same indexer.

Thanks!

-John

 Allow customization of column stride field and norms via indexing chain
 ---

 Key: LUCENE-4569
 URL: https://issues.apache.org/jira/browse/LUCENE-4569
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index
Affects Versions: 4.0
Reporter: John Wang
Assignee: Simon Willnauer
 Attachments: patch.diff


 We are building an in-memory indexing format and managing our own segments. 
 We are doing this by implementing a custom IndexingChain. We would like to 
 support column-stride-fields and norms without having to wire in a codec 
 (since we are managing our postings differently)
 Suggested change is consistent with the api support for passing in a custom 
 InvertedDocConsumer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4087) MoreLikeThis missing MAX_DOC_FREQ option

2012-11-28 Thread Commit Tag Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505733#comment-13505733
 ] 

Commit Tag Bot commented on SOLR-4087:
--

[branch_4x commit] Mark Robert Miller
http://svn.apache.org/viewvc?view=revisionrevision=1414846

SOLR-4087: Add MAX_DOC_FREQ option to MoreLikeThis.



 MoreLikeThis missing MAX_DOC_FREQ option
 

 Key: SOLR-4087
 URL: https://issues.apache.org/jira/browse/SOLR-4087
 Project: Solr
  Issue Type: Improvement
Affects Versions: 4.0
Reporter: Andrew Janowczyk
Priority: Minor
 Fix For: 4.1, 5.0

 Attachments: MorelikeThis-maxdocfreq.patch


 the MoreLikeThisHandler supports almost all of the underlying MoreLikeThis 
 options except for MAX_DOC_FREQ, which seems important in preventing terms 
 from being selected for the query which are present in too many documents.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4087) MoreLikeThis missing MAX_DOC_FREQ option

2012-11-28 Thread Commit Tag Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13505734#comment-13505734
 ] 

Commit Tag Bot commented on SOLR-4087:
--

[trunk commit] Mark Robert Miller
http://svn.apache.org/viewvc?view=revisionrevision=1414841

SOLR-4087: Add MAX_DOC_FREQ option to MoreLikeThis.



 MoreLikeThis missing MAX_DOC_FREQ option
 

 Key: SOLR-4087
 URL: https://issues.apache.org/jira/browse/SOLR-4087
 Project: Solr
  Issue Type: Improvement
Affects Versions: 4.0
Reporter: Andrew Janowczyk
Priority: Minor
 Fix For: 4.1, 5.0

 Attachments: MorelikeThis-maxdocfreq.patch


 the MoreLikeThisHandler supports almost all of the underlying MoreLikeThis 
 options except for MAX_DOC_FREQ, which seems important in preventing terms 
 from being selected for the query which are present in too many documents.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-4087) MoreLikeThis missing MAX_DOC_FREQ option

2012-11-28 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller resolved SOLR-4087.
---

Resolution: Fixed

Thanks Andrew!

 MoreLikeThis missing MAX_DOC_FREQ option
 

 Key: SOLR-4087
 URL: https://issues.apache.org/jira/browse/SOLR-4087
 Project: Solr
  Issue Type: Improvement
Affects Versions: 4.0
Reporter: Andrew Janowczyk
Priority: Minor
 Fix For: 4.1, 5.0

 Attachments: MorelikeThis-maxdocfreq.patch


 the MoreLikeThisHandler supports almost all of the underlying MoreLikeThis 
 options except for MAX_DOC_FREQ, which seems important in preventing terms 
 from being selected for the query which are present in too many documents.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2908) To push the terms.limit parameter from the master core to all the shard cores.

2012-11-28 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated SOLR-2908:
--

Fix Version/s: (was: 1.4.1)
   5.0
   4.1

 To push the terms.limit parameter from the master core to all the shard cores.
 --

 Key: SOLR-2908
 URL: https://issues.apache.org/jira/browse/SOLR-2908
 Project: Solr
  Issue Type: Improvement
  Components: SearchComponents - other
Affects Versions: 1.4.1
 Environment: Linux server. 64 bit processor and 16GB Ram.
Reporter: sivaganesh
Priority: Critical
  Labels: patch
 Fix For: 4.1, 5.0

   Original Estimate: 168h
  Remaining Estimate: 168h

 When we pass the terms.limit parameter to the master (which has many shard 
 cores), it's not getting pushed down to the individual cores. Instead the 
 default value of -1 is assigned to Terms.limit parameter is assigned in the 
 underlying shard cores. The issue being the time taken by the Master core to 
 return the required limit of terms is higher when we are having more number 
 of underlying shard cores. This affects the performances of the auto suggest 
 feature. 
 Can thought we can have a parameter to explicitly override the -1 being set 
 to Terms.limit in shards core.
 We saw the source code(TermsComponent.java) and concluded that the same. 
 Please help us in pushing the terms.limit parameter to shard cores. 
 PFB code snippet.
 private ShardRequest createShardQuery(SolrParams params) {
 ShardRequest sreq = new ShardRequest();
 sreq.purpose = ShardRequest.PURPOSE_GET_TERMS;
 // base shard request on original parameters
 sreq.params = new ModifiableSolrParams(params);
 // remove any limits for shards, we want them to return all possible
 // responses
 // we want this so we can calculate the correct counts
 // dont sort by count to avoid that unnecessary overhead on the shards
 sreq.params.remove(TermsParams.TERMS_MAXCOUNT);
 sreq.params.remove(TermsParams.TERMS_MINCOUNT);
 sreq.params.set(TermsParams.TERMS_LIMIT, -1);
 sreq.params.set(TermsParams.TERMS_SORT, TermsParams.TERMS_SORT_INDEX);
 return sreq;
   }
 Solr Version:
 Solr Specification Version: 1.4.0.2010.01.13.08.09.44 
  Solr Implementation Version: 1.5-dev exported - yonik - 2010-01-13 08:09:44 
  Lucene Specification Version: 2.9.1-dev 
  Lucene Implementation Version: 2.9.1-dev 888785 - 2009-12-09 18:03:31 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-4112) Dataimporting with SolrCloud Fails

2012-11-28 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated SOLR-4112:
--

Fix Version/s: 5.0
   4.1

 Dataimporting with SolrCloud Fails
 --

 Key: SOLR-4112
 URL: https://issues.apache.org/jira/browse/SOLR-4112
 Project: Solr
  Issue Type: Bug
Affects Versions: 5.0
Reporter: Deniz Durmus
 Fix For: 4.1, 5.0

 Attachments: SOLR-4112.patch


 While trying to import data from db on cloud, it shows this in logs:
 SEVERE: Full Import 
 failed:org.apache.solr.handler.dataimport.DataImportHandlerException: Unable 
 to PropertyWriter implementation:ZKPropertiesWriter 
 at 
 org.apache.solr.handler.dataimport.DataImporter.createPropertyWriter(DataImporter.java:336)
  
 at 
 org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:418)
  
 at 
 org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:487) 
 at 
 org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:468) 
 Caused by: org.apache.solr.common.cloud.ZooKeeperException: 
 ZkSolrResourceLoader does not support getConfigDir() - likely, what you are 
 trying to do is not supported in ZooKeeper mode 
 at 
 org.apache.solr.cloud.ZkSolrResourceLoader.getConfigDir(ZkSolrResourceLoader.java:100)
  
 at 
 org.apache.solr.handler.dataimport.SimplePropertiesWriter.init(SimplePropertiesWriter.java:91)
  
 at 
 org.apache.solr.handler.dataimport.ZKPropertiesWriter.init(ZKPropertiesWriter.java:45)
  
 at 
 org.apache.solr.handler.dataimport.DataImporter.createPropertyWriter(DataImporter.java:334)
  
 ... 3 more 
 Exception in thread Thread-306 java.lang.NullPointerException 
 at 
 org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:427)
  
 at 
 org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:487) 
 at 
 org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:468) 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



  1   2   >