[ 
https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13504698#comment-13504698
 ] 

Per Steffensen commented on SOLR-4114:
--------------------------------------

Well I see no reason to introduce (in the first step at least) a 
maxShardsPerNode. Your requested numShards and the number of live servers at 
that point in time will decide the number of shards of each node/server - 
basically no limit, but the clever user of Solr will probably not want to 
request to have 1000 shards for a collection if he only have 2 Solr servers, 
but it should actually be up to the user of Solr. We have been using Solr-cloud 
for a long time now and we have a very high focus on performance, because we 
need to end up with a Solr cluster supporting "live-searches" among 50-100 
billion records. During numerous performance test we have a.o. played with the 
number of shards per Solr server per collection. We have run one-month+ tests 
just pumping data into the cluster to se how loading-time, search-time etc. 
develops as collections are filled with data. We have run such tests with 1, 4, 
8 and 12 shards per Solr server per collection, and each of them have both good 
and bad properties wrt performance, so until we know (and we should be very 
carefull taking "good decissions on behalf of every Solr user") that there is a 
always-true best number for maxShardsPerNode we should be careful putting any 
limit on the user.

I know you just want to give a maxShardsPerNode in the create request, but the 
user of Solr really should be able to calculate the number of shards going on 
each Solr when he controls the numShards and when he knows how many Solr 
servers he is running. Only potential problem is if his create request is run 
when not all Solr servers are running, and in such case a maxShardsPerNode 
could help to stop the creation process.

But a Solr user probably want to make sure all Solr servers that are supposed 
to run, are actually running, before he issues a collection creation request, 
so that he gets shards distributed across all the Solr servers he intend to 
run. We do that in our project BTW, but outside Solr code.
                
> Collection API: Allow multiple shards from one collection on the same Solr 
> server
> ---------------------------------------------------------------------------------
>
>                 Key: SOLR-4114
>                 URL: https://issues.apache.org/jira/browse/SOLR-4114
>             Project: Solr
>          Issue Type: New Feature
>          Components: multicore, SolrCloud
>    Affects Versions: 4.0
>         Environment: Solr 4.0.0 release
>            Reporter: Per Steffensen
>            Assignee: Per Steffensen
>              Labels: collection-api, multicore, shard, shard-allocation
>
> We should support running multiple shards from one collection on the same 
> Solr server - the run a collection with 8 shards on a 4 Solr server cluster 
> (each Solr server running 2 shards).
> Performance tests at our side has shown that this is a good idea, and it is 
> also a good idea for easy elasticity later on - it is much easier to move an 
> entire existing shards from one Solr server to another one that just joined 
> the cluter than it is to split an exsiting shard among the Solr that used to 
> run it and the new Solr.
> See dev mailing list discussion "Multiple shards for one collection on the 
> same Solr server"

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to