[jira] [Comment Edited] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server

Per Steffensen (JIRA) Wed, 28 Nov 2012 03:21:02 -0800

    [ 
https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13505368#comment-13505368
 ]


Per Steffensen edited comment on SOLR-4114 at 11/28/12 11:20 AM:
-----------------------------------------------------------------

bq. As far as terminology, when I say replicationFactor of 3, I mean 3 copies 
of the data. I also count the leader as a replica of a shard (which is 
logical). It follows from the clusterstate.json, which lists all "replicas" for 
a shard and one of them just has a flag indicating it's the leader. This also 
makes it easier to talk about a shard having 0 replicas (meaning there is not 
even a leader).

I understand that you can view all shards under a slice as a "replica", but in 
my mind "replica" is also a "role" that a shard plays at runtime - all shards 
except one under a slice play the "replica role" at runtime, the remaining 
shard plays the "leader role" at runtime. To not create to much confusion, I 
suggest you use the term "shards" for all the instances under a slice, and that 
you use the terms "replica" and "leader" only for a role that a shard plays at 
runtime.
But that of course would require changes e.g. to Slice-class where e.g. 
getReplicas, getReplicasCopy and getReplicasMap needs to me renamed to 
getShardsXXX. It probably shouldnt be done now, but as a part of a cross-code 
cleaning up in term-usage. Today there is a heavy mixup of term-usage in the 
code - replica and shard are sometimes used for a node, replica and shard are 
used for the same thing, etc.

Suggested terms:
 * collection: A big logical bucket to fill data into
 * slice: A logical part of a collection. A part of the data going into a 
collection goes into a particular slice. Slices for a particular collection are 
non-overlapping
 * shard: A physical instance of a slice. Running without replica there is one 
shard per slice. Running with replication-factor X there are X+1 shards per 
slice.
 * replica and leader: Roles played by shards at runtime. As soon as the system 
is not running there are no replica/leader - there are just shards
 * node-base-url: The prefix/base (up to and including the webapp-context) of 
the URL for a specific Solr server
 * node-name: A logical name for the Solr server - the same as node-base-url 
except /'s are replaced by _'s and the protocol part (http(s)://) is removed

                
      was (Author: steff1193):
    bq. As far as terminology, when I say replicationFactor of 3, I mean 3 
copies of the data. I also count the leader as a replica of a shard (which is 
logical). It follows from the clusterstate.json, which lists all "replicas" for 
a shard and one of them just has a flag indicating it's the leader. This also 
makes it easier to talk about a shard having 0 replicas (meaning there is not 
even a leader).

I understand that you can view all shards under a slice as a "replica", but in 
my mind "replica" is also a "role" that a shard plays at runtime - all shards 
except one under a slice plays the "replica role" at runtime, the remaining 
shard play the "leader role". To not create to much confusion I suggest you use 
the term shards for all the instances under a slice, and that you use the term 
"replica" only for a role that a shard plays at runtime.
But that of course would require changes e.g. to Slice-class where e.g. 
getReplicas, getReplicasCopy and getReplicasMap needs to me renamed to 
getShardsXXX. It probably shouldnt be done now, but as a part of a cross-code 
cleaning up in term-usage.

Suggested terms:
 * collection: A big logical bucket to fill data into
 * slice: A logical part of a collection. A part of the data going into a 
collection goes into a particular slice. Slices for a particular collection are 
non-overlapping
 * shard: A physical instance of a slice. Running without replica there is one 
shard per slice. Running with replication-factor X there are X+1 shards per 
slice.
 * replica and leader: Roles played by shards at runtime. As soon as the system 
is not running there are no replica/leader - there are just shards
 * node-base-url: The prefix/base (up to and including the webapp-context) of 
the URL for a specific Solr server
 * node-name: A logical name for the Solr server - the same as node-base-url 
except /'s are replaced by _'s and the protocol part (http(s)://) is removed

                  
> Collection API: Allow multiple shards from one collection on the same Solr 
> server
> ---------------------------------------------------------------------------------
>
>                 Key: SOLR-4114
>                 URL: https://issues.apache.org/jira/browse/SOLR-4114
>             Project: Solr
>          Issue Type: New Feature
>          Components: multicore, SolrCloud
>    Affects Versions: 4.0
>         Environment: Solr 4.0.0 release
>            Reporter: Per Steffensen
>            Assignee: Per Steffensen
>              Labels: collection-api, multicore, shard, shard-allocation
>         Attachments: SOLR-4114.patch
>
>
> We should support running multiple shards from one collection on the same 
> Solr server - the run a collection with 8 shards on a 4 Solr server cluster 
> (each Solr server running 2 shards).
> Performance tests at our side has shown that this is a good idea, and it is 
> also a good idea for easy elasticity later on - it is much easier to move an 
> entire existing shards from one Solr server to another one that just joined 
> the cluter than it is to split an exsiting shard among the Solr that used to 
> run it and the new Solr.
> See dev mailing list discussion "Multiple shards for one collection on the 
> same Solr server"

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server

Reply via email to