[
https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13505368#comment-13505368
]
Per Steffensen edited comment on SOLR-4114 at 11/28/12 11:20 AM:
-----------------------------------------------------------------
bq. As far as terminology, when I say replicationFactor of 3, I mean 3 copies
of the data. I also count the leader as a replica of a shard (which is
logical). It follows from the clusterstate.json, which lists all "replicas" for
a shard and one of them just has a flag indicating it's the leader. This also
makes it easier to talk about a shard having 0 replicas (meaning there is not
even a leader).
I understand that you can view all shards under a slice as a "replica", but in
my mind "replica" is also a "role" that a shard plays at runtime - all shards
except one under a slice play the "replica role" at runtime, the remaining
shard plays the "leader role" at runtime. To not create to much confusion, I
suggest you use the term "shards" for all the instances under a slice, and that
you use the terms "replica" and "leader" only for a role that a shard plays at
runtime.
But that of course would require changes e.g. to Slice-class where e.g.
getReplicas, getReplicasCopy and getReplicasMap needs to me renamed to
getShardsXXX. It probably shouldnt be done now, but as a part of a cross-code
cleaning up in term-usage. Today there is a heavy mixup of term-usage in the
code - replica and shard are sometimes used for a node, replica and shard are
used for the same thing, etc.
Suggested terms:
* collection: A big logical bucket to fill data into
* slice: A logical part of a collection. A part of the data going into a
collection goes into a particular slice. Slices for a particular collection are
non-overlapping
* shard: A physical instance of a slice. Running without replica there is one
shard per slice. Running with replication-factor X there are X+1 shards per
slice.
* replica and leader: Roles played by shards at runtime. As soon as the system
is not running there are no replica/leader - there are just shards
* node-base-url: The prefix/base (up to and including the webapp-context) of
the URL for a specific Solr server
* node-name: A logical name for the Solr server - the same as node-base-url
except /'s are replaced by _'s and the protocol part (http(s)://) is removed
was (Author: steff1193):
bq. As far as terminology, when I say replicationFactor of 3, I mean 3
copies of the data. I also count the leader as a replica of a shard (which is
logical). It follows from the clusterstate.json, which lists all "replicas" for
a shard and one of them just has a flag indicating it's the leader. This also
makes it easier to talk about a shard having 0 replicas (meaning there is not
even a leader).
I understand that you can view all shards under a slice as a "replica", but in
my mind "replica" is also a "role" that a shard plays at runtime - all shards
except one under a slice plays the "replica role" at runtime, the remaining
shard play the "leader role". To not create to much confusion I suggest you use
the term shards for all the instances under a slice, and that you use the term
"replica" only for a role that a shard plays at runtime.
But that of course would require changes e.g. to Slice-class where e.g.
getReplicas, getReplicasCopy and getReplicasMap needs to me renamed to
getShardsXXX. It probably shouldnt be done now, but as a part of a cross-code
cleaning up in term-usage.
Suggested terms:
* collection: A big logical bucket to fill data into
* slice: A logical part of a collection. A part of the data going into a
collection goes into a particular slice. Slices for a particular collection are
non-overlapping
* shard: A physical instance of a slice. Running without replica there is one
shard per slice. Running with replication-factor X there are X+1 shards per
slice.
* replica and leader: Roles played by shards at runtime. As soon as the system
is not running there are no replica/leader - there are just shards
* node-base-url: The prefix/base (up to and including the webapp-context) of
the URL for a specific Solr server
* node-name: A logical name for the Solr server - the same as node-base-url
except /'s are replaced by _'s and the protocol part (http(s)://) is removed
> Collection API: Allow multiple shards from one collection on the same Solr
> server
> ---------------------------------------------------------------------------------
>
> Key: SOLR-4114
> URL: https://issues.apache.org/jira/browse/SOLR-4114
> Project: Solr
> Issue Type: New Feature
> Components: multicore, SolrCloud
> Affects Versions: 4.0
> Environment: Solr 4.0.0 release
> Reporter: Per Steffensen
> Assignee: Per Steffensen
> Labels: collection-api, multicore, shard, shard-allocation
> Attachments: SOLR-4114.patch
>
>
> We should support running multiple shards from one collection on the same
> Solr server - the run a collection with 8 shards on a 4 Solr server cluster
> (each Solr server running 2 shards).
> Performance tests at our side has shown that this is a good idea, and it is
> also a good idea for easy elasticity later on - it is much easier to move an
> entire existing shards from one Solr server to another one that just joined
> the cluter than it is to split an exsiting shard among the Solr that used to
> run it and the new Solr.
> See dev mailing list discussion "Multiple shards for one collection on the
> same Solr server"
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]