[ 
https://issues.apache.org/jira/browse/SOLR-4998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13700329#comment-13700329
 ] 

Otis Gospodnetic edited comment on SOLR-4998 at 7/4/13 8:38 PM:
----------------------------------------------------------------

I am not sure what naming "conventionS" Solr code is using.  I know most people 
are inconsistent and so code (in general, not referring specifically to Solr 
here) is also often inconsistent.  Here we see this inconsistency leads to a 
lot of confusion.  I think it's great Anshum initiated this. My personal 
preference would be to:
* pick the terminology that makes sense and is easy to explain and understand
* adjust BOTH code and documentation to match that, even if it means renaming 
classes and variables, because it's only going to get harder to do that if it's 
not done now.

OK, here is another attempt:

# A Cluster has Collections
# A Collection is a logical index
# A Collection has as many Shards as "numShards"
# A Shard is a logical index subset
# There are as many physical instances of a given Shard as the Collection's 
"replicationFactor"
# These physical instances are called Replicas
# The number of Replicas in a Collection equals "numShards * replicationFactor" 
# Each Replica contains a Core
# A Core is a single physical Lucene index
# One Replica in each Shard is labeled a Leader
# Any Replica can become a Leader through election if previous Leader goes away
# Each Shard has 1 or more Replicas with exactly 1 of those Replicas acting as 
the Leader

I think this is it, no?

Visually, by logical role:
||shard 1||shard 2||shard 3||
|leader 1.1|leader 2.1|leader 3.1|
|replica 1.2|replica 2.2|replica 3.2|
|replica 1.3|replica 2.3|replica 3.3|
|replica 1.4|replica 2.4|replica 3.4|
|replica 1.5|replica 2.5|replica 3.5|

So we would say that the above Collection has:
* 3 Shards
* 5 Replicas
* in each Shard 1 Replica *acts as* a Leader

If we ignore roles then this same Collection has the following physical 
structure:

|replica 1.1|replica 2.1|replica 3.1|
|replica 1.2|replica 2.2|replica 3.2|
|replica 1.3|replica 2.3|replica 3.3|
|replica 1.4|replica 2.4|replica 3.4|
|replica 1.5|replica 2.5|replica 3.5|

Yes/no?

So I agree, there is really no need for "Slice" here. I already forgot about 
that term.
Problems we'll have:
* People will refer to physical copies, those Replicas, as Shards.  When they 
say "Shard" they'll often refer to a specific Replica.  I know I always think 
of each cell in the above table as "Shard", but that's not how we (should) use 
that term. Shards are just logical. Those cells are Replicas.
* We use "Replica" to a physical index, but also use it to describe a 
non-Leader role.  Confusing.  If there is a Leader, where are Followers?  Would 
introducing the term "Follower" help?  Then we could say/teach people the 
following:
** When you say "Shard" it just means the logical Collection subset. It's not 
physical at all.
** If you want to talk about physical indices in a Collection use the term 
"Replica". They are all Replicas.
** If you want to refer to a Replica by its role, then you've got to say either 
Leader or Follower.  Because if you say "Replica" we won't know whether you are 
referring to the special Replica that acts as a Leader or all the other ones.

I think we'll need to correct this in any docs and will need to correct people 
on the ML until we get everyone in sync.  Any books or articles that have been 
written with different terminology will be wrong/out of date and will confuse 
people.

Yes/no?

                
      was (Author: otis):
    I am not sure what naming "conventionS" Solr code is using.  I know most 
people are inconsistent and so code (in general, not referring specifically to 
Solr here) is also often inconsistent.  Here we see this inconsistency leads to 
a lot of confusion.  I think it's great Anshum initiated this. My personal 
preference would be to:
* pick the terminology that makes sense and is easy to explain and understand
* adjust BOTH code and documentation to match that, even if it means renaming 
classes and variables, because it's only going to get harder to do that if it's 
not done now.

OK, here is another attempt:

# A Cluster has Collections
# A Collection is a logical index
# A Collection has as many Shards as "numShards"
# A Shard is a logical index subset
# There are as many physical instances of a given Shard as the Collection's 
"replicationFactor"
# These physical instances are called Replicas
# Each Replica contains a Core
# A Core is a single physical Lucene index
# One Replica in each Shard is labeled a Leader
# Any Replica can become a Leader through election if previous Leader goes away
# Each Shard has 1 or more Replicas with exactly 1 of those Replicas acting as 
the Leader

I think this is it, no?

Visually, by logical role:
||shard 1||shard 2||shard 3||
|leader 1.1|leader 2.1|leader 3.1|
|replica 1.2|replica 2.2|replica 3.2|
|replica 1.3|replica 2.3|replica 3.3|
|replica 1.4|replica 2.4|replica 3.4|
|replica 1.5|replica 2.5|replica 3.5|

So we would say that the above Collection has:
* 3 Shards
* 5 Replicas
* in each Shard 1 Replica *acts as* a Leader

If we ignore roles then this same Collection has the following physical 
structure:

|replica 1.1|replica 2.1|replica 3.1|
|replica 1.2|replica 2.2|replica 3.2|
|replica 1.3|replica 2.3|replica 3.3|
|replica 1.4|replica 2.4|replica 3.4|
|replica 1.5|replica 2.5|replica 3.5|

Yes/no?

                  
> Make the use of Slice and Shard consistent across the code and document base
> ----------------------------------------------------------------------------
>
>                 Key: SOLR-4998
>                 URL: https://issues.apache.org/jira/browse/SOLR-4998
>             Project: Solr
>          Issue Type: Improvement
>          Components: SolrCloud
>    Affects Versions: 4.3, 4.3.1
>            Reporter: Anshum Gupta
>
> The interchangeable use of Slice and Shard is pretty confusing at times. We 
> should define each separately and use the apt term whenever we do so.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to