[jira] [Comment Edited] (SOLR-4998) Make the use of Slice and Shard consistent across the code and document base

2013-07-04 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13700329#comment-13700329
 ] 

Otis Gospodnetic edited comment on SOLR-4998 at 7/4/13 8:27 PM:


I am not sure what naming "conventionS" Solr code is using.  I know most people 
are inconsistent and so code (in general, not referring specifically to Solr 
here) is also often inconsistent.  Here we see this inconsistency leads to a 
lot of confusion.  I think it's great Anshum initiated this. My personal 
preference would be to:
* pick the terminology that makes sense and is easy to explain and understand
* adjust BOTH code and documentation to match that, even if it means renaming 
classes and variables, because it's only going to get harder to do that if it's 
not done now.

OK, here is another attempt:

# A Cluster has Collections
# A Collection is a logical index
# A Collection has as many Shards as "numShards"
# A Shard is a logical index subset
# There are as many physical instances of a given Shard as the Collection's 
"replicationFactor"
# These physical instances are called Replicas
# Each Replica contains a Core
# A Core is a single physical Lucene index
# One Replica in each Shard is labeled a Leader
# Any Replica can become a Leader through election if previous Leader goes away
# Each Shard has 1 or more Replicas with exactly 1 of those Replicas acting as 
the Leader

I think this is it, no?

Visually, by logical role:
||shard 1||shard 2||shard 3||
|leader 1.1|leader 2.1|leader 3.1|
|replica 1.2|replica 2.2|replica 3.2|
|replica 1.3|replica 2.3|replica 3.3|
|replica 1.4|replica 2.4|replica 3.4|
|replica 1.5|replica 2.5|replica 3.5|

So we would say that the above Collection has:
* 3 Shards
* 5 Replicas
* in each Shard 1 Replica *acts as* a Leader

If we ignore roles then this same Collection has the following physical 
structure:

|replica 1.1|replica 2.1|replica 3.1|
|replica 1.2|replica 2.2|replica 3.2|
|replica 1.3|replica 2.3|replica 3.3|
|replica 1.4|replica 2.4|replica 3.4|
|replica 1.5|replica 2.5|replica 3.5|

Yes/no?


  was (Author: otis):
I am not sure what naming "conventionS" Solr code is using.  I know most 
people are inconsistent and so the code is also often inconsistent.  Here we 
see this inconsistency leads to a lot of confusion.  I think it's great Anshum 
initiated this. My personal preference would be to:
* pick the terminology that makes sense and is easy to explain and understand
* adjust BOTH code and documentation to match that, even if it means renaming 
classes and variables, because it's only going to get harder to do that if it's 
not done now.

OK, here is another attempt:

# A Cluster has Collections
# A Collection is a logical index
# A Collection has as many Shards as "numShards"
# A Shard is a logical index subset
# There are as many physical instances of a given Shard as the Collection's 
"replicationFactor"
# These physical instances are called Replicas
# Each Replica contains a Core
# A Core is a single physical Lucene index
# One Replica in each Shard is labeled a Leader
# Any Replica can become a Leader through election if previous Leader goes away
# Each Shard has 1 or more Replicas with exactly 1 of those Replicas acting as 
the Leader

I think this is it, no?

Visually, by logical role:
||shard 1||shard 2||shard 3||
|leader 1.1|leader 2.1|leader 3.1|
|replica 1.2|replica 2.2|replica 3.2|
|replica 1.3|replica 2.3|replica 3.3|
|replica 1.4|replica 2.4|replica 3.4|
|replica 1.5|replica 2.5|replica 3.5|

So we would say that the above Collection has:
* 3 Shards
* 5 Replicas
* in each Shard 1 Replica *acts as* a Leader

If we ignore roles then this same Collection has the following physical 
structure:

|replica 1.1|replica 2.1|replica 3.1|
|replica 1.2|replica 2.2|replica 3.2|
|replica 1.3|replica 2.3|replica 3.3|
|replica 1.4|replica 2.4|replica 3.4|
|replica 1.5|replica 2.5|replica 3.5|

Yes/no?

  
> Make the use of Slice and Shard consistent across the code and document base
> 
>
> Key: SOLR-4998
> URL: https://issues.apache.org/jira/browse/SOLR-4998
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Affects Versions: 4.3, 4.3.1
>Reporter: Anshum Gupta
>
> The interchangeable use of Slice and Shard is pretty confusing at times. We 
> should define each separately and use the apt term whenever we do so.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additiona

[jira] [Comment Edited] (SOLR-4998) Make the use of Slice and Shard consistent across the code and document base

2013-07-04 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13700329#comment-13700329
 ] 

Otis Gospodnetic edited comment on SOLR-4998 at 7/4/13 8:38 PM:


I am not sure what naming "conventionS" Solr code is using.  I know most people 
are inconsistent and so code (in general, not referring specifically to Solr 
here) is also often inconsistent.  Here we see this inconsistency leads to a 
lot of confusion.  I think it's great Anshum initiated this. My personal 
preference would be to:
* pick the terminology that makes sense and is easy to explain and understand
* adjust BOTH code and documentation to match that, even if it means renaming 
classes and variables, because it's only going to get harder to do that if it's 
not done now.

OK, here is another attempt:

# A Cluster has Collections
# A Collection is a logical index
# A Collection has as many Shards as "numShards"
# A Shard is a logical index subset
# There are as many physical instances of a given Shard as the Collection's 
"replicationFactor"
# These physical instances are called Replicas
# The number of Replicas in a Collection equals "numShards * replicationFactor" 
# Each Replica contains a Core
# A Core is a single physical Lucene index
# One Replica in each Shard is labeled a Leader
# Any Replica can become a Leader through election if previous Leader goes away
# Each Shard has 1 or more Replicas with exactly 1 of those Replicas acting as 
the Leader

I think this is it, no?

Visually, by logical role:
||shard 1||shard 2||shard 3||
|leader 1.1|leader 2.1|leader 3.1|
|replica 1.2|replica 2.2|replica 3.2|
|replica 1.3|replica 2.3|replica 3.3|
|replica 1.4|replica 2.4|replica 3.4|
|replica 1.5|replica 2.5|replica 3.5|

So we would say that the above Collection has:
* 3 Shards
* 5 Replicas
* in each Shard 1 Replica *acts as* a Leader

If we ignore roles then this same Collection has the following physical 
structure:

|replica 1.1|replica 2.1|replica 3.1|
|replica 1.2|replica 2.2|replica 3.2|
|replica 1.3|replica 2.3|replica 3.3|
|replica 1.4|replica 2.4|replica 3.4|
|replica 1.5|replica 2.5|replica 3.5|

Yes/no?

So I agree, there is really no need for "Slice" here. I already forgot about 
that term.
Problems we'll have:
* People will refer to physical copies, those Replicas, as Shards.  When they 
say "Shard" they'll often refer to a specific Replica.  I know I always think 
of each cell in the above table as "Shard", but that's not how we (should) use 
that term. Shards are just logical. Those cells are Replicas.
* We use "Replica" to a physical index, but also use it to describe a 
non-Leader role.  Confusing.  If there is a Leader, where are Followers?  Would 
introducing the term "Follower" help?  Then we could say/teach people the 
following:
** When you say "Shard" it just means the logical Collection subset. It's not 
physical at all.
** If you want to talk about physical indices in a Collection use the term 
"Replica". They are all Replicas.
** If you want to refer to a Replica by its role, then you've got to say either 
Leader or Follower.  Because if you say "Replica" we won't know whether you are 
referring to the special Replica that acts as a Leader or all the other ones.

I think we'll need to correct this in any docs and will need to correct people 
on the ML until we get everyone in sync.  Any books or articles that have been 
written with different terminology will be wrong/out of date and will confuse 
people.

Yes/no?


  was (Author: otis):
I am not sure what naming "conventionS" Solr code is using.  I know most 
people are inconsistent and so code (in general, not referring specifically to 
Solr here) is also often inconsistent.  Here we see this inconsistency leads to 
a lot of confusion.  I think it's great Anshum initiated this. My personal 
preference would be to:
* pick the terminology that makes sense and is easy to explain and understand
* adjust BOTH code and documentation to match that, even if it means renaming 
classes and variables, because it's only going to get harder to do that if it's 
not done now.

OK, here is another attempt:

# A Cluster has Collections
# A Collection is a logical index
# A Collection has as many Shards as "numShards"
# A Shard is a logical index subset
# There are as many physical instances of a given Shard as the Collection's 
"replicationFactor"
# These physical instances are called Replicas
# Each Replica contains a Core
# A Core is a single physical Lucene index
# One Replica in each Shard is labeled a Leader
# Any Replica can become a Leader through election if previous Leader goes away
# Each Shard has 1 or more Replicas with exactly 1 of those Replicas acting as 
the Leader

I think this is it, no?

Visually, by logical role:
||shard 1||shard 2||shard 3||
|leader 1.1|leader 2.1|leader 3.1|
|replica 1.2|replica

[jira] [Comment Edited] (SOLR-4998) Make the use of Slice and Shard consistent across the code and document base

2013-07-04 Thread Otis Gospodnetic (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13700443#comment-13700443
 ] 

Otis Gospodnetic edited comment on SOLR-4998 at 7/5/13 3:57 AM:


bq. http://www.elasticsearch.org/guide/reference/glossary/

Much simpler and cleaner, IMHO:
* primary shard
* replica shard

So in ES a primary shard is a real physical thing and a replica is a real 
physical thing as well.
I think that's easier than saying a shard is a logical concept and that inside 
a shard there are replicas, but some are called leaders and others are 
calledwell, replicas.  So there is no logical vs. physical in ES (see my 
tables above), it's all just physical:

||primary shard 1||primary shard 2||primary shard 3||
|replica shard 1.1|replica shard 2.1|replica shard 3.1|
|replica shard 1.2|replica shard 2.2|replica shard 3.2|

That's what you'd get with number_of_shards=3, number_of_replicas=2

So Collection (Index in ES terminology) creation API takes number_of_shards and 
number_of_replicas parameters and they are used as shown above, which is less 
confusing than agreeing what replicationFactor means - is it the total number 
of replicas in a shard or the number of non-leader replicas.

It may be too late to change this in SolrCloud now even if everyone agreed...

  was (Author: otis):
bq. http://www.elasticsearch.org/guide/reference/glossary/

Much simpler and cleaner, IMHO:
* primary shard
* replica shard

So in ES a primary shard is a real physical thing and a replica is a real 
physical thing as well.
I think that's easier than saying a shard is a logical concept and that inside 
a shard there are replicas, but some are called leaders and others are 
calledwell, replicas.

This also allows the Collection (Index in ES terminology) creation API to take 
num_shards and num_replicas parameters, which is less confusing than agreeing 
what replicationFactor means - is it the total number of replicas in a shard or 
the number of non-leader replicas.

It may be too late to change this in SolrCloud now even if everyone agreed...

  
> Make the use of Slice and Shard consistent across the code and document base
> 
>
> Key: SOLR-4998
> URL: https://issues.apache.org/jira/browse/SOLR-4998
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Affects Versions: 4.3, 4.3.1
>Reporter: Anshum Gupta
>
> The interchangeable use of Slice and Shard is pretty confusing at times. We 
> should define each separately and use the apt term whenever we do so.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-4998) Make the use of Slice and Shard consistent across the code and document base

2013-07-04 Thread Anshum Gupta (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13700450#comment-13700450
 ] 

Anshum Gupta edited comment on SOLR-4998 at 7/5/13 4:27 AM:


bq. So I agree, there is really no need for "Slice" here. I already forgot 
about that term.
It's just a term that we used for (what we're now calling) Shards. It's deep in 
the code. At the same time considering that this has the least exposure to the 
outside world, it's our best bet at being changed.

bq. People will refer to physical copies, those Replicas, as Shards. 
Exactly what yonik said. There's confusion in the use of the term Shard but I 
believe it's just a matter of clean documentation.

bq. Would introducing the term "Follower" help?
I wouldn't want that extra element introduced. A 'leader' is just a specific 
non-default role for a Replica wherein it does some extra bit. Again, we could 
just fix our documentation on that.


bq. Personally, I'm happy with the current slice/replica terminology in the 
code and I don't much care if it matches the external doc terminology.
+1, but people outside of here rarely use/see 'Slice' and so as Yonik 
suggested, it seems better to converge towards 'Shard' and 'Replicas'. That is 
what I'm working on, doing away with 'Slice'.

bq. but I certainly don't think its worth breaking all those api's to change 
the names in the code now - unless it's a couple minor consistency issues.
This certainly would mean breaking back-compat with a few things at least. May 
be more. I am almost half way through and already have a good 250k patch with 
instances where Slice and Shard are used interchangeably. There are other 
places where a Replica is referred to as a Shard. So it's just all mixed up.

bq. It may be too late to change this in SolrCloud now even if everyone 
agreed...
I don't think we can/should make a change that drastic. As long as it's a 
little consistent and documented, Shard and Replicas should work fine for us.


To get an opinion, do you guys think we shouldn't be 'renaming' public APIs?

  was (Author: anshumg):
bq. So I agree, there is really no need for "Slice" here. I already forgot 
about that term.
It's just a term that we used for (what we're now calling) Shards. It's deep in 
the code. At the same time considering that this has the least exposure to the 
outside world, it's our best bet at being changed.

bq. People will refer to physical copies, those Replicas, as Shards. 
Exactly what yonik said. There's confusion in the use of the term Shard but I 
believe it's just a matter of clean documentation.

bq. Would introducing the term "Follower" help?
I wouldn't want that extra element introduced. A 'leader' is just a specific 
non-default role for a Replica wherein it does some extra bit. Again, we could 
just fix our documentation on that.


bq. Personally, I'm happy with the current slice/replica terminology in the 
code and I don't much care if it matches the external doc terminology.
+1, but people outside of here rarely use/see 'Slice' and so as Yonik 
suggested, it seems better to converge towards 'Shard' and 'Replicas'. That is 
what I'm working on, doing away with 'Slice'.

bq. but I certainly don't think its worth breaking all those api's to change 
the names in the code now - unless it's a couple minor consistency issues.
This certainly would mean breaking back-compat with a few things at least. May 
be more. I am almost half way through and already have a good 250k patch with 
instances where Slice and Shard are used interchangeably. There are other 
places where a Replica is referred to as a Shard. So it's just all mixed up.

bq. It may be too late to change this in SolrCloud now even if everyone 
agreed...
I don't think we can/should make a change that drastic. As long as it's a 
little consistent and documented, Shard and Replicas should work fine for us.


To get an opinion, do you guys think we shouldn't be 'touching' public APIs?
  
> Make the use of Slice and Shard consistent across the code and document base
> 
>
> Key: SOLR-4998
> URL: https://issues.apache.org/jira/browse/SOLR-4998
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Affects Versions: 4.3, 4.3.1
>Reporter: Anshum Gupta
>
> The interchangeable use of Slice and Shard is pretty confusing at times. We 
> should define each separately and use the apt term whenever we do so.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mai

[jira] [Comment Edited] (SOLR-4998) Make the use of Slice and Shard consistent across the code and document base

2013-07-08 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13702892#comment-13702892
 ] 

Mark Miller edited comment on SOLR-4998 at 7/9/13 4:35 AM:
---

Being a coder and a user is two different things in my opinion.

As it is we would have to define shard in the code - it's ambiguous - and you 
are already fighting with preconceived notions of it's definition. In the code, 
slice is not ambiguous and calls for reading the definition of it in javadoc. 
It's been around for some time now, and there has been no large outcry. 

I'm fine with changing these API's for real gains, but I don't find this a gain 
given the current discussion, so I don't think the disruption in the rename is 
worth it at all.

  was (Author: markrmil...@gmail.com):
Being a coder and a user is two different things in my opinion.

As it is we would have define shard in the code - it's ambiguous - and you are 
already fighting with preconceived notions of it's definition. In the code, 
slice is not ambiguous and calls for reading the definition of it in javadoc. 
It's been around for some time now, and there has been no large outcry. 

I'm fine with changing these API's for real gains, but I don't find this a gain 
given the current discussion, so I don't think the disruption in the rename is 
worth it at all.
  
> Make the use of Slice and Shard consistent across the code and document base
> 
>
> Key: SOLR-4998
> URL: https://issues.apache.org/jira/browse/SOLR-4998
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Affects Versions: 4.3, 4.3.1
>Reporter: Anshum Gupta
>
> The interchangeable use of Slice and Shard is pretty confusing at times. We 
> should define each separately and use the apt term whenever we do so.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-4998) Make the use of Slice and Shard consistent across the code and document base

2013-07-23 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13717517#comment-13717517
 ] 

Mark Miller edited comment on SOLR-4998 at 7/23/13 7:36 PM:


I think for things like:

{noformat}
-  public static final String MAX_SHARDS_PER_NODE = "maxShardsPerNode";
+  public static final String MAX_REPLICAS_PER_NODE = "maxReplicasPerNode";
{noformat}

We have to be really careful. Solr does not error/warn on unknown params - 
existing users might keeping using the existing param for a long time, and not 
even notice it no longer has an affect. I think if we make any type of change 
like that, we should be sure to support them as an alias or perhaps explicitly 
look for the old key and fail if it's found.

  was (Author: markrmil...@gmail.com):
I think for things like:

-  public static final String MAX_SHARDS_PER_NODE = "maxShardsPerNode";
+  public static final String MAX_REPLICAS_PER_NODE = "maxReplicasPerNode";

We have to be really careful. Solr does not error/warn on unknown params - 
existing users might keeping using the existing param for a long time, and not 
even notice it no longer has an affect. I think if we make any type of change 
like that, we should be sure to support them as an alias or perhaps explicitly 
look for the old key and fail if it's found.
  
> Make the use of Slice and Shard consistent across the code and document base
> 
>
> Key: SOLR-4998
> URL: https://issues.apache.org/jira/browse/SOLR-4998
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Affects Versions: 4.3, 4.3.1
>Reporter: Anshum Gupta
> Attachments: SOLR-4998.patch
>
>
> The interchangeable use of Slice and Shard is pretty confusing at times. We 
> should define each separately and use the apt term whenever we do so.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org