[jira] [Comment Edited] (SOLR-4998) Make the use of Slice and Shard consistent across the code and document base
[ https://issues.apache.org/jira/browse/SOLR-4998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13700329#comment-13700329 ] Otis Gospodnetic edited comment on SOLR-4998 at 7/4/13 8:27 PM: I am not sure what naming "conventionS" Solr code is using. I know most people are inconsistent and so code (in general, not referring specifically to Solr here) is also often inconsistent. Here we see this inconsistency leads to a lot of confusion. I think it's great Anshum initiated this. My personal preference would be to: * pick the terminology that makes sense and is easy to explain and understand * adjust BOTH code and documentation to match that, even if it means renaming classes and variables, because it's only going to get harder to do that if it's not done now. OK, here is another attempt: # A Cluster has Collections # A Collection is a logical index # A Collection has as many Shards as "numShards" # A Shard is a logical index subset # There are as many physical instances of a given Shard as the Collection's "replicationFactor" # These physical instances are called Replicas # Each Replica contains a Core # A Core is a single physical Lucene index # One Replica in each Shard is labeled a Leader # Any Replica can become a Leader through election if previous Leader goes away # Each Shard has 1 or more Replicas with exactly 1 of those Replicas acting as the Leader I think this is it, no? Visually, by logical role: ||shard 1||shard 2||shard 3|| |leader 1.1|leader 2.1|leader 3.1| |replica 1.2|replica 2.2|replica 3.2| |replica 1.3|replica 2.3|replica 3.3| |replica 1.4|replica 2.4|replica 3.4| |replica 1.5|replica 2.5|replica 3.5| So we would say that the above Collection has: * 3 Shards * 5 Replicas * in each Shard 1 Replica *acts as* a Leader If we ignore roles then this same Collection has the following physical structure: |replica 1.1|replica 2.1|replica 3.1| |replica 1.2|replica 2.2|replica 3.2| |replica 1.3|replica 2.3|replica 3.3| |replica 1.4|replica 2.4|replica 3.4| |replica 1.5|replica 2.5|replica 3.5| Yes/no? was (Author: otis): I am not sure what naming "conventionS" Solr code is using. I know most people are inconsistent and so the code is also often inconsistent. Here we see this inconsistency leads to a lot of confusion. I think it's great Anshum initiated this. My personal preference would be to: * pick the terminology that makes sense and is easy to explain and understand * adjust BOTH code and documentation to match that, even if it means renaming classes and variables, because it's only going to get harder to do that if it's not done now. OK, here is another attempt: # A Cluster has Collections # A Collection is a logical index # A Collection has as many Shards as "numShards" # A Shard is a logical index subset # There are as many physical instances of a given Shard as the Collection's "replicationFactor" # These physical instances are called Replicas # Each Replica contains a Core # A Core is a single physical Lucene index # One Replica in each Shard is labeled a Leader # Any Replica can become a Leader through election if previous Leader goes away # Each Shard has 1 or more Replicas with exactly 1 of those Replicas acting as the Leader I think this is it, no? Visually, by logical role: ||shard 1||shard 2||shard 3|| |leader 1.1|leader 2.1|leader 3.1| |replica 1.2|replica 2.2|replica 3.2| |replica 1.3|replica 2.3|replica 3.3| |replica 1.4|replica 2.4|replica 3.4| |replica 1.5|replica 2.5|replica 3.5| So we would say that the above Collection has: * 3 Shards * 5 Replicas * in each Shard 1 Replica *acts as* a Leader If we ignore roles then this same Collection has the following physical structure: |replica 1.1|replica 2.1|replica 3.1| |replica 1.2|replica 2.2|replica 3.2| |replica 1.3|replica 2.3|replica 3.3| |replica 1.4|replica 2.4|replica 3.4| |replica 1.5|replica 2.5|replica 3.5| Yes/no? > Make the use of Slice and Shard consistent across the code and document base > > > Key: SOLR-4998 > URL: https://issues.apache.org/jira/browse/SOLR-4998 > Project: Solr > Issue Type: Improvement > Components: SolrCloud >Affects Versions: 4.3, 4.3.1 >Reporter: Anshum Gupta > > The interchangeable use of Slice and Shard is pretty confusing at times. We > should define each separately and use the apt term whenever we do so. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additiona
[jira] [Comment Edited] (SOLR-4998) Make the use of Slice and Shard consistent across the code and document base
[ https://issues.apache.org/jira/browse/SOLR-4998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13700329#comment-13700329 ] Otis Gospodnetic edited comment on SOLR-4998 at 7/4/13 8:38 PM: I am not sure what naming "conventionS" Solr code is using. I know most people are inconsistent and so code (in general, not referring specifically to Solr here) is also often inconsistent. Here we see this inconsistency leads to a lot of confusion. I think it's great Anshum initiated this. My personal preference would be to: * pick the terminology that makes sense and is easy to explain and understand * adjust BOTH code and documentation to match that, even if it means renaming classes and variables, because it's only going to get harder to do that if it's not done now. OK, here is another attempt: # A Cluster has Collections # A Collection is a logical index # A Collection has as many Shards as "numShards" # A Shard is a logical index subset # There are as many physical instances of a given Shard as the Collection's "replicationFactor" # These physical instances are called Replicas # The number of Replicas in a Collection equals "numShards * replicationFactor" # Each Replica contains a Core # A Core is a single physical Lucene index # One Replica in each Shard is labeled a Leader # Any Replica can become a Leader through election if previous Leader goes away # Each Shard has 1 or more Replicas with exactly 1 of those Replicas acting as the Leader I think this is it, no? Visually, by logical role: ||shard 1||shard 2||shard 3|| |leader 1.1|leader 2.1|leader 3.1| |replica 1.2|replica 2.2|replica 3.2| |replica 1.3|replica 2.3|replica 3.3| |replica 1.4|replica 2.4|replica 3.4| |replica 1.5|replica 2.5|replica 3.5| So we would say that the above Collection has: * 3 Shards * 5 Replicas * in each Shard 1 Replica *acts as* a Leader If we ignore roles then this same Collection has the following physical structure: |replica 1.1|replica 2.1|replica 3.1| |replica 1.2|replica 2.2|replica 3.2| |replica 1.3|replica 2.3|replica 3.3| |replica 1.4|replica 2.4|replica 3.4| |replica 1.5|replica 2.5|replica 3.5| Yes/no? So I agree, there is really no need for "Slice" here. I already forgot about that term. Problems we'll have: * People will refer to physical copies, those Replicas, as Shards. When they say "Shard" they'll often refer to a specific Replica. I know I always think of each cell in the above table as "Shard", but that's not how we (should) use that term. Shards are just logical. Those cells are Replicas. * We use "Replica" to a physical index, but also use it to describe a non-Leader role. Confusing. If there is a Leader, where are Followers? Would introducing the term "Follower" help? Then we could say/teach people the following: ** When you say "Shard" it just means the logical Collection subset. It's not physical at all. ** If you want to talk about physical indices in a Collection use the term "Replica". They are all Replicas. ** If you want to refer to a Replica by its role, then you've got to say either Leader or Follower. Because if you say "Replica" we won't know whether you are referring to the special Replica that acts as a Leader or all the other ones. I think we'll need to correct this in any docs and will need to correct people on the ML until we get everyone in sync. Any books or articles that have been written with different terminology will be wrong/out of date and will confuse people. Yes/no? was (Author: otis): I am not sure what naming "conventionS" Solr code is using. I know most people are inconsistent and so code (in general, not referring specifically to Solr here) is also often inconsistent. Here we see this inconsistency leads to a lot of confusion. I think it's great Anshum initiated this. My personal preference would be to: * pick the terminology that makes sense and is easy to explain and understand * adjust BOTH code and documentation to match that, even if it means renaming classes and variables, because it's only going to get harder to do that if it's not done now. OK, here is another attempt: # A Cluster has Collections # A Collection is a logical index # A Collection has as many Shards as "numShards" # A Shard is a logical index subset # There are as many physical instances of a given Shard as the Collection's "replicationFactor" # These physical instances are called Replicas # Each Replica contains a Core # A Core is a single physical Lucene index # One Replica in each Shard is labeled a Leader # Any Replica can become a Leader through election if previous Leader goes away # Each Shard has 1 or more Replicas with exactly 1 of those Replicas acting as the Leader I think this is it, no? Visually, by logical role: ||shard 1||shard 2||shard 3|| |leader 1.1|leader 2.1|leader 3.1| |replica 1.2|replica
[jira] [Comment Edited] (SOLR-4998) Make the use of Slice and Shard consistent across the code and document base
[ https://issues.apache.org/jira/browse/SOLR-4998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13700443#comment-13700443 ] Otis Gospodnetic edited comment on SOLR-4998 at 7/5/13 3:57 AM: bq. http://www.elasticsearch.org/guide/reference/glossary/ Much simpler and cleaner, IMHO: * primary shard * replica shard So in ES a primary shard is a real physical thing and a replica is a real physical thing as well. I think that's easier than saying a shard is a logical concept and that inside a shard there are replicas, but some are called leaders and others are calledwell, replicas. So there is no logical vs. physical in ES (see my tables above), it's all just physical: ||primary shard 1||primary shard 2||primary shard 3|| |replica shard 1.1|replica shard 2.1|replica shard 3.1| |replica shard 1.2|replica shard 2.2|replica shard 3.2| That's what you'd get with number_of_shards=3, number_of_replicas=2 So Collection (Index in ES terminology) creation API takes number_of_shards and number_of_replicas parameters and they are used as shown above, which is less confusing than agreeing what replicationFactor means - is it the total number of replicas in a shard or the number of non-leader replicas. It may be too late to change this in SolrCloud now even if everyone agreed... was (Author: otis): bq. http://www.elasticsearch.org/guide/reference/glossary/ Much simpler and cleaner, IMHO: * primary shard * replica shard So in ES a primary shard is a real physical thing and a replica is a real physical thing as well. I think that's easier than saying a shard is a logical concept and that inside a shard there are replicas, but some are called leaders and others are calledwell, replicas. This also allows the Collection (Index in ES terminology) creation API to take num_shards and num_replicas parameters, which is less confusing than agreeing what replicationFactor means - is it the total number of replicas in a shard or the number of non-leader replicas. It may be too late to change this in SolrCloud now even if everyone agreed... > Make the use of Slice and Shard consistent across the code and document base > > > Key: SOLR-4998 > URL: https://issues.apache.org/jira/browse/SOLR-4998 > Project: Solr > Issue Type: Improvement > Components: SolrCloud >Affects Versions: 4.3, 4.3.1 >Reporter: Anshum Gupta > > The interchangeable use of Slice and Shard is pretty confusing at times. We > should define each separately and use the apt term whenever we do so. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-4998) Make the use of Slice and Shard consistent across the code and document base
[ https://issues.apache.org/jira/browse/SOLR-4998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13700450#comment-13700450 ] Anshum Gupta edited comment on SOLR-4998 at 7/5/13 4:27 AM: bq. So I agree, there is really no need for "Slice" here. I already forgot about that term. It's just a term that we used for (what we're now calling) Shards. It's deep in the code. At the same time considering that this has the least exposure to the outside world, it's our best bet at being changed. bq. People will refer to physical copies, those Replicas, as Shards. Exactly what yonik said. There's confusion in the use of the term Shard but I believe it's just a matter of clean documentation. bq. Would introducing the term "Follower" help? I wouldn't want that extra element introduced. A 'leader' is just a specific non-default role for a Replica wherein it does some extra bit. Again, we could just fix our documentation on that. bq. Personally, I'm happy with the current slice/replica terminology in the code and I don't much care if it matches the external doc terminology. +1, but people outside of here rarely use/see 'Slice' and so as Yonik suggested, it seems better to converge towards 'Shard' and 'Replicas'. That is what I'm working on, doing away with 'Slice'. bq. but I certainly don't think its worth breaking all those api's to change the names in the code now - unless it's a couple minor consistency issues. This certainly would mean breaking back-compat with a few things at least. May be more. I am almost half way through and already have a good 250k patch with instances where Slice and Shard are used interchangeably. There are other places where a Replica is referred to as a Shard. So it's just all mixed up. bq. It may be too late to change this in SolrCloud now even if everyone agreed... I don't think we can/should make a change that drastic. As long as it's a little consistent and documented, Shard and Replicas should work fine for us. To get an opinion, do you guys think we shouldn't be 'renaming' public APIs? was (Author: anshumg): bq. So I agree, there is really no need for "Slice" here. I already forgot about that term. It's just a term that we used for (what we're now calling) Shards. It's deep in the code. At the same time considering that this has the least exposure to the outside world, it's our best bet at being changed. bq. People will refer to physical copies, those Replicas, as Shards. Exactly what yonik said. There's confusion in the use of the term Shard but I believe it's just a matter of clean documentation. bq. Would introducing the term "Follower" help? I wouldn't want that extra element introduced. A 'leader' is just a specific non-default role for a Replica wherein it does some extra bit. Again, we could just fix our documentation on that. bq. Personally, I'm happy with the current slice/replica terminology in the code and I don't much care if it matches the external doc terminology. +1, but people outside of here rarely use/see 'Slice' and so as Yonik suggested, it seems better to converge towards 'Shard' and 'Replicas'. That is what I'm working on, doing away with 'Slice'. bq. but I certainly don't think its worth breaking all those api's to change the names in the code now - unless it's a couple minor consistency issues. This certainly would mean breaking back-compat with a few things at least. May be more. I am almost half way through and already have a good 250k patch with instances where Slice and Shard are used interchangeably. There are other places where a Replica is referred to as a Shard. So it's just all mixed up. bq. It may be too late to change this in SolrCloud now even if everyone agreed... I don't think we can/should make a change that drastic. As long as it's a little consistent and documented, Shard and Replicas should work fine for us. To get an opinion, do you guys think we shouldn't be 'touching' public APIs? > Make the use of Slice and Shard consistent across the code and document base > > > Key: SOLR-4998 > URL: https://issues.apache.org/jira/browse/SOLR-4998 > Project: Solr > Issue Type: Improvement > Components: SolrCloud >Affects Versions: 4.3, 4.3.1 >Reporter: Anshum Gupta > > The interchangeable use of Slice and Shard is pretty confusing at times. We > should define each separately and use the apt term whenever we do so. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mai
[jira] [Comment Edited] (SOLR-4998) Make the use of Slice and Shard consistent across the code and document base
[ https://issues.apache.org/jira/browse/SOLR-4998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13702892#comment-13702892 ] Mark Miller edited comment on SOLR-4998 at 7/9/13 4:35 AM: --- Being a coder and a user is two different things in my opinion. As it is we would have to define shard in the code - it's ambiguous - and you are already fighting with preconceived notions of it's definition. In the code, slice is not ambiguous and calls for reading the definition of it in javadoc. It's been around for some time now, and there has been no large outcry. I'm fine with changing these API's for real gains, but I don't find this a gain given the current discussion, so I don't think the disruption in the rename is worth it at all. was (Author: markrmil...@gmail.com): Being a coder and a user is two different things in my opinion. As it is we would have define shard in the code - it's ambiguous - and you are already fighting with preconceived notions of it's definition. In the code, slice is not ambiguous and calls for reading the definition of it in javadoc. It's been around for some time now, and there has been no large outcry. I'm fine with changing these API's for real gains, but I don't find this a gain given the current discussion, so I don't think the disruption in the rename is worth it at all. > Make the use of Slice and Shard consistent across the code and document base > > > Key: SOLR-4998 > URL: https://issues.apache.org/jira/browse/SOLR-4998 > Project: Solr > Issue Type: Improvement > Components: SolrCloud >Affects Versions: 4.3, 4.3.1 >Reporter: Anshum Gupta > > The interchangeable use of Slice and Shard is pretty confusing at times. We > should define each separately and use the apt term whenever we do so. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-4998) Make the use of Slice and Shard consistent across the code and document base
[ https://issues.apache.org/jira/browse/SOLR-4998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13717517#comment-13717517 ] Mark Miller edited comment on SOLR-4998 at 7/23/13 7:36 PM: I think for things like: {noformat} - public static final String MAX_SHARDS_PER_NODE = "maxShardsPerNode"; + public static final String MAX_REPLICAS_PER_NODE = "maxReplicasPerNode"; {noformat} We have to be really careful. Solr does not error/warn on unknown params - existing users might keeping using the existing param for a long time, and not even notice it no longer has an affect. I think if we make any type of change like that, we should be sure to support them as an alias or perhaps explicitly look for the old key and fail if it's found. was (Author: markrmil...@gmail.com): I think for things like: - public static final String MAX_SHARDS_PER_NODE = "maxShardsPerNode"; + public static final String MAX_REPLICAS_PER_NODE = "maxReplicasPerNode"; We have to be really careful. Solr does not error/warn on unknown params - existing users might keeping using the existing param for a long time, and not even notice it no longer has an affect. I think if we make any type of change like that, we should be sure to support them as an alias or perhaps explicitly look for the old key and fail if it's found. > Make the use of Slice and Shard consistent across the code and document base > > > Key: SOLR-4998 > URL: https://issues.apache.org/jira/browse/SOLR-4998 > Project: Solr > Issue Type: Improvement > Components: SolrCloud >Affects Versions: 4.3, 4.3.1 >Reporter: Anshum Gupta > Attachments: SOLR-4998.patch > > > The interchangeable use of Slice and Shard is pretty confusing at times. We > should define each separately and use the apt term whenever we do so. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org