[jira] [Commented] (SOLR-5017) Allow sharding based on the value of a field
[ https://issues.apache.org/jira/browse/SOLR-5017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13738422#comment-13738422 ] Shalin Shekhar Mangar commented on SOLR-5017: - Shard splitting doesn't support collections configured with a hash router and routeField. I'll put up a test and fix. > Allow sharding based on the value of a field > > > Key: SOLR-5017 > URL: https://issues.apache.org/jira/browse/SOLR-5017 > Project: Solr > Issue Type: Sub-task >Reporter: Noble Paul >Assignee: Noble Paul > Fix For: 4.5, 5.0 > > Attachments: SOLR-5017.patch > > > We should be able to create a collection where sharding is done based on the > value of a given field > collections can be created with shardField=fieldName, which will be persisted > in DocCollection in ZK > implicit DocRouter would look at this field instead of _shard_ field > CompositeIdDocRouter can also use this field instead of looking at the id > field. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5017) Allow sharding based on the value of a field
[ https://issues.apache.org/jira/browse/SOLR-5017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13738375#comment-13738375 ] Noble Paul commented on SOLR-5017: -- This is only for SolrCloud deleteById/getById would expect the param \_route_ or shard.keys (deprecated) without which it will have to fan out a distributed request. it works without complaining but will be inefficient > Allow sharding based on the value of a field > > > Key: SOLR-5017 > URL: https://issues.apache.org/jira/browse/SOLR-5017 > Project: Solr > Issue Type: Sub-task >Reporter: Noble Paul >Assignee: Noble Paul > Fix For: 4.5, 5.0 > > Attachments: SOLR-5017.patch > > > We should be able to create a collection where sharding is done based on the > value of a given field > collections can be created with shardField=fieldName, which will be persisted > in DocCollection in ZK > implicit DocRouter would look at this field instead of _shard_ field > CompositeIdDocRouter can also use this field instead of looking at the id > field. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5017) Allow sharding based on the value of a field
[ https://issues.apache.org/jira/browse/SOLR-5017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13738321#comment-13738321 ] Jack Krupansky commented on SOLR-5017: -- Is this feature intended for both traditional Solr sharding as well as SolrCloud? If it is intended for SolrCloud as well, how does delete-by-id work, in the sense that the delete command does not include the field needed to determine routing? > Allow sharding based on the value of a field > > > Key: SOLR-5017 > URL: https://issues.apache.org/jira/browse/SOLR-5017 > Project: Solr > Issue Type: Sub-task >Reporter: Noble Paul >Assignee: Noble Paul > Fix For: 4.5, 5.0 > > Attachments: SOLR-5017.patch > > > We should be able to create a collection where sharding is done based on the > value of a given field > collections can be created with shardField=fieldName, which will be persisted > in DocCollection in ZK > implicit DocRouter would look at this field instead of _shard_ field > CompositeIdDocRouter can also use this field instead of looking at the id > field. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5017) Allow sharding based on the value of a field
[ https://issues.apache.org/jira/browse/SOLR-5017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13737869#comment-13737869 ] ASF subversion and git services commented on SOLR-5017: --- Commit 1513357 from [~noble.paul] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1513357 ] SOLR-5017 support for routeField in COmpositeId router also > Allow sharding based on the value of a field > > > Key: SOLR-5017 > URL: https://issues.apache.org/jira/browse/SOLR-5017 > Project: Solr > Issue Type: Sub-task >Reporter: Noble Paul >Assignee: Noble Paul > Fix For: 4.5, 5.0 > > Attachments: SOLR-5017.patch > > > We should be able to create a collection where sharding is done based on the > value of a given field > collections can be created with shardField=fieldName, which will be persisted > in DocCollection in ZK > implicit DocRouter would look at this field instead of _shard_ field > CompositeIdDocRouter can also use this field instead of looking at the id > field. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5017) Allow sharding based on the value of a field
[ https://issues.apache.org/jira/browse/SOLR-5017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13737865#comment-13737865 ] ASF subversion and git services commented on SOLR-5017: --- Commit 1513356 from [~noble.paul] in branch 'dev/trunk' [ https://svn.apache.org/r1513356 ] SOLR-5017 support for routeField in COmpositeId router also > Allow sharding based on the value of a field > > > Key: SOLR-5017 > URL: https://issues.apache.org/jira/browse/SOLR-5017 > Project: Solr > Issue Type: Sub-task >Reporter: Noble Paul >Assignee: Noble Paul > Fix For: 4.5, 5.0 > > Attachments: SOLR-5017.patch > > > We should be able to create a collection where sharding is done based on the > value of a given field > collections can be created with shardField=fieldName, which will be persisted > in DocCollection in ZK > implicit DocRouter would look at this field instead of _shard_ field > CompositeIdDocRouter can also use this field instead of looking at the id > field. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5017) Allow sharding based on the value of a field
[ https://issues.apache.org/jira/browse/SOLR-5017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13729367#comment-13729367 ] ASF subversion and git services commented on SOLR-5017: --- Commit 1510421 from [~noble.paul] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1510421 ] updating CHANGES.txt regarding deprecation of shar.keys' param SOLR-5017 > Allow sharding based on the value of a field > > > Key: SOLR-5017 > URL: https://issues.apache.org/jira/browse/SOLR-5017 > Project: Solr > Issue Type: Sub-task >Reporter: Noble Paul >Assignee: Noble Paul > Fix For: 4.5, 5.0 > > > We should be able to create a collection where sharding is done based on the > value of a given field > collections can be created with shardField=fieldName, which will be persisted > in DocCollection in ZK > implicit DocRouter would look at this field instead of _shard_ field > CompositeIdDocRouter can also use this field instead of looking at the id > field. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5017) Allow sharding based on the value of a field
[ https://issues.apache.org/jira/browse/SOLR-5017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13729366#comment-13729366 ] ASF subversion and git services commented on SOLR-5017: --- Commit 1510420 from [~noble.paul] in branch 'dev/trunk' [ https://svn.apache.org/r1510420 ] updating CHANGES.txt regarding deprecation of shar.keys' param SOLR-5017 > Allow sharding based on the value of a field > > > Key: SOLR-5017 > URL: https://issues.apache.org/jira/browse/SOLR-5017 > Project: Solr > Issue Type: Sub-task >Reporter: Noble Paul >Assignee: Noble Paul > Fix For: 4.5, 5.0 > > > We should be able to create a collection where sharding is done based on the > value of a given field > collections can be created with shardField=fieldName, which will be persisted > in DocCollection in ZK > implicit DocRouter would look at this field instead of _shard_ field > CompositeIdDocRouter can also use this field instead of looking at the id > field. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5017) Allow sharding based on the value of a field
[ https://issues.apache.org/jira/browse/SOLR-5017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13725615#comment-13725615 ] Noble Paul commented on SOLR-5017: -- It is now possible to create a collection with an extra parameter 'routeField' . 'implicit' router would look into that field for routing any document.The value of the field will be the name of the shard where it belongs to. If the collection is created with 'routeField' other routing params are not honored This deprecates the 'shard.keys' parameter for routing queries in favor of a parameter called '_route_' . 'shard.keys' will continue to work for another release , though > Allow sharding based on the value of a field > > > Key: SOLR-5017 > URL: https://issues.apache.org/jira/browse/SOLR-5017 > Project: Solr > Issue Type: Sub-task >Reporter: Noble Paul >Assignee: Noble Paul > Fix For: 4.5, 5.0 > > > We should be able to create a collection where sharding is done based on the > value of a given field > collections can be created with shardField=fieldName, which will be persisted > in DocCollection in ZK > implicit DocRouter would look at this field instead of _shard_ field > CompositeIdDocRouter can also use this field instead of looking at the id > field. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5017) Allow sharding based on the value of a field
[ https://issues.apache.org/jira/browse/SOLR-5017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13725597#comment-13725597 ] Jack Krupansky commented on SOLR-5017: -- It seems like there was a lot of discussion that was never resolved, and now the issue is marked as "fixed", with no discussion or summary of how the discussion points were addressed or resolved (or ignored!). A short summary would be nice. > Allow sharding based on the value of a field > > > Key: SOLR-5017 > URL: https://issues.apache.org/jira/browse/SOLR-5017 > Project: Solr > Issue Type: Sub-task >Reporter: Noble Paul >Assignee: Noble Paul > Fix For: 4.5, 5.0 > > > We should be able to create a collection where sharding is done based on the > value of a given field > collections can be created with shardField=fieldName, which will be persisted > in DocCollection in ZK > implicit DocRouter would look at this field instead of _shard_ field > CompositeIdDocRouter can also use this field instead of looking at the id > field. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5017) Allow sharding based on the value of a field
[ https://issues.apache.org/jira/browse/SOLR-5017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13725561#comment-13725561 ] Noble Paul commented on SOLR-5017: -- The issue fixes the case of 'implicit' router only . will resolve after the same is done for compositeId router too > Allow sharding based on the value of a field > > > Key: SOLR-5017 > URL: https://issues.apache.org/jira/browse/SOLR-5017 > Project: Solr > Issue Type: Sub-task >Reporter: Noble Paul >Assignee: Noble Paul > > We should be able to create a collection where sharding is done based on the > value of a given field > collections can be created with shardField=fieldName, which will be persisted > in DocCollection in ZK > implicit DocRouter would look at this field instead of _shard_ field > CompositeIdDocRouter can also use this field instead of looking at the id > field. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5017) Allow sharding based on the value of a field
[ https://issues.apache.org/jira/browse/SOLR-5017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13725553#comment-13725553 ] ASF subversion and git services commented on SOLR-5017: --- Commit 1508981 from [~noble.paul] in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1508981 ] SOLR-4221 SOLR-4808 SOLR-5006 SOLR-5017 SOLR-4222 > Allow sharding based on the value of a field > > > Key: SOLR-5017 > URL: https://issues.apache.org/jira/browse/SOLR-5017 > Project: Solr > Issue Type: Sub-task >Reporter: Noble Paul >Assignee: Noble Paul > > We should be able to create a collection where sharding is done based on the > value of a given field > collections can be created with shardField=fieldName, which will be persisted > in DocCollection in ZK > implicit DocRouter would look at this field instead of _shard_ field > CompositeIdDocRouter can also use this field instead of looking at the id > field. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5017) Allow sharding based on the value of a field
[ https://issues.apache.org/jira/browse/SOLR-5017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13725497#comment-13725497 ] ASF subversion and git services commented on SOLR-5017: --- Commit 1508968 from [~noble.paul] in branch 'dev/trunk' [ https://svn.apache.org/r1508968 ] SOLR-4221 SOLR-4808 SOLR-5006 SOLR-5017 SOLR-4222 > Allow sharding based on the value of a field > > > Key: SOLR-5017 > URL: https://issues.apache.org/jira/browse/SOLR-5017 > Project: Solr > Issue Type: Sub-task >Reporter: Noble Paul >Assignee: Noble Paul > > We should be able to create a collection where sharding is done based on the > value of a given field > collections can be created with shardField=fieldName, which will be persisted > in DocCollection in ZK > implicit DocRouter would look at this field instead of _shard_ field > CompositeIdDocRouter can also use this field instead of looking at the id > field. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5017) Allow sharding based on the value of a field
[ https://issues.apache.org/jira/browse/SOLR-5017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13704718#comment-13704718 ] Jack Krupansky commented on SOLR-5017: -- bq. ImplicitDocRouter I started referring to this as "manual routing", meaning that Solr cannot automatically figure out which shard a document is in unless the user manually/explicitly specifies the shard. Overall, I would say that we have this menu of routing techniques: 1. Manual URL, specifying the shard URL or directing the request to the shard URL. 2. Manual shard ID, specifying the shard ID/name as a parameter. SolrJ or the receiving node can look up the shard URL in clusterstate. 3. Fully automatic, hashing the full, raw ID key value. 4. Directed automatic or key-directed automatic, hashing the "!" prefix of the composite key value. (I called this "explicit routing" at one point.) 5. Field-directed automatic, the proposal for using a non-ID field's value for the surrogate key to hash. As far as the atomic update issue for field-directed routing, there are three choices: 1. Update request includes the specified alternative (non-ID) routing field. 2. If not present, a "shard" parameter would be required, specifying either the shard ID or the surrogate key value to be hashed. 3. If neither is present, an error. That still leaves the update issue of changing the field-directed key value. This is not just an atomic update issue - replacing the full document also has this problem, when the specified routing field value changes, which may mean that the updated document now belongs in another shard. > Allow sharding based on the value of a field > > > Key: SOLR-5017 > URL: https://issues.apache.org/jira/browse/SOLR-5017 > Project: Solr > Issue Type: Sub-task >Reporter: Noble Paul >Assignee: Noble Paul > > We should be able to create a collection where sharding is done based on the > value of a given field > collections can be created with shardField=fieldName, which will be persisted > in DocCollection in ZK > implicit DocRouter would look at this field instead of _shard_ field > CompositeIdDocRouter can also use this field instead of looking at the id > field. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5017) Allow sharding based on the value of a field
[ https://issues.apache.org/jira/browse/SOLR-5017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13704614#comment-13704614 ] Yonik Seeley commented on SOLR-5017: bq. 2) An ImplicitDocRouter (or is it ExplicitRouter) It's implicit if the target shard is implicitly defined by what shard received the update. It's explicit if you give it an explicit value (which makes the name "implicit" kind of not-so-good at that point). We could change the name of that too if we want (and make it so that "implicit" still works as an alias for back compat). > Allow sharding based on the value of a field > > > Key: SOLR-5017 > URL: https://issues.apache.org/jira/browse/SOLR-5017 > Project: Solr > Issue Type: Sub-task >Reporter: Noble Paul >Assignee: Noble Paul > > We should be able to create a collection where sharding is done based on the > value of a given field > collections can be created with shardField=fieldName, which will be persisted > in DocCollection in ZK > implicit DocRouter would look at this field instead of _shard_ field > CompositeIdDocRouter can also use this field instead of looking at the id > field. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5017) Allow sharding based on the value of a field
[ https://issues.apache.org/jira/browse/SOLR-5017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13704608#comment-13704608 ] Noble Paul commented on SOLR-5017: -- Speaking of the the best option my 2 cents 2 routers 1) A HashDocRouter 2) An ImplicitDocRouter (or is it ExplicitRouter) Both honors the shardField or (routeField) param . one uses the value verbatim whereas the other uses the hash of the field value HashDocRouter honors the special id format with "!" . \_route_ param can be used and will be honored by all routers always in add/update/query/getbyid et al. HashDocRouter uses the hash of the value whereas ImplicitDocROuter uses the value verbatim > Allow sharding based on the value of a field > > > Key: SOLR-5017 > URL: https://issues.apache.org/jira/browse/SOLR-5017 > Project: Solr > Issue Type: Sub-task >Reporter: Noble Paul >Assignee: Noble Paul > > We should be able to create a collection where sharding is done based on the > value of a given field > collections can be created with shardField=fieldName, which will be persisted > in DocCollection in ZK > implicit DocRouter would look at this field instead of _shard_ field > CompositeIdDocRouter can also use this field instead of looking at the id > field. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5017) Allow sharding based on the value of a field
[ https://issues.apache.org/jira/browse/SOLR-5017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13704598#comment-13704598 ] Jack Krupansky commented on SOLR-5017: -- bq. If the id no longer contains enough information to tell what shard it's on... Great point. Automatic routing needs to be able to work when presented with just the ID field. An atomic update is a great example - the shard field may not be available on the client. Better to just forever say that automatic routing needs to be based solely on the ID key value, and that if the app needs to use the value of another field for routing, they absolutely do need to use a "composite key" with the routing key prepended to the nominal key value. OTOH, maybe they might want to use some other subset of the key value for router, such as a product category that is a part of a SKU used as the ID key. I think the idea there is that this would be custom sharding that uses most of the logic of CompositeID routing, but just different logic for how to extract the routing key from the full ID key value. Manual or custom routing is another story. There, the user can use whatever contrived "rules" they want. > Allow sharding based on the value of a field > > > Key: SOLR-5017 > URL: https://issues.apache.org/jira/browse/SOLR-5017 > Project: Solr > Issue Type: Sub-task >Reporter: Noble Paul >Assignee: Noble Paul > > We should be able to create a collection where sharding is done based on the > value of a given field > collections can be created with shardField=fieldName, which will be persisted > in DocCollection in ZK > implicit DocRouter would look at this field instead of _shard_ field > CompositeIdDocRouter can also use this field instead of looking at the id > field. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5017) Allow sharding based on the value of a field
[ https://issues.apache.org/jira/browse/SOLR-5017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13704595#comment-13704595 ] Yonik Seeley commented on SOLR-5017: bq. > Perhaps that should be a different router... compositeField rather than compositeId. bq. Too many routers can be confusing to users. Heh - my favorite argument. "confusing to users" can be trotted out in any context ;-) Too many options can be just as confusing... 3 routers with 5 options each vs 5 routers with 3 or whatever. Let's talk about the *best* option. If we have a good default and good documentation, confusion shouldn't enter the equation. As far as compositeId router goes, I'm not sure I care if we create a new compositeField router or if we add more parameters / functionality to compositeId. Giving the exact same \_shard\_ parameter should give the exact same hash code though - it shouldn't just be the first part of a composite id. > Allow sharding based on the value of a field > > > Key: SOLR-5017 > URL: https://issues.apache.org/jira/browse/SOLR-5017 > Project: Solr > Issue Type: Sub-task >Reporter: Noble Paul >Assignee: Noble Paul > > We should be able to create a collection where sharding is done based on the > value of a given field > collections can be created with shardField=fieldName, which will be persisted > in DocCollection in ZK > implicit DocRouter would look at this field instead of _shard_ field > CompositeIdDocRouter can also use this field instead of looking at the id > field. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5017) Allow sharding based on the value of a field
[ https://issues.apache.org/jira/browse/SOLR-5017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13704592#comment-13704592 ] Noble Paul commented on SOLR-5017: -- bq.Having to specify extra information is what seems odd to me, and greatly complicates clients. We already pass extra info if the lookup is not by id. lookup by id is a small feature for a solr. > Allow sharding based on the value of a field > > > Key: SOLR-5017 > URL: https://issues.apache.org/jira/browse/SOLR-5017 > Project: Solr > Issue Type: Sub-task >Reporter: Noble Paul >Assignee: Noble Paul > > We should be able to create a collection where sharding is done based on the > value of a given field > collections can be created with shardField=fieldName, which will be persisted > in DocCollection in ZK > implicit DocRouter would look at this field instead of _shard_ field > CompositeIdDocRouter can also use this field instead of looking at the id > field. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5017) Allow sharding based on the value of a field
[ https://issues.apache.org/jira/browse/SOLR-5017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13704586#comment-13704586 ] Noble Paul commented on SOLR-5017: -- bq.I think we should use the same parameter name for query requests too (i.e. deprecate "shard.keys") Tha's it. I just wanted one parameter for routing either \_shard_ or something else . lets use \_route_ for all routers and deprecate shard.keys . > Allow sharding based on the value of a field > > > Key: SOLR-5017 > URL: https://issues.apache.org/jira/browse/SOLR-5017 > Project: Solr > Issue Type: Sub-task >Reporter: Noble Paul >Assignee: Noble Paul > > We should be able to create a collection where sharding is done based on the > value of a given field > collections can be created with shardField=fieldName, which will be persisted > in DocCollection in ZK > implicit DocRouter would look at this field instead of _shard_ field > CompositeIdDocRouter can also use this field instead of looking at the id > field. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5017) Allow sharding based on the value of a field
[ https://issues.apache.org/jira/browse/SOLR-5017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13704561#comment-13704561 ] Yonik Seeley commented on SOLR-5017: bq. It is very confusing to have these names behaving differently in different routers. Not sure I understand... we should definitely have the same parameters behaving in the same way across all the routers. \_shard\_ should work across all routers. I understand the naming issue though... (the fact that _shard_ is just the input to the router, not the actual shard name unless you're using the implicit router). \_shard\_ hasn't even really been documented yet I don't think... it's possible we could change it to \_routing\_ or \_route\_ bq. Should we rather not use the other parameter \'shard.keys across router names , query and update requests . I think we should use the same parameter name for query requests too (i.e. deprecate "shard.keys") > Allow sharding based on the value of a field > > > Key: SOLR-5017 > URL: https://issues.apache.org/jira/browse/SOLR-5017 > Project: Solr > Issue Type: Sub-task >Reporter: Noble Paul >Assignee: Noble Paul > > We should be able to create a collection where sharding is done based on the > value of a given field > collections can be created with shardField=fieldName, which will be persisted > in DocCollection in ZK > implicit DocRouter would look at this field instead of _shard_ field > CompositeIdDocRouter can also use this field instead of looking at the id > field. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5017) Allow sharding based on the value of a field
[ https://issues.apache.org/jira/browse/SOLR-5017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13704548#comment-13704548 ] Yonik Seeley commented on SOLR-5017: bq. Having to manually (and forever) muck up the ID field values for routing always seemed rather odd to me. Having to specify extra information is what seems odd to me, and greatly complicates clients. Say I have a basic client that wants to do a simple get by id, or a simple delete by id. If the id no longer contains enough information to tell what shard it's on, we need to start broadcasting gets and deletes or something. > Allow sharding based on the value of a field > > > Key: SOLR-5017 > URL: https://issues.apache.org/jira/browse/SOLR-5017 > Project: Solr > Issue Type: Sub-task >Reporter: Noble Paul >Assignee: Noble Paul > > We should be able to create a collection where sharding is done based on the > value of a given field > collections can be created with shardField=fieldName, which will be persisted > in DocCollection in ZK > implicit DocRouter would look at this field instead of _shard_ field > CompositeIdDocRouter can also use this field instead of looking at the id > field. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5017) Allow sharding based on the value of a field
[ https://issues.apache.org/jira/browse/SOLR-5017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13704547#comment-13704547 ] Noble Paul commented on SOLR-5017: -- bq.Do you have such a system? . Yes. I had . The entire Aol mail system already has billions of documents where id is immutable and referenced in code. While I was there I hacked solr to a field based sharding scheme. A lot of users will not have that expertise or patience bq.Don't really know, I've been meaning to dive into that patch but haven't. IIRC , SolJ consults the DocRouter to identify the target slice/leader .If future patches need it they too should. bq.But I fear at this point that having two ways of routing things around We already have multiple ways of routing things after SOLR-4221 is in place (next release will have it . Custom Sharding does not have a 'mangled id' concept as of now. It is not going to impact anyone who is already using the current scheme with compositeId. You will need to create your cluster explicitly with that option (which will be new users) . We will solve any problems as we go along bq.Premature optimization? This is not optimization. I'm just trying to be intuitive and user-friendly . AFAIK Almost all nosql systems do grouping on the basis of some field value . bq.what happens when a document is updated and the value of this field changes? Good question. It should be dealt in exactly the same way 'id' updates are handled today > Allow sharding based on the value of a field > > > Key: SOLR-5017 > URL: https://issues.apache.org/jira/browse/SOLR-5017 > Project: Solr > Issue Type: Sub-task >Reporter: Noble Paul >Assignee: Noble Paul > > We should be able to create a collection where sharding is done based on the > value of a given field > collections can be created with shardField=fieldName, which will be persisted > in DocCollection in ZK > implicit DocRouter would look at this field instead of _shard_ field > CompositeIdDocRouter can also use this field instead of looking at the id > field. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5017) Allow sharding based on the value of a field
[ https://issues.apache.org/jira/browse/SOLR-5017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13704536#comment-13704536 ] Yonik Seeley commented on SOLR-5017: bq. What if I to have a clean 'id' value which is devoid of extra information? Should I do id.substring(id.indexOf("!") everytime I use it elsewhere ? Why would you have to do that? If "!" appears in the ID field by accident sometimes, everything still works as expected with the compositeId router - that's why it's the default. > Allow sharding based on the value of a field > > > Key: SOLR-5017 > URL: https://issues.apache.org/jira/browse/SOLR-5017 > Project: Solr > Issue Type: Sub-task >Reporter: Noble Paul >Assignee: Noble Paul > > We should be able to create a collection where sharding is done based on the > value of a given field > collections can be created with shardField=fieldName, which will be persisted > in DocCollection in ZK > implicit DocRouter would look at this field instead of _shard_ field > CompositeIdDocRouter can also use this field instead of looking at the id > field. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5017) Allow sharding based on the value of a field
[ https://issues.apache.org/jira/browse/SOLR-5017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13704533#comment-13704533 ] Erick Erickson commented on SOLR-5017: -- bq: If I have a already working system where ids cannot be changed, I have no option with the current scheme of things . _Do_ you have such a system? Theoretically I agree. But it also seems like this change has enough edge cases that it might be better to wait and see whether there's enough pressure to move this forward before trying to anticipate problems. Premature optimization? bq: If your code is using that API then your code should continue to work right... Don't really know, I've been meaning to dive into that patch but haven't. It's on the SolrJ side, mostly I'm using it as an example of a place things can get out of synch. I'm sure there are others. bq: What if I to have a clean 'id' value which is devoid of extra information? Should I do id.substring(id.indexOf("!") everytime I use it elsewhere ? Yeah, that's a pain. But perhaps not as much as trying to maintain two schemes to route documents and deal with the issues that are sure to come up. Frankly I don't have a firm sense of which is better/worse, my antenna are just quivering based on introducing a feature that'll have repercussions before there's a demonstrated need. I've gotten myself into trouble too often doing that... bq: what happens when a document is updated and the value of this field changes? This is exactly what I'm talking about, I'm afraid the edge cases will go on forever (or nearly). An N+1 kind of thing. All that said, I'm not totally against the idea. In fact I kind of wish a separate "routing field" was the way it was implemented in the first place. But did I think to suggest it when it first started to be implemented? Nooo. But I fear at this point that having two ways of routing things around without a compelling _existing_ use case will generate a lot of work, lots of ongoing maintenance and the effort could well be spent elsewhere in the near term. But since I'm not volunteering to do the work, I really don't have all that much to say. > Allow sharding based on the value of a field > > > Key: SOLR-5017 > URL: https://issues.apache.org/jira/browse/SOLR-5017 > Project: Solr > Issue Type: Sub-task >Reporter: Noble Paul >Assignee: Noble Paul > > We should be able to create a collection where sharding is done based on the > value of a given field > collections can be created with shardField=fieldName, which will be persisted > in DocCollection in ZK > implicit DocRouter would look at this field instead of _shard_ field > CompositeIdDocRouter can also use this field instead of looking at the id > field. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5017) Allow sharding based on the value of a field
[ https://issues.apache.org/jira/browse/SOLR-5017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13704506#comment-13704506 ] Jack Krupansky commented on SOLR-5017: -- bq. there's no case made for why this is a better thing than using the current ! syntax Logically, I think it makes perfect sense to be able to declare what field should be used for "grouping" of documents, and that some apps want more of a functional grouping (e.g., by department or product category.) Having to manually (and forever) muck up the ID field values for routing always seemed rather odd to me. Maybe the latter has some utility on its own, but the former seems more sensible to me. And, then there is the issue of how to change the shard of an existing document that was, in terms I use, "explicitly routed", using the "!" notation. I mean, if the ID of that document is referenced in other documents, all of those other documents would need to be manually updated as well. Before the introduction of the "!" notation, key values were completely application controlled, but with "!", suddenly Solr interjects itself into the ID generation process. Some day... even Data Import Handler users are going to start flooding the Solr-user email list with questions about how to set and change routing and why key values containing "!" seem to be causing SolrCloud to be distributing documents to shards in an unexpected manner (because they didn't know about the "!" notation.) > Allow sharding based on the value of a field > > > Key: SOLR-5017 > URL: https://issues.apache.org/jira/browse/SOLR-5017 > Project: Solr > Issue Type: Sub-task >Reporter: Noble Paul >Assignee: Noble Paul > > We should be able to create a collection where sharding is done based on the > value of a given field > collections can be created with shardField=fieldName, which will be persisted > in DocCollection in ZK > implicit DocRouter would look at this field instead of _shard_ field > CompositeIdDocRouter can also use this field instead of looking at the id > field. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5017) Allow sharding based on the value of a field
[ https://issues.apache.org/jira/browse/SOLR-5017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13704493#comment-13704493 ] Jack Krupansky commented on SOLR-5017: -- Hmmm... what happens when a document is updated and the value of this field changes? The update request would need to go to both the "new" shard to add the document, and the "old" shard to delete it, right? And for atomic update when the shard field is updated to a value that hashes to a different shard? The existing field values need to be read from the "old" shard and then all values written to the new shard? > Allow sharding based on the value of a field > > > Key: SOLR-5017 > URL: https://issues.apache.org/jira/browse/SOLR-5017 > Project: Solr > Issue Type: Sub-task >Reporter: Noble Paul >Assignee: Noble Paul > > We should be able to create a collection where sharding is done based on the > value of a given field > collections can be created with shardField=fieldName, which will be persisted > in DocCollection in ZK > implicit DocRouter would look at this field instead of _shard_ field > CompositeIdDocRouter can also use this field instead of looking at the id > field. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5017) Allow sharding based on the value of a field
[ https://issues.apache.org/jira/browse/SOLR-5017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13704491#comment-13704491 ] Noble Paul commented on SOLR-5017: -- bq.What can be accomplished by this that cannot be accomplished with the current syntax? * If I have a already working system where ids cannot be changed, I have no option with the current scheme of things . * What if I to have a clean 'id' value which is devoid of extra information? Should I do id.substring(id.indexOf("!") everytime I use it elsewhere ? bq.One place where it'll be easy to get wrong AFAIK everyone relies on the DocRouter to identify the right shard . If your code is using that API then your code should continue to work right > Allow sharding based on the value of a field > > > Key: SOLR-5017 > URL: https://issues.apache.org/jira/browse/SOLR-5017 > Project: Solr > Issue Type: Sub-task >Reporter: Noble Paul >Assignee: Noble Paul > > We should be able to create a collection where sharding is done based on the > value of a given field > collections can be created with shardField=fieldName, which will be persisted > in DocCollection in ZK > implicit DocRouter would look at this field instead of _shard_ field > CompositeIdDocRouter can also use this field instead of looking at the id > field. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5017) Allow sharding based on the value of a field
[ https://issues.apache.org/jira/browse/SOLR-5017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13704485#comment-13704485 ] Erick Erickson commented on SOLR-5017: -- What can be accomplished by this that cannot be accomplished with the current syntax? Weighing in late, but scanning the comments, there's no case made for why this is a better thing than using the current ! syntax. From what I can see, simplistically it looks like putting what's on the left of the ! in its own field (not a nuanced statement). And I'm neutral-to-negative on it without a compelling use-case that couldn't be handled by the current syntax, mostly from the perspective that I'd rather see "one true way" of accomplishing something than two that can get out of synch. And they will. I can imagine getting shard splitting, routing and all that stuff right in one but not the other. One place where it'll be easy to get wrong: Joel is working on routing from the client so updates go to the right leader. We'll have to put this logic in that code too. I'm not sure the functionality is worth the complication, but maybe that's just because routing gives me a headache. All of the complexifications I imagine can be addressed, but is it worth the effort? Without a compelling use-case for why I don't think so. FWIW, Erick > Allow sharding based on the value of a field > > > Key: SOLR-5017 > URL: https://issues.apache.org/jira/browse/SOLR-5017 > Project: Solr > Issue Type: Sub-task >Reporter: Noble Paul >Assignee: Noble Paul > > We should be able to create a collection where sharding is done based on the > value of a given field > collections can be created with shardField=fieldName, which will be persisted > in DocCollection in ZK > implicit DocRouter would look at this field instead of _shard_ field > CompositeIdDocRouter can also use this field instead of looking at the id > field. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5017) Allow sharding based on the value of a field
[ https://issues.apache.org/jira/browse/SOLR-5017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13704223#comment-13704223 ] Noble Paul commented on SOLR-5017: -- bq.I think it should work simpler... shard is used as the whole value to hash on for any hash based router. Should the field based sharding be any less powerful than compositeId? Or do we want to have configure multiple fields like shardField=primaryShardFIeld,secondaryShardField instead of separating the values with a '/' bq.Perhaps that should be a different router... compositeField rather than compositeId. Too many routers can be confusing to users. Essentially it is a hash router. The only difference is where the value is obtained for hashing. It could be from an 'id' ( which is the default) or it can be from a separate field. We probably should rename the CompositeIdRouter to HashRouter instead of having multiple routers doing slightly different things. In reality , it is not a CompositeFieldRouter, it is just a FieldHashRouter bq.For the implicit router. For a hash based router, it should be the value that is hashed to then lookup the shard based on ranges. I understand that. I'm worried about the name. Should we rather not use the other parameter \'shard.keys across router names , query and update requests . It is very confusing to have these names behaving differently in different routers. I'm all for changing the param from \_shard_ to \'shard.keys' and keeping it consistent between all routers > Allow sharding based on the value of a field > > > Key: SOLR-5017 > URL: https://issues.apache.org/jira/browse/SOLR-5017 > Project: Solr > Issue Type: Sub-task >Reporter: Noble Paul >Assignee: Noble Paul > > We should be able to create a collection where sharding is done based on the > value of a given field > collections can be created with shardField=fieldName, which will be persisted > in DocCollection in ZK > implicit DocRouter would look at this field instead of _shard_ field > CompositeIdDocRouter can also use this field instead of looking at the id > field. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5017) Allow sharding based on the value of a field
[ https://issues.apache.org/jira/browse/SOLR-5017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13703653#comment-13703653 ] Yonik Seeley commented on SOLR-5017: bq. the _shard_ parameter is the actual name of the shard. For the implicit router. For a hash based router, it should be the value that is hashed to then lookup the shard based on ranges. bq. In case of compositeId router , I would like to read the part before the (!) to be read from the 'shardField'. I think it should work simpler... _shard_ is used as the whole value to hash on for any hash based router. It's simple - if you want to have doc B have the exact same hash as doc A, then you give _shard_=A when adding doc B. bq. I would like to read the part before the (!) to be read from the 'shardField'. Perhaps that should be a different router... compositeField rather than compositeId. > Allow sharding based on the value of a field > > > Key: SOLR-5017 > URL: https://issues.apache.org/jira/browse/SOLR-5017 > Project: Solr > Issue Type: Sub-task >Reporter: Noble Paul >Assignee: Noble Paul > > We should be able to create a collection where sharding is done based on the > value of a given field > collections can be created with shardField=fieldName, which will be persisted > in DocCollection in ZK > implicit DocRouter would look at this field instead of _shard_ field > CompositeIdDocRouter can also use this field instead of looking at the id > field. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5017) Allow sharding based on the value of a field
[ https://issues.apache.org/jira/browse/SOLR-5017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13703246#comment-13703246 ] Noble Paul commented on SOLR-5017: -- if a collection is created with the shardField value, it is a required param for all docs.If the field is null the document addition fails. No more lookup for "!" anymore. > Allow sharding based on the value of a field > > > Key: SOLR-5017 > URL: https://issues.apache.org/jira/browse/SOLR-5017 > Project: Solr > Issue Type: Sub-task >Reporter: Noble Paul >Assignee: Noble Paul > > We should be able to create a collection where sharding is done based on the > value of a given field > collections can be created with shardField=fieldName, which will be persisted > in DocCollection in ZK > implicit DocRouter would look at this field instead of _shard_ field > CompositeIdDocRouter can also use this field instead of looking at the id > field. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5017) Allow sharding based on the value of a field
[ https://issues.apache.org/jira/browse/SOLR-5017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13703236#comment-13703236 ] Jack Krupansky commented on SOLR-5017: -- bq. Not sure what you mean by "explicit routing" I mean where the user has placed a prefix and "!" in front of a key value. Granted, it isn't explicitly stating the shard, and is really simply a "surrogate" key value to use for sharding. Is there better terminology for the fact that they used the "!" notation? Question for Noble: If a shard field is specified and there is a "!" on a document key, which takes precedence? > Allow sharding based on the value of a field > > > Key: SOLR-5017 > URL: https://issues.apache.org/jira/browse/SOLR-5017 > Project: Solr > Issue Type: Sub-task >Reporter: Noble Paul >Assignee: Noble Paul > > We should be able to create a collection where sharding is done based on the > value of a given field > collections can be created with shardField=fieldName, which will be persisted > in DocCollection in ZK > implicit DocRouter would look at this field instead of _shard_ field > CompositeIdDocRouter can also use this field instead of looking at the id > field. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5017) Allow sharding based on the value of a field
[ https://issues.apache.org/jira/browse/SOLR-5017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13703039#comment-13703039 ] Noble Paul commented on SOLR-5017: -- bq.I could see by default, the compositeId router also paying attention to the _shard_ parameter the _shard_ parameter is the actual name of the shard. In case of compositeId router , the client is agnostic of the shard name and all that it cares about is shard.keys. What I mean to say is, the name _shard_ can be a bit confusing As of now we don't have a plan on how to do shard splitting for 'implicit' router. Let's keep it as TBD In case of compositeId router , I would like to read the part before the (!) to be read from the 'shardField'. The semantics will be exactly same as it is now. Reading the value from a request parameter would mean we will need to persist it along with the document in some field . > Allow sharding based on the value of a field > > > Key: SOLR-5017 > URL: https://issues.apache.org/jira/browse/SOLR-5017 > Project: Solr > Issue Type: Sub-task >Reporter: Noble Paul >Assignee: Noble Paul > > We should be able to create a collection where sharding is done based on the > value of a given field > collections can be created with shardField=fieldName, which will be persisted > in DocCollection in ZK > implicit DocRouter would look at this field instead of _shard_ field > CompositeIdDocRouter can also use this field instead of looking at the id > field. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5017) Allow sharding based on the value of a field
[ https://issues.apache.org/jira/browse/SOLR-5017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13702659#comment-13702659 ] Yonik Seeley commented on SOLR-5017: bq. CompositeIdDocRouter can also use this field instead of looking at the id field. Agree - I could see by default, the compositeId router also paying attention to the \_shard\_ parameter (as the implicit router does). Even if the implicit router is configured to pay attention to a field other than \_shard\_ in the document, it should still use _shard_ when looking at query parameters. This has some downsides thought too - related to splits and how to calculate the has (store the _shard_ param when explicitly specified as a column? store the calculated hash as a column?) bq. Does this proposal eliminate the need to do explicit routing in the key values? Not sure what you mean by "explicit routing" but if you mean the compositeId stuff, no. That has a lot of benefits and will remain the default. > Allow sharding based on the value of a field > > > Key: SOLR-5017 > URL: https://issues.apache.org/jira/browse/SOLR-5017 > Project: Solr > Issue Type: Sub-task >Reporter: Noble Paul >Assignee: Noble Paul > > We should be able to create a collection where sharding is done based on the > value of a given field > collections can be created with shardField=fieldName, which will be persisted > in DocCollection in ZK > implicit DocRouter would look at this field instead of _shard_ field > CompositeIdDocRouter can also use this field instead of looking at the id > field. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5017) Allow sharding based on the value of a field
[ https://issues.apache.org/jira/browse/SOLR-5017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13702536#comment-13702536 ] Jack Krupansky commented on SOLR-5017: -- Will SplitShard preserve the grouping by field value? I imagine it would, but... In other words, if an app uses a field to preserve grouping of similar documents on the same shard, SplitShard should preserve that grouping on a split, right? As long as the SplitShard code knows that it is supposed to used the specified alternative sharding field, things should be okay. > Allow sharding based on the value of a field > > > Key: SOLR-5017 > URL: https://issues.apache.org/jira/browse/SOLR-5017 > Project: Solr > Issue Type: Sub-task >Reporter: Noble Paul >Assignee: Noble Paul > > We should be able to create a collection where sharding is done based on the > value of a given field > collections can be created with shardField=fieldName, which will be persisted > in DocCollection in ZK > implicit DocRouter would look at this field instead of _shard_ field > CompositeIdDocRouter can also use this field instead of looking at the id > field. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5017) Allow sharding based on the value of a field
[ https://issues.apache.org/jira/browse/SOLR-5017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13702197#comment-13702197 ] Jack Krupansky commented on SOLR-5017: -- Does this proposal eliminate the need to do explicit routing in the key values? So, instead of having to say "my-value!key-value" for the key value when some other field already has "my-value" in it, I can just leave my key as "key-value" and with this proposal Solr would read that other field to get "my-value" and use it for sharding? > Allow sharding based on the value of a field > > > Key: SOLR-5017 > URL: https://issues.apache.org/jira/browse/SOLR-5017 > Project: Solr > Issue Type: Sub-task >Reporter: Noble Paul >Assignee: Noble Paul > > We should be able to create a collection where sharding is done based on the > value of a given field > collections can be created with shardField=fieldName, which will be persisted > in DocCollection in ZK > implicit DocRouter would look at this field instead of _shard_ field > CompositeIdDocRouter can also use this field instead of looking at the id > field. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5017) Allow sharding based on the value of a field
[ https://issues.apache.org/jira/browse/SOLR-5017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13702152#comment-13702152 ] Noble Paul commented on SOLR-5017: -- Jack ,I think, I got you partially. Yes, docs with a same value in a field ,WILL go to the same shard In case of 'implicit' router there is a 1:1 mapping between the field value and the shard In case of compositeId router there wil be a n:1 mapping between the field value and the shard > Allow sharding based on the value of a field > > > Key: SOLR-5017 > URL: https://issues.apache.org/jira/browse/SOLR-5017 > Project: Solr > Issue Type: Sub-task >Reporter: Noble Paul >Assignee: Noble Paul > > We should be able to create a collection where sharding is done based on the > value of a given field > collections can be created with shardField=fieldName, which will be persisted > in DocCollection in ZK > implicit DocRouter would look at this field instead of _shard_ field > CompositeIdDocRouter can also use this field instead of looking at the id > field. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-5017) Allow sharding based on the value of a field
[ https://issues.apache.org/jira/browse/SOLR-5017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13702141#comment-13702141 ] Jack Krupansky commented on SOLR-5017: -- Some clarification is needed: 1. Is this simply telling SolrCloud to use a different field for the key to be sharded? With no additional semantics? 2. Or, is this saying that all documents with a particular value in that field will be guaranteed to be in the same shard (e.g., so that grouping works properly)? I'm hoping it is the latter. Thanks. > Allow sharding based on the value of a field > > > Key: SOLR-5017 > URL: https://issues.apache.org/jira/browse/SOLR-5017 > Project: Solr > Issue Type: Sub-task >Reporter: Noble Paul >Assignee: Noble Paul > > We should be able to create a collection where sharding is done based on the > value of a given field > collections can be created with shardField=fieldName, which will be persisted > in DocCollection in ZK > implicit DocRouter would look at this field instead of _shard_ field > CompositeIdDocRouter can also use this field instead of looking at the id > field. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org