[jira] [Commented] (SOLR-5017) Allow sharding based on the value of a field

2013-08-13 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13738422#comment-13738422
 ] 

Shalin Shekhar Mangar commented on SOLR-5017:
-

Shard splitting doesn't support collections configured with a hash router and 
routeField. I'll put up a test and fix.

> Allow sharding based on the value of a field
> 
>
> Key: SOLR-5017
> URL: https://issues.apache.org/jira/browse/SOLR-5017
> Project: Solr
>  Issue Type: Sub-task
>Reporter: Noble Paul
>Assignee: Noble Paul
> Fix For: 4.5, 5.0
>
> Attachments: SOLR-5017.patch
>
>
> We should be able to create a collection where sharding is done based on the 
> value of a given field
> collections can be created with shardField=fieldName, which will be persisted 
> in DocCollection in ZK
> implicit DocRouter would look at this field instead of _shard_ field
> CompositeIdDocRouter can also use this field instead of looking at the id 
> field. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5017) Allow sharding based on the value of a field

2013-08-13 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13738375#comment-13738375
 ] 

Noble Paul commented on SOLR-5017:
--

This is only for SolrCloud 

deleteById/getById would expect the param \_route_ or shard.keys (deprecated) 
without which it will have to fan out a distributed request. it works without 
complaining but will be inefficient

> Allow sharding based on the value of a field
> 
>
> Key: SOLR-5017
> URL: https://issues.apache.org/jira/browse/SOLR-5017
> Project: Solr
>  Issue Type: Sub-task
>Reporter: Noble Paul
>Assignee: Noble Paul
> Fix For: 4.5, 5.0
>
> Attachments: SOLR-5017.patch
>
>
> We should be able to create a collection where sharding is done based on the 
> value of a given field
> collections can be created with shardField=fieldName, which will be persisted 
> in DocCollection in ZK
> implicit DocRouter would look at this field instead of _shard_ field
> CompositeIdDocRouter can also use this field instead of looking at the id 
> field. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5017) Allow sharding based on the value of a field

2013-08-13 Thread Jack Krupansky (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13738321#comment-13738321
 ] 

Jack Krupansky commented on SOLR-5017:
--

Is this feature intended for both traditional Solr sharding as well as 
SolrCloud?

If it is intended for SolrCloud as well, how does delete-by-id work, in the 
sense that the delete command does not include the field needed to determine 
routing?


> Allow sharding based on the value of a field
> 
>
> Key: SOLR-5017
> URL: https://issues.apache.org/jira/browse/SOLR-5017
> Project: Solr
>  Issue Type: Sub-task
>Reporter: Noble Paul
>Assignee: Noble Paul
> Fix For: 4.5, 5.0
>
> Attachments: SOLR-5017.patch
>
>
> We should be able to create a collection where sharding is done based on the 
> value of a given field
> collections can be created with shardField=fieldName, which will be persisted 
> in DocCollection in ZK
> implicit DocRouter would look at this field instead of _shard_ field
> CompositeIdDocRouter can also use this field instead of looking at the id 
> field. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5017) Allow sharding based on the value of a field

2013-08-12 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13737869#comment-13737869
 ] 

ASF subversion and git services commented on SOLR-5017:
---

Commit 1513357 from [~noble.paul] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1513357 ]

SOLR-5017 support for routeField in COmpositeId router also

> Allow sharding based on the value of a field
> 
>
> Key: SOLR-5017
> URL: https://issues.apache.org/jira/browse/SOLR-5017
> Project: Solr
>  Issue Type: Sub-task
>Reporter: Noble Paul
>Assignee: Noble Paul
> Fix For: 4.5, 5.0
>
> Attachments: SOLR-5017.patch
>
>
> We should be able to create a collection where sharding is done based on the 
> value of a given field
> collections can be created with shardField=fieldName, which will be persisted 
> in DocCollection in ZK
> implicit DocRouter would look at this field instead of _shard_ field
> CompositeIdDocRouter can also use this field instead of looking at the id 
> field. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5017) Allow sharding based on the value of a field

2013-08-12 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13737865#comment-13737865
 ] 

ASF subversion and git services commented on SOLR-5017:
---

Commit 1513356 from [~noble.paul] in branch 'dev/trunk'
[ https://svn.apache.org/r1513356 ]

SOLR-5017 support for routeField in COmpositeId router also

> Allow sharding based on the value of a field
> 
>
> Key: SOLR-5017
> URL: https://issues.apache.org/jira/browse/SOLR-5017
> Project: Solr
>  Issue Type: Sub-task
>Reporter: Noble Paul
>Assignee: Noble Paul
> Fix For: 4.5, 5.0
>
> Attachments: SOLR-5017.patch
>
>
> We should be able to create a collection where sharding is done based on the 
> value of a given field
> collections can be created with shardField=fieldName, which will be persisted 
> in DocCollection in ZK
> implicit DocRouter would look at this field instead of _shard_ field
> CompositeIdDocRouter can also use this field instead of looking at the id 
> field. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5017) Allow sharding based on the value of a field

2013-08-05 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13729367#comment-13729367
 ] 

ASF subversion and git services commented on SOLR-5017:
---

Commit 1510421 from [~noble.paul] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1510421 ]

updating CHANGES.txt regarding deprecation of shar.keys' param SOLR-5017

> Allow sharding based on the value of a field
> 
>
> Key: SOLR-5017
> URL: https://issues.apache.org/jira/browse/SOLR-5017
> Project: Solr
>  Issue Type: Sub-task
>Reporter: Noble Paul
>Assignee: Noble Paul
> Fix For: 4.5, 5.0
>
>
> We should be able to create a collection where sharding is done based on the 
> value of a given field
> collections can be created with shardField=fieldName, which will be persisted 
> in DocCollection in ZK
> implicit DocRouter would look at this field instead of _shard_ field
> CompositeIdDocRouter can also use this field instead of looking at the id 
> field. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5017) Allow sharding based on the value of a field

2013-08-05 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13729366#comment-13729366
 ] 

ASF subversion and git services commented on SOLR-5017:
---

Commit 1510420 from [~noble.paul] in branch 'dev/trunk'
[ https://svn.apache.org/r1510420 ]

updating CHANGES.txt regarding deprecation of shar.keys' param SOLR-5017

> Allow sharding based on the value of a field
> 
>
> Key: SOLR-5017
> URL: https://issues.apache.org/jira/browse/SOLR-5017
> Project: Solr
>  Issue Type: Sub-task
>Reporter: Noble Paul
>Assignee: Noble Paul
> Fix For: 4.5, 5.0
>
>
> We should be able to create a collection where sharding is done based on the 
> value of a given field
> collections can be created with shardField=fieldName, which will be persisted 
> in DocCollection in ZK
> implicit DocRouter would look at this field instead of _shard_ field
> CompositeIdDocRouter can also use this field instead of looking at the id 
> field. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5017) Allow sharding based on the value of a field

2013-07-31 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13725615#comment-13725615
 ] 

Noble Paul commented on SOLR-5017:
--


It is now possible to create a collection with an extra parameter 'routeField' 
. 'implicit' router would look into that field for routing any document.The 
value of the field will be the name of the shard where it belongs to.

If the collection is created with 'routeField' other routing params are not  
honored

This deprecates the 'shard.keys' parameter for routing queries in favor of a 
parameter called '_route_' . 'shard.keys' will continue to work for another 
release , though

 



> Allow sharding based on the value of a field
> 
>
> Key: SOLR-5017
> URL: https://issues.apache.org/jira/browse/SOLR-5017
> Project: Solr
>  Issue Type: Sub-task
>Reporter: Noble Paul
>Assignee: Noble Paul
> Fix For: 4.5, 5.0
>
>
> We should be able to create a collection where sharding is done based on the 
> value of a given field
> collections can be created with shardField=fieldName, which will be persisted 
> in DocCollection in ZK
> implicit DocRouter would look at this field instead of _shard_ field
> CompositeIdDocRouter can also use this field instead of looking at the id 
> field. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5017) Allow sharding based on the value of a field

2013-07-31 Thread Jack Krupansky (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13725597#comment-13725597
 ] 

Jack Krupansky commented on SOLR-5017:
--

It seems like there was a lot of discussion that was never resolved, and now 
the issue is marked as "fixed", with no discussion or summary of how the 
discussion points were addressed or resolved (or ignored!).

A short summary would be nice.


> Allow sharding based on the value of a field
> 
>
> Key: SOLR-5017
> URL: https://issues.apache.org/jira/browse/SOLR-5017
> Project: Solr
>  Issue Type: Sub-task
>Reporter: Noble Paul
>Assignee: Noble Paul
> Fix For: 4.5, 5.0
>
>
> We should be able to create a collection where sharding is done based on the 
> value of a given field
> collections can be created with shardField=fieldName, which will be persisted 
> in DocCollection in ZK
> implicit DocRouter would look at this field instead of _shard_ field
> CompositeIdDocRouter can also use this field instead of looking at the id 
> field. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5017) Allow sharding based on the value of a field

2013-07-31 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13725561#comment-13725561
 ] 

Noble Paul commented on SOLR-5017:
--

The issue fixes the case of 'implicit' router only . will resolve after the 
same is done for compositeId router too

> Allow sharding based on the value of a field
> 
>
> Key: SOLR-5017
> URL: https://issues.apache.org/jira/browse/SOLR-5017
> Project: Solr
>  Issue Type: Sub-task
>Reporter: Noble Paul
>Assignee: Noble Paul
>
> We should be able to create a collection where sharding is done based on the 
> value of a given field
> collections can be created with shardField=fieldName, which will be persisted 
> in DocCollection in ZK
> implicit DocRouter would look at this field instead of _shard_ field
> CompositeIdDocRouter can also use this field instead of looking at the id 
> field. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5017) Allow sharding based on the value of a field

2013-07-31 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13725553#comment-13725553
 ] 

ASF subversion and git services commented on SOLR-5017:
---

Commit 1508981 from [~noble.paul] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1508981 ]

SOLR-4221 SOLR-4808 SOLR-5006 SOLR-5017 SOLR-4222

> Allow sharding based on the value of a field
> 
>
> Key: SOLR-5017
> URL: https://issues.apache.org/jira/browse/SOLR-5017
> Project: Solr
>  Issue Type: Sub-task
>Reporter: Noble Paul
>Assignee: Noble Paul
>
> We should be able to create a collection where sharding is done based on the 
> value of a given field
> collections can be created with shardField=fieldName, which will be persisted 
> in DocCollection in ZK
> implicit DocRouter would look at this field instead of _shard_ field
> CompositeIdDocRouter can also use this field instead of looking at the id 
> field. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5017) Allow sharding based on the value of a field

2013-07-31 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13725497#comment-13725497
 ] 

ASF subversion and git services commented on SOLR-5017:
---

Commit 1508968 from [~noble.paul] in branch 'dev/trunk'
[ https://svn.apache.org/r1508968 ]

SOLR-4221 SOLR-4808 SOLR-5006 SOLR-5017 SOLR-4222

> Allow sharding based on the value of a field
> 
>
> Key: SOLR-5017
> URL: https://issues.apache.org/jira/browse/SOLR-5017
> Project: Solr
>  Issue Type: Sub-task
>Reporter: Noble Paul
>Assignee: Noble Paul
>
> We should be able to create a collection where sharding is done based on the 
> value of a given field
> collections can be created with shardField=fieldName, which will be persisted 
> in DocCollection in ZK
> implicit DocRouter would look at this field instead of _shard_ field
> CompositeIdDocRouter can also use this field instead of looking at the id 
> field. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5017) Allow sharding based on the value of a field

2013-07-10 Thread Jack Krupansky (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13704718#comment-13704718
 ] 

Jack Krupansky commented on SOLR-5017:
--

bq.  ImplicitDocRouter 

I started referring to this as "manual routing", meaning that Solr cannot 
automatically figure out which shard a document is in unless the user 
manually/explicitly specifies the shard.

Overall, I would say that we have this menu of routing techniques:

1. Manual URL, specifying the shard URL or directing the request to the shard 
URL.
2. Manual shard ID, specifying the shard ID/name as a parameter. SolrJ or the 
receiving node can look up the shard URL in clusterstate.
3. Fully automatic, hashing the full, raw ID key value.
4. Directed automatic or key-directed automatic, hashing the "!" prefix of the 
composite key value. (I called this "explicit routing" at one point.)
5. Field-directed automatic, the proposal for using a non-ID field's value for 
the surrogate key to hash.

As far as the atomic update issue for field-directed routing, there are three 
choices:

1. Update request includes the specified alternative (non-ID) routing field.
2. If not present, a "shard" parameter would be required, specifying either the 
shard ID or the surrogate key value to be hashed.
3. If neither is present, an error.

That still leaves the update issue of changing the field-directed key value. 
This is not just an atomic update issue - replacing the full document also has 
this problem, when the specified routing field value changes, which may mean 
that the updated document now belongs in another shard.




> Allow sharding based on the value of a field
> 
>
> Key: SOLR-5017
> URL: https://issues.apache.org/jira/browse/SOLR-5017
> Project: Solr
>  Issue Type: Sub-task
>Reporter: Noble Paul
>Assignee: Noble Paul
>
> We should be able to create a collection where sharding is done based on the 
> value of a given field
> collections can be created with shardField=fieldName, which will be persisted 
> in DocCollection in ZK
> implicit DocRouter would look at this field instead of _shard_ field
> CompositeIdDocRouter can also use this field instead of looking at the id 
> field. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5017) Allow sharding based on the value of a field

2013-07-10 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13704614#comment-13704614
 ] 

Yonik Seeley commented on SOLR-5017:


bq. 2) An ImplicitDocRouter (or is it ExplicitRouter)

It's implicit if the target shard is implicitly defined by what shard received 
the update.
It's explicit if you give it an explicit value (which makes the name "implicit" 
kind of not-so-good at that point).  We could change the name of that too if we 
want (and make it so that "implicit" still works as an alias for back compat).



> Allow sharding based on the value of a field
> 
>
> Key: SOLR-5017
> URL: https://issues.apache.org/jira/browse/SOLR-5017
> Project: Solr
>  Issue Type: Sub-task
>Reporter: Noble Paul
>Assignee: Noble Paul
>
> We should be able to create a collection where sharding is done based on the 
> value of a given field
> collections can be created with shardField=fieldName, which will be persisted 
> in DocCollection in ZK
> implicit DocRouter would look at this field instead of _shard_ field
> CompositeIdDocRouter can also use this field instead of looking at the id 
> field. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5017) Allow sharding based on the value of a field

2013-07-10 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13704608#comment-13704608
 ] 

Noble Paul commented on SOLR-5017:
--

Speaking of the the best option 

my 2 cents

2 routers

1) A HashDocRouter
2) An ImplicitDocRouter (or is it ExplicitRouter)

Both honors the shardField or (routeField) param . one uses the value verbatim 
whereas the other uses the hash of the field value

HashDocRouter honors the special id format with "!" .

 \_route_ param can be used and will be honored by all routers always in 
add/update/query/getbyid et al. HashDocRouter uses the hash of the value 
whereas ImplicitDocROuter uses the value verbatim

> Allow sharding based on the value of a field
> 
>
> Key: SOLR-5017
> URL: https://issues.apache.org/jira/browse/SOLR-5017
> Project: Solr
>  Issue Type: Sub-task
>Reporter: Noble Paul
>Assignee: Noble Paul
>
> We should be able to create a collection where sharding is done based on the 
> value of a given field
> collections can be created with shardField=fieldName, which will be persisted 
> in DocCollection in ZK
> implicit DocRouter would look at this field instead of _shard_ field
> CompositeIdDocRouter can also use this field instead of looking at the id 
> field. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5017) Allow sharding based on the value of a field

2013-07-10 Thread Jack Krupansky (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13704598#comment-13704598
 ] 

Jack Krupansky commented on SOLR-5017:
--

bq. If the id no longer contains enough information to tell what shard it's 
on...

Great point. Automatic routing needs to be able to work when presented with 
just the ID field. An atomic update is a great example - the shard field may 
not be available on the client.

Better to just forever say that automatic routing needs to be based solely on 
the ID key value, and that if the app needs to use the value of another field 
for routing, they absolutely do need to use a "composite key" with the routing 
key prepended to the nominal key value.

OTOH, maybe they might want to use some other subset of the key value for 
router, such as a product category that is a part of a SKU used as the ID key. 
I think the idea there is that this would be custom sharding that uses most of 
the logic of CompositeID routing, but just different logic for how to extract 
the routing key from the full ID key value.

Manual or custom routing is another story. There, the user can use whatever 
contrived "rules" they want.


> Allow sharding based on the value of a field
> 
>
> Key: SOLR-5017
> URL: https://issues.apache.org/jira/browse/SOLR-5017
> Project: Solr
>  Issue Type: Sub-task
>Reporter: Noble Paul
>Assignee: Noble Paul
>
> We should be able to create a collection where sharding is done based on the 
> value of a given field
> collections can be created with shardField=fieldName, which will be persisted 
> in DocCollection in ZK
> implicit DocRouter would look at this field instead of _shard_ field
> CompositeIdDocRouter can also use this field instead of looking at the id 
> field. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5017) Allow sharding based on the value of a field

2013-07-10 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13704595#comment-13704595
 ] 

Yonik Seeley commented on SOLR-5017:


bq. > Perhaps that should be a different router... compositeField rather than 
compositeId.
bq. Too many routers can be confusing to users. 

Heh - my favorite argument.  "confusing to users" can be trotted out in any 
context ;-)
Too many options can be just as confusing... 3 routers with 5 options each vs 5 
routers with 3 or whatever.  Let's talk about the *best* option.  If we have a 
good default and good documentation, confusion shouldn't enter the equation.

As far as compositeId router goes, I'm not sure I care if we create a new 
compositeField router or if we add more parameters / functionality to 
compositeId.  Giving the exact same \_shard\_ parameter should give the exact 
same hash code though - it shouldn't just be the first part of a composite id.

> Allow sharding based on the value of a field
> 
>
> Key: SOLR-5017
> URL: https://issues.apache.org/jira/browse/SOLR-5017
> Project: Solr
>  Issue Type: Sub-task
>Reporter: Noble Paul
>Assignee: Noble Paul
>
> We should be able to create a collection where sharding is done based on the 
> value of a given field
> collections can be created with shardField=fieldName, which will be persisted 
> in DocCollection in ZK
> implicit DocRouter would look at this field instead of _shard_ field
> CompositeIdDocRouter can also use this field instead of looking at the id 
> field. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5017) Allow sharding based on the value of a field

2013-07-10 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13704592#comment-13704592
 ] 

Noble Paul commented on SOLR-5017:
--

bq.Having to specify extra information is what seems odd to me, and greatly 
complicates clients.

We already pass extra info if the lookup is not by id. lookup by id is a small 
feature for a solr.

> Allow sharding based on the value of a field
> 
>
> Key: SOLR-5017
> URL: https://issues.apache.org/jira/browse/SOLR-5017
> Project: Solr
>  Issue Type: Sub-task
>Reporter: Noble Paul
>Assignee: Noble Paul
>
> We should be able to create a collection where sharding is done based on the 
> value of a given field
> collections can be created with shardField=fieldName, which will be persisted 
> in DocCollection in ZK
> implicit DocRouter would look at this field instead of _shard_ field
> CompositeIdDocRouter can also use this field instead of looking at the id 
> field. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5017) Allow sharding based on the value of a field

2013-07-10 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13704586#comment-13704586
 ] 

Noble Paul commented on SOLR-5017:
--

bq.I think we should use the same parameter name for query requests too (i.e. 
deprecate "shard.keys")

Tha's it. I just wanted one parameter for routing either \_shard_ or something 
else . lets use \_route_ for all routers and deprecate shard.keys .



> Allow sharding based on the value of a field
> 
>
> Key: SOLR-5017
> URL: https://issues.apache.org/jira/browse/SOLR-5017
> Project: Solr
>  Issue Type: Sub-task
>Reporter: Noble Paul
>Assignee: Noble Paul
>
> We should be able to create a collection where sharding is done based on the 
> value of a given field
> collections can be created with shardField=fieldName, which will be persisted 
> in DocCollection in ZK
> implicit DocRouter would look at this field instead of _shard_ field
> CompositeIdDocRouter can also use this field instead of looking at the id 
> field. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5017) Allow sharding based on the value of a field

2013-07-10 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13704561#comment-13704561
 ] 

Yonik Seeley commented on SOLR-5017:


bq. It is very confusing to have these names behaving differently in different 
routers.

Not sure I understand... we should definitely have the same parameters behaving 
in the same way across all the routers.
\_shard\_ should work across all routers.  I understand the naming issue 
though... (the fact that _shard_ is just the input to the router, not the 
actual shard name unless you're using the implicit router).  \_shard\_ hasn't 
even really been documented yet I don't think... it's possible we could change 
it to \_routing\_ or \_route\_ 

bq. Should we rather not use the other parameter \'shard.keys across router 
names , query and update requests .

I think we should use the same parameter name for query requests too (i.e. 
deprecate "shard.keys")


> Allow sharding based on the value of a field
> 
>
> Key: SOLR-5017
> URL: https://issues.apache.org/jira/browse/SOLR-5017
> Project: Solr
>  Issue Type: Sub-task
>Reporter: Noble Paul
>Assignee: Noble Paul
>
> We should be able to create a collection where sharding is done based on the 
> value of a given field
> collections can be created with shardField=fieldName, which will be persisted 
> in DocCollection in ZK
> implicit DocRouter would look at this field instead of _shard_ field
> CompositeIdDocRouter can also use this field instead of looking at the id 
> field. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5017) Allow sharding based on the value of a field

2013-07-10 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13704548#comment-13704548
 ] 

Yonik Seeley commented on SOLR-5017:


bq. Having to manually (and forever) muck up the ID field values for routing 
always seemed rather odd to me.

Having to specify extra information is what seems odd to me, and greatly 
complicates clients.
Say I have a basic client that wants to do a simple get by id, or a simple 
delete by id.  If the id no longer contains enough information to tell what 
shard it's on, we need to start broadcasting gets and deletes or something.

> Allow sharding based on the value of a field
> 
>
> Key: SOLR-5017
> URL: https://issues.apache.org/jira/browse/SOLR-5017
> Project: Solr
>  Issue Type: Sub-task
>Reporter: Noble Paul
>Assignee: Noble Paul
>
> We should be able to create a collection where sharding is done based on the 
> value of a given field
> collections can be created with shardField=fieldName, which will be persisted 
> in DocCollection in ZK
> implicit DocRouter would look at this field instead of _shard_ field
> CompositeIdDocRouter can also use this field instead of looking at the id 
> field. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5017) Allow sharding based on the value of a field

2013-07-10 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13704547#comment-13704547
 ] 

Noble Paul commented on SOLR-5017:
--

bq.Do you have such a system? .

Yes. I had . The entire Aol mail system already has billions of documents where 
id is immutable and referenced in code. While I was there I hacked solr to a 
field based sharding scheme. A lot of users will not have that expertise or 
patience

bq.Don't really know, I've been meaning to dive into that patch but haven't.
IIRC ,  SolJ consults the DocRouter to identify the target slice/leader .If 
future patches need it they too should.

bq.But I fear at this point that having two ways of routing things around

We already have multiple ways of routing things after SOLR-4221 is in place 
(next release will have it . Custom Sharding does not have a 'mangled id' 
concept as of now. It is not going to impact anyone who is already using the 
current scheme with compositeId. You will need to create your cluster 
explicitly with that option (which will be new users) . We will solve any 
problems as we go along

bq.Premature optimization?

This is not optimization. I'm just trying to be intuitive and user-friendly . 
AFAIK Almost all nosql systems do grouping on the basis of some field value .

bq.what happens when a document is updated and the value of this field changes?

Good question. It should be dealt in exactly the same way 'id' updates are 
handled today 

> Allow sharding based on the value of a field
> 
>
> Key: SOLR-5017
> URL: https://issues.apache.org/jira/browse/SOLR-5017
> Project: Solr
>  Issue Type: Sub-task
>Reporter: Noble Paul
>Assignee: Noble Paul
>
> We should be able to create a collection where sharding is done based on the 
> value of a given field
> collections can be created with shardField=fieldName, which will be persisted 
> in DocCollection in ZK
> implicit DocRouter would look at this field instead of _shard_ field
> CompositeIdDocRouter can also use this field instead of looking at the id 
> field. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5017) Allow sharding based on the value of a field

2013-07-10 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13704536#comment-13704536
 ] 

Yonik Seeley commented on SOLR-5017:


bq. What if I to have a clean 'id' value which is devoid of extra information? 
Should I do id.substring(id.indexOf("!") everytime I use it elsewhere ?

Why would you have to do that?  If "!" appears in the ID field by accident 
sometimes, everything still works as expected with the compositeId router - 
that's why it's the default.

> Allow sharding based on the value of a field
> 
>
> Key: SOLR-5017
> URL: https://issues.apache.org/jira/browse/SOLR-5017
> Project: Solr
>  Issue Type: Sub-task
>Reporter: Noble Paul
>Assignee: Noble Paul
>
> We should be able to create a collection where sharding is done based on the 
> value of a given field
> collections can be created with shardField=fieldName, which will be persisted 
> in DocCollection in ZK
> implicit DocRouter would look at this field instead of _shard_ field
> CompositeIdDocRouter can also use this field instead of looking at the id 
> field. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5017) Allow sharding based on the value of a field

2013-07-10 Thread Erick Erickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13704533#comment-13704533
 ] 

Erick Erickson commented on SOLR-5017:
--

bq: If I have a already working system where ids cannot be changed, I have no 
option with the current scheme of things .

_Do_ you have such a system? Theoretically I agree. But it also seems like this 
change has enough edge cases that it might be better to wait and see whether 
there's enough pressure to move this forward before trying to anticipate 
problems. Premature optimization?

bq: If your code is using that API then your code should continue to work 
right...

Don't really know, I've been meaning to dive into that patch but haven't. It's 
on the SolrJ side, mostly I'm using it as an example of a place things can get 
out of synch. I'm sure there are others.

bq: What if I to have a clean 'id' value which is devoid of extra information? 
Should I do id.substring(id.indexOf("!") everytime I use it elsewhere ?

Yeah, that's a pain. But perhaps not as much as trying to maintain two schemes 
to route documents and deal with the issues that are sure to come up. Frankly I 
don't have a firm sense of which is better/worse, my antenna are just quivering 
based on introducing a feature that'll have repercussions before there's a 
demonstrated need. I've gotten myself into trouble too often doing that...

bq: what happens when a document is updated and the value of this field changes?

This is exactly what I'm talking about, I'm afraid the edge cases will go on 
forever (or nearly). An N+1 kind of thing. 


All that said, I'm not totally against the idea. In fact I kind of wish a 
separate "routing field" was the way it was implemented in the first place. But 
did I think to suggest it when it first started to be implemented? Nooo.

But I fear at this point that having two ways of routing things around without 
a compelling _existing_ use case will generate a lot of work, lots of ongoing 
maintenance and the effort could well be spent elsewhere in the near term.

But since I'm not volunteering to do the work, I really don't have all that 
much to say.



> Allow sharding based on the value of a field
> 
>
> Key: SOLR-5017
> URL: https://issues.apache.org/jira/browse/SOLR-5017
> Project: Solr
>  Issue Type: Sub-task
>Reporter: Noble Paul
>Assignee: Noble Paul
>
> We should be able to create a collection where sharding is done based on the 
> value of a given field
> collections can be created with shardField=fieldName, which will be persisted 
> in DocCollection in ZK
> implicit DocRouter would look at this field instead of _shard_ field
> CompositeIdDocRouter can also use this field instead of looking at the id 
> field. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5017) Allow sharding based on the value of a field

2013-07-10 Thread Jack Krupansky (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13704506#comment-13704506
 ] 

Jack Krupansky commented on SOLR-5017:
--

bq. there's no case made for why this is a better thing than using the current 
! syntax

Logically, I think it makes perfect sense to be able to declare what field 
should be used for "grouping" of documents, and that some apps want more of a 
functional grouping (e.g., by department or product category.) Having to 
manually (and forever) muck up the ID field values for routing always seemed 
rather odd to me. Maybe the latter has some utility on its own, but the former 
seems more sensible to me.

And, then there is the issue of how to change the shard of an existing document 
that was, in terms I use, "explicitly routed", using the "!" notation. I mean, 
if the ID of that document is referenced in other documents, all of those other 
documents would need to be manually updated as well. Before the introduction of 
the "!" notation, key values were completely application controlled, but with 
"!", suddenly Solr interjects itself into the ID generation process. Some 
day... even Data Import Handler users are going to start flooding the Solr-user 
email list with questions about how to set and change routing and why key 
values containing "!" seem to be causing SolrCloud to be distributing documents 
to shards in an unexpected manner (because they didn't know about the "!" 
notation.)

> Allow sharding based on the value of a field
> 
>
> Key: SOLR-5017
> URL: https://issues.apache.org/jira/browse/SOLR-5017
> Project: Solr
>  Issue Type: Sub-task
>Reporter: Noble Paul
>Assignee: Noble Paul
>
> We should be able to create a collection where sharding is done based on the 
> value of a given field
> collections can be created with shardField=fieldName, which will be persisted 
> in DocCollection in ZK
> implicit DocRouter would look at this field instead of _shard_ field
> CompositeIdDocRouter can also use this field instead of looking at the id 
> field. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5017) Allow sharding based on the value of a field

2013-07-10 Thread Jack Krupansky (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13704493#comment-13704493
 ] 

Jack Krupansky commented on SOLR-5017:
--

Hmmm... what happens when a document is updated and the value of this field 
changes? The update request would need to go to both the "new" shard to add the 
document, and the "old" shard to delete it, right?

And for atomic update when the shard field is updated to a value that hashes to 
a different shard? The existing field values need to be read from the "old" 
shard and then all values written to the new shard?



> Allow sharding based on the value of a field
> 
>
> Key: SOLR-5017
> URL: https://issues.apache.org/jira/browse/SOLR-5017
> Project: Solr
>  Issue Type: Sub-task
>Reporter: Noble Paul
>Assignee: Noble Paul
>
> We should be able to create a collection where sharding is done based on the 
> value of a given field
> collections can be created with shardField=fieldName, which will be persisted 
> in DocCollection in ZK
> implicit DocRouter would look at this field instead of _shard_ field
> CompositeIdDocRouter can also use this field instead of looking at the id 
> field. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5017) Allow sharding based on the value of a field

2013-07-10 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13704491#comment-13704491
 ] 

Noble Paul commented on SOLR-5017:
--

bq.What can be accomplished by this that cannot be accomplished with the 
current syntax?

* If I have a already working system where ids cannot be changed, I have no 
option with the current scheme of things . 
* What if I to have a clean 'id' value which is devoid of extra information?  
Should I do id.substring(id.indexOf("!") everytime I use it elsewhere ?

bq.One place where it'll be easy to get wrong


AFAIK everyone relies on the DocRouter to identify the right shard . If your 
code is using that API then your code should continue to work right



> Allow sharding based on the value of a field
> 
>
> Key: SOLR-5017
> URL: https://issues.apache.org/jira/browse/SOLR-5017
> Project: Solr
>  Issue Type: Sub-task
>Reporter: Noble Paul
>Assignee: Noble Paul
>
> We should be able to create a collection where sharding is done based on the 
> value of a given field
> collections can be created with shardField=fieldName, which will be persisted 
> in DocCollection in ZK
> implicit DocRouter would look at this field instead of _shard_ field
> CompositeIdDocRouter can also use this field instead of looking at the id 
> field. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5017) Allow sharding based on the value of a field

2013-07-10 Thread Erick Erickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13704485#comment-13704485
 ] 

Erick Erickson commented on SOLR-5017:
--

What can be accomplished by this that cannot be accomplished with the current 
syntax?

Weighing in late, but scanning the comments, there's no case made for why this 
is a better thing than using the current ! syntax. From what I can see, 
simplistically it looks like putting what's on the left of the ! in its own 
field (not a nuanced statement). 

And I'm neutral-to-negative on it without a compelling use-case that couldn't 
be handled by the current syntax, mostly from the
perspective that I'd rather see "one true way" of accomplishing something than 
two that can get out of synch. And
they will. I can imagine getting shard splitting, routing and all that stuff 
right in one but not the other.

One place where it'll be easy to get wrong: Joel is working on routing from the 
client so updates go to the right leader. We'll
have to put this logic in that code too.

I'm not sure the functionality is worth the complication, but maybe that's just 
because routing gives me a headache.

All of the complexifications I imagine can be addressed, but is it worth the 
effort? Without a compelling use-case for why I don't think so.

FWIW,
Erick



> Allow sharding based on the value of a field
> 
>
> Key: SOLR-5017
> URL: https://issues.apache.org/jira/browse/SOLR-5017
> Project: Solr
>  Issue Type: Sub-task
>Reporter: Noble Paul
>Assignee: Noble Paul
>
> We should be able to create a collection where sharding is done based on the 
> value of a given field
> collections can be created with shardField=fieldName, which will be persisted 
> in DocCollection in ZK
> implicit DocRouter would look at this field instead of _shard_ field
> CompositeIdDocRouter can also use this field instead of looking at the id 
> field. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5017) Allow sharding based on the value of a field

2013-07-09 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13704223#comment-13704223
 ] 

Noble Paul commented on SOLR-5017:
--

bq.I think it should work simpler... shard is used as the whole value to hash 
on for any hash based router.

Should the field based sharding be any less powerful than compositeId? Or do we 
want to have configure multiple fields like 
shardField=primaryShardFIeld,secondaryShardField instead of separating the 
values with a '/'


bq.Perhaps that should be a different router... compositeField rather than 
compositeId.

Too many routers can be confusing to users. Essentially it is a hash router. 
The only difference is where the value is obtained for hashing. It could be 
from an 'id' ( which is the default) or it can be from a separate field. We 
probably should rename the CompositeIdRouter to HashRouter instead of having 
multiple routers doing slightly different things. In reality , it is not a 
CompositeFieldRouter, it is just a FieldHashRouter

bq.For the implicit router. For a hash based router, it should be the value 
that is hashed to then lookup the shard based on ranges.

I understand that. I'm worried about the name. Should we rather not use the 
other parameter \'shard.keys across router names , query and update requests . 
It is very confusing to have these names behaving differently in different 
routers. 

I'm all for changing the param from \_shard_ to \'shard.keys' and keeping it 
consistent between all routers




> Allow sharding based on the value of a field
> 
>
> Key: SOLR-5017
> URL: https://issues.apache.org/jira/browse/SOLR-5017
> Project: Solr
>  Issue Type: Sub-task
>Reporter: Noble Paul
>Assignee: Noble Paul
>
> We should be able to create a collection where sharding is done based on the 
> value of a given field
> collections can be created with shardField=fieldName, which will be persisted 
> in DocCollection in ZK
> implicit DocRouter would look at this field instead of _shard_ field
> CompositeIdDocRouter can also use this field instead of looking at the id 
> field. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5017) Allow sharding based on the value of a field

2013-07-09 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13703653#comment-13703653
 ] 

Yonik Seeley commented on SOLR-5017:


bq. the _shard_ parameter is the actual name of the shard.

For the implicit router.  For a hash based router, it should be the value that 
is hashed to then lookup the shard based on ranges.

bq. In case of compositeId router , I would like to read the part before the 
(!) to be read from the 'shardField'.

I think it should work simpler... _shard_ is used as the whole value to hash on 
for any hash based router.
It's simple - if you want to have doc B have the exact same hash as doc A, then 
you give _shard_=A when adding doc B.

bq. I would like to read the part before the (!) to be read from the 
'shardField'.

Perhaps that should be a different router... compositeField rather than 
compositeId.


> Allow sharding based on the value of a field
> 
>
> Key: SOLR-5017
> URL: https://issues.apache.org/jira/browse/SOLR-5017
> Project: Solr
>  Issue Type: Sub-task
>Reporter: Noble Paul
>Assignee: Noble Paul
>
> We should be able to create a collection where sharding is done based on the 
> value of a given field
> collections can be created with shardField=fieldName, which will be persisted 
> in DocCollection in ZK
> implicit DocRouter would look at this field instead of _shard_ field
> CompositeIdDocRouter can also use this field instead of looking at the id 
> field. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5017) Allow sharding based on the value of a field

2013-07-09 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13703246#comment-13703246
 ] 

Noble Paul commented on SOLR-5017:
--

if a collection is created with the shardField value, it is a required param 
for all docs.If the field is null the document addition fails. No more lookup 
for "!" anymore. 

> Allow sharding based on the value of a field
> 
>
> Key: SOLR-5017
> URL: https://issues.apache.org/jira/browse/SOLR-5017
> Project: Solr
>  Issue Type: Sub-task
>Reporter: Noble Paul
>Assignee: Noble Paul
>
> We should be able to create a collection where sharding is done based on the 
> value of a given field
> collections can be created with shardField=fieldName, which will be persisted 
> in DocCollection in ZK
> implicit DocRouter would look at this field instead of _shard_ field
> CompositeIdDocRouter can also use this field instead of looking at the id 
> field. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5017) Allow sharding based on the value of a field

2013-07-09 Thread Jack Krupansky (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13703236#comment-13703236
 ] 

Jack Krupansky commented on SOLR-5017:
--

bq. Not sure what you mean by "explicit routing"

I mean where the user has placed a prefix and "!" in front of a key value. 
Granted, it isn't explicitly stating the shard, and is really simply a 
"surrogate" key value to use for sharding. Is there better terminology for the 
fact that they used the "!" notation?

Question for Noble: If a shard field is specified and there is a "!" on a 
document key, which takes precedence?


> Allow sharding based on the value of a field
> 
>
> Key: SOLR-5017
> URL: https://issues.apache.org/jira/browse/SOLR-5017
> Project: Solr
>  Issue Type: Sub-task
>Reporter: Noble Paul
>Assignee: Noble Paul
>
> We should be able to create a collection where sharding is done based on the 
> value of a given field
> collections can be created with shardField=fieldName, which will be persisted 
> in DocCollection in ZK
> implicit DocRouter would look at this field instead of _shard_ field
> CompositeIdDocRouter can also use this field instead of looking at the id 
> field. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5017) Allow sharding based on the value of a field

2013-07-09 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13703039#comment-13703039
 ] 

Noble Paul commented on SOLR-5017:
--

bq.I could see by default, the compositeId router also paying attention to the 
_shard_ parameter

the _shard_ parameter is the actual name of the shard. In case of compositeId 
router , the client is agnostic of the shard name and all that it cares about 
is shard.keys. What I mean to say is, the name _shard_ can be a bit confusing

As of now we don't have a plan on how to do shard splitting for 'implicit' 
router. Let's keep it as  TBD

In case of compositeId router , I would like to read the part before the (!) to 
be read from the 'shardField'. The semantics will be exactly same as it is now. 
Reading the value from a request parameter would mean we will need to persist 
it along with the document in some field . 

> Allow sharding based on the value of a field
> 
>
> Key: SOLR-5017
> URL: https://issues.apache.org/jira/browse/SOLR-5017
> Project: Solr
>  Issue Type: Sub-task
>Reporter: Noble Paul
>Assignee: Noble Paul
>
> We should be able to create a collection where sharding is done based on the 
> value of a given field
> collections can be created with shardField=fieldName, which will be persisted 
> in DocCollection in ZK
> implicit DocRouter would look at this field instead of _shard_ field
> CompositeIdDocRouter can also use this field instead of looking at the id 
> field. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5017) Allow sharding based on the value of a field

2013-07-08 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13702659#comment-13702659
 ] 

Yonik Seeley commented on SOLR-5017:


bq. CompositeIdDocRouter can also use this field instead of looking at the id 
field.

Agree - I could see by default, the compositeId router also paying attention to 
the \_shard\_ parameter (as the implicit router does).
Even if the implicit router is configured to pay attention to a field other 
than \_shard\_ in the document, it should still use _shard_ when looking at 
query parameters.

This has some downsides thought too - related to splits and how to calculate 
the has (store the _shard_ param when explicitly specified as a column?  store 
the calculated hash as a column?)

bq. Does this proposal eliminate the need to do explicit routing in the key 
values?

Not sure what you mean by "explicit routing" but if you mean the compositeId 
stuff, no.  That has a lot of benefits and will remain the default.

> Allow sharding based on the value of a field
> 
>
> Key: SOLR-5017
> URL: https://issues.apache.org/jira/browse/SOLR-5017
> Project: Solr
>  Issue Type: Sub-task
>Reporter: Noble Paul
>Assignee: Noble Paul
>
> We should be able to create a collection where sharding is done based on the 
> value of a given field
> collections can be created with shardField=fieldName, which will be persisted 
> in DocCollection in ZK
> implicit DocRouter would look at this field instead of _shard_ field
> CompositeIdDocRouter can also use this field instead of looking at the id 
> field. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5017) Allow sharding based on the value of a field

2013-07-08 Thread Jack Krupansky (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13702536#comment-13702536
 ] 

Jack Krupansky commented on SOLR-5017:
--

Will SplitShard preserve the grouping by field value? I imagine it would, but...

In other words, if an app uses a field to preserve grouping of similar 
documents on the same shard, SplitShard should preserve that grouping on a 
split, right?

As long as the SplitShard code knows that it is supposed to used the specified 
alternative sharding field, things should be okay.


> Allow sharding based on the value of a field
> 
>
> Key: SOLR-5017
> URL: https://issues.apache.org/jira/browse/SOLR-5017
> Project: Solr
>  Issue Type: Sub-task
>Reporter: Noble Paul
>Assignee: Noble Paul
>
> We should be able to create a collection where sharding is done based on the 
> value of a given field
> collections can be created with shardField=fieldName, which will be persisted 
> in DocCollection in ZK
> implicit DocRouter would look at this field instead of _shard_ field
> CompositeIdDocRouter can also use this field instead of looking at the id 
> field. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5017) Allow sharding based on the value of a field

2013-07-08 Thread Jack Krupansky (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13702197#comment-13702197
 ] 

Jack Krupansky commented on SOLR-5017:
--

Does this proposal eliminate the need to do explicit routing in the key values?

So, instead of having to say "my-value!key-value" for the key value when some 
other field already has "my-value" in it, I can just leave my key as 
"key-value" and with this proposal Solr would read that other field to get 
"my-value" and use it for sharding?


> Allow sharding based on the value of a field
> 
>
> Key: SOLR-5017
> URL: https://issues.apache.org/jira/browse/SOLR-5017
> Project: Solr
>  Issue Type: Sub-task
>Reporter: Noble Paul
>Assignee: Noble Paul
>
> We should be able to create a collection where sharding is done based on the 
> value of a given field
> collections can be created with shardField=fieldName, which will be persisted 
> in DocCollection in ZK
> implicit DocRouter would look at this field instead of _shard_ field
> CompositeIdDocRouter can also use this field instead of looking at the id 
> field. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5017) Allow sharding based on the value of a field

2013-07-08 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13702152#comment-13702152
 ] 

Noble Paul commented on SOLR-5017:
--

Jack ,I think, I got you partially. 

Yes, docs with a same value in a field ,WILL go to the same shard

In case of 'implicit' router there is a 1:1 mapping between the field value and 
the shard

In case of compositeId router there wil be a n:1 mapping between the field 
value and the shard

> Allow sharding based on the value of a field
> 
>
> Key: SOLR-5017
> URL: https://issues.apache.org/jira/browse/SOLR-5017
> Project: Solr
>  Issue Type: Sub-task
>Reporter: Noble Paul
>Assignee: Noble Paul
>
> We should be able to create a collection where sharding is done based on the 
> value of a given field
> collections can be created with shardField=fieldName, which will be persisted 
> in DocCollection in ZK
> implicit DocRouter would look at this field instead of _shard_ field
> CompositeIdDocRouter can also use this field instead of looking at the id 
> field. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-5017) Allow sharding based on the value of a field

2013-07-08 Thread Jack Krupansky (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-5017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13702141#comment-13702141
 ] 

Jack Krupansky commented on SOLR-5017:
--

Some clarification is needed:

1. Is this simply telling SolrCloud to use a different field for the key to be 
sharded? With no additional semantics?

2. Or, is this saying that all documents with a particular value in that field 
will be guaranteed to be in the same shard (e.g., so that grouping works 
properly)?

I'm hoping it is the latter.

Thanks.


> Allow sharding based on the value of a field
> 
>
> Key: SOLR-5017
> URL: https://issues.apache.org/jira/browse/SOLR-5017
> Project: Solr
>  Issue Type: Sub-task
>Reporter: Noble Paul
>Assignee: Noble Paul
>
> We should be able to create a collection where sharding is done based on the 
> value of a given field
> collections can be created with shardField=fieldName, which will be persisted 
> in DocCollection in ZK
> implicit DocRouter would look at this field instead of _shard_ field
> CompositeIdDocRouter can also use this field instead of looking at the id 
> field. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org