[ 
https://issues.apache.org/jira/browse/CASSANDRA-9231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14534376#comment-14534376
 ] 

Sylvain Lebresne commented on CASSANDRA-9231:
---------------------------------------------

bq. My point is that from a data modelling perspective, being able to define 
the values on which you distribute is the concept you care about.

Then we agree. But my problem is that it is *exactly* what the partition key is 
about, it's his purpose, how we explain and define it. Changing that purpose 
now is confusing (and if that's not the purpose of the partition key anymore, 
I'm not even sure what purpose it actually has, how you define it simply).

Which is why I'm convinced we'll create less confusion and invalidate less 
documentation/existing assumptions by simply adding an option to define the 
token function. In that case, the fundamental concept stay the same and the 
partition key still define the values used for distribution. But the exact way 
they are used, which already depend on the partitioner today, gain some more 
flexibility as it can be user defined. The fact that you can write functions 
that use only some of those value becomes an implementation details, the 
"concept" of the partition key is preserved. I don't think changing the meaning 
of fundamental concepts, nor multiplying them, is a good idea.

Besides, that's really only one of my point. We have had many time people 
wanting to do fancy things with the partitioner but so far the fact that the 
partitioner is cluster wide, and that making it per-table is pretty annoying 
has limited what can be done. The use case of the description is really just 
one special case. Assuming that it's the only smart thing we can do when it 
comes from computing the token from the partition key feels a bit short sided 
to me. It's an advanced feature for power users anyway, so lets at least make 
it powerful.


> Support Routing Key as part of Partition Key
> --------------------------------------------
>
>                 Key: CASSANDRA-9231
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9231
>             Project: Cassandra
>          Issue Type: Wish
>          Components: Core
>            Reporter: Matthias Broecheler
>             Fix For: 3.x
>
>
> Provide support for sub-dividing the partition key into a routing key and a 
> non-routing key component. Currently, all columns that make up the partition 
> key of the primary key are also routing keys, i.e. they determine which nodes 
> store the data. This proposal would give the data modeler the ability to 
> designate only a subset of the columns that comprise the partition key to be 
> routing keys. The non-routing key columns of the partition key identify the 
> partition but are not used to determine where to store the data.
> Consider the following example table definition:
> CREATE TABLE foo (
>   a int,
>   b int,
>   c int,
>   d int,
>   PRIMARY KEY  (([a], b), c ) );
> (a,b) is the partition key, c is the clustering key, and d is just a column. 
> In addition, the square brackets identify the routing key as column a. This 
> means that only the value of column a is used to determine the node for data 
> placement (i.e. only the value of column a is murmur3 hashed to compute the 
> token). In addition, column b is needed to identify the partition but does 
> not influence the placement.
> This has the benefit that all rows with the same routing key (but potentially 
> different non-routing key columns of the partition key) are stored on the 
> same node and that knowledge of such co-locality can be exploited by 
> applications build on top of Cassandra.
> Currently, the only way to achieve co-locality is within a partition. 
> However, this approach has the limitations that: a) there are theoretical and 
> (more importantly) practical limitations on the size of a partition and b) 
> rows within a partition are ordered and an index is build to exploit such 
> ordering. For large partitions that overhead is significant if ordering isn't 
> needed.
> In other words, routing keys afford a simple means to achieve scalable 
> node-level co-locality without ordering while clustering keys afford 
> page-level co-locality with ordering. As such, they address different 
> co-locality needs giving the data modeler the flexibility to choose what is 
> needed for their application.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to