[ https://issues.apache.org/jira/browse/CASSANDRA-7850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14116170#comment-14116170 ]
Drew Kutcharian edited comment on CASSANDRA-7850 at 8/30/14 2:00 AM: --------------------------------------------------------------------- Yes, but then I might end up with very wide _thrift_ rows. Basically what I want is {{PRIMARY KEY ((block_id, breed_bucket), breed)}} where records with same block_id get stored on the same node *regardless* of the value of breed_bucket. But I don't want to use {{PRIMARY KEY (block_id, breed_bucket, breed)}} since in that case all the records for a block_id would end up in a single _thrift_ row. So, ideally the layout would be: block_id -> decides the node (block_id, breed_bucket) -> decides the _thrift_ row. Old school "row key" breed -> prefix of _thrift_ columns. Old school "column name prefix" was (Author: drew_kutchar): Yes, but then I might end up with very wide rows. Basically what I want is {{PRIMARY KEY ((block_id, breed_bucket), breed)}} where records with same block_id and breed_bucket get stored on the same node, but in different _thrift_ rows so I don't have very wide rows (millions of _thrift_ columns per _thrift_ row). > Composite Aware Partitioner > --------------------------- > > Key: CASSANDRA-7850 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7850 > Project: Cassandra > Issue Type: Improvement > Reporter: Drew Kutcharian > > Since C* supports composites for partition keys, I think it'd be useful to > have the ability to only use first (or first few) components of the key to > calculate the token hash. > A naive use case would be multi-tenancy: > Say we have accounts and accounts have users. So we would have the following > tables: > {code} > CREATE TABLE account ( > id timeuuid PRIMARY KEY, > company text > ); > {code} > {code} > CREATE TABLE user ( > id timeuuid PRIMARY KEY, > accountId timeuuid, > email text, > password text > ); > {code} > {code} > // Get users by account > CREATE TABLE user_account_index ( > accountId timeuuid, > userId timeuuid, > PRIMARY KEY(acid, id) > ); > {code} > Say we want to get all the users that belong to an account. We would first > have to get the results from user_account_index and then use a multi-get > (WHERE IN) to get the records from user table. Now this multi-get part could > potentially query a lot of different nodes in the cluster. It’d be great if > there was a way to limit storage of users of an account to a single node so > that way multi-get would only need to query a single node. > With this improvement we would be able to define the user table like so: > {code} > CREATE TABLE user ( > id timeuuid, > accountId timeuuid, > email text, > password text, > PRIMARY KEY(((accountId),id)) //extra parentheses > ); > {code} > I'm not too sure about the notation, it could be something like PRIMARY > KEY(((accountId),id)) where the "(accountId)" means use this part to > calculate the hash and ((accountId),id) is the actual partition key. > The main complication I see with this is that we would have to use the table > definition when calculating hashes so we know what components of the > partition keys need to be used for hash calculation. -- This message was sent by Atlassian JIRA (v6.2#6252)