[ https://issues.apache.org/jira/browse/CASSANDRA-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13477109#comment-13477109 ]
Nick Bailey commented on CASSANDRA-4815: ---------------------------------------- Isn't this the main reason behind collections support? {noformat} CREATE TABLE movies ( movie_id int PRIMARY KEY, blacklisted int, credits map<text, text>, description text, likes_today int, name text, tags set<text> ); {noformat} > Make CQL work naturally with wide rows > -------------------------------------- > > Key: CASSANDRA-4815 > URL: https://issues.apache.org/jira/browse/CASSANDRA-4815 > Project: Cassandra > Issue Type: Wish > Reporter: Edward Capriolo > > I find that CQL3 is quite obtuse and does not provide me a language useful > for accessing my data. First, lets point out how we should design Cassandra > data. > 1) Denormalize > 2) Eliminate seeks > 3) Design for read > 4) optimize for blind writes > So here is a schema that abides by these tried and tested rules large > production uses are employing today. > Say we have a table of movie objects: > Movie > Name > Description > -< tags (string) > -< credits composite(role string, name string ) > -1 likesToday > -1 blacklisted > The above structure is a movie notice it hold a mix of static and dynamic > columns, but the other all number of columns is not very large. (even if it > was larger this is OK as well) Notice this table is not just > a single one to many relationship, it has 1 to 1 data and it has two sets of > 1 to many data. > The schema today is declared something like this: > create column family movies > with default_comparator=UTF8Type and > column_metadata = > [ > {column_name: blacklisted, validation_class: int}, > {column_name: likestoday, validation_class: long}, > {column_name: description, validation_class: UTF8Type} > ]; > We should be able to insert data like this: > set ['Cassandra Database, not looking for a seQL']['blacklisted']=1; > set ['Cassandra Database, not looking for a seQL']['likesToday']=34; > set ['Cassandra Database, not looking for a > seQL']['credits-dir']='director:asf'; > set ['Cassandra Database, not looking for a > seQL']['credits-jir]='jiraguy:bob'; > set ['Cassandra Database, not looking for a seQL']['tags-action']=''; > set ['Cassandra Database, not looking for a seQL']['tags-adventure']=''; > set ['Cassandra Database, not looking for a seQL']['tags-romance']=''; > set ['Cassandra Database, not looking for a seQL']['tags-programming']=''; > This is the correct way to do it. 1 seek to find all the information related > to a movie. As long as this row does > not get "large" there is no reason to optimize by breaking data into other > column families. (Notice you can not transpose this > because movies is two 1-to-many relationships of potentially different types) > Lets look at the CQL3 way to do this design: > First, contrary to the original design of cassandra CQL does not like wide > rows. It also does not have a good way to dealing with dynamic rows together > with static rows either. > You have two options: > Option 1: lose all schema > create table movies ( name string, column blob, value blob, primary > key(name)) with compact storage. > This method is not so hot we have not lost all our validators, and by the way > you have to physically shutdown everything and rename files and recreate your > schema if you want to inform cassandra that a current table should be > compact. This could at very least be just a metadata change. Also you can not > add column schema either. > Option 2 Normalize (is even worse) > create table movie (name String, description string, likestoday int, > blacklisted int); > create table movecredits( name string, role string, personname string, > primary key(name,role) ); > create table movetags( name string, tag string, primary key (name,tag) ); > This is a terrible design, of the 4 key characteristics how cassandra data > should be designed it fails 3: > It does not: > 1) Denormalize > 2) Eliminate seeks > 3) Design for read > Why is Cassandra steering toward this course, by making a language that does > not understand wide rows? > So what can be done? My suggestions: > Cassandra needs to lose the COMPACT STORAGE conversions. Each table needs a > "virtual view" that is compact storage with no work to migrate data and > recreate schemas. Every table should have a compact view for the schemaless, > or a simple query hint like /*transposed*/ should make this change. > Metadata should be definable by regex. For example, all columnes named "tag*" > are of type string. > CQL should have the column[slice_start] .. column[slice_end] operator from > cql2. > CQL should support current users, users should not have to > switch between CQL versions, and possibly thrift, to work with wide rows. The > language should work for them even if > it not expressly designed for them. Some of these features are already part > of cql2 so they should be carried over. > Also what needs to not happen is someone to make a hand waiving statement > like "Once we have collection types we will not need wide rows". This request > is to satisfy current users of cassandra not future ones or theoretical ones. > Solutions should not involve physically migrating data in any way, they > should not involve telling someone to do something they are already doing > much differently. The suggestions should revolve around making the query > language work well with existing data. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira