[jira] [Commented] (CASSANDRA-4815) Make CQL work naturally with wide rows

Sylvain Lebresne (JIRA) Mon, 22 Oct 2012 01:38:16 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-4815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13481264#comment-13481264
 ]


Sylvain Lebresne commented on CASSANDRA-4815:
---------------------------------------------


bq. Can I create a schema less table?

Yes. The following as-schemaless-as-can-possibly-be thrift/cli definition:
{noformat}
create column family schemaless
  with key_validation_class = BytesType
  and comparator = BytesType
  and default_validation_class = BytesType
{noformat}
is *equivalent* to the following CQL3 definition
{noformat}
CREATE TABLE schemaless (
  key blob,
  column blob,
  value blob,
  PRIMARY KEY (key, column)
) WITH COMPACT STORAGE
{noformat}
And to be clear, when I say equivalent, I mean equivalent. If you create the 
first definion above, you can use the column family in CQL3 as if it was 
defined by the second definition (as in, you don't have to do the CREATE TABLE 
itself), or you can create the table in CQL3 first with the second query and 
query it in thrift exactly as if it had been created by the first definition.

The composite primary key is what tells CQL3 that it's a "transposed" wide CF.  
In other words, in CQL3, 'key' will map to the row key, 'column' will map to 
the internal column name and 'value' will map to the internal column value. I 
note that 'key', 'column' and 'value' are the default names that CQL3 picks for 
you when you haven't explicitely defined user friendlier one (in other words, 
when you upgrade from thrift). CASSANDRA-4822 is open to allow you to rename 
those default names to more user friendly ones if you so wish (and to be clear, 
doing so as no impact whatsoever on what is stored, it just declare the new 
names as CQL3 metadata).

bq. I guess this is slightly more difficult to express composite slices.

It's possibly nitpicking, but I would talk of a difficulty in poperly 
paginating composites. But yes, that's one of the very few things that CQL3 is 
not currently very good at. But we'll fix it (and the good thing about having a 
query language is that it will be trivial to fix it without a backward 
incompatible breaking change). That being said, I do believe that once you 
start doing real life example, it's not really a blocker. Most of the time, 
when you use composites in real life, you want to slice over one of the 
component, which works fine. That's why it's really more a problem for slightly 
more complex pagination over composite wide rows. There is also CASSANDRA-4415 
that will fix the need for a good part of the manual pagination people do right 
now.

bq. If we have an old style schema don't we need to be able to alter a current 
table.

As explained above, "thrift" CF *are* directly accessible from CQL3 (without 
any redefinition, and that's why trying to create the table in CQL3 is not 
legal). However, you won't nice column names if you do so (but rather the 
'key', 'column' and 'value' generic names above). Again, CASSANDRA-4822 will 
allow to declare nice names without having to do complex operation (like 
trashing your thrift schema so that CQL3 allow the redefinition).

bq. What is going to happen if Cassandra and the CQL language actually adds 
true composite row keys?

It does already: CASSANDRA-4179. You just declare
{noformat}
PRIMARY KEY ((id_part1, id_part2), tag_name).
{noformat}

                
> Make CQL work naturally with wide rows
> --------------------------------------
>
>                 Key: CASSANDRA-4815
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4815
>             Project: Cassandra
>          Issue Type: Wish
>            Reporter: Edward Capriolo
>
> I find that CQL3 is quite obtuse and does not provide me a language useful 
> for accessing my data. First, lets point out how we should design Cassandra 
> data. 
> 1) Denormalize
> 2) Eliminate seeks
> 3) Design for read
> 4) optimize for blind writes
> So here is a schema that abides by these tried and tested rules large 
> production uses are employing today. 
> Say we have a table of movie objects:
> Movie
> Name
> Description
> -< tags   (string)
> -< credits composite(role string, name string )
> -1 likesToday
> -1 blacklisted
> The above structure is a movie notice it hold a mix of static and dynamic 
> columns, but the other all number of columns is not very large. (even if it 
> was larger this is OK as well) Notice this table is not just 
> a single one to many relationship, it has 1 to 1 data and it has two sets of 
> 1 to many data.
> The schema today is declared something like this:
> create column family movies
> with default_comparator=UTF8Type and
>   column_metadata =
>   [
>     {column_name: blacklisted, validation_class: int},
>     {column_name: likestoday, validation_class: long},
>     {column_name: description, validation_class: UTF8Type}
>   ];
> We should be able to insert data like this:
> set ['Cassandra Database, not looking for a seQL']['blacklisted']=1;
> set ['Cassandra Database, not looking for a seQL']['likesToday']=34;
> set ['Cassandra Database, not looking for a 
> seQL']['credits-dir']='director:asf';
> set ['Cassandra Database, not looking for a 
> seQL']['credits-jir]='jiraguy:bob';
> set ['Cassandra Database, not looking for a seQL']['tags-action']='';
> set ['Cassandra Database, not looking for a seQL']['tags-adventure']='';
> set ['Cassandra Database, not looking for a seQL']['tags-romance']='';
> set ['Cassandra Database, not looking for a seQL']['tags-programming']='';
> This is the correct way to do it. 1 seek to find all the information related 
> to a movie. As long as this row does
> not get "large" there is no reason to optimize by breaking data into other 
> column families. (Notice you can not transpose this
> because movies is two 1-to-many relationships of potentially different types)
> Lets look at the CQL3 way to do this design:
> First, contrary to the original design of cassandra CQL does not like wide 
> rows. It also does not have a good way to dealing with dynamic rows together 
> with static rows either.
> You have two options:
> Option 1: lose all schema
> create table movies ( name string, column blob, value blob, primary 
> key(name)) with compact storage.
> This method is not so hot we have not lost all our validators, and by the way 
> you have to physically shutdown everything and rename files and recreate your 
> schema if you want to inform cassandra that a current table should be 
> compact. This could at very least be just a metadata change. Also you can not 
> add column schema either.
> Option 2  Normalize (is even worse)
> create table movie (name String, description string, likestoday int, 
> blacklisted int);
> create table movecredits( name string, role string, personname string, 
> primary key(name,role) );
> create table movetags( name string, tag string, primary key (name,tag) );
> This is a terrible design, of the 4 key characteristics how cassandra data 
> should be designed it fails 3:
> It does not:
> 1) Denormalize
> 2) Eliminate seeks
> 3) Design for read
> Why is Cassandra steering toward this course, by making a language that does 
> not understand wide rows?
> So what can be done? My suggestions: 
> Cassandra needs to lose the COMPACT STORAGE conversions. Each table needs a 
> "virtual view" that is compact storage with no work to migrate data and 
> recreate schemas. Every table should have a compact view for the schemaless, 
> or a simple query hint like /*transposed*/ should make this change.
> Metadata should be definable by regex. For example, all columnes named "tag*" 
> are of type string.
> CQL should have the column[slice_start] .. column[slice_end] operator from 
> cql2. 
> CQL should support current users, users should not have to 
> switch between CQL versions, and possibly thrift, to work with wide rows. The 
> language should work for them even if 
> it not expressly designed for them. Some of these features are already part 
> of cql2 so they should be carried over.
> Also what needs to not happen is someone to make a hand waiving statement 
> like "Once we have collection types we will not need wide rows". This request 
> is to satisfy current users of cassandra not future ones or theoretical ones. 
> Solutions should not involve physically migrating data in any way, they 
> should not involve telling someone to do something they are already doing 
> much differently. The suggestions should revolve around making the query 
> language work well with existing data. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-4815) Make CQL work naturally with wide rows

Reply via email to