Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for 
change notification.

The "Cassandra2474" page has been changed by JonathanEllis:
http://wiki.apache.org/cassandra/Cassandra2474?action=diff&rev1=2&rev2=3

Comment:
add Alpha, Beta, and Discussion Summary sections

  
  <<TableOfContents(100)>>
  
+ == Goals ==
+ 
+ Primary: provide a CQL syntax for updating and querying composite column 
families.
+ 
+ Secondary goal: proposed syntax should be implementable by the Hive driver 
with the minimum of changes from mainline Hive.  In particular, changes to the 
Hive parser are too difficult to maintain long-term and are Right Out.  We 
would prefer to avoid changes to the Hive metastore but this is doable if 
necessary.
+ 
+ Tertiary goal: it would be nice to also support supercolumns
+ 
+ == Non-goals ==
+ 
+ Supporting arbitrarily-and-non-uniformly nested "document" data is a 
non-goal.  https://issues.apache.org/jira/browse/CASSANDRA-3647 is created to 
follow up on this related problem.
+ 
  == Alpha ==
  
- Discussion starts 
[[https://issues.apache.org/jira/browse/CASSANDRA-2474?focusedCommentId=13046834&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13046834|here]]
+ The short-lived first proposal envisioned adding the "prefix" from which to 
select a resultset to the table name in the FROM clause.  Discussion starts 
Discussion starts 
[[https://issues.apache.org/jira/browse/CASSANDRA-2474?focusedCommentId=13046834&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13046834|here]]
  
- === Goals ===
+ {{{
+ SELECT x, y FROM foo:bar WHERE parent='columnA'
+ }}}
  
-  * FIXME: add goals
-  * FIXME: add goals
-  * FIXME: add goals
+ {{{
+ select a, b FROM foo:bar:columnA where subparent='x'
+ }}}
+ 
+ === Discussion Summary ===
+ 
+ Jonathan was thinking in terms of supercolumns for this early proposal.  It's 
not clear how to generalize this to composites where the "subcolumns" are not 
explicitly named in the CompositeType definition.
+ 
+ This proposal would require a Hive metastore change, but the nail in the 
coffin is that this means you cannot use WHERE clauses with the "parent" parts 
of the column.  So, no range queries (necessary for map/reduce) or even slices 
within the same row.
  
  == Beta ==
  
- Discussion starts 
[[https://issues.apache.org/jira/browse/CASSANDRA-2474?focusedCommentId=13095626&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13095626|here]]
+ This proposal suggests the use of a keyword or hint to indicate that a query 
is transposed. Discussion starts 
[[https://issues.apache.org/jira/browse/CASSANDRA-2474?focusedCommentId=13046937&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13046937|here]]
  
- === Goals ===
+ The first part of the discussion is where to put the transposition marker:
  
-  * FIXME: add goals
-  * FIXME: add goals
-  * FIXME: add goals
+ {{{
+ select /*+TRANSPOSED*/ key, column, subcolumn, value from foo;
+ }}}
+ 
+ {{{
+ select key, column, subcolumn, value from foo TRANSPOSED;
+ }}}
+ 
+ {{{
+ select transposed(key, column, subcolumn, value) from foo;
+ }}}
+ 
+ Settling on "table:transposed" because that requires no Hive changes:
+ 
+ {{{
+ select key, column, subcolumn, value from foo:transposed;
+ }}}
+ 
+ The second part, starting 
[[https://issues.apache.org/jira/browse/CASSANDRA-2474?focusedCommentId=13095626&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13095626|here]],
 digs into how to deal with destructuring the composite column name:
+ 
+ {{{
+ SELECT name AS (tweet_id, username), value AS body
+ FROM timeline:transposed
+ WHERE tweet_id = '95a789a' AND user_id = 'cscotta'
+ }}}
+ 
+ {{{
+ SELECT component1 AS tweet_id, component2 AS username, component3 location, 
value AS body
+ FROM timeline:transposed
+ WHERE user_id = '95a789a'
+ }}}
+ 
+ {{{
+ UPDATE tweets:transposed SET COMPOUND NAME ('2e1c3308', 'cscotta') = 'My 
motocycle...' WHERE KEY = <key>;
+ }}}
+ 
+ {{{
+ UPDATE tweets:transposed SET value = 'my motorcycle' WHERE KEY= <key> AND 
column = COMPOUND_NAME('2e1c3308', 'cscotta');
+ }}}
+ 
+ === Discussion Summary ===
+ 
+ There was general agreement that "FROM foo:transposed" is a reasonable 
syntax, however, neither the "componentX" syntax (where X is in range(1, number 
of components in the compositetype) nor the "name AS (x, y)" syntax met with 
approval: the "name AS" syntax requires patching the Hive parser, and the 
"componentX" syntax is ugly and repetitive to use.  The UPDATE syntaxes were 
also unsatisfactory.
  
  == Gamma ==
  
+ This proposal switches gears to dealing with transposition using DDL instead 
of 
+ 
  Discussion starts 
[[https://issues.apache.org/jira/browse/CASSANDRA-2474?focusedCommentId=13171304&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13171304|here]]
  
- === Goals ===
- 
-  * FIXME: add goals
-  * FIXME: add goals
-  * FIXME: add goals
- 
- === Implementation ===
+ {{{
+ CREATE TABLE timeline (
+     userid int primary key,
+     posted_at uuid,
+     posted_by int,
+     body text
+ ) TRANSPOSED;
+ }}}
  
  {{{
  CREATE TABLE events (
@@ -47, +110 @@

      ts1 int,
      cat text,
      subcat text,
-     id uuid
- ) TRANSPOSED AS (ts1, cat, subcat, id);
+     "1337" uuid,
+     "92d21d0a-d6cb-437c-9d3f-b67aa733a19f" bigint
+ ) TRANSPOSED WITH COLUMN NAMES ("1337" int, 
"92d21d0a-d6cb-437c-9d3f-b67aa733a19f" uuid);
  }}}
  
- An example column stored might be (2355234412, 'trucks', 'Ford', 
2165cd4c-4db8-4a8f-a2b2-e8fa157f7697). Each transposed row is represented by 
one column.
+ === Examples ===
  
- {{{
- CREATE TABLE timeline (
-     userid int primary key,
-     posted_at uuid,
-     posted_by int,
-     body text
- ) TRANSPOSED AS (posted_at), SPARSE(posted_by, body);
- }}}
- 
- The `SPARSE` keyword means this is a dynamic composite column. Thus, the 
actual composite columns stored here might be 
{(08ec87a0-2cc3-4982-a0e7-a434884451f8, 'posted_by'): 524342} and 
{(08ec87a0-2cc3-4982-a0e7-a434884451f8, 'body'): 'CQL FTW'}. That is, the 
column name literals 'posted_by' and 'body' are part of the composite column 
name. It will take one composite column per SPARSE column to store a transposed 
row.
- 
- `SELECT`, `INSERT`, and `UPDATE` syntax requires no changes.  Some examples:
+ `SELECT`, `INSERT`, and `UPDATE` syntax require no changes.  Some examples:
  
  {{{
  INSERT INTO timeline (user_id, posted_at, posted_by, body)
@@ -85, +138 @@

  ||'''user_id'''||'''posted_at'''||'''posted_by'''||'''body'''||
  ||tjefferson||1818||jadams||Revolution was effected before the war commenced||
  ||tjefferson||1763||jadams||Democracy will soon degenerate into an anarchy||
+ ||bfranklin||1781||tjefferson||Every government degenerates when trusted to 
the rulers of the people alone||
  ||tjefferson||1790||gwashington||To be prepared for war is one of the most 
effectual means of preserving peace||
- ||bfranklin||1781||tjefferson||Every government degenerates when trusted to 
the rulers of the people alone||
  
  In "raw" form this would look like:
  
- ||tjefferson||(1790, 'body'): To be prepared for war is one of the most 
effectual means of preserving peace||(1790, 'posted_by'): gwashington||(1763, 
'body'): Democracy will soon degenerate into an anarchy||(1763, 'posted_by'): 
jadams||(1818, 'body'): Revolution was effected before the war 
commenced||(1818, 'posted_by'): jadams||
+ ||tjefferson||(1763, 'body'): Democracy will soon degenerate into an 
anarchy||(1763, 'posted_by'): jadams||(1790, 'body'): To be prepared for war is 
one of the most effectual means of preserving peace||(1790, 'posted_by'): 
gwashington||(1818, 'body'): Revolution was effected before the war 
commenced||(1818, 'posted_by'): jadams||
  ||bfranklin||(1781, 'body'): Every government degenerates when trusted to the 
rulers of the people alone||(1781, 'posted_by'): tjefferson||||||||||
  
  And an example `SELECT`:
@@ -103, +156 @@

  ||tjefferson||1790||gwashington||To be prepared for war is one of the most 
effectual means of preserving peace||
  ||tjefferson||1818||jadams||Revolution was effected before the war commenced||
  
- === Advantages ===
+ === Discussion Summary ===
  
+ Only minimal CQL changes are required.  The Hive metastore would need to be 
updated to understand the TRANSPOSED syntax.  Normal SELECTs and UPDATEs are 
supported, including "SELECT *," a weakness of the Beta proposals.
-  * Language changes are DDL-only
-  * FIXME: more advantages
-  * FIXME: more advantages
  
- === Disadvantages ===
+ This proposal originally included a SPARSE option for handling supercolumns 
and supercolumn-like composites, where a component of the composite name is 
itself a column name in the transposed row: instead of component X always 
representing the same column name, the component would contain the column name. 
 Jonathan dropped this extension in the face of objections from Sylvain when it 
became clear that fixed-structure composite types can support ALTER .. ADD 
COLUMN with a minor change 
(https://issues.apache.org/jira/browse/CASSANDRA-3657).
  
+ Without SPARSE, this proposal does not support supercolumns.
-  * FIXME: more disadvantages
-  * FIXME: more disadvantages
-  * FIXME: more disadvantages
  

Reply via email to