[jira] [Issue Comment Edited] (CASSANDRA-2474) CQL support for compound columns and wide rows

2012-01-20 Thread Sylvain Lebresne (Issue Comment Edited) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13189873#comment-13189873
 ] 

Sylvain Lebresne edited comment on CASSANDRA-2474 at 1/20/12 4:33 PM:
--

This issue is reay long. So I've created CASSANDRA-3761 for CQL 
3.0, which basically will be the changes of the patch and whatever other 
changes I'm claiming are implied. It feels vaguely reasonable to switch to a 
more general ticket anyway now that we've decided on a number of things (the 
main syntax, the fact that we'll keep support for 2.0 for now etc...).

I'll update CASSANDRA-3761 with a patch and more details about that patch soon, 
real soon.

  was (Author: slebresne):
This issue is reay long. So I've created CASSANDRA-3761 for CQL 
3.0, which basically will be the changes of the patch and whatever other 
changes I'm pretending are implied. It feels vaguely reasonable to switch to a 
more general ticket anyway now that we've decided on a number of things (the 
main syntax, the fact that we'll keep support for 2.0 for now etc...).

I'll update CASSANDRA-3761 with a patch and more details about that patch soon, 
real soon.
  
> CQL support for compound columns and wide rows
> --
>
> Key: CASSANDRA-2474
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2474
> Project: Cassandra
>  Issue Type: New Feature
>  Components: API, Core
>Reporter: Eric Evans
>Assignee: Sylvain Lebresne
>Priority: Critical
>  Labels: cql
> Fix For: 1.1
>
> Attachments: 0001-Add-support-for-wide-and-composite-CFs.patch, 
> 0002-thrift-generated-code.patch, 2474-transposed-1.PNG, 
> 2474-transposed-raw.PNG, 2474-transposed-select-no-sparse.PNG, 
> 2474-transposed-select.PNG, cql_tests.py, raw_composite.txt, 
> screenshot-1.jpg, screenshot-2.jpg
>
>
> For the most part, this boils down to supporting the specification of 
> compound column names (the CQL syntax is colon-delimted terms), and then 
> teaching the decoders (drivers) to create structures from the results.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Issue Comment Edited] (CASSANDRA-2474) CQL support for compound columns and wide rows

2012-01-12 Thread T Jake Luciani (Issue Comment Edited) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13185228#comment-13185228
 ] 

T Jake Luciani edited comment on CASSANDRA-2474 at 1/12/12 8:51 PM:


bq. My current patch don't really allow creating a secondary index on a non 
static CF, because it's not clear how that would work from the syntax and I'm 
sure I see good use for that. Does that bother someone ? 

CASSANDRA-3680 will add support for this at a later time

  was (Author: tjake):
.bq My current patch don't really allow creating a secondary index on a non 
static CF, because it's not clear how that would work from the syntax and I'm 
sure I see good use for that. Does that bother someone ? 

CASSANDRA-3680 will add support for this at a later time
  
> CQL support for compound columns and wide rows
> --
>
> Key: CASSANDRA-2474
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2474
> Project: Cassandra
>  Issue Type: New Feature
>  Components: API, Core
>Reporter: Eric Evans
>Assignee: Sylvain Lebresne
>Priority: Critical
>  Labels: cql
> Fix For: 1.1
>
> Attachments: 2474-transposed-1.PNG, 2474-transposed-raw.PNG, 
> 2474-transposed-select-no-sparse.PNG, 2474-transposed-select.PNG, 
> cql_tests.py, raw_composite.txt, screenshot-1.jpg, screenshot-2.jpg
>
>
> For the most part, this boils down to supporting the specification of 
> compound column names (the CQL syntax is colon-delimted terms), and then 
> teaching the decoders (drivers) to create structures from the results.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Issue Comment Edited] (CASSANDRA-2474) CQL support for compound columns

2011-12-30 Thread Jonathan Ellis (Issue Comment Edited) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13177293#comment-13177293
 ] 

Jonathan Ellis edited comment on CASSANDRA-2474 at 12/30/11 4:57 PM:
-

I've updated the wiki page at http://wiki.apache.org/cassandra/Cassandra2474 
with a more in-depth Background section that hopefully clears up where I'm 
going with this sparse/dense business.

Edit: had some discussion here about making row key explicit in TRANSPOSED WITH 
ROW KEY clause, instead of implicitly being the first element of PRIMARY KEY, 
but consensus seems to be this is a bad idea.

  was (Author: jbellis):
I've updated the wiki page at 
http://wiki.apache.org/cassandra/Cassandra2474 with a more in-depth Background 
section that hopefully clears up where I'm going with this sparse/dense 
business.

Eric mentioned on IRC that he's uneasy about the PRIMARY KEY syntax implicitly 
using the first element of PRIMARY KEY as the row key.  We could make it 
explicit with another WITH option to the TRANSPOSED clause: 

{code}
CREATE TABLE timeline (
user_id int,
posted_at uuid,
column string,
value blob,
PRIMARY KEY(user_id, posted_at)
) TRANSPOSED WITH ROW KEY(user_id)
{code}

This makes things more verbose (this would be a required clause) but I'm okay 
with that if consensus is that being explicit here is better.
  
> CQL support for compound columns
> 
>
> Key: CASSANDRA-2474
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2474
> Project: Cassandra
>  Issue Type: New Feature
>  Components: API, Core
>Reporter: Eric Evans
>Assignee: Pavel Yaskevich
>  Labels: cql
> Fix For: 1.1
>
> Attachments: 2474-transposed-1.PNG, 2474-transposed-raw.PNG, 
> 2474-transposed-select-no-sparse.PNG, 2474-transposed-select.PNG, 
> raw_composite.txt, screenshot-1.jpg, screenshot-2.jpg
>
>
> For the most part, this boils down to supporting the specification of 
> compound column names (the CQL syntax is colon-delimted terms), and then 
> teaching the decoders (drivers) to create structures from the results.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Issue Comment Edited] (CASSANDRA-2474) CQL support for compound columns

2011-12-29 Thread Jonathan Ellis (Issue Comment Edited) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13177293#comment-13177293
 ] 

Jonathan Ellis edited comment on CASSANDRA-2474 at 12/30/11 1:41 AM:
-

I've updated the wiki page at http://wiki.apache.org/cassandra/Cassandra2474 
with a more in-depth Background section that hopefully clears up where I'm 
going with this sparse/dense business.

Eric mentioned on IRC that he's uneasy about the PRIMARY KEY syntax implicitly 
using the first element of PRIMARY KEY as the row key.  We could make it 
explicit with another WITH option to the TRANSPOSED clause: 

{code}
CREATE TABLE timeline (
user_id int,
posted_at uuid,
column string,
value blob,
PRIMARY KEY(user_id, posted_at)
) TRANSPOSED WITH ROW KEY(user_id)
{code}

This makes things more verbose (this would be a required clause) but I'm okay 
with that if consensus is that being explicit here is better.

  was (Author: jbellis):
I've updated the wiki page at 
http://wiki.apache.org/cassandra/Cassandra2474 with a more in-depth Background 
section that hopefully clears up where I'm going with this sparse/dense 
business.

Eric mentioned on IRC that he's uneasy about the PRIMARY KEY syntax implicitly 
using the first element of PRIMARY KEY as the row key.  We could make it 
explicit with another WITH option to the TRANSPOSED clause: 

{{{
CREATE TABLE timeline (
user_id int,
posted_at uuid,
column string,
value blob,
PRIMARY KEY(user_id, posted_at)
) TRANSPOSED WITH ROW KEY(user_id)
}}}

This makes things more verbose (this would be a required clause) but I'm okay 
with that if consensus is that being explicit here is better.
  
> CQL support for compound columns
> 
>
> Key: CASSANDRA-2474
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2474
> Project: Cassandra
>  Issue Type: New Feature
>  Components: API, Core
>Reporter: Eric Evans
>Assignee: Pavel Yaskevich
>  Labels: cql
> Fix For: 1.1
>
> Attachments: 2474-transposed-1.PNG, 2474-transposed-raw.PNG, 
> 2474-transposed-select-no-sparse.PNG, 2474-transposed-select.PNG, 
> raw_composite.txt, screenshot-1.jpg, screenshot-2.jpg
>
>
> For the most part, this boils down to supporting the specification of 
> compound column names (the CQL syntax is colon-delimted terms), and then 
> teaching the decoders (drivers) to create structures from the results.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Issue Comment Edited] (CASSANDRA-2474) CQL support for compound columns

2011-12-28 Thread Jonathan Ellis (Issue Comment Edited) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13176762#comment-13176762
 ] 

Jonathan Ellis edited comment on CASSANDRA-2474 at 12/28/11 8:58 PM:
-

The more I think about it the less happy I am with omitting support for sparse 
columns.  Remember that dense composites may only be inserted and deleted, not 
updated, since they are just a tuple of values with "column names" determined 
by schema and/or convention.

I think we can support sparse columns well in a way that improves the 
conceptual integrity for the dense composites as well:

{code}
-- "column" and "value" are sparse; a transposed row will be stored as
-- two columns of (posted_at, 'column': string) and (posted_at, 'value': blob),
-- with C* row key of user_id
CREATE TABLE timeline (
user_id int,
posted_at uuid,
column string,
value blob,
PRIMARY KEY(user_id, posted_at)
) TRANSPOSED;

-- entire transposed row is stored as a single dense composite column
-- (ts1, cat, subcat, 1337, 92d21d0a-...: []) with a C* row key of series.  
-- Note that the composite column's value is unused in this case.
CREATE TABLE events (
series text,
ts1 int,
cat text,
subcat text,
"1337" uuid,
"92d21d0a-d6cb-437c-9d3f-b67aa733a19f" bigint,
PRIMARY KEY(series, ts1, cat, subcat, "1337", 
"92d21d0a-d6cb-437c-9d3f-b67aa733a19f")
) TRANSPOSED WITH COLUMN NAMES ("1337" int, 
"92d21d0a-d6cb-437c-9d3f-b67aa733a19f" uuid);
{code}

Thus, columns included in the (transposed) primary key will be "dense," and not 
updateable, which conforms to our existing practice that keys are not 
updateable.  Remaining columns will be updateable since they will each map to a 
separate physical column.

  was (Author: jbellis):
The more I think about it the less happy I am with omitting support for 
sparse columns.  Remember that dense composites may only be inserted and 
deleted, not updated, since they are just a tuple of values with "column names" 
determined by schema and/or convention.

I think we can support sparse columns well in a way that improves the 
conceptual integrity for the dense composites as well:

{code}
-- "column" and "value" are sparse; a transposed row will be stored as
-- two columns of (user_id, posted_at, 'column': string) and (user_id, 
posted_at, 'value': blob)
CREATE TABLE timeline (
user_id int,
posted_at uuid,
column string,
value blob,
PRIMARY KEY(user_id, posted_at)
) TRANSPOSED;

-- entire transposed row is stored as a single dense composite column
-- (series, ts1, cat, subcat, 1337, 92d21d0a-...: []).  Note that the
-- composite column's value is unused in this case.
CREATE TABLE events (
series text,
ts1 int,
cat text,
subcat text,
"1337" uuid,
"92d21d0a-d6cb-437c-9d3f-b67aa733a19f" bigint,
PRIMARY KEY(series, ts1, cat, subcat, "1337", 
"92d21d0a-d6cb-437c-9d3f-b67aa733a19f")
) TRANSPOSED WITH COLUMN NAMES ("1337" int, 
"92d21d0a-d6cb-437c-9d3f-b67aa733a19f" uuid);
{code}

Thus, columns included in the (transposed) primary key will be "dense," and not 
updateable, which conforms to our existing practice that keys are not 
updateable.  Remaining columns will be updateable since they will each map to a 
separate physical column.
  
> CQL support for compound columns
> 
>
> Key: CASSANDRA-2474
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2474
> Project: Cassandra
>  Issue Type: New Feature
>  Components: API, Core
>Reporter: Eric Evans
>Assignee: Pavel Yaskevich
>  Labels: cql
> Fix For: 1.1
>
> Attachments: 2474-transposed-1.PNG, 2474-transposed-raw.PNG, 
> 2474-transposed-select-no-sparse.PNG, 2474-transposed-select.PNG, 
> raw_composite.txt, screenshot-1.jpg, screenshot-2.jpg
>
>
> For the most part, this boils down to supporting the specification of 
> compound column names (the CQL syntax is colon-delimted terms), and then 
> teaching the decoders (drivers) to create structures from the results.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Issue Comment Edited] (CASSANDRA-2474) CQL support for compound columns

2011-12-20 Thread Jonathan Ellis (Issue Comment Edited) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13173246#comment-13173246
 ] 

Jonathan Ellis edited comment on CASSANDRA-2474 at 12/20/11 3:29 PM:
-

bq. I would say that 'sucking much less' doesn't necessarily make it 'the right 
solution'.

This is more than sucking less. This is adapting exactly to the relational 
philosophy that SQL is about sets of records and predicates that deal with 
them, and how things are stored is an implementation detail.  I don't see how 
we can do better than this or something like it.

bq. Typically, you can do {{UPDATE test1 SET prop2 = 42 WHERE key = 'someKey'}} 
but you cannot do the same [statement] on test2. 

That's a good point, although I think the limitations are fairly easy to 
explain, along with the effect on ordering.

bq. The SPARSE don't add any new possibility, it just adds a presumably better 
syntax for a specific case. 

Technically that is true, but that is akin to arguing that because PHP is 
Turing complete it's as good as Java to write databases in. :)

Consider my timeline example.  If we defined the table without SPARSE, as
{noformat}
CREATE TABLE timeline (
userid int primary key,
posted_at uuid,
column string,
value blob
) TRANSPOSED
{noformat}

The query {{SELECT * FROM timeline WHERE user_id = 'tjefferson' AND posted_at > 
1770}} would give us 

!2474-transposed-select-no-sparse.PNG!

I don't see how this is usable in any reasonable sense of the word.  Am I 
missing something?

(Edit: it's actually worse than that, since you can't even use 'posted_at' in 
the query.  I've actually written SQL with "dynamic columns" like this and I 
cannot overemphasize how badly it sucks.)

(And no, "just model it with dense composites" is not an option for the mview 
use case, where adding new columns requires rebuilding the entire CF.  That's a 
big win for us, and I'm not willing to give it up.)

bq. I would actually be in favor of having the first step on this being the 
addition of a very simple 'raw' notation to access composites.

I'm strongly against this or any "row slicing" notation, it's a terrible fit 
for anything that wants to deal with resultsets of rows sharing a common set of 
columns (i.e., CQL and its drivers). Unless this actually returns rows instead 
of columns which is not at all implied by the syntax. Strongly against that 
too. :)

  was (Author: jbellis):
bq. I would say that 'sucking much less' doesn't necessarily make it 'the 
right solution'.

This is more than sucking less. This is adapting exactly to the relational 
philosophy that SQL is about sets of records and predicates that deal with 
them, and how things are stored is an implementation detail.  I don't see how 
we can do better than this or something like it.

bq. Typically, you can do {{UPDATE test1 SET prop2 = 42 WHERE key = 'someKey'}} 
but you cannot do the same [statement] on test2. 

That's a good point, although I think the limitations are fairly easy to 
explain, along with the effect on ordering.

bq. The SPARSE don't add any new possibility, it just adds a presumably better 
syntax for a specific case. 

Technically that is true, but that is akin to arguing that because PHP is 
Turing complete it's as good as Java to write databases in. :)

Consider my timeline example.  If we defined the table without SPARSE, as
{noformat}
CREATE TABLE timeline (
userid int primary key,
posted_at uuid,
column string,
value blob
) TRANSPOSED
{noformat}

The query {{SELECT * FROM timeline WHERE user_id = 'tjefferson' AND posted_at > 
1770}} would give us 

!2474-transposed-select-no-sparse.PNG!

I don't see how this is usable in any reasonable sense of the word.  Am I 
missing something?

(And no, "just model it with dense composites" is not an option for the mview 
use case, where adding new columns requires rebuilding the entire CF.  That's a 
big win for us, and I'm not willing to give it up.)

bq. I would actually be in favor of having the first step on this being the 
addition of a very simple 'raw' notation to access composites.

I'm strongly against this or any "row slicing" notation, it's a terrible fit 
for anything that wants to deal with resultsets of rows sharing a common set of 
columns (i.e., CQL and its drivers). Unless this actually returns rows instead 
of columns which is not at all implied by the syntax. Strongly against that 
too. :)
  
> CQL support for compound columns
> 
>
> Key: CASSANDRA-2474
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2474
> Project: Cassandra
>  Issue Type: New Feature
>  Components: API, Core
>Reporter: Eric Evans
>Assignee: Pavel Yaskevich
>  Labels: cql
> 

[jira] [Issue Comment Edited] (CASSANDRA-2474) CQL support for compound columns

2011-12-19 Thread Pavel Yaskevich (Issue Comment Edited) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13172670#comment-13172670
 ] 

Pavel Yaskevich edited comment on CASSANDRA-2474 at 12/19/11 10:12 PM:
---

Here is another option for this:

{code}
TRANSPOSED AS ( ?MODIFIER(?), ) 
{code}

where MODIFIER = SPARSE | DENSE and  = utf8, int, uuid, timeuuid, 
...; if no MODIFIER set then MODIFIER = DENSE(default_type)

{code}
CREATE TABLE timeline (
userid int primary key,
posted_at uuid,
posted_by int,
body text,
"92d21d0a-d6cb-437c-9d3f-b67aa733a19f" bigint
) TRANSPOSED AS (posted_at, posted_by SPARSE, 
92d21d0a-d6cb-437c-9d3f-b67aa733a19f" DENSE(uuid), body SPARSE);
{code}

  was (Author: xedin):
Here is another option for this:

{code}
TRANSPOSED AS ( ?MODIFIER(), ) 
{code}

where MODIFIER = SPARSE | DENSE and  = utf8, int, uuid, timeuuid, 
...; if no MODIFIER set then MODIFIER = DENSE(default_type)

{code}
CREATE TABLE timeline (
userid int primary key,
posted_at uuid,
posted_by int,
body text,
"92d21d0a-d6cb-437c-9d3f-b67aa733a19f" bigint
) TRANSPOSED AS (posted_at, posted_by SPARSE, 
92d21d0a-d6cb-437c-9d3f-b67aa733a19f" DENSE(uuid), body SPARSE);
{code}
  
> CQL support for compound columns
> 
>
> Key: CASSANDRA-2474
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2474
> Project: Cassandra
>  Issue Type: New Feature
>  Components: API, Core
>Reporter: Eric Evans
>Assignee: Pavel Yaskevich
>  Labels: cql
> Fix For: 1.1
>
> Attachments: 2474-transposed-1.PNG, 2474-transposed-raw.PNG, 
> 2474-transposed-select.PNG, screenshot-1.jpg, screenshot-2.jpg
>
>
> For the most part, this boils down to supporting the specification of 
> compound column names (the CQL syntax is colon-delimted terms), and then 
> teaching the decoders (drivers) to create structures from the results.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Issue Comment Edited] (CASSANDRA-2474) CQL support for compound columns

2011-12-19 Thread Jonathan Ellis (Issue Comment Edited) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13172604#comment-13172604
 ] 

Jonathan Ellis edited comment on CASSANDRA-2474 at 12/19/11 9:02 PM:
-

The crucial part of this latest proposal is that it really highlights that 
transposition really is just an implementation detail from the relational 
perspective.  So, to flesh that out:

{code}
INSERT INTO timeline (user_id, posted_at, posted_by, body)
VALUES ('tjefferson', '1818', 'jadams', 'Revolution was effected before the war 
commenced');

INSERT INTO timeline (user_id, posted_at, posted_by, body)
VALUES ('tjefferson', '1763', 'jadams', 'Democracy will soon degenerate into an 
anarchy');

INSERT INTO timeline (user_id, posted_at, posted_by, body)
VALUES ('tjefferson', '1790', 'gwashington', 'To be prepared for war is one of 
the most effectual means of preserving peace');

INSERT INTO timeline (user_id, posted_at, posted_by, body)
VALUES ('bfranklin', '1781', 'tjefferson', 'Every government degenerates when 
trusted to the rulers of the people alone');
{code}

... corresponding to the data in !2474-transposed-1.PNG!

which in "raw" form looks like !2474-transposed-raw.PNG!

Does that make sense?  We're using TRANSPOSED AS similarly to how databases 
have used storage hints like CLUSTERED.  It doesn't affect the relational model 
of the data, but it gives you different performance characteristics.  (The 
analogy is particularly apt in that both CLUSTERED and TRANSPOSED AS affect 
ordering of results. EDIT: oops, goofed in ordering of the tjefferson row in my 
diagram.)

  was (Author: jbellis):
The crucial part of this latest proposal is that it really highlights that 
transposition really is just an implementation detail from the relational 
perspective.  So, to flesh that out:

{code}
INSERT INTO timeline (user_id, posted_at, posted_by, body)
VALUES ('tjefferson', '1818', 'jadams', 'Revolution was effected before the war 
commenced');

INSERT INTO timeline (user_id, posted_at, posted_by, body)
VALUES ('tjefferson', '1763', 'jadams', 'Democracy will soon degenerate into an 
anarchy');

INSERT INTO timeline (user_id, posted_at, posted_by, body)
VALUES ('tjefferson', '1790', 'gwashington', 'To be prepared for war is one of 
the most effectual means of preserving peace');

INSERT INTO timeline (user_id, posted_at, posted_by, body)
VALUES ('bfranklin', '1781', 'tjefferson', 'Every government degenerates when 
trusted to the rulers of the people alone');
{code}

... corresponding to the data in !2474-transposed-1.PNG!

which in "raw" form looks like !2474-transposed-raw.PNG!

Does that make sense?  We're using TRANSPOSED AS similarly to how databases 
have used storage hints like CLUSTERED.  It doesn't affect the relational model 
of the data, but it gives you different performance characteristics.  (The 
analogy is particularly apt in that both CLUSTERED and TRANSPOSED AS affect 
ordering of results.)
  
> CQL support for compound columns
> 
>
> Key: CASSANDRA-2474
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2474
> Project: Cassandra
>  Issue Type: New Feature
>  Components: API, Core
>Reporter: Eric Evans
>Assignee: Pavel Yaskevich
>  Labels: cql
> Fix For: 1.1
>
> Attachments: 2474-transposed-1.PNG, 2474-transposed-raw.PNG, 
> 2474-transposed-select.PNG, screenshot-1.jpg, screenshot-2.jpg
>
>
> For the most part, this boils down to supporting the specification of 
> compound column names (the CQL syntax is colon-delimted terms), and then 
> teaching the decoders (drivers) to create structures from the results.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Issue Comment Edited] (CASSANDRA-2474) CQL support for compound columns

2011-12-19 Thread Jonathan Ellis (Issue Comment Edited) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13172608#comment-13172608
 ] 

Jonathan Ellis edited comment on CASSANDRA-2474 at 12/19/11 9:03 PM:
-

So, you could do queries like

{code}
SELECT * FROM timeline WHERE user_id = 'tjefferson' AND posted_at > 1770;
{code}

Which would give the resultset shown in !2474-transposed-select.PNG!

  was (Author: jbellis):
So, you could do queries like

{code}
SELECT * FROM timeline WHERE user_id = 'tjefferson' AND posted_ad > 1770;
{code}

Which would give the resultset shown in !2474-transposed-select.PNG!
  
> CQL support for compound columns
> 
>
> Key: CASSANDRA-2474
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2474
> Project: Cassandra
>  Issue Type: New Feature
>  Components: API, Core
>Reporter: Eric Evans
>Assignee: Pavel Yaskevich
>  Labels: cql
> Fix For: 1.1
>
> Attachments: 2474-transposed-1.PNG, 2474-transposed-raw.PNG, 
> 2474-transposed-select.PNG, screenshot-1.jpg, screenshot-2.jpg
>
>
> For the most part, this boils down to supporting the specification of 
> compound column names (the CQL syntax is colon-delimted terms), and then 
> teaching the decoders (drivers) to create structures from the results.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Issue Comment Edited] (CASSANDRA-2474) CQL support for compound columns

2011-12-19 Thread Jonathan Ellis (Issue Comment Edited) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13172604#comment-13172604
 ] 

Jonathan Ellis edited comment on CASSANDRA-2474 at 12/19/11 8:59 PM:
-

The crucial part of this latest proposal is that it really highlights that 
transposition really is just an implementation detail from the relational 
perspective.  So, to flesh that out:

{code}
INSERT INTO timeline (user_id, posted_at, posted_by, body)
VALUES ('tjefferson', '1818', 'jadams', 'Revolution was effected before the war 
commenced');

INSERT INTO timeline (user_id, posted_at, posted_by, body)
VALUES ('tjefferson', '1763', 'jadams', 'Democracy will soon degenerate into an 
anarchy');

INSERT INTO timeline (user_id, posted_at, posted_by, body)
VALUES ('tjefferson', '1790', 'gwashington', 'To be prepared for war is one of 
the most effectual means of preserving peace');

INSERT INTO timeline (user_id, posted_at, posted_by, body)
VALUES ('bfranklin', '1781', 'tjefferson', 'Every government degenerates when 
trusted to the rulers of the people alone');
{code}

... corresponding to the data in !2474-transposed-1.PNG!

which in "raw" form looks like !2474-transposed-raw.PNG!

Does that make sense?  We're using TRANSPOSED AS similarly to how databases 
have used storage hints like CLUSTERED.  It doesn't affect the relational model 
of the data, but it gives you different performance characteristics.  (The 
analogy is particularly apt in that both CLUSTERED and TRANSPOSED AS affect 
ordering of results.)

  was (Author: jbellis):
The crucial part of this latest proposal is that it really highlights that 
transposition really is just an implementation detail from the relational 
perspective.  So, to flesh that out:

{code}
INSERT INTO timeline (user_id, posted_at, posted_by, body)
VALUES ('tjefferson', '1818', 'jadams', 'Revolution was effected before the war 
commenced');

INSERT INTO timeline (user_id, posted_at, posted_by, body)
VALUES ('tjefferson', '1763', 'jadams', 'Democracy will soon degenerate into an 
anarchy');

INSERT INTO timeline (user_id, posted_at, posted_by, body)
VALUES ('tjefferson', '1790', 'gwashington', 'To be prepared for war is one of 
the most effectual means of preserving peace');

INSERT INTO timeline (user_id, posted_at, posted_by, body)
VALUES ('bfranklin', '1781', 'tjefferson', 'Every government degenerates when 
trusted to the rulers of the people alone');
{code}

... corresponding to the data in !2474-transposed-1.PNG!, which in "raw" form 
looks like !2474-transposed-raw.PNG!

Does that make sense?  We're using TRANSPOSED AS similarly to how databases 
have used storage hints like CLUSTERED.  It doesn't affect the relational model 
of the data, but it gives you different performance characteristics.  (The 
analogy is particularly apt in that both CLUSTERED and TRANSPOSED AS affect 
ordering of results.)
  
> CQL support for compound columns
> 
>
> Key: CASSANDRA-2474
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2474
> Project: Cassandra
>  Issue Type: New Feature
>  Components: API, Core
>Reporter: Eric Evans
>Assignee: Pavel Yaskevich
>  Labels: cql
> Fix For: 1.1
>
> Attachments: 2474-transposed-1.PNG, 2474-transposed-raw.PNG, 
> screenshot-1.jpg, screenshot-2.jpg
>
>
> For the most part, this boils down to supporting the specification of 
> compound column names (the CQL syntax is colon-delimted terms), and then 
> teaching the decoders (drivers) to create structures from the results.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Issue Comment Edited] (CASSANDRA-2474) CQL support for compound columns

2011-12-19 Thread T Jake Luciani (Issue Comment Edited) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13172299#comment-13172299
 ] 

T Jake Luciani edited comment on CASSANDRA-2474 at 12/19/11 2:21 PM:
-

In the non-sparse case you would always ignore the column value? I think we 
need to expose that somehow. (first non-transposed, non-key, non-sparse column?)

Overall I like this because it forces a user to think at schema creation time 
and not access time. This approach makes sense for CQL only access, but for 
users who are coming from thrift they will be asking "how do i access data from 
my current data model?"

On the negative side, this approach feels a bit too restrictive since you 
*MUST* use the same kind of schema across all rows within a CF.  What if a user 
doesn't know what the sparse columns will be ahead of time?  

Also, I know that's best practice but want to make the point, what if a user 
wants to access data in composite form and "raw" mode, should we support 
multiple "views" on the CF?



  was (Author: tjake):
In the non-sparse case you always would always ignore the column value? I 
think we need to expose that somehow. (first non-transposed, non-key, 
non-sparse column?)

Overall I like this because it forces a user to think at schema creation time 
and not access time. This approach makes sense for CQL only access, but for 
users who are coming from thrift they will be asking "how do i access data from 
my current data model?"

On the negative side, this approach feels a bit too restrictive since you 
*MUST* use the same kind of schema across all rows within a CF.  What if a user 
doesn't know what the sparse columns will be ahead of time?  

Also, I know that's best practice but want to make the point, what if a user 
wants to access data in composite form and "raw" mode, should we support 
multiple "views" on the CF?


  
> CQL support for compound columns
> 
>
> Key: CASSANDRA-2474
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2474
> Project: Cassandra
>  Issue Type: New Feature
>  Components: API, Core
>Reporter: Eric Evans
>Assignee: Pavel Yaskevich
>  Labels: cql
> Fix For: 1.1
>
> Attachments: screenshot-1.jpg, screenshot-2.jpg
>
>
> For the most part, this boils down to supporting the specification of 
> compound column names (the CQL syntax is colon-delimted terms), and then 
> teaching the decoders (drivers) to create structures from the results.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Issue Comment Edited] (CASSANDRA-2474) CQL support for compound columns

2011-09-07 Thread Pavel Yaskevich (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13099103#comment-13099103
 ] 

Pavel Yaskevich edited comment on CASSANDRA-2474 at 9/7/11 5:09 PM:


To make it Hive friendly

{noformat}
-- sparse select example
SELECT tweet_id; username, body; location FROM tweets:transposed WHERE key = 
 AND username = 'xedin';

-- dense select example
SELECT tweet_id; username; location FROM tweets:transposed WHERE key =  
AND username = 'xedin';

-- insert
INSERT INTO tweets (KEY, COMPOUND_NAME('2e1c3308', username))  VALUES (, 
'cscotta');
INSERT INTO tweets (KEY, COMPOUND_NAME('2e1c3308', body))  VALUES (, 
'Brother...');
INSERT INTO tweets (KEY, COMPOUND_NAME('2e1c3308', 'cscotta')) VALUES (, 
'Brother...');

-- update
UPDATE tweets SET value = 'My motocycle...' WHERE KEY =  AND 
COMPOUND_NAME('2e1c3308', 'cscotta');
{noformat}

  was (Author: xedin):
To make it Hive friendly

{noformat}
-- sparse select example
SELECT tweet_id; username, body; location FROM tweets:transposed WHERE key = 
 AND username = 'xedin';

-- dense select example
SELECT tweet_id; username; location FROM tweets:transposed WHERE key =  
AND username = 'xedin';

-- insert
INSERT INTO tweets:transposed (KEY, COMPOUND_NAME('2e1c3308', username))  
VALUES (, 'cscotta');
INSERT INTO tweets:transposed (KEY, COMPOUND_NAME('2e1c3308', body))  
VALUES (, 'Brother...');
INSERT INTO tweets:transposed (KEY, COMPOUND_NAME('2e1c3308', 'cscotta')) 
VALUES (, 'Brother...');

-- update
UPDATE tweets:transposed SET value = 'My motocycle...' WHERE KEY =  AND 
COMPOUND_NAME('2e1c3308', 'cscotta');
{noformat}
  
> CQL support for compound columns
> 
>
> Key: CASSANDRA-2474
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2474
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: API, Core
>Reporter: Eric Evans
>Assignee: Pavel Yaskevich
>  Labels: cql
> Fix For: 1.0
>
> Attachments: screenshot-1.jpg, screenshot-2.jpg
>
>
> For the most part, this boils down to supporting the specification of 
> compound column names (the CQL syntax is colon-delimted terms), and then 
> teaching the decoders (drivers) to create structures from the results.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Issue Comment Edited] (CASSANDRA-2474) CQL support for compound columns

2011-09-07 Thread Pavel Yaskevich (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13099103#comment-13099103
 ] 

Pavel Yaskevich edited comment on CASSANDRA-2474 at 9/7/11 5:08 PM:


To make it Hive friendly

{noformat}
-- sparse select example
SELECT tweet_id; username, body; location FROM tweets:transposed WHERE key = 
 AND username = 'xedin';

-- dense select example
SELECT tweet_id; username; location FROM tweets:transposed WHERE key =  
AND username = 'xedin';

-- insert
INSERT INTO tweets:transposed (KEY, COMPOUND_NAME('2e1c3308', username))  
VALUES (, 'cscotta');
INSERT INTO tweets:transposed (KEY, COMPOUND_NAME('2e1c3308', body))  
VALUES (, 'Brother...');
INSERT INTO tweets:transposed (KEY, COMPOUND_NAME('2e1c3308', 'cscotta')) 
VALUES (, 'Brother...');

-- update
UPDATE tweets:transposed SET value = 'My motocycle...' WHERE KEY =  AND 
COMPOUND_NAME('2e1c3308', 'cscotta');
{noformat}

  was (Author: xedin):
No make it Hive friendly

{noformat}
-- sparse select example
SELECT tweet_id; username, body; location FROM tweets:transposed WHERE key = 
 AND username = 'xedin';

-- dense select example
SELECT tweet_id; username; location FROM tweets:transposed WHERE key =  
AND username = 'xedin';

-- insert
INSERT INTO tweets:transposed (KEY, COMPOUND_NAME('2e1c3308', username))  
VALUES (, 'cscotta');
INSERT INTO tweets:transposed (KEY, COMPOUND_NAME('2e1c3308', body))  
VALUES (, 'Brother...');
INSERT INTO tweets:transposed (KEY, COMPOUND_NAME('2e1c3308', 'cscotta')) 
VALUES (, 'Brother...');

-- update
UPDATE tweets:transposed SET value = 'My motocycle...' WHERE KEY =  AND 
COMPOUND_NAME('2e1c3308', 'cscotta');
{noformat}
  
> CQL support for compound columns
> 
>
> Key: CASSANDRA-2474
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2474
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: API, Core
>Reporter: Eric Evans
>Assignee: Pavel Yaskevich
>  Labels: cql
> Fix For: 1.0
>
> Attachments: screenshot-1.jpg, screenshot-2.jpg
>
>
> For the most part, this boils down to supporting the specification of 
> compound column names (the CQL syntax is colon-delimted terms), and then 
> teaching the decoders (drivers) to create structures from the results.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Issue Comment Edited] (CASSANDRA-2474) CQL support for compound columns

2011-09-06 Thread T Jake Luciani (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13098384#comment-13098384
 ] 

T Jake Luciani edited comment on CASSANDRA-2474 at 9/6/11 9:38 PM:
---

bq. I think something got lost along the way here

Seems like the confusion is around the component1 syntax? 

This is meant to be used with composite columns. Example: 

{code}
ts1:catA  :subcatA   :id1   :prop1
component1 :component2:component3:component4:component5
{code}





  was (Author: tjake):
bq. I think something got lost along the way here

Seems like the confusion is around the component1 syntax? 

This is meant to be used with composite columns. Example: 

ts1:catA  :subcatA   :id1   :prop1
component1 :component2:component3:component4:component5





  
> CQL support for compound columns
> 
>
> Key: CASSANDRA-2474
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2474
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: API, Core
>Reporter: Eric Evans
>Assignee: Pavel Yaskevich
>  Labels: cql
> Fix For: 1.0
>
> Attachments: screenshot-1.jpg, screenshot-2.jpg
>
>
> For the most part, this boils down to supporting the specification of 
> compound column names (the CQL syntax is colon-delimted terms), and then 
> teaching the decoders (drivers) to create structures from the results.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Issue Comment Edited] (CASSANDRA-2474) CQL support for compound columns

2011-09-06 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13098165#comment-13098165
 ] 

Sylvain Lebresne edited comment on CASSANDRA-2474 at 9/6/11 5:07 PM:
-

bq. A more Cassandra-ish way to model this would be to encode this as a series 
of columns: (, 'category', ), (, 'subcategory', 
), (, 'event', ). This is better in the 
general case for the same reason that a sparse top-level set of columns is 
better: I can easily add more data to events (e.g., "source") without rewriting 
existing events.

But my point is: I disagree with that claim.

Maybe sometime your proposal is better, but not always. What if you know that 
you won't add more data to events. Or more precisely, you know that what 
identify an event won't change. What if you decided to model it with a 
(timestamp, category, sub-category, eventId) composite not as a way to feed 
data into the column key, but because this correspond to how you want to query 
the data (which I would say is a very cassandra-ish way to model).

Let's take an example. The data for the (timestamp, category, sub-category, 
eventId) composite (for some key) could look like (on disk):
{noformat}
  ts1:catA:subcatA:id1 -> 
  ts1:catA:subcatA:id2 -> 
  ts1:catA:subcatA:id3 -> 
  ts1:catA:subcatA:id4 -> 
  ts1:catA:subcatB:id5 -> 
  ts1:catA:subcatB:id6 -> 
  ts1:catB:subcatA:id7 -> 
  ts1:catB:subcatA:id8 -> 
  
{noformat}
And say that value is some opaque bytes representing some event data.

Now, I'm not even sure how you model the same thing with your proposal, but I'm 
pretty sure it will involve indirections (or duplication), I doubt it will be 
more user friendly and you will need more than one query (I would have said 3 
queries at first but after trying to see how it would look like I'm not even 
sure I see where you would put the value in your proposal) to do query like:
  * give me all the events (eventid and value) for (ts1, catA, subcatA)
  * give me all the events (eventid and value) for (ts1, catA)
  * give me all the events (eventid and value) for ts1
because the events would not be ordered correctly.

The kind of modeling you propose would make sense if the  for an event 
above was not opaque but composed of a number of property. They yes, I may 
would want to model things as:
{noformat}
  ts1:catA:subcatA:id1:prop1 -> 
  ts1:catA:subcatA:id1:prop2 -> 
  ts1:catA:subcatA:id1:prop3 -> 
  ts1:catA:subcatA:id2:prop1 -> 
  ts1:catA:subcatA:id2:prop2 -> 
  ts1:catA:subcatA:id2:prop3 -> 
  ts1:catA:subcatA:id3:prop1 -> 
  ts1:catA:subcatA:id3:prop2 -> 
  ts1:catA:subcatA:id3:prop3 -> 
  ...
{noformat}
because that doesn't screw up with the sorting I'm trying to impose (and that 
correspond to my queries). And btw, prop1 could 'category' (though that would 
be redundant in that case). But there is two different thing:
  # the first part of the key (ts1:catA:subcatA:id1) is the key to my object. 
It is what makes the ordering corresponding to my queries.
  # the last component (prop1, ...) is just the way to express the different 
properties of my object (and just a way to emulate super columns after all).

So I guess what I'm arguing here is just to not forget the case where you use 
CompositeType because your column key do is intrinsically composed of multiple 
parts. Because it *is* very useful.

  was (Author: slebresne):
bq. A more Cassandra-ish way to model this would be to encode this as a 
series of columns: (, 'category', ), (, 
'subcategory', ), (, 'event', ). This is 
better in the general case for the same reason that a sparse top-level set of 
columns is better: I can easily add more data to events (e.g., "source") 
without rewriting existing events.

But my point is: I disagree with that claim.

Maybe sometime your proposal is better, but not always. What if you know that 
you won't add more data to events. Or more precisely, you know that what 
identify an event won't change. What if you decided to model it with a 
(timestamp, category, sub-category, eventId) composite not as a way to feed 
data into the column key, but because this correspond to how you want to query 
the data (which I would say is a very cassandra-ish way to model).

Let's take an example. The data for the (timestamp, category, sub-category, 
eventId) composite (for some key) could look like (on disk):
{noformat}
  ts1:catA:subcatA:id1 -> 
  ts1:catA:subcatA:id2 -> 
  ts1:catA:subcatA:id3 -> 
  ts1:catA:subcatA:id4 -> 
  ts1:catA:subcatB:id5 -> 
  ts1:catA:subcatB:id6 -> 
  ts1:catB:subcatA:id7 -> 
  ts1:catB:subcatA:id8 -> 
  
{noformat}
And say that value is some opaque bytes representing some event data.

Now, I'm not even sure how you model the same thing with your proposal, but I'm 
pretty sure it will involve indirections (or duplication), I doubt it will be 
more user friendly and you will need more tha

[jira] [Issue Comment Edited] (CASSANDRA-2474) CQL support for compound columns

2011-09-05 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13097242#comment-13097242
 ] 

Jonathan Ellis edited comment on CASSANDRA-2474 at 9/5/11 6:50 PM:
---

bq. composite/super columns won't originally play nice with SQL syntax because 
it wasn't designed to query hierarchical data

That's exactly what "transposition" solves -- taking a horizontal slice and 
"turning it sideways" into a resultset with the same set of columns.

We are NOT trying to solve a more generic hierarchical data problem -- all the 
("leaf") data we select has to be at the same "level" in the hierarchy. 

bq. if we have 10 subcolumns do I need to list them all using component syntax

You will if you are using the "dense" format.  And let's be clear: this is NOT 
the recommended way to do things, because it is fragile, as described above.  
We want to support it [dense], but making it beautiful is not our goal.  Sparse 
encoding will be the recommended practice.

bq. it lacks scoping therefore on the big queries it will be hard to read, e.g. 
"SELECT component1 AS tweet_id, component2 AS username, body, location, age, 
value"

I don't understand, that seems perfectly readable to me.

bq. SELECT name AS (tweet_id, username | body | location | age), value AS body

This syntax is not viable for the reasons given in my previous comment.  I'm 
happy to entertain other alternatives to the component syntax but there's no 
need to spend further time on this one.

  was (Author: jbellis):
bq. composite/super columns won't originally play nice with SQL syntax 
because it wasn't designed to query hierarchical data

That's exactly what "transposition" solves -- taking a horizontal slice and 
"turning it sideways" into a resultset with the same set of columns.

We are NOT trying to solve a more generic hierarchical data problem -- all the 
("leaf") data we select has to be at the same "level" in the hierarchy. 

bq. if we have 10 subcolumns do I need to list them all using component syntax

You will if you are using the "dense" format.  And let's be clear: this is NOT 
the recommended way to do things, because it is fragile, as described above.  
We want to support it, but making it beautiful is not our goal.

bq. it lacks scoping therefore on the big queries it will be hard to read, e.g. 
"SELECT component1 AS tweet_id, component2 AS username, body, location, age, 
value"

I don't understand, that seems perfectly readable to me.

bq. SELECT name AS (tweet_id, username | body | location | age), value AS body

This syntax is not viable for the reasons given in my previous comment.  I'm 
happy to entertain other alternatives to the component syntax but there's no 
need to spend further time on this one.
  
> CQL support for compound columns
> 
>
> Key: CASSANDRA-2474
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2474
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: API, Core
>Reporter: Eric Evans
>Assignee: Pavel Yaskevich
>  Labels: cql
> Fix For: 1.0
>
> Attachments: screenshot-1.jpg, screenshot-2.jpg
>
>
> For the most part, this boils down to supporting the specification of 
> compound column names (the CQL syntax is colon-delimted terms), and then 
> teaching the decoders (drivers) to create structures from the results.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Issue Comment Edited] (CASSANDRA-2474) CQL support for compound columns

2011-09-05 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13097242#comment-13097242
 ] 

Jonathan Ellis edited comment on CASSANDRA-2474 at 9/5/11 6:50 PM:
---

bq. composite/super columns won't originally play nice with SQL syntax because 
it wasn't designed to query hierarchical data

That's exactly what "transposition" solves -- taking a horizontal slice and 
"turning it sideways" into a resultset with the same set of columns.

We are NOT trying to solve a more generic hierarchical data problem -- all the 
("leaf") data we select has to be at the same "level" in the hierarchy. 

bq. if we have 10 subcolumns do I need to list them all using component syntax

You will if you are using the "dense" format.  And let's be clear: this is NOT 
the recommended way to do things, because it is fragile, as described above.  
We want to support it, but making it beautiful is not our goal.

bq. it lacks scoping therefore on the big queries it will be hard to read, e.g. 
"SELECT component1 AS tweet_id, component2 AS username, body, location, age, 
value"

I don't understand, that seems perfectly readable to me.

bq. SELECT name AS (tweet_id, username | body | location | age), value AS body

This syntax is not viable for the reasons given in my previous comment.  I'm 
happy to entertain other alternatives to the component syntax but there's no 
need to spend further time on this one.

  was (Author: jbellis):
bq. composite/super columns won't originally play nice with SQL syntax 
because it wasn't designed to query hierarchical data

That's exactly what "transposition" solves -- taking a horizontal slice and 
"turning it sideways" into a resultset with the same set of columns.

We are NOT trying to solve a more generic hierarchical data problem -- all the 
("leaf") data we select has to be at the same "level" in the hierarchy. 

bq. if we have 10 subcolumns do I need to list them all using component syntax

You will if you are using the "dense" format.  And let's be clear: this is NOT 
the recommended way to do things, because it is fragile, as described above.  
We want to support it, but making it beautiful is not our goal.

bq. will potentially be hard to put into grammar because it can have ambiguous 
rules again because lack of scoping

bq. it lacks scoping therefore on the big queries it will be hard to read
e.g. "SELECT component1 AS tweet_id, component2 AS username, body, location, 
age, value"

I don't understand, that seems perfectly readable to me.

bq. SELECT name AS (tweet_id, username | body | location | age), value AS body

This syntax is not viable for the reasons given in my previous comment.  I'm 
happy to entertain other alternatives to the component syntax but there's no 
need to spend further time on this one.
  
> CQL support for compound columns
> 
>
> Key: CASSANDRA-2474
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2474
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: API, Core
>Reporter: Eric Evans
>Assignee: Pavel Yaskevich
>  Labels: cql
> Fix For: 1.0
>
> Attachments: screenshot-1.jpg, screenshot-2.jpg
>
>
> For the most part, this boils down to supporting the specification of 
> compound column names (the CQL syntax is colon-delimted terms), and then 
> teaching the decoders (drivers) to create structures from the results.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Issue Comment Edited] (CASSANDRA-2474) CQL support for compound columns

2011-09-04 Thread Pavel Yaskevich (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13096842#comment-13096842
 ] 

Pavel Yaskevich edited comment on CASSANDRA-2474 at 9/4/11 9:57 AM:


bq. Remember that the ideal for CQL is to have "SELECT x, y, z" and get back 
exactly columns x, y, and z.

composite/super columns won't originally play nice with SQL syntax because it 
wasn't designed to query hierarchical data.

Few problems I have with componentX syntax:

 - if we have 10 subcolumns do I need to list them all using component syntax 
(which would be totally unreadable)?
 - it lacks scoping therefore on the big queries it will be hard to read
   e.g.
{noformat}
SELECT component1 AS tweet_id, component2 AS username, body, location, age, 
value AS body
{noformat}
 - will potentially be hard to put into grammar because it can have ambiguous 
rules again because lack of scoping
 - why should we force users to actually give each component a number? 

And I don't get why do you think that (..,..,..) is a "rocket science" syntax:

If we presume that user should be familiar with composite type columns before 
start using the syntax then he will know what does each section (separated by 
",") mean:

{noformat}
SELECT name AS (tweet_id, username, location), value AS body
{noformat}

means that we have three sections as column name which we are aliasing to 
tweet_id, username, location

{noformat}
SELECT name AS (tweet_id, username | body | location | age), value AS body
{noformat}

means that we have two components in the name: first one - tweet_id, and second 
component that has multiple meanings but we only want to get username, body, 
location

{noformat}
SELECT name AS (tweet_id, *), value AS body
{noformat}

means that we still have two components in the column name but we don't care 
what holds component #2 and we expect result set to return all of the possible 
values.
  

  was (Author: xedin):
bq. Remember that the ideal for CQL is to have "SELECT x, y, z" and get 
back exactly columns x, y, and z.

composite/super columns won't originally play nice with SQL syntax because it 
wasn't designed to query hierarchical data.

Few problems I have with componentX syntax:

 - if we have 10 subcolumns do I need to list them all using component syntax 
(which would be totally unreadable)?
 - it lacks scoping therefore on the big queries it will be hard to read
   e.g.
   {noformat}
  SELECT component1 AS tweet_id, component2 AS username, body, location, 
age, value AS body
   {noformat}
 - will potentially be hard to put into grammar because it can have ambiguous 
rules again because lack of scoping
 - why should we force users to actually give each component a number? 

And I don't get why do you think that (..,..,..) is a "rocket science" syntax:

If we presume that user should be familiar with composite type columns before 
start using the syntax then he will know what does each section (separated by 
",") mean:

{noformat}
SELECT name AS (tweet_id, username, location), value AS body
{noformat}

means that we have three sections as column name which we are aliasing to 
tweet_id, username, location

{noformat}
SELECT name AS (tweet_id, username | body | location | age), value AS body
{noformat}

means that we have two components in the name: first one - tweet_id, and second 
component that has multiple meanings but we only want to get username, body, 
location

{noformat}
SELECT name AS (tweet_id, *), value AS body

means that we still have two components in the column name but we don't care 
what holds component #2 and we expect result set to return all of the possible 
values.
  
  
> CQL support for compound columns
> 
>
> Key: CASSANDRA-2474
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2474
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: API, Core
>Reporter: Eric Evans
>Assignee: Pavel Yaskevich
>  Labels: cql
> Fix For: 1.0
>
> Attachments: screenshot-1.jpg, screenshot-2.jpg
>
>
> For the most part, this boils down to supporting the specification of 
> compound column names (the CQL syntax is colon-delimted terms), and then 
> teaching the decoders (drivers) to create structures from the results.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Issue Comment Edited] (CASSANDRA-2474) CQL support for compound columns

2011-09-02 Thread Pavel Yaskevich (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13095913#comment-13095913
 ] 

Pavel Yaskevich edited comment on CASSANDRA-2474 at 9/2/11 11:16 AM:
-

bq. We already have the transposition as a visual "hint" as to what is going 
on. I don't think it's difficult to say that "you need to use componentX when 
using transposition," and :colon tokens are usually used to represent bind 
variables in SQL drivers.

I didn't use ":transposed" keyword any where but I'm file using it instead of 
:colon tokens. I don't like "componentX" notation because we can simply 
preserve ordering of the arguments to make the same thing which is much clearer 
syntax on my opinion

bq. Maybe it is better if we will support (..,..,..) notation for composite 
columns

Can be used as a replacement for "componentX" 

bq. Sorry, I don't follow.

You wrote:

bq. If I wanted to add a location tweet field, I'd need to delete (e.g.) the 
2e1c3308,cscotta column and replace it with 2e1c3308,cscotta,. So, 
generally not recommended... but the composite column spec implies we should 
support it.) Which brings us to: how to deal with insert/update in composite 
columns.

I mean when we want to extend "2e1c3308,cscotta" with  we can simply 
update old column's name to "2e1c3308,cscotta,", for querying we can 
use "1/0/-1" as <'end-of-component' byte> from CompositeType doc when we want 
to search by specific components of the column name.

e.g. (if I understand CompositeType annotation correctly)

{noformat}
SELECT name AS (tweet_id, username), value AS body
FROM timeline
WHERE tweet_id = '95a789a' AND user_id = 'cscotta'
{noformat} 

start = "<7>95a789a.getBytes()<0><7>cscotta.getBytes()<0>"
end   = "<7>95a789a.getBytes()<0><7>cscotta.getBytes()<1>"

Which should give as *all* columns where column name starts with 95a789a,cscotta
 
{noformat}
SELECT name AS (tweet_id, username, location), value AS body
FROM timeline
WHERE tweet_id = '95a789a' AND user_id = 'cscotta' AND location = 'USA'
{noformat}

start = "<7>95a789a.getBytes()<0><7>cscotta.getBytes()<0><3>USA.getBytes()<0>"
end   = "<7>95a789a.getBytes()<0><7>cscotta.getBytes()<0><3>USA.getBytes()<1>"

Which should give as *all* columns where column name starts with 
95a789a,cscotta,USA




  was (Author: xedin):
bq. We already have the transposition as a visual "hint" as to what is 
going on. I don't think it's difficult to say that "you need to use componentX 
when using transposition," and :colon tokens are usually used to represent bind 
variables in SQL drivers.

I didn't use ":transposed" keyword any where but I'm file using it instead of 
:colon tokens. I don't like "componentX" notation because we can simply 
preserve ordering of the arguments to make the same thing which is much clearer 
syntax on my opinion

bq. Maybe it is better if we will support (..,..,..) notation for composite 
columns

Can be used as a replacement for "componentX" 

bq. Sorry, I don't follow.

You wrote:

bq. If I wanted to add a location tweet field, I'd need to delete (e.g.) the 
2e1c3308,cscotta column and replace it with 2e1c3308,cscotta,. So, 
generally not recommended... but the composite column spec implies we should 
support it.) Which brings us to: how to deal with insert/update in composite 
columns.

I mean when we want to extend "2e1c3308,cscotta" with  we can simply 
update old column's name to "2e1c3308,cscotta,", for querying we can 
use "1/0/-1" as <'end-of-component' byte> from CompositeType doc when we want 
to search by specific components of the column name.

e.g. (if I understand CompositeType annotation correctly)

{noformat}
SELECT name AS (tweet_id, username), value AS body
FROM timeline
WHERE tweet_id = '95a789a' AND user_id = 'cscotta'
{noformat} 

start = "<7>95a789a.getBytes()<0><7>cscotta.getBytes()<0>"
end   = "<7>95a789a.getBytes()<0><7>cscotta.getBytes()<1>"

Which should give as *all* columns where column name starts with 95a789a,cscotta
 
{noformat}
SELECT name AS (tweet_id, username, location), value AS body
FROM timeline
WHERE tweet_id = '95a789a' AND user_id = 'cscotta' AND location = 'USA'
{noformat}

start = "<7>95a789a.getBytes()<0><7>cscotta.getBytes()<0><3>USA.getBytes()<0>"
end   = "<7>95a789a.getBytes()<0><7>cscotta.getBytes()<1><3>USA.getBytes()<1>"

Which should give as *all* columns where column name starts with 
95a789a,cscotta,USA



  
> CQL support for compound columns
> 
>
> Key: CASSANDRA-2474
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2474
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: API, Core
>Reporter: Eric Evans
>Assignee: Pavel Yaskevich
>  Labels: cql
> Fix For: 1.0
>
> Attachments: screenshot-1.jpg, scree

[jira] [Issue Comment Edited] (CASSANDRA-2474) CQL support for compound columns

2011-09-01 Thread Pavel Yaskevich (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13095653#comment-13095653
 ] 

Pavel Yaskevich edited comment on CASSANDRA-2474 at 9/1/11 10:47 PM:
-

we can extend that syntax like this to support multiple columns:

for composite columns:

{noformat}
SELECT :column as (date, company), :value as price FROM stockhist WHERE 
='GOOG' AND date >= '2011-01-02' LIMIT 50
{noformat}

for super columns:

{noformat}
SELECT :parent as company, :column as (date, ), :value as (price, 
) FROM stockhist WHERE company='GOOG' AND date >= '2011-01-02' AND 
 = '...' LIMIT 50
{noformat}

Also instead of :column we could use just :name, it might be even cleaner.

  was (Author: xedin):
we can extend that syntax like this to support multiple columns:

{noformat}
SELECT :column as (date, company), :value as (price, name) FROM stockhist WHERE 
='GOOG' AND date >= '2011-01-02' LIMIT 50
{noformat}

{noformat}
SELECT :parent as company, :column as (date, ), :value as (price, 
) FROM stockhist WHERE company='GOOG' AND date >= '2011-01-02' AND 
 = '...' LIMIT 50
{noformat}

Also instead of :column we could use just :name, it might be even cleaner.
  
> CQL support for compound columns
> 
>
> Key: CASSANDRA-2474
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2474
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: API, Core
>Reporter: Eric Evans
>Assignee: Pavel Yaskevich
>  Labels: cql
> Fix For: 1.0
>
> Attachments: screenshot-1.jpg, screenshot-2.jpg
>
>
> For the most part, this boils down to supporting the specification of 
> compound column names (the CQL syntax is colon-delimted terms), and then 
> teaching the decoders (drivers) to create structures from the results.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Issue Comment Edited] (CASSANDRA-2474) CQL support for compound columns

2011-09-01 Thread Pavel Yaskevich (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13095653#comment-13095653
 ] 

Pavel Yaskevich edited comment on CASSANDRA-2474 at 9/1/11 10:44 PM:
-

we can extend that syntax like this to support multiple columns:

{noformat}
SELECT :column as (date, company), :value as (price, name) FROM stockhist WHERE 
='GOOG' AND date >= '2011-01-02' LIMIT 50
{noformat}

{noformat}
SELECT :parent as company, :column as (date, ), :value as (price, 
) FROM stockhist WHERE company='GOOG' AND date >= '2011-01-02' AND 
 = '...' LIMIT 50
{noformat}

Also instead of :column we could use just :name, it might be even cleaner.

  was (Author: xedin):
we can extend that syntax like this to support multiple columns:

{noformat}
SELECT :column as (date, company), :value as (price, name) FROM stockhist WHERE 
='GOOG' AND date >= '2011-01-02' LIMIT 50
{noformat}

{noformat}
SELECT :parent as company, :column as (date, ), :value as (price, 
) FROM stockhist WHERE company='GOOG' AND date >= '2011-01-02' AND 
 = '...' LIMIT 50
{noformat}
  
> CQL support for compound columns
> 
>
> Key: CASSANDRA-2474
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2474
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: API, Core
>Reporter: Eric Evans
>Assignee: Pavel Yaskevich
>  Labels: cql
> Fix For: 1.0
>
> Attachments: screenshot-1.jpg, screenshot-2.jpg
>
>
> For the most part, this boils down to supporting the specification of 
> compound column names (the CQL syntax is colon-delimted terms), and then 
> teaching the decoders (drivers) to create structures from the results.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Issue Comment Edited] (CASSANDRA-2474) CQL support for compound columns

2011-09-01 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13095618#comment-13095618
 ] 

Jonathan Ellis edited comment on CASSANDRA-2474 at 9/1/11 9:31 PM:
---

Illustration time.  (See: screenshot-1.jpg)

We have a "wide row" of stock histories.  Just "normal" columns, no composites 
or supercolumns.

We could use transposition syntax to get these as "normal" (not sparse) 
resultset:

{code}
SELECT date, price
FROM stockhist
WHERE symbol='GOOG'
  AND date >= '2011-01-02'
LIMIT 50
{code}

Which would yield
{noformat}
dateprice
2011-01-02  $75.23
2011-01-03  $82.11
{noformat}

As mentioned above, we need to figure out how to allow adding the date/price 
aliases to refer to column name/value, and extend that to the more general 
composite column case.

  was (Author: jbellis):
Illustration time.  (See: screenshot-1.jpg)

We have a "wide row" of stock histories.  Just "normal" columns, no composites 
or supercolumns.

We could use transposition syntax to get these as "normal" (not sparse) 
resultset:

{code}
SELECT date, value
FROM stockhist
WHERE symbol='GOOG'
  AND date >= '2011-01-02'
LIMIT 50
{code}

Which would yield
{noformat}
datevalue
2011-01-02  $75.23
2011-01-03  $82.11
{noformat}
  
> CQL support for compound columns
> 
>
> Key: CASSANDRA-2474
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2474
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: API, Core
>Reporter: Eric Evans
>Assignee: Pavel Yaskevich
>  Labels: cql
> Fix For: 1.0
>
> Attachments: screenshot-1.jpg
>
>
> For the most part, this boils down to supporting the specification of 
> compound column names (the CQL syntax is colon-delimted terms), and then 
> teaching the decoders (drivers) to create structures from the results.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Issue Comment Edited] (CASSANDRA-2474) CQL support for compound columns

2011-08-22 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13088958#comment-13088958
 ] 

Jonathan Ellis edited comment on CASSANDRA-2474 at 8/22/11 7:44 PM:


bq. Rather than trying to support all this flexibility on the fly in Hive, why 
don't we let users create transposed projections in the DML?

I like that idea, as the next best thing to finding a syntax that works well 
for both Hive and CQL.

So for CQL, I think we're close with the "transposed" idea.  As explained 
above, this allows using "'real' WHERE clauses [so] we get all the benefits of 
regularity that implies."

I did note one problem with the transposed(x, y, z) syntax, though: you lose 
the ability to compose with other functions in the SELECT clause.  Currently 
that just means count(), but it's conceivable that we'd add scalar functions 
like sqrt(), or even aggregation like sum().

So I propose adding ".transposed" to the table name instead:

{noformat}
SELECT tweet_id, username, timestamp
FROM timeline.transposed 
WHERE user_id='jbellis'
LIMIT 50
{noformat}


  was (Author: jbellis):
bq. Rather than trying to support all this flexibility on the fly in Hive, 
why don't we let users create transposed projections in the DML?

I like that idea, as the next best thing to finding a syntax that works well 
for both Hive and CQL.

So for CQL, I think we're close with the "transposed" idea.  As explained 
above, "we have "real" WHERE clauses we get all the benefits of regularity that 
implies."

I did note one problem with the transposed(x, y, z) syntax, though: you lose 
the ability to compose with other functions in the SELECT clause.  Currently 
that just means count(), but it's conceivable that we'd add scalar functions 
like sqrt(), or even aggregation like sum().

So I propose adding ".transposed" to the table name instead:

{noformat}
SELECT tweet_id, username, timestamp
FROM timeline.transposed 
WHERE user_id='jbellis'
LIMIT 50
{noformat}

  
> CQL support for compound columns
> 
>
> Key: CASSANDRA-2474
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2474
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: API, Core
>Reporter: Eric Evans
>  Labels: cql
> Fix For: 1.0
>
>
> For the most part, this boils down to supporting the specification of 
> compound column names (the CQL syntax is colon-delimted terms), and then 
> teaching the decoders (drivers) to create structures from the results.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Issue Comment Edited] (CASSANDRA-2474) CQL support for compound columns

2011-08-22 Thread T Jake Luciani (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13088738#comment-13088738
 ] 

T Jake Luciani edited comment on CASSANDRA-2474 at 8/22/11 2:49 PM:


Rather than trying to support all this flexibility on the fly in Hive, why 
don't we let users create transposed projections in the DML?

I think this is reasonable vs CQL language because CQL is the source of truth 
for the DML where-as Hive refers to the meta info explicitly.

Example for timeseries supercol wide rows:

{code}
CREATE EXTERNAL TABLE timeline(user_id string, tweet_id long, username string, 
timestamp long)
  STORED BY 'org.apache.hadoop.hive.cassandra.CassandraStorageHandler'
  WITH SERDEPROPERTIES ( "cassandra.columns.mapping" = 
":key,:supercol,:subcol:uname,:subcol:ts")
{code}

:key, :supercol, :subcol are reserved keywords

Example for composite/supercolumn level:

{code}
CREATE EXTERNAL TABLE retweets(user_id string, tweet_id long, retweet_id long)  
  STORED BY 'org.apache.hadoop.hive.cassandra.CassandraStorageHandler'
  WITH SERDEPROPERTIES ( "cassandra.columns.mapping" = 
":key,retweet:component1,retweet:component2)
{code}

This means you might have to create many projections on the source data in Hive 
to support your analysis but this feels more OLAP than the CQL transpose on the 
fly approach.

Also, we could still support the transpose() function syntax using a 
passthrough UTF so the same syntax works across CQL and HiveQL


  was (Author: tjake):
Rather than trying to support all this flexibility on the fly in Hive, why 
don't we let users create transposed projections in the DML?

I think this is reasonable vs CQL language because CQL is the source of truth 
for the DML where-as Hive refers to the meta info explicitly.

Example for timeseries supercol wide rows:

{code}
CREATE EXTERNAL TABLE timeline(user_id string, tweet_id long, username string, 
timestamp long)
  STORED BY 'org.apache.hadoop.hive.cassandra.CassandraStorageHandler'
  WITH SERDEPROPERTIES ( "cassandra.columns.mapping" = 
":key,:supercol,:subcol:uname,:subcol:ts")
{code}

:key, :supercol, :subcol are reserved keywords

Example for composite/supercolumn level:

{code}
CREATE EXTERNAL TABLE retweets(user_id string, tweet_id long, retweet_id long)  
  STORED BY 'org.apache.hadoop.hive.cassandra.CassandraStorageHandler'
  WITH SERDEPROPERTIES ( "cassandra.columns.mapping" = 
":key,:component1,:component2)
{code}

This means you might have to create many projections on the source data in Hive 
to support your analysis but this feels more OLAP than the CQL transpose on the 
fly approach.

Also, we could still support the transpose() function syntax using a 
passthrough UTF so the same syntax works across CQL and HiveQL

  
> CQL support for compound columns
> 
>
> Key: CASSANDRA-2474
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2474
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: API, Core
>Reporter: Eric Evans
>  Labels: cql
> Fix For: 1.0
>
>
> For the most part, this boils down to supporting the specification of 
> compound column names (the CQL syntax is colon-delimted terms), and then 
> teaching the decoders (drivers) to create structures from the results.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Issue Comment Edited] (CASSANDRA-2474) CQL support for compound columns

2011-08-22 Thread T Jake Luciani (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13088738#comment-13088738
 ] 

T Jake Luciani edited comment on CASSANDRA-2474 at 8/22/11 2:48 PM:


Rather than trying to support all this flexibility on the fly in Hive, why 
don't we let users create transposed projections in the DML?

I think this is reasonable vs CQL language because CQL is the source of truth 
for the DML where-as Hive refers to the meta info explicitly.

Example for timeseries supercol wide rows:

{code}
CREATE EXTERNAL TABLE timeline(user_id string, tweet_id long, username string, 
timestamp long)
  STORED BY 'org.apache.hadoop.hive.cassandra.CassandraStorageHandler'
  WITH SERDEPROPERTIES ( "cassandra.columns.mapping" = 
":key,:supercol,:subcol:uname,:subcol:ts")
{code}

:key, :supercol, :subcol are reserved keywords

Example for composite/supercolumn level:

{code}
CREATE EXTERNAL TABLE retweets(user_id string, tweet_id long, retweet_id long)  
  STORED BY 'org.apache.hadoop.hive.cassandra.CassandraStorageHandler'
  WITH SERDEPROPERTIES ( "cassandra.columns.mapping" = 
":key,:component1,:component2)
{code}

This means you might have to create many projections on the source data in Hive 
to support your analysis but this feels more OLAP than the CQL transpose on the 
fly approach.

Also, we could still support the transpose() function syntax using a 
passthrough UTF so the same syntax works across CQL and HiveQL


  was (Author: tjake):
Rather than trying to support all this flexibility on the fly in Hive, why 
don't we let users create transposed projections in the DML?

I think this is reasonable vs CQL language because CQL is the source of truth 
for the DDL where-as Hive refers to the meta info explicitly.

Example for timeseries supercol wide rows:

{code}
CREATE EXTERNAL TABLE timeline(user_id string, tweet_id long, username string, 
timestamp long)
  STORED BY 'org.apache.hadoop.hive.cassandra.CassandraStorageHandler'
  WITH SERDEPROPERTIES ( "cassandra.columns.mapping" = 
":key,:supercol,:subcol:uname,:subcol:ts")
{code}

:key, :supercol, :subcol are reserved keywords

Example for composite/supercolumn level:

{code}
CREATE EXTERNAL TABLE retweets(user_id string, tweet_id long, retweet_id long)  
  STORED BY 'org.apache.hadoop.hive.cassandra.CassandraStorageHandler'
  WITH SERDEPROPERTIES ( "cassandra.columns.mapping" = 
":key,:component1,:component2)
{code}

This means you might have to create many projections on the source data in Hive 
to support your analysis but this feels more OLAP than the CQL transpose on the 
fly approach.

Also, we could still support the transpose() function syntax using a 
passthrough UTF so the same syntax works across CQL and HiveQL

  
> CQL support for compound columns
> 
>
> Key: CASSANDRA-2474
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2474
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: API, Core
>Reporter: Eric Evans
>  Labels: cql
> Fix For: 1.0
>
>
> For the most part, this boils down to supporting the specification of 
> compound column names (the CQL syntax is colon-delimted terms), and then 
> teaching the decoders (drivers) to create structures from the results.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Issue Comment Edited] (CASSANDRA-2474) CQL support for compound columns

2011-08-19 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13087831#comment-13087831
 ] 

Jonathan Ellis edited comment on CASSANDRA-2474 at 8/19/11 5:48 PM:


That would come from the schema, similar to how we have key_alias.

(Implementation TBD, especially for "dynamic composite columns."  For those we 
might just have to reserve component1, component2, etc. names)

  was (Author: jbellis):
That would come from the schema, similar to how we have key_alias.

(Implementation TBD, especially for "dynamic composite columns.")
  
> CQL support for compound columns
> 
>
> Key: CASSANDRA-2474
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2474
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: API, Core
>Reporter: Eric Evans
>  Labels: cql
> Fix For: 1.0
>
>
> For the most part, this boils down to supporting the specification of 
> compound column names (the CQL syntax is colon-delimted terms), and then 
> teaching the decoders (drivers) to create structures from the results.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Issue Comment Edited] (CASSANDRA-2474) CQL support for compound columns

2011-08-19 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13087831#comment-13087831
 ] 

Jonathan Ellis edited comment on CASSANDRA-2474 at 8/19/11 5:47 PM:


That would come from the schema, similar to how we have key_alias.

(Implementation TBD, especially for "dynamic composite columns.")

  was (Author: jbellis):
That would come from the schema.
  
> CQL support for compound columns
> 
>
> Key: CASSANDRA-2474
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2474
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: API, Core
>Reporter: Eric Evans
>  Labels: cql
> Fix For: 1.0
>
>
> For the most part, this boils down to supporting the specification of 
> compound column names (the CQL syntax is colon-delimted terms), and then 
> teaching the decoders (drivers) to create structures from the results.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Issue Comment Edited] (CASSANDRA-2474) CQL support for compound columns

2011-08-19 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13087765#comment-13087765
 ] 

Jonathan Ellis edited comment on CASSANDRA-2474 at 8/19/11 3:36 PM:


I need to back up.  Neither transposed() nor transposed(column) gives us the 
power we need, unless we layer subqueries on top.  Let me use a more complete 
example to illustrate why.

Suppose we have a Twissandra data model where the timeline storage is done 
"properly" in supercolumns.  Then we might want to say, "give me the most 
recent 50 tweets in user X's timeline:"

{code}
SELECT transposed(*) FROM timeline WHERE key=X LIMIT 50
{code}

So far so good.  (Note again how this obviates the need for the existing, 
cumbersome "column limit" syntax for ".." slices.)

But what if we just want, say, the username and timestamp from each entry?

{code}
SELECT transposed(username, timestamp) FROM timeline WHERE key=X LIMIT 50
{code}

That's where I was going with the multiple parameters to transposed().

Now, for the WHERE clause components... I guess for metastore-less Hive that's 
ok.  Or even, as you originally suggested, just leave it out and say "you can 
only do full-CF scans without a C* metastore."  I'm okay with that.

But for CQL we need to be able to give meaningful names to things, e.g., a row 
full of supercolumns representing tweets could be "tweet_id" instead of 
"supercolumn."  So it would be nice to support this somehow in "full metastore 
Hive."

This is important because we want to generalize beyond the current "one level 
of nesting" you get with supercolumns, to arbitrary with composite columns.  So,

{code}
SELECT transposed(tweet_id, -- supercolumn
  username, timestamp) -- subcolumns
FROM timeline 
WHERE user_id='jbellis' -- key
LIMIT 50
{code}

Another example.  Suppose I have a "retweets" CF, where I have a composite 
columns of (tweet_id, retweet_id).  (You can think of this as simply 
supercolumn/subcolumn, the difference being that composites won't deserialize 
the entire parent container to fetch a single subcolumn or subcolumn slice.)  
And I want the most recent retweets, for a given tweet:

{code}
SELECT transposed(retweet_id) -- second-level column name / subcolumn
FROM timeline 
WHERE tweet_id=1b09d795-f038-4cf9-a996-5678fb092c3a -- first-level column name 
/ supercolumn
  AND user_id='jbellis' -- key
LIMIT 50
{code}


  was (Author: jbellis):
I need to back up.  Neither transposed() nor transposed(column) gives us 
the power we need, unless we layer subqueries on top.  Let me use a more 
complete example to illustrate why.

Suppose we have a Twissandra data model where the timeline storage is done 
"properly" in supercolumns.  Then we might want to say, "give me the most 
recent 50 tweets in user X's timeline:"

{code}
SELECT transposed(*) FROM foo WHERE key=X LIMIT 50
{code}

So far so good.  (Note again how this obviates the need for the existing, 
cumbersome "column limit" syntax for ".." slices.)

But what if we just want, say, the username and timestamp from each entry?

{code}
SELECT transposed(username, timestamp) FROM foo WHERE key=X LIMIT 50
{code}

That's where I was going with the multiple parameters to transposed().

Now, for the WHERE clause components... I guess for metastore-less Hive that's 
ok.  Or even, as you originally suggested, just leave it out and say "you can 
only do full-CF scans without a C* metastore."  I'm okay with that.

But for CQL we need to be able to give meaningful names to things, e.g., a row 
full of supercolumns representing tweets could be "tweet_id" instead of 
"supercolumn."  So it would be nice to support this somehow in "full metastore 
Hive."

This is important because we want to generalize beyond the current "one level 
of nesting" you get with supercolumns, to arbitrary with composite columns.  So,

{code}
SELECT transposed(tweet_id, -- supercolumn
  username, timestamp) -- subcolumns
FROM timeline 
WHERE user_id='jbellis' -- key
LIMIT 50
{code}

Another example.  Suppose I have a "retweets" CF, where I have a composite 
columns of (tweet_id, retweet_id).  (You can think of this as simply 
supercolumn/subcolumn, the difference being that composites won't deserialize 
the entire parent container to fetch a single subcolumn or subcolumn slice.)  
And I want the most recent retweets, for a given tweet:

{code}
SELECT transposed(retweet_id) -- second-level column name / subcolumn
FROM timeline 
WHERE tweet_id=1b09d795-f038-4cf9-a996-5678fb092c3a -- first-level column name 
/ supercolumn
  AND user_id='jbellis' -- key
LIMIT 50
{code}

  
> CQL support for compound columns
> 
>
> Key: CASSANDRA-2474
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2474
> Project: Cassandra
>  Issue Type: Sub-tas

[jira] [Issue Comment Edited] (CASSANDRA-2474) CQL support for compound columns

2011-08-19 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13087765#comment-13087765
 ] 

Jonathan Ellis edited comment on CASSANDRA-2474 at 8/19/11 3:36 PM:


I need to back up.  Neither transposed() nor transposed(column) gives us the 
power we need, unless we layer subqueries on top.  Let me use a more complete 
example to illustrate why.

Suppose we have a Twissandra data model where the timeline storage is done 
"properly" in supercolumns.  Then we might want to say, "give me the most 
recent 50 tweets in user X's timeline:"

{code}
SELECT transposed(*) FROM timeline WHERE key=X LIMIT 50
{code}

So far so good.  (Note again how this obviates the need for the existing, 
cumbersome "column limit" syntax for ".." slices.)

But what if we just want, say, the username and timestamp from each entry?

{code}
SELECT transposed(username, timestamp) FROM timeline WHERE key=X LIMIT 50
{code}

That's where I was going with the multiple parameters to transposed().

Now, for the WHERE clause components... I guess for metastore-less Hive that's 
ok.  Or even, as you originally suggested, just leave it out and say "you can 
only do full-CF scans without a C* metastore."  I'm okay with that.

But for CQL we need to be able to give meaningful names to things, e.g., a row 
full of supercolumns representing tweets could be "tweet_id" instead of 
"supercolumn."  So it would be nice to support this somehow in "full metastore 
Hive."

This is important because we want to generalize beyond the current "one level 
of nesting" you get with supercolumns, to arbitrary with composite columns.  So,

{code}
SELECT transposed(tweet_id, -- supercolumn
  username, timestamp) -- subcolumns
FROM timeline 
WHERE user_id='jbellis' -- key
LIMIT 50
{code}

Another example.  Suppose I have a "retweets" CF, where I have a composite 
columns of (tweet_id, retweet_id).  (You can think of this as simply 
supercolumn/subcolumn, the difference being that composites won't deserialize 
the entire parent container to fetch a single subcolumn or subcolumn slice.)  
And I want the most recent retweets, for a given tweet:

{code}
SELECT transposed(retweet_id) -- second-level column name / subcolumn
FROM retweets 
WHERE tweet_id=1b09d795-f038-4cf9-a996-5678fb092c3a -- first-level column name 
/ supercolumn
  AND user_id='jbellis' -- key
LIMIT 50
{code}


  was (Author: jbellis):
I need to back up.  Neither transposed() nor transposed(column) gives us 
the power we need, unless we layer subqueries on top.  Let me use a more 
complete example to illustrate why.

Suppose we have a Twissandra data model where the timeline storage is done 
"properly" in supercolumns.  Then we might want to say, "give me the most 
recent 50 tweets in user X's timeline:"

{code}
SELECT transposed(*) FROM timeline WHERE key=X LIMIT 50
{code}

So far so good.  (Note again how this obviates the need for the existing, 
cumbersome "column limit" syntax for ".." slices.)

But what if we just want, say, the username and timestamp from each entry?

{code}
SELECT transposed(username, timestamp) FROM timeline WHERE key=X LIMIT 50
{code}

That's where I was going with the multiple parameters to transposed().

Now, for the WHERE clause components... I guess for metastore-less Hive that's 
ok.  Or even, as you originally suggested, just leave it out and say "you can 
only do full-CF scans without a C* metastore."  I'm okay with that.

But for CQL we need to be able to give meaningful names to things, e.g., a row 
full of supercolumns representing tweets could be "tweet_id" instead of 
"supercolumn."  So it would be nice to support this somehow in "full metastore 
Hive."

This is important because we want to generalize beyond the current "one level 
of nesting" you get with supercolumns, to arbitrary with composite columns.  So,

{code}
SELECT transposed(tweet_id, -- supercolumn
  username, timestamp) -- subcolumns
FROM timeline 
WHERE user_id='jbellis' -- key
LIMIT 50
{code}

Another example.  Suppose I have a "retweets" CF, where I have a composite 
columns of (tweet_id, retweet_id).  (You can think of this as simply 
supercolumn/subcolumn, the difference being that composites won't deserialize 
the entire parent container to fetch a single subcolumn or subcolumn slice.)  
And I want the most recent retweets, for a given tweet:

{code}
SELECT transposed(retweet_id) -- second-level column name / subcolumn
FROM timeline 
WHERE tweet_id=1b09d795-f038-4cf9-a996-5678fb092c3a -- first-level column name 
/ supercolumn
  AND user_id='jbellis' -- key
LIMIT 50
{code}

  
> CQL support for compound columns
> 
>
> Key: CASSANDRA-2474
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2474
> Project: Cassandra
>  Issue Typ

[jira] [Issue Comment Edited] (CASSANDRA-2474) CQL support for compound columns

2011-08-19 Thread T Jake Luciani (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13087699#comment-13087699
 ] 

T Jake Luciani edited comment on CASSANDRA-2474 at 8/19/11 1:27 PM:


I think the best we can do with no major changes to HiveQL is:

{code}
  SELECT transposed(column) FROM foo WHERE key=X
{code}

This will return:
   rowkey, subcol1, subcol2 ...

You could then also do supercol slices like transposed(column, start, end)


Looking at hive you can make functions return many columns:

http://massapi.com/source/hive-0.5.0-dev/src/contrib/src/java/org/apache/hadoop/hive/contrib/udtf/example/GenericUDTFExplode2.java.html

  was (Author: tjake):
I think the best we can do with no major changes to HiveQL is:

{code}
  SELECT transposed(column) FROM foo WHERE key=X
{code}

This will return:
   rowkey, column, subcol, value

You could then also do supercol slices like transposed(column, start, end)


Looking at hive you can make functions return many columns:

http://massapi.com/source/hive-0.5.0-dev/src/contrib/src/java/org/apache/hadoop/hive/contrib/udtf/example/GenericUDTFExplode2.java.html
  
> CQL support for compound columns
> 
>
> Key: CASSANDRA-2474
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2474
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: API, Core
>Reporter: Eric Evans
>  Labels: cql
> Fix For: 1.0
>
>
> For the most part, this boils down to supporting the specification of 
> compound column names (the CQL syntax is colon-delimted terms), and then 
> teaching the decoders (drivers) to create structures from the results.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Issue Comment Edited] (CASSANDRA-2474) CQL support for compound columns

2011-06-10 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13047331#comment-13047331
 ] 

Jonathan Ellis edited comment on CASSANDRA-2474 at 6/10/11 6:22 PM:


bq. you can specify the following column mappings in cassandra.columns.mapping

Okay, I get it.  The reason there is no explanation of WHERE and resultsets is 
because these are not supported, for the reason deduced above in the N*M 
explanation.
 
So you are limited to the number of levels of nesting that you have magic names 
for?

bq. you are going to be doing a map reduce on most/all the data anyway

The right way to represent a "select * from foo" with no compound modification 
is to present it with compound keys (in the SQL sense not in the sense of 
"multiple keys crammed into a single value"):

{noformat}
rowkey columnkey subcol1subcol2
key1 col1val1   val2
key1 col2v3 v4
{noformat}


  was (Author: jbellis):
bq. you can specify the following column mappings in 
cassandra.columns.mapping

So you are limited to the number of levels of nesting that you have magic names 
for?

bq. you are going to be doing a map reduce on most/all the data anyway

The right way to represent a "select * from foo" with no compound modification 
is to present it with compound keys (in the SQL sense not in the sense of 
"multiple keys crammed into a single value"):

{noformat}
rowkey columnkey subcol1subcol2
key1 col1val1   val2
key1 col2v3 v4
{noformat}

  
> CQL support for compound columns
> 
>
> Key: CASSANDRA-2474
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2474
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: API, Core
>Reporter: Eric Evans
>  Labels: cql
> Fix For: 1.0
>
>
> For the most part, this boils down to supporting the specification of 
> compound column names (the CQL syntax is colon-delimted terms), and then 
> teaching the decoders (drivers) to create structures from the results.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Issue Comment Edited] (CASSANDRA-2474) CQL support for compound columns

2011-06-09 Thread T Jake Luciani (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046937#comment-13046937
 ] 

T Jake Luciani edited comment on CASSANDRA-2474 at 6/10/11 12:33 AM:
-

Doesn't this limit things to one row key?

If I have a CF of wide rows it would be nice to get back a transposed view of 
all rows..

you could use a SQL hint syntax like:

{noformat}
select /*+TRANSPOSED*/ key, column, subcolumn, value from foo;
{noformat}


  was (Author: tjake):
Doesn't this limit things to one row key?

If I have a CF of wide rows it would be nice to get back a transposed view of 
all rows..

you could use a SQL hint syntax like:

select /*+TRANSPOSED*/ key, column, subcolumn, value from foo;
  
> CQL support for compound columns
> 
>
> Key: CASSANDRA-2474
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2474
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: API, Core
>Reporter: Eric Evans
>  Labels: cql
> Fix For: 1.0
>
>
> For the most part, this boils down to supporting the specification of 
> compound column names (the CQL syntax is colon-delimted terms), and then 
> teaching the decoders (drivers) to create structures from the results.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira