[jira] [Issue Comment Edited] (CASSANDRA-2474) CQL support for compound columns and wide rows
[ https://issues.apache.org/jira/browse/CASSANDRA-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13189873#comment-13189873 ] Sylvain Lebresne edited comment on CASSANDRA-2474 at 1/20/12 4:33 PM: -- This issue is reay long. So I've created CASSANDRA-3761 for CQL 3.0, which basically will be the changes of the patch and whatever other changes I'm claiming are implied. It feels vaguely reasonable to switch to a more general ticket anyway now that we've decided on a number of things (the main syntax, the fact that we'll keep support for 2.0 for now etc...). I'll update CASSANDRA-3761 with a patch and more details about that patch soon, real soon. was (Author: slebresne): This issue is reay long. So I've created CASSANDRA-3761 for CQL 3.0, which basically will be the changes of the patch and whatever other changes I'm pretending are implied. It feels vaguely reasonable to switch to a more general ticket anyway now that we've decided on a number of things (the main syntax, the fact that we'll keep support for 2.0 for now etc...). I'll update CASSANDRA-3761 with a patch and more details about that patch soon, real soon. > CQL support for compound columns and wide rows > -- > > Key: CASSANDRA-2474 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2474 > Project: Cassandra > Issue Type: New Feature > Components: API, Core >Reporter: Eric Evans >Assignee: Sylvain Lebresne >Priority: Critical > Labels: cql > Fix For: 1.1 > > Attachments: 0001-Add-support-for-wide-and-composite-CFs.patch, > 0002-thrift-generated-code.patch, 2474-transposed-1.PNG, > 2474-transposed-raw.PNG, 2474-transposed-select-no-sparse.PNG, > 2474-transposed-select.PNG, cql_tests.py, raw_composite.txt, > screenshot-1.jpg, screenshot-2.jpg > > > For the most part, this boils down to supporting the specification of > compound column names (the CQL syntax is colon-delimted terms), and then > teaching the decoders (drivers) to create structures from the results. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-2474) CQL support for compound columns and wide rows
[ https://issues.apache.org/jira/browse/CASSANDRA-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13185228#comment-13185228 ] T Jake Luciani edited comment on CASSANDRA-2474 at 1/12/12 8:51 PM: bq. My current patch don't really allow creating a secondary index on a non static CF, because it's not clear how that would work from the syntax and I'm sure I see good use for that. Does that bother someone ? CASSANDRA-3680 will add support for this at a later time was (Author: tjake): .bq My current patch don't really allow creating a secondary index on a non static CF, because it's not clear how that would work from the syntax and I'm sure I see good use for that. Does that bother someone ? CASSANDRA-3680 will add support for this at a later time > CQL support for compound columns and wide rows > -- > > Key: CASSANDRA-2474 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2474 > Project: Cassandra > Issue Type: New Feature > Components: API, Core >Reporter: Eric Evans >Assignee: Sylvain Lebresne >Priority: Critical > Labels: cql > Fix For: 1.1 > > Attachments: 2474-transposed-1.PNG, 2474-transposed-raw.PNG, > 2474-transposed-select-no-sparse.PNG, 2474-transposed-select.PNG, > cql_tests.py, raw_composite.txt, screenshot-1.jpg, screenshot-2.jpg > > > For the most part, this boils down to supporting the specification of > compound column names (the CQL syntax is colon-delimted terms), and then > teaching the decoders (drivers) to create structures from the results. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-2474) CQL support for compound columns
[ https://issues.apache.org/jira/browse/CASSANDRA-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13177293#comment-13177293 ] Jonathan Ellis edited comment on CASSANDRA-2474 at 12/30/11 4:57 PM: - I've updated the wiki page at http://wiki.apache.org/cassandra/Cassandra2474 with a more in-depth Background section that hopefully clears up where I'm going with this sparse/dense business. Edit: had some discussion here about making row key explicit in TRANSPOSED WITH ROW KEY clause, instead of implicitly being the first element of PRIMARY KEY, but consensus seems to be this is a bad idea. was (Author: jbellis): I've updated the wiki page at http://wiki.apache.org/cassandra/Cassandra2474 with a more in-depth Background section that hopefully clears up where I'm going with this sparse/dense business. Eric mentioned on IRC that he's uneasy about the PRIMARY KEY syntax implicitly using the first element of PRIMARY KEY as the row key. We could make it explicit with another WITH option to the TRANSPOSED clause: {code} CREATE TABLE timeline ( user_id int, posted_at uuid, column string, value blob, PRIMARY KEY(user_id, posted_at) ) TRANSPOSED WITH ROW KEY(user_id) {code} This makes things more verbose (this would be a required clause) but I'm okay with that if consensus is that being explicit here is better. > CQL support for compound columns > > > Key: CASSANDRA-2474 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2474 > Project: Cassandra > Issue Type: New Feature > Components: API, Core >Reporter: Eric Evans >Assignee: Pavel Yaskevich > Labels: cql > Fix For: 1.1 > > Attachments: 2474-transposed-1.PNG, 2474-transposed-raw.PNG, > 2474-transposed-select-no-sparse.PNG, 2474-transposed-select.PNG, > raw_composite.txt, screenshot-1.jpg, screenshot-2.jpg > > > For the most part, this boils down to supporting the specification of > compound column names (the CQL syntax is colon-delimted terms), and then > teaching the decoders (drivers) to create structures from the results. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-2474) CQL support for compound columns
[ https://issues.apache.org/jira/browse/CASSANDRA-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13177293#comment-13177293 ] Jonathan Ellis edited comment on CASSANDRA-2474 at 12/30/11 1:41 AM: - I've updated the wiki page at http://wiki.apache.org/cassandra/Cassandra2474 with a more in-depth Background section that hopefully clears up where I'm going with this sparse/dense business. Eric mentioned on IRC that he's uneasy about the PRIMARY KEY syntax implicitly using the first element of PRIMARY KEY as the row key. We could make it explicit with another WITH option to the TRANSPOSED clause: {code} CREATE TABLE timeline ( user_id int, posted_at uuid, column string, value blob, PRIMARY KEY(user_id, posted_at) ) TRANSPOSED WITH ROW KEY(user_id) {code} This makes things more verbose (this would be a required clause) but I'm okay with that if consensus is that being explicit here is better. was (Author: jbellis): I've updated the wiki page at http://wiki.apache.org/cassandra/Cassandra2474 with a more in-depth Background section that hopefully clears up where I'm going with this sparse/dense business. Eric mentioned on IRC that he's uneasy about the PRIMARY KEY syntax implicitly using the first element of PRIMARY KEY as the row key. We could make it explicit with another WITH option to the TRANSPOSED clause: {{{ CREATE TABLE timeline ( user_id int, posted_at uuid, column string, value blob, PRIMARY KEY(user_id, posted_at) ) TRANSPOSED WITH ROW KEY(user_id) }}} This makes things more verbose (this would be a required clause) but I'm okay with that if consensus is that being explicit here is better. > CQL support for compound columns > > > Key: CASSANDRA-2474 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2474 > Project: Cassandra > Issue Type: New Feature > Components: API, Core >Reporter: Eric Evans >Assignee: Pavel Yaskevich > Labels: cql > Fix For: 1.1 > > Attachments: 2474-transposed-1.PNG, 2474-transposed-raw.PNG, > 2474-transposed-select-no-sparse.PNG, 2474-transposed-select.PNG, > raw_composite.txt, screenshot-1.jpg, screenshot-2.jpg > > > For the most part, this boils down to supporting the specification of > compound column names (the CQL syntax is colon-delimted terms), and then > teaching the decoders (drivers) to create structures from the results. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-2474) CQL support for compound columns
[ https://issues.apache.org/jira/browse/CASSANDRA-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13176762#comment-13176762 ] Jonathan Ellis edited comment on CASSANDRA-2474 at 12/28/11 8:58 PM: - The more I think about it the less happy I am with omitting support for sparse columns. Remember that dense composites may only be inserted and deleted, not updated, since they are just a tuple of values with "column names" determined by schema and/or convention. I think we can support sparse columns well in a way that improves the conceptual integrity for the dense composites as well: {code} -- "column" and "value" are sparse; a transposed row will be stored as -- two columns of (posted_at, 'column': string) and (posted_at, 'value': blob), -- with C* row key of user_id CREATE TABLE timeline ( user_id int, posted_at uuid, column string, value blob, PRIMARY KEY(user_id, posted_at) ) TRANSPOSED; -- entire transposed row is stored as a single dense composite column -- (ts1, cat, subcat, 1337, 92d21d0a-...: []) with a C* row key of series. -- Note that the composite column's value is unused in this case. CREATE TABLE events ( series text, ts1 int, cat text, subcat text, "1337" uuid, "92d21d0a-d6cb-437c-9d3f-b67aa733a19f" bigint, PRIMARY KEY(series, ts1, cat, subcat, "1337", "92d21d0a-d6cb-437c-9d3f-b67aa733a19f") ) TRANSPOSED WITH COLUMN NAMES ("1337" int, "92d21d0a-d6cb-437c-9d3f-b67aa733a19f" uuid); {code} Thus, columns included in the (transposed) primary key will be "dense," and not updateable, which conforms to our existing practice that keys are not updateable. Remaining columns will be updateable since they will each map to a separate physical column. was (Author: jbellis): The more I think about it the less happy I am with omitting support for sparse columns. Remember that dense composites may only be inserted and deleted, not updated, since they are just a tuple of values with "column names" determined by schema and/or convention. I think we can support sparse columns well in a way that improves the conceptual integrity for the dense composites as well: {code} -- "column" and "value" are sparse; a transposed row will be stored as -- two columns of (user_id, posted_at, 'column': string) and (user_id, posted_at, 'value': blob) CREATE TABLE timeline ( user_id int, posted_at uuid, column string, value blob, PRIMARY KEY(user_id, posted_at) ) TRANSPOSED; -- entire transposed row is stored as a single dense composite column -- (series, ts1, cat, subcat, 1337, 92d21d0a-...: []). Note that the -- composite column's value is unused in this case. CREATE TABLE events ( series text, ts1 int, cat text, subcat text, "1337" uuid, "92d21d0a-d6cb-437c-9d3f-b67aa733a19f" bigint, PRIMARY KEY(series, ts1, cat, subcat, "1337", "92d21d0a-d6cb-437c-9d3f-b67aa733a19f") ) TRANSPOSED WITH COLUMN NAMES ("1337" int, "92d21d0a-d6cb-437c-9d3f-b67aa733a19f" uuid); {code} Thus, columns included in the (transposed) primary key will be "dense," and not updateable, which conforms to our existing practice that keys are not updateable. Remaining columns will be updateable since they will each map to a separate physical column. > CQL support for compound columns > > > Key: CASSANDRA-2474 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2474 > Project: Cassandra > Issue Type: New Feature > Components: API, Core >Reporter: Eric Evans >Assignee: Pavel Yaskevich > Labels: cql > Fix For: 1.1 > > Attachments: 2474-transposed-1.PNG, 2474-transposed-raw.PNG, > 2474-transposed-select-no-sparse.PNG, 2474-transposed-select.PNG, > raw_composite.txt, screenshot-1.jpg, screenshot-2.jpg > > > For the most part, this boils down to supporting the specification of > compound column names (the CQL syntax is colon-delimted terms), and then > teaching the decoders (drivers) to create structures from the results. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-2474) CQL support for compound columns
[ https://issues.apache.org/jira/browse/CASSANDRA-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13173246#comment-13173246 ] Jonathan Ellis edited comment on CASSANDRA-2474 at 12/20/11 3:29 PM: - bq. I would say that 'sucking much less' doesn't necessarily make it 'the right solution'. This is more than sucking less. This is adapting exactly to the relational philosophy that SQL is about sets of records and predicates that deal with them, and how things are stored is an implementation detail. I don't see how we can do better than this or something like it. bq. Typically, you can do {{UPDATE test1 SET prop2 = 42 WHERE key = 'someKey'}} but you cannot do the same [statement] on test2. That's a good point, although I think the limitations are fairly easy to explain, along with the effect on ordering. bq. The SPARSE don't add any new possibility, it just adds a presumably better syntax for a specific case. Technically that is true, but that is akin to arguing that because PHP is Turing complete it's as good as Java to write databases in. :) Consider my timeline example. If we defined the table without SPARSE, as {noformat} CREATE TABLE timeline ( userid int primary key, posted_at uuid, column string, value blob ) TRANSPOSED {noformat} The query {{SELECT * FROM timeline WHERE user_id = 'tjefferson' AND posted_at > 1770}} would give us !2474-transposed-select-no-sparse.PNG! I don't see how this is usable in any reasonable sense of the word. Am I missing something? (Edit: it's actually worse than that, since you can't even use 'posted_at' in the query. I've actually written SQL with "dynamic columns" like this and I cannot overemphasize how badly it sucks.) (And no, "just model it with dense composites" is not an option for the mview use case, where adding new columns requires rebuilding the entire CF. That's a big win for us, and I'm not willing to give it up.) bq. I would actually be in favor of having the first step on this being the addition of a very simple 'raw' notation to access composites. I'm strongly against this or any "row slicing" notation, it's a terrible fit for anything that wants to deal with resultsets of rows sharing a common set of columns (i.e., CQL and its drivers). Unless this actually returns rows instead of columns which is not at all implied by the syntax. Strongly against that too. :) was (Author: jbellis): bq. I would say that 'sucking much less' doesn't necessarily make it 'the right solution'. This is more than sucking less. This is adapting exactly to the relational philosophy that SQL is about sets of records and predicates that deal with them, and how things are stored is an implementation detail. I don't see how we can do better than this or something like it. bq. Typically, you can do {{UPDATE test1 SET prop2 = 42 WHERE key = 'someKey'}} but you cannot do the same [statement] on test2. That's a good point, although I think the limitations are fairly easy to explain, along with the effect on ordering. bq. The SPARSE don't add any new possibility, it just adds a presumably better syntax for a specific case. Technically that is true, but that is akin to arguing that because PHP is Turing complete it's as good as Java to write databases in. :) Consider my timeline example. If we defined the table without SPARSE, as {noformat} CREATE TABLE timeline ( userid int primary key, posted_at uuid, column string, value blob ) TRANSPOSED {noformat} The query {{SELECT * FROM timeline WHERE user_id = 'tjefferson' AND posted_at > 1770}} would give us !2474-transposed-select-no-sparse.PNG! I don't see how this is usable in any reasonable sense of the word. Am I missing something? (And no, "just model it with dense composites" is not an option for the mview use case, where adding new columns requires rebuilding the entire CF. That's a big win for us, and I'm not willing to give it up.) bq. I would actually be in favor of having the first step on this being the addition of a very simple 'raw' notation to access composites. I'm strongly against this or any "row slicing" notation, it's a terrible fit for anything that wants to deal with resultsets of rows sharing a common set of columns (i.e., CQL and its drivers). Unless this actually returns rows instead of columns which is not at all implied by the syntax. Strongly against that too. :) > CQL support for compound columns > > > Key: CASSANDRA-2474 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2474 > Project: Cassandra > Issue Type: New Feature > Components: API, Core >Reporter: Eric Evans >Assignee: Pavel Yaskevich > Labels: cql >
[jira] [Issue Comment Edited] (CASSANDRA-2474) CQL support for compound columns
[ https://issues.apache.org/jira/browse/CASSANDRA-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13172670#comment-13172670 ] Pavel Yaskevich edited comment on CASSANDRA-2474 at 12/19/11 10:12 PM: --- Here is another option for this: {code} TRANSPOSED AS ( ?MODIFIER(?), ) {code} where MODIFIER = SPARSE | DENSE and = utf8, int, uuid, timeuuid, ...; if no MODIFIER set then MODIFIER = DENSE(default_type) {code} CREATE TABLE timeline ( userid int primary key, posted_at uuid, posted_by int, body text, "92d21d0a-d6cb-437c-9d3f-b67aa733a19f" bigint ) TRANSPOSED AS (posted_at, posted_by SPARSE, 92d21d0a-d6cb-437c-9d3f-b67aa733a19f" DENSE(uuid), body SPARSE); {code} was (Author: xedin): Here is another option for this: {code} TRANSPOSED AS ( ?MODIFIER(), ) {code} where MODIFIER = SPARSE | DENSE and = utf8, int, uuid, timeuuid, ...; if no MODIFIER set then MODIFIER = DENSE(default_type) {code} CREATE TABLE timeline ( userid int primary key, posted_at uuid, posted_by int, body text, "92d21d0a-d6cb-437c-9d3f-b67aa733a19f" bigint ) TRANSPOSED AS (posted_at, posted_by SPARSE, 92d21d0a-d6cb-437c-9d3f-b67aa733a19f" DENSE(uuid), body SPARSE); {code} > CQL support for compound columns > > > Key: CASSANDRA-2474 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2474 > Project: Cassandra > Issue Type: New Feature > Components: API, Core >Reporter: Eric Evans >Assignee: Pavel Yaskevich > Labels: cql > Fix For: 1.1 > > Attachments: 2474-transposed-1.PNG, 2474-transposed-raw.PNG, > 2474-transposed-select.PNG, screenshot-1.jpg, screenshot-2.jpg > > > For the most part, this boils down to supporting the specification of > compound column names (the CQL syntax is colon-delimted terms), and then > teaching the decoders (drivers) to create structures from the results. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-2474) CQL support for compound columns
[ https://issues.apache.org/jira/browse/CASSANDRA-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13172604#comment-13172604 ] Jonathan Ellis edited comment on CASSANDRA-2474 at 12/19/11 9:02 PM: - The crucial part of this latest proposal is that it really highlights that transposition really is just an implementation detail from the relational perspective. So, to flesh that out: {code} INSERT INTO timeline (user_id, posted_at, posted_by, body) VALUES ('tjefferson', '1818', 'jadams', 'Revolution was effected before the war commenced'); INSERT INTO timeline (user_id, posted_at, posted_by, body) VALUES ('tjefferson', '1763', 'jadams', 'Democracy will soon degenerate into an anarchy'); INSERT INTO timeline (user_id, posted_at, posted_by, body) VALUES ('tjefferson', '1790', 'gwashington', 'To be prepared for war is one of the most effectual means of preserving peace'); INSERT INTO timeline (user_id, posted_at, posted_by, body) VALUES ('bfranklin', '1781', 'tjefferson', 'Every government degenerates when trusted to the rulers of the people alone'); {code} ... corresponding to the data in !2474-transposed-1.PNG! which in "raw" form looks like !2474-transposed-raw.PNG! Does that make sense? We're using TRANSPOSED AS similarly to how databases have used storage hints like CLUSTERED. It doesn't affect the relational model of the data, but it gives you different performance characteristics. (The analogy is particularly apt in that both CLUSTERED and TRANSPOSED AS affect ordering of results. EDIT: oops, goofed in ordering of the tjefferson row in my diagram.) was (Author: jbellis): The crucial part of this latest proposal is that it really highlights that transposition really is just an implementation detail from the relational perspective. So, to flesh that out: {code} INSERT INTO timeline (user_id, posted_at, posted_by, body) VALUES ('tjefferson', '1818', 'jadams', 'Revolution was effected before the war commenced'); INSERT INTO timeline (user_id, posted_at, posted_by, body) VALUES ('tjefferson', '1763', 'jadams', 'Democracy will soon degenerate into an anarchy'); INSERT INTO timeline (user_id, posted_at, posted_by, body) VALUES ('tjefferson', '1790', 'gwashington', 'To be prepared for war is one of the most effectual means of preserving peace'); INSERT INTO timeline (user_id, posted_at, posted_by, body) VALUES ('bfranklin', '1781', 'tjefferson', 'Every government degenerates when trusted to the rulers of the people alone'); {code} ... corresponding to the data in !2474-transposed-1.PNG! which in "raw" form looks like !2474-transposed-raw.PNG! Does that make sense? We're using TRANSPOSED AS similarly to how databases have used storage hints like CLUSTERED. It doesn't affect the relational model of the data, but it gives you different performance characteristics. (The analogy is particularly apt in that both CLUSTERED and TRANSPOSED AS affect ordering of results.) > CQL support for compound columns > > > Key: CASSANDRA-2474 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2474 > Project: Cassandra > Issue Type: New Feature > Components: API, Core >Reporter: Eric Evans >Assignee: Pavel Yaskevich > Labels: cql > Fix For: 1.1 > > Attachments: 2474-transposed-1.PNG, 2474-transposed-raw.PNG, > 2474-transposed-select.PNG, screenshot-1.jpg, screenshot-2.jpg > > > For the most part, this boils down to supporting the specification of > compound column names (the CQL syntax is colon-delimted terms), and then > teaching the decoders (drivers) to create structures from the results. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-2474) CQL support for compound columns
[ https://issues.apache.org/jira/browse/CASSANDRA-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13172608#comment-13172608 ] Jonathan Ellis edited comment on CASSANDRA-2474 at 12/19/11 9:03 PM: - So, you could do queries like {code} SELECT * FROM timeline WHERE user_id = 'tjefferson' AND posted_at > 1770; {code} Which would give the resultset shown in !2474-transposed-select.PNG! was (Author: jbellis): So, you could do queries like {code} SELECT * FROM timeline WHERE user_id = 'tjefferson' AND posted_ad > 1770; {code} Which would give the resultset shown in !2474-transposed-select.PNG! > CQL support for compound columns > > > Key: CASSANDRA-2474 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2474 > Project: Cassandra > Issue Type: New Feature > Components: API, Core >Reporter: Eric Evans >Assignee: Pavel Yaskevich > Labels: cql > Fix For: 1.1 > > Attachments: 2474-transposed-1.PNG, 2474-transposed-raw.PNG, > 2474-transposed-select.PNG, screenshot-1.jpg, screenshot-2.jpg > > > For the most part, this boils down to supporting the specification of > compound column names (the CQL syntax is colon-delimted terms), and then > teaching the decoders (drivers) to create structures from the results. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-2474) CQL support for compound columns
[ https://issues.apache.org/jira/browse/CASSANDRA-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13172604#comment-13172604 ] Jonathan Ellis edited comment on CASSANDRA-2474 at 12/19/11 8:59 PM: - The crucial part of this latest proposal is that it really highlights that transposition really is just an implementation detail from the relational perspective. So, to flesh that out: {code} INSERT INTO timeline (user_id, posted_at, posted_by, body) VALUES ('tjefferson', '1818', 'jadams', 'Revolution was effected before the war commenced'); INSERT INTO timeline (user_id, posted_at, posted_by, body) VALUES ('tjefferson', '1763', 'jadams', 'Democracy will soon degenerate into an anarchy'); INSERT INTO timeline (user_id, posted_at, posted_by, body) VALUES ('tjefferson', '1790', 'gwashington', 'To be prepared for war is one of the most effectual means of preserving peace'); INSERT INTO timeline (user_id, posted_at, posted_by, body) VALUES ('bfranklin', '1781', 'tjefferson', 'Every government degenerates when trusted to the rulers of the people alone'); {code} ... corresponding to the data in !2474-transposed-1.PNG! which in "raw" form looks like !2474-transposed-raw.PNG! Does that make sense? We're using TRANSPOSED AS similarly to how databases have used storage hints like CLUSTERED. It doesn't affect the relational model of the data, but it gives you different performance characteristics. (The analogy is particularly apt in that both CLUSTERED and TRANSPOSED AS affect ordering of results.) was (Author: jbellis): The crucial part of this latest proposal is that it really highlights that transposition really is just an implementation detail from the relational perspective. So, to flesh that out: {code} INSERT INTO timeline (user_id, posted_at, posted_by, body) VALUES ('tjefferson', '1818', 'jadams', 'Revolution was effected before the war commenced'); INSERT INTO timeline (user_id, posted_at, posted_by, body) VALUES ('tjefferson', '1763', 'jadams', 'Democracy will soon degenerate into an anarchy'); INSERT INTO timeline (user_id, posted_at, posted_by, body) VALUES ('tjefferson', '1790', 'gwashington', 'To be prepared for war is one of the most effectual means of preserving peace'); INSERT INTO timeline (user_id, posted_at, posted_by, body) VALUES ('bfranklin', '1781', 'tjefferson', 'Every government degenerates when trusted to the rulers of the people alone'); {code} ... corresponding to the data in !2474-transposed-1.PNG!, which in "raw" form looks like !2474-transposed-raw.PNG! Does that make sense? We're using TRANSPOSED AS similarly to how databases have used storage hints like CLUSTERED. It doesn't affect the relational model of the data, but it gives you different performance characteristics. (The analogy is particularly apt in that both CLUSTERED and TRANSPOSED AS affect ordering of results.) > CQL support for compound columns > > > Key: CASSANDRA-2474 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2474 > Project: Cassandra > Issue Type: New Feature > Components: API, Core >Reporter: Eric Evans >Assignee: Pavel Yaskevich > Labels: cql > Fix For: 1.1 > > Attachments: 2474-transposed-1.PNG, 2474-transposed-raw.PNG, > screenshot-1.jpg, screenshot-2.jpg > > > For the most part, this boils down to supporting the specification of > compound column names (the CQL syntax is colon-delimted terms), and then > teaching the decoders (drivers) to create structures from the results. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-2474) CQL support for compound columns
[ https://issues.apache.org/jira/browse/CASSANDRA-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13172299#comment-13172299 ] T Jake Luciani edited comment on CASSANDRA-2474 at 12/19/11 2:21 PM: - In the non-sparse case you would always ignore the column value? I think we need to expose that somehow. (first non-transposed, non-key, non-sparse column?) Overall I like this because it forces a user to think at schema creation time and not access time. This approach makes sense for CQL only access, but for users who are coming from thrift they will be asking "how do i access data from my current data model?" On the negative side, this approach feels a bit too restrictive since you *MUST* use the same kind of schema across all rows within a CF. What if a user doesn't know what the sparse columns will be ahead of time? Also, I know that's best practice but want to make the point, what if a user wants to access data in composite form and "raw" mode, should we support multiple "views" on the CF? was (Author: tjake): In the non-sparse case you always would always ignore the column value? I think we need to expose that somehow. (first non-transposed, non-key, non-sparse column?) Overall I like this because it forces a user to think at schema creation time and not access time. This approach makes sense for CQL only access, but for users who are coming from thrift they will be asking "how do i access data from my current data model?" On the negative side, this approach feels a bit too restrictive since you *MUST* use the same kind of schema across all rows within a CF. What if a user doesn't know what the sparse columns will be ahead of time? Also, I know that's best practice but want to make the point, what if a user wants to access data in composite form and "raw" mode, should we support multiple "views" on the CF? > CQL support for compound columns > > > Key: CASSANDRA-2474 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2474 > Project: Cassandra > Issue Type: New Feature > Components: API, Core >Reporter: Eric Evans >Assignee: Pavel Yaskevich > Labels: cql > Fix For: 1.1 > > Attachments: screenshot-1.jpg, screenshot-2.jpg > > > For the most part, this boils down to supporting the specification of > compound column names (the CQL syntax is colon-delimted terms), and then > teaching the decoders (drivers) to create structures from the results. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-2474) CQL support for compound columns
[ https://issues.apache.org/jira/browse/CASSANDRA-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13099103#comment-13099103 ] Pavel Yaskevich edited comment on CASSANDRA-2474 at 9/7/11 5:09 PM: To make it Hive friendly {noformat} -- sparse select example SELECT tweet_id; username, body; location FROM tweets:transposed WHERE key = AND username = 'xedin'; -- dense select example SELECT tweet_id; username; location FROM tweets:transposed WHERE key = AND username = 'xedin'; -- insert INSERT INTO tweets (KEY, COMPOUND_NAME('2e1c3308', username)) VALUES (, 'cscotta'); INSERT INTO tweets (KEY, COMPOUND_NAME('2e1c3308', body)) VALUES (, 'Brother...'); INSERT INTO tweets (KEY, COMPOUND_NAME('2e1c3308', 'cscotta')) VALUES (, 'Brother...'); -- update UPDATE tweets SET value = 'My motocycle...' WHERE KEY = AND COMPOUND_NAME('2e1c3308', 'cscotta'); {noformat} was (Author: xedin): To make it Hive friendly {noformat} -- sparse select example SELECT tweet_id; username, body; location FROM tweets:transposed WHERE key = AND username = 'xedin'; -- dense select example SELECT tweet_id; username; location FROM tweets:transposed WHERE key = AND username = 'xedin'; -- insert INSERT INTO tweets:transposed (KEY, COMPOUND_NAME('2e1c3308', username)) VALUES (, 'cscotta'); INSERT INTO tweets:transposed (KEY, COMPOUND_NAME('2e1c3308', body)) VALUES (, 'Brother...'); INSERT INTO tweets:transposed (KEY, COMPOUND_NAME('2e1c3308', 'cscotta')) VALUES (, 'Brother...'); -- update UPDATE tweets:transposed SET value = 'My motocycle...' WHERE KEY = AND COMPOUND_NAME('2e1c3308', 'cscotta'); {noformat} > CQL support for compound columns > > > Key: CASSANDRA-2474 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2474 > Project: Cassandra > Issue Type: Sub-task > Components: API, Core >Reporter: Eric Evans >Assignee: Pavel Yaskevich > Labels: cql > Fix For: 1.0 > > Attachments: screenshot-1.jpg, screenshot-2.jpg > > > For the most part, this boils down to supporting the specification of > compound column names (the CQL syntax is colon-delimted terms), and then > teaching the decoders (drivers) to create structures from the results. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-2474) CQL support for compound columns
[ https://issues.apache.org/jira/browse/CASSANDRA-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13099103#comment-13099103 ] Pavel Yaskevich edited comment on CASSANDRA-2474 at 9/7/11 5:08 PM: To make it Hive friendly {noformat} -- sparse select example SELECT tweet_id; username, body; location FROM tweets:transposed WHERE key = AND username = 'xedin'; -- dense select example SELECT tweet_id; username; location FROM tweets:transposed WHERE key = AND username = 'xedin'; -- insert INSERT INTO tweets:transposed (KEY, COMPOUND_NAME('2e1c3308', username)) VALUES (, 'cscotta'); INSERT INTO tweets:transposed (KEY, COMPOUND_NAME('2e1c3308', body)) VALUES (, 'Brother...'); INSERT INTO tweets:transposed (KEY, COMPOUND_NAME('2e1c3308', 'cscotta')) VALUES (, 'Brother...'); -- update UPDATE tweets:transposed SET value = 'My motocycle...' WHERE KEY = AND COMPOUND_NAME('2e1c3308', 'cscotta'); {noformat} was (Author: xedin): No make it Hive friendly {noformat} -- sparse select example SELECT tweet_id; username, body; location FROM tweets:transposed WHERE key = AND username = 'xedin'; -- dense select example SELECT tweet_id; username; location FROM tweets:transposed WHERE key = AND username = 'xedin'; -- insert INSERT INTO tweets:transposed (KEY, COMPOUND_NAME('2e1c3308', username)) VALUES (, 'cscotta'); INSERT INTO tweets:transposed (KEY, COMPOUND_NAME('2e1c3308', body)) VALUES (, 'Brother...'); INSERT INTO tweets:transposed (KEY, COMPOUND_NAME('2e1c3308', 'cscotta')) VALUES (, 'Brother...'); -- update UPDATE tweets:transposed SET value = 'My motocycle...' WHERE KEY = AND COMPOUND_NAME('2e1c3308', 'cscotta'); {noformat} > CQL support for compound columns > > > Key: CASSANDRA-2474 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2474 > Project: Cassandra > Issue Type: Sub-task > Components: API, Core >Reporter: Eric Evans >Assignee: Pavel Yaskevich > Labels: cql > Fix For: 1.0 > > Attachments: screenshot-1.jpg, screenshot-2.jpg > > > For the most part, this boils down to supporting the specification of > compound column names (the CQL syntax is colon-delimted terms), and then > teaching the decoders (drivers) to create structures from the results. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-2474) CQL support for compound columns
[ https://issues.apache.org/jira/browse/CASSANDRA-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13098384#comment-13098384 ] T Jake Luciani edited comment on CASSANDRA-2474 at 9/6/11 9:38 PM: --- bq. I think something got lost along the way here Seems like the confusion is around the component1 syntax? This is meant to be used with composite columns. Example: {code} ts1:catA :subcatA :id1 :prop1 component1 :component2:component3:component4:component5 {code} was (Author: tjake): bq. I think something got lost along the way here Seems like the confusion is around the component1 syntax? This is meant to be used with composite columns. Example: ts1:catA :subcatA :id1 :prop1 component1 :component2:component3:component4:component5 > CQL support for compound columns > > > Key: CASSANDRA-2474 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2474 > Project: Cassandra > Issue Type: Sub-task > Components: API, Core >Reporter: Eric Evans >Assignee: Pavel Yaskevich > Labels: cql > Fix For: 1.0 > > Attachments: screenshot-1.jpg, screenshot-2.jpg > > > For the most part, this boils down to supporting the specification of > compound column names (the CQL syntax is colon-delimted terms), and then > teaching the decoders (drivers) to create structures from the results. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-2474) CQL support for compound columns
[ https://issues.apache.org/jira/browse/CASSANDRA-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13098165#comment-13098165 ] Sylvain Lebresne edited comment on CASSANDRA-2474 at 9/6/11 5:07 PM: - bq. A more Cassandra-ish way to model this would be to encode this as a series of columns: (, 'category', ), (, 'subcategory', ), (, 'event', ). This is better in the general case for the same reason that a sparse top-level set of columns is better: I can easily add more data to events (e.g., "source") without rewriting existing events. But my point is: I disagree with that claim. Maybe sometime your proposal is better, but not always. What if you know that you won't add more data to events. Or more precisely, you know that what identify an event won't change. What if you decided to model it with a (timestamp, category, sub-category, eventId) composite not as a way to feed data into the column key, but because this correspond to how you want to query the data (which I would say is a very cassandra-ish way to model). Let's take an example. The data for the (timestamp, category, sub-category, eventId) composite (for some key) could look like (on disk): {noformat} ts1:catA:subcatA:id1 -> ts1:catA:subcatA:id2 -> ts1:catA:subcatA:id3 -> ts1:catA:subcatA:id4 -> ts1:catA:subcatB:id5 -> ts1:catA:subcatB:id6 -> ts1:catB:subcatA:id7 -> ts1:catB:subcatA:id8 -> {noformat} And say that value is some opaque bytes representing some event data. Now, I'm not even sure how you model the same thing with your proposal, but I'm pretty sure it will involve indirections (or duplication), I doubt it will be more user friendly and you will need more than one query (I would have said 3 queries at first but after trying to see how it would look like I'm not even sure I see where you would put the value in your proposal) to do query like: * give me all the events (eventid and value) for (ts1, catA, subcatA) * give me all the events (eventid and value) for (ts1, catA) * give me all the events (eventid and value) for ts1 because the events would not be ordered correctly. The kind of modeling you propose would make sense if the for an event above was not opaque but composed of a number of property. They yes, I may would want to model things as: {noformat} ts1:catA:subcatA:id1:prop1 -> ts1:catA:subcatA:id1:prop2 -> ts1:catA:subcatA:id1:prop3 -> ts1:catA:subcatA:id2:prop1 -> ts1:catA:subcatA:id2:prop2 -> ts1:catA:subcatA:id2:prop3 -> ts1:catA:subcatA:id3:prop1 -> ts1:catA:subcatA:id3:prop2 -> ts1:catA:subcatA:id3:prop3 -> ... {noformat} because that doesn't screw up with the sorting I'm trying to impose (and that correspond to my queries). And btw, prop1 could 'category' (though that would be redundant in that case). But there is two different thing: # the first part of the key (ts1:catA:subcatA:id1) is the key to my object. It is what makes the ordering corresponding to my queries. # the last component (prop1, ...) is just the way to express the different properties of my object (and just a way to emulate super columns after all). So I guess what I'm arguing here is just to not forget the case where you use CompositeType because your column key do is intrinsically composed of multiple parts. Because it *is* very useful. was (Author: slebresne): bq. A more Cassandra-ish way to model this would be to encode this as a series of columns: (, 'category', ), (, 'subcategory', ), (, 'event', ). This is better in the general case for the same reason that a sparse top-level set of columns is better: I can easily add more data to events (e.g., "source") without rewriting existing events. But my point is: I disagree with that claim. Maybe sometime your proposal is better, but not always. What if you know that you won't add more data to events. Or more precisely, you know that what identify an event won't change. What if you decided to model it with a (timestamp, category, sub-category, eventId) composite not as a way to feed data into the column key, but because this correspond to how you want to query the data (which I would say is a very cassandra-ish way to model). Let's take an example. The data for the (timestamp, category, sub-category, eventId) composite (for some key) could look like (on disk): {noformat} ts1:catA:subcatA:id1 -> ts1:catA:subcatA:id2 -> ts1:catA:subcatA:id3 -> ts1:catA:subcatA:id4 -> ts1:catA:subcatB:id5 -> ts1:catA:subcatB:id6 -> ts1:catB:subcatA:id7 -> ts1:catB:subcatA:id8 -> {noformat} And say that value is some opaque bytes representing some event data. Now, I'm not even sure how you model the same thing with your proposal, but I'm pretty sure it will involve indirections (or duplication), I doubt it will be more user friendly and you will need more tha
[jira] [Issue Comment Edited] (CASSANDRA-2474) CQL support for compound columns
[ https://issues.apache.org/jira/browse/CASSANDRA-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13097242#comment-13097242 ] Jonathan Ellis edited comment on CASSANDRA-2474 at 9/5/11 6:50 PM: --- bq. composite/super columns won't originally play nice with SQL syntax because it wasn't designed to query hierarchical data That's exactly what "transposition" solves -- taking a horizontal slice and "turning it sideways" into a resultset with the same set of columns. We are NOT trying to solve a more generic hierarchical data problem -- all the ("leaf") data we select has to be at the same "level" in the hierarchy. bq. if we have 10 subcolumns do I need to list them all using component syntax You will if you are using the "dense" format. And let's be clear: this is NOT the recommended way to do things, because it is fragile, as described above. We want to support it [dense], but making it beautiful is not our goal. Sparse encoding will be the recommended practice. bq. it lacks scoping therefore on the big queries it will be hard to read, e.g. "SELECT component1 AS tweet_id, component2 AS username, body, location, age, value" I don't understand, that seems perfectly readable to me. bq. SELECT name AS (tweet_id, username | body | location | age), value AS body This syntax is not viable for the reasons given in my previous comment. I'm happy to entertain other alternatives to the component syntax but there's no need to spend further time on this one. was (Author: jbellis): bq. composite/super columns won't originally play nice with SQL syntax because it wasn't designed to query hierarchical data That's exactly what "transposition" solves -- taking a horizontal slice and "turning it sideways" into a resultset with the same set of columns. We are NOT trying to solve a more generic hierarchical data problem -- all the ("leaf") data we select has to be at the same "level" in the hierarchy. bq. if we have 10 subcolumns do I need to list them all using component syntax You will if you are using the "dense" format. And let's be clear: this is NOT the recommended way to do things, because it is fragile, as described above. We want to support it, but making it beautiful is not our goal. bq. it lacks scoping therefore on the big queries it will be hard to read, e.g. "SELECT component1 AS tweet_id, component2 AS username, body, location, age, value" I don't understand, that seems perfectly readable to me. bq. SELECT name AS (tweet_id, username | body | location | age), value AS body This syntax is not viable for the reasons given in my previous comment. I'm happy to entertain other alternatives to the component syntax but there's no need to spend further time on this one. > CQL support for compound columns > > > Key: CASSANDRA-2474 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2474 > Project: Cassandra > Issue Type: Sub-task > Components: API, Core >Reporter: Eric Evans >Assignee: Pavel Yaskevich > Labels: cql > Fix For: 1.0 > > Attachments: screenshot-1.jpg, screenshot-2.jpg > > > For the most part, this boils down to supporting the specification of > compound column names (the CQL syntax is colon-delimted terms), and then > teaching the decoders (drivers) to create structures from the results. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-2474) CQL support for compound columns
[ https://issues.apache.org/jira/browse/CASSANDRA-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13097242#comment-13097242 ] Jonathan Ellis edited comment on CASSANDRA-2474 at 9/5/11 6:50 PM: --- bq. composite/super columns won't originally play nice with SQL syntax because it wasn't designed to query hierarchical data That's exactly what "transposition" solves -- taking a horizontal slice and "turning it sideways" into a resultset with the same set of columns. We are NOT trying to solve a more generic hierarchical data problem -- all the ("leaf") data we select has to be at the same "level" in the hierarchy. bq. if we have 10 subcolumns do I need to list them all using component syntax You will if you are using the "dense" format. And let's be clear: this is NOT the recommended way to do things, because it is fragile, as described above. We want to support it, but making it beautiful is not our goal. bq. it lacks scoping therefore on the big queries it will be hard to read, e.g. "SELECT component1 AS tweet_id, component2 AS username, body, location, age, value" I don't understand, that seems perfectly readable to me. bq. SELECT name AS (tweet_id, username | body | location | age), value AS body This syntax is not viable for the reasons given in my previous comment. I'm happy to entertain other alternatives to the component syntax but there's no need to spend further time on this one. was (Author: jbellis): bq. composite/super columns won't originally play nice with SQL syntax because it wasn't designed to query hierarchical data That's exactly what "transposition" solves -- taking a horizontal slice and "turning it sideways" into a resultset with the same set of columns. We are NOT trying to solve a more generic hierarchical data problem -- all the ("leaf") data we select has to be at the same "level" in the hierarchy. bq. if we have 10 subcolumns do I need to list them all using component syntax You will if you are using the "dense" format. And let's be clear: this is NOT the recommended way to do things, because it is fragile, as described above. We want to support it, but making it beautiful is not our goal. bq. will potentially be hard to put into grammar because it can have ambiguous rules again because lack of scoping bq. it lacks scoping therefore on the big queries it will be hard to read e.g. "SELECT component1 AS tweet_id, component2 AS username, body, location, age, value" I don't understand, that seems perfectly readable to me. bq. SELECT name AS (tweet_id, username | body | location | age), value AS body This syntax is not viable for the reasons given in my previous comment. I'm happy to entertain other alternatives to the component syntax but there's no need to spend further time on this one. > CQL support for compound columns > > > Key: CASSANDRA-2474 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2474 > Project: Cassandra > Issue Type: Sub-task > Components: API, Core >Reporter: Eric Evans >Assignee: Pavel Yaskevich > Labels: cql > Fix For: 1.0 > > Attachments: screenshot-1.jpg, screenshot-2.jpg > > > For the most part, this boils down to supporting the specification of > compound column names (the CQL syntax is colon-delimted terms), and then > teaching the decoders (drivers) to create structures from the results. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-2474) CQL support for compound columns
[ https://issues.apache.org/jira/browse/CASSANDRA-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13096842#comment-13096842 ] Pavel Yaskevich edited comment on CASSANDRA-2474 at 9/4/11 9:57 AM: bq. Remember that the ideal for CQL is to have "SELECT x, y, z" and get back exactly columns x, y, and z. composite/super columns won't originally play nice with SQL syntax because it wasn't designed to query hierarchical data. Few problems I have with componentX syntax: - if we have 10 subcolumns do I need to list them all using component syntax (which would be totally unreadable)? - it lacks scoping therefore on the big queries it will be hard to read e.g. {noformat} SELECT component1 AS tweet_id, component2 AS username, body, location, age, value AS body {noformat} - will potentially be hard to put into grammar because it can have ambiguous rules again because lack of scoping - why should we force users to actually give each component a number? And I don't get why do you think that (..,..,..) is a "rocket science" syntax: If we presume that user should be familiar with composite type columns before start using the syntax then he will know what does each section (separated by ",") mean: {noformat} SELECT name AS (tweet_id, username, location), value AS body {noformat} means that we have three sections as column name which we are aliasing to tweet_id, username, location {noformat} SELECT name AS (tweet_id, username | body | location | age), value AS body {noformat} means that we have two components in the name: first one - tweet_id, and second component that has multiple meanings but we only want to get username, body, location {noformat} SELECT name AS (tweet_id, *), value AS body {noformat} means that we still have two components in the column name but we don't care what holds component #2 and we expect result set to return all of the possible values. was (Author: xedin): bq. Remember that the ideal for CQL is to have "SELECT x, y, z" and get back exactly columns x, y, and z. composite/super columns won't originally play nice with SQL syntax because it wasn't designed to query hierarchical data. Few problems I have with componentX syntax: - if we have 10 subcolumns do I need to list them all using component syntax (which would be totally unreadable)? - it lacks scoping therefore on the big queries it will be hard to read e.g. {noformat} SELECT component1 AS tweet_id, component2 AS username, body, location, age, value AS body {noformat} - will potentially be hard to put into grammar because it can have ambiguous rules again because lack of scoping - why should we force users to actually give each component a number? And I don't get why do you think that (..,..,..) is a "rocket science" syntax: If we presume that user should be familiar with composite type columns before start using the syntax then he will know what does each section (separated by ",") mean: {noformat} SELECT name AS (tweet_id, username, location), value AS body {noformat} means that we have three sections as column name which we are aliasing to tweet_id, username, location {noformat} SELECT name AS (tweet_id, username | body | location | age), value AS body {noformat} means that we have two components in the name: first one - tweet_id, and second component that has multiple meanings but we only want to get username, body, location {noformat} SELECT name AS (tweet_id, *), value AS body means that we still have two components in the column name but we don't care what holds component #2 and we expect result set to return all of the possible values. > CQL support for compound columns > > > Key: CASSANDRA-2474 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2474 > Project: Cassandra > Issue Type: Sub-task > Components: API, Core >Reporter: Eric Evans >Assignee: Pavel Yaskevich > Labels: cql > Fix For: 1.0 > > Attachments: screenshot-1.jpg, screenshot-2.jpg > > > For the most part, this boils down to supporting the specification of > compound column names (the CQL syntax is colon-delimted terms), and then > teaching the decoders (drivers) to create structures from the results. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-2474) CQL support for compound columns
[ https://issues.apache.org/jira/browse/CASSANDRA-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13095913#comment-13095913 ] Pavel Yaskevich edited comment on CASSANDRA-2474 at 9/2/11 11:16 AM: - bq. We already have the transposition as a visual "hint" as to what is going on. I don't think it's difficult to say that "you need to use componentX when using transposition," and :colon tokens are usually used to represent bind variables in SQL drivers. I didn't use ":transposed" keyword any where but I'm file using it instead of :colon tokens. I don't like "componentX" notation because we can simply preserve ordering of the arguments to make the same thing which is much clearer syntax on my opinion bq. Maybe it is better if we will support (..,..,..) notation for composite columns Can be used as a replacement for "componentX" bq. Sorry, I don't follow. You wrote: bq. If I wanted to add a location tweet field, I'd need to delete (e.g.) the 2e1c3308,cscotta column and replace it with 2e1c3308,cscotta,. So, generally not recommended... but the composite column spec implies we should support it.) Which brings us to: how to deal with insert/update in composite columns. I mean when we want to extend "2e1c3308,cscotta" with we can simply update old column's name to "2e1c3308,cscotta,", for querying we can use "1/0/-1" as <'end-of-component' byte> from CompositeType doc when we want to search by specific components of the column name. e.g. (if I understand CompositeType annotation correctly) {noformat} SELECT name AS (tweet_id, username), value AS body FROM timeline WHERE tweet_id = '95a789a' AND user_id = 'cscotta' {noformat} start = "<7>95a789a.getBytes()<0><7>cscotta.getBytes()<0>" end = "<7>95a789a.getBytes()<0><7>cscotta.getBytes()<1>" Which should give as *all* columns where column name starts with 95a789a,cscotta {noformat} SELECT name AS (tweet_id, username, location), value AS body FROM timeline WHERE tweet_id = '95a789a' AND user_id = 'cscotta' AND location = 'USA' {noformat} start = "<7>95a789a.getBytes()<0><7>cscotta.getBytes()<0><3>USA.getBytes()<0>" end = "<7>95a789a.getBytes()<0><7>cscotta.getBytes()<0><3>USA.getBytes()<1>" Which should give as *all* columns where column name starts with 95a789a,cscotta,USA was (Author: xedin): bq. We already have the transposition as a visual "hint" as to what is going on. I don't think it's difficult to say that "you need to use componentX when using transposition," and :colon tokens are usually used to represent bind variables in SQL drivers. I didn't use ":transposed" keyword any where but I'm file using it instead of :colon tokens. I don't like "componentX" notation because we can simply preserve ordering of the arguments to make the same thing which is much clearer syntax on my opinion bq. Maybe it is better if we will support (..,..,..) notation for composite columns Can be used as a replacement for "componentX" bq. Sorry, I don't follow. You wrote: bq. If I wanted to add a location tweet field, I'd need to delete (e.g.) the 2e1c3308,cscotta column and replace it with 2e1c3308,cscotta,. So, generally not recommended... but the composite column spec implies we should support it.) Which brings us to: how to deal with insert/update in composite columns. I mean when we want to extend "2e1c3308,cscotta" with we can simply update old column's name to "2e1c3308,cscotta,", for querying we can use "1/0/-1" as <'end-of-component' byte> from CompositeType doc when we want to search by specific components of the column name. e.g. (if I understand CompositeType annotation correctly) {noformat} SELECT name AS (tweet_id, username), value AS body FROM timeline WHERE tweet_id = '95a789a' AND user_id = 'cscotta' {noformat} start = "<7>95a789a.getBytes()<0><7>cscotta.getBytes()<0>" end = "<7>95a789a.getBytes()<0><7>cscotta.getBytes()<1>" Which should give as *all* columns where column name starts with 95a789a,cscotta {noformat} SELECT name AS (tweet_id, username, location), value AS body FROM timeline WHERE tweet_id = '95a789a' AND user_id = 'cscotta' AND location = 'USA' {noformat} start = "<7>95a789a.getBytes()<0><7>cscotta.getBytes()<0><3>USA.getBytes()<0>" end = "<7>95a789a.getBytes()<0><7>cscotta.getBytes()<1><3>USA.getBytes()<1>" Which should give as *all* columns where column name starts with 95a789a,cscotta,USA > CQL support for compound columns > > > Key: CASSANDRA-2474 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2474 > Project: Cassandra > Issue Type: Sub-task > Components: API, Core >Reporter: Eric Evans >Assignee: Pavel Yaskevich > Labels: cql > Fix For: 1.0 > > Attachments: screenshot-1.jpg, scree
[jira] [Issue Comment Edited] (CASSANDRA-2474) CQL support for compound columns
[ https://issues.apache.org/jira/browse/CASSANDRA-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13095653#comment-13095653 ] Pavel Yaskevich edited comment on CASSANDRA-2474 at 9/1/11 10:47 PM: - we can extend that syntax like this to support multiple columns: for composite columns: {noformat} SELECT :column as (date, company), :value as price FROM stockhist WHERE ='GOOG' AND date >= '2011-01-02' LIMIT 50 {noformat} for super columns: {noformat} SELECT :parent as company, :column as (date, ), :value as (price, ) FROM stockhist WHERE company='GOOG' AND date >= '2011-01-02' AND = '...' LIMIT 50 {noformat} Also instead of :column we could use just :name, it might be even cleaner. was (Author: xedin): we can extend that syntax like this to support multiple columns: {noformat} SELECT :column as (date, company), :value as (price, name) FROM stockhist WHERE ='GOOG' AND date >= '2011-01-02' LIMIT 50 {noformat} {noformat} SELECT :parent as company, :column as (date, ), :value as (price, ) FROM stockhist WHERE company='GOOG' AND date >= '2011-01-02' AND = '...' LIMIT 50 {noformat} Also instead of :column we could use just :name, it might be even cleaner. > CQL support for compound columns > > > Key: CASSANDRA-2474 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2474 > Project: Cassandra > Issue Type: Sub-task > Components: API, Core >Reporter: Eric Evans >Assignee: Pavel Yaskevich > Labels: cql > Fix For: 1.0 > > Attachments: screenshot-1.jpg, screenshot-2.jpg > > > For the most part, this boils down to supporting the specification of > compound column names (the CQL syntax is colon-delimted terms), and then > teaching the decoders (drivers) to create structures from the results. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-2474) CQL support for compound columns
[ https://issues.apache.org/jira/browse/CASSANDRA-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13095653#comment-13095653 ] Pavel Yaskevich edited comment on CASSANDRA-2474 at 9/1/11 10:44 PM: - we can extend that syntax like this to support multiple columns: {noformat} SELECT :column as (date, company), :value as (price, name) FROM stockhist WHERE ='GOOG' AND date >= '2011-01-02' LIMIT 50 {noformat} {noformat} SELECT :parent as company, :column as (date, ), :value as (price, ) FROM stockhist WHERE company='GOOG' AND date >= '2011-01-02' AND = '...' LIMIT 50 {noformat} Also instead of :column we could use just :name, it might be even cleaner. was (Author: xedin): we can extend that syntax like this to support multiple columns: {noformat} SELECT :column as (date, company), :value as (price, name) FROM stockhist WHERE ='GOOG' AND date >= '2011-01-02' LIMIT 50 {noformat} {noformat} SELECT :parent as company, :column as (date, ), :value as (price, ) FROM stockhist WHERE company='GOOG' AND date >= '2011-01-02' AND = '...' LIMIT 50 {noformat} > CQL support for compound columns > > > Key: CASSANDRA-2474 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2474 > Project: Cassandra > Issue Type: Sub-task > Components: API, Core >Reporter: Eric Evans >Assignee: Pavel Yaskevich > Labels: cql > Fix For: 1.0 > > Attachments: screenshot-1.jpg, screenshot-2.jpg > > > For the most part, this boils down to supporting the specification of > compound column names (the CQL syntax is colon-delimted terms), and then > teaching the decoders (drivers) to create structures from the results. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-2474) CQL support for compound columns
[ https://issues.apache.org/jira/browse/CASSANDRA-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13095618#comment-13095618 ] Jonathan Ellis edited comment on CASSANDRA-2474 at 9/1/11 9:31 PM: --- Illustration time. (See: screenshot-1.jpg) We have a "wide row" of stock histories. Just "normal" columns, no composites or supercolumns. We could use transposition syntax to get these as "normal" (not sparse) resultset: {code} SELECT date, price FROM stockhist WHERE symbol='GOOG' AND date >= '2011-01-02' LIMIT 50 {code} Which would yield {noformat} dateprice 2011-01-02 $75.23 2011-01-03 $82.11 {noformat} As mentioned above, we need to figure out how to allow adding the date/price aliases to refer to column name/value, and extend that to the more general composite column case. was (Author: jbellis): Illustration time. (See: screenshot-1.jpg) We have a "wide row" of stock histories. Just "normal" columns, no composites or supercolumns. We could use transposition syntax to get these as "normal" (not sparse) resultset: {code} SELECT date, value FROM stockhist WHERE symbol='GOOG' AND date >= '2011-01-02' LIMIT 50 {code} Which would yield {noformat} datevalue 2011-01-02 $75.23 2011-01-03 $82.11 {noformat} > CQL support for compound columns > > > Key: CASSANDRA-2474 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2474 > Project: Cassandra > Issue Type: Sub-task > Components: API, Core >Reporter: Eric Evans >Assignee: Pavel Yaskevich > Labels: cql > Fix For: 1.0 > > Attachments: screenshot-1.jpg > > > For the most part, this boils down to supporting the specification of > compound column names (the CQL syntax is colon-delimted terms), and then > teaching the decoders (drivers) to create structures from the results. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-2474) CQL support for compound columns
[ https://issues.apache.org/jira/browse/CASSANDRA-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13088958#comment-13088958 ] Jonathan Ellis edited comment on CASSANDRA-2474 at 8/22/11 7:44 PM: bq. Rather than trying to support all this flexibility on the fly in Hive, why don't we let users create transposed projections in the DML? I like that idea, as the next best thing to finding a syntax that works well for both Hive and CQL. So for CQL, I think we're close with the "transposed" idea. As explained above, this allows using "'real' WHERE clauses [so] we get all the benefits of regularity that implies." I did note one problem with the transposed(x, y, z) syntax, though: you lose the ability to compose with other functions in the SELECT clause. Currently that just means count(), but it's conceivable that we'd add scalar functions like sqrt(), or even aggregation like sum(). So I propose adding ".transposed" to the table name instead: {noformat} SELECT tweet_id, username, timestamp FROM timeline.transposed WHERE user_id='jbellis' LIMIT 50 {noformat} was (Author: jbellis): bq. Rather than trying to support all this flexibility on the fly in Hive, why don't we let users create transposed projections in the DML? I like that idea, as the next best thing to finding a syntax that works well for both Hive and CQL. So for CQL, I think we're close with the "transposed" idea. As explained above, "we have "real" WHERE clauses we get all the benefits of regularity that implies." I did note one problem with the transposed(x, y, z) syntax, though: you lose the ability to compose with other functions in the SELECT clause. Currently that just means count(), but it's conceivable that we'd add scalar functions like sqrt(), or even aggregation like sum(). So I propose adding ".transposed" to the table name instead: {noformat} SELECT tweet_id, username, timestamp FROM timeline.transposed WHERE user_id='jbellis' LIMIT 50 {noformat} > CQL support for compound columns > > > Key: CASSANDRA-2474 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2474 > Project: Cassandra > Issue Type: Sub-task > Components: API, Core >Reporter: Eric Evans > Labels: cql > Fix For: 1.0 > > > For the most part, this boils down to supporting the specification of > compound column names (the CQL syntax is colon-delimted terms), and then > teaching the decoders (drivers) to create structures from the results. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-2474) CQL support for compound columns
[ https://issues.apache.org/jira/browse/CASSANDRA-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13088738#comment-13088738 ] T Jake Luciani edited comment on CASSANDRA-2474 at 8/22/11 2:49 PM: Rather than trying to support all this flexibility on the fly in Hive, why don't we let users create transposed projections in the DML? I think this is reasonable vs CQL language because CQL is the source of truth for the DML where-as Hive refers to the meta info explicitly. Example for timeseries supercol wide rows: {code} CREATE EXTERNAL TABLE timeline(user_id string, tweet_id long, username string, timestamp long) STORED BY 'org.apache.hadoop.hive.cassandra.CassandraStorageHandler' WITH SERDEPROPERTIES ( "cassandra.columns.mapping" = ":key,:supercol,:subcol:uname,:subcol:ts") {code} :key, :supercol, :subcol are reserved keywords Example for composite/supercolumn level: {code} CREATE EXTERNAL TABLE retweets(user_id string, tweet_id long, retweet_id long) STORED BY 'org.apache.hadoop.hive.cassandra.CassandraStorageHandler' WITH SERDEPROPERTIES ( "cassandra.columns.mapping" = ":key,retweet:component1,retweet:component2) {code} This means you might have to create many projections on the source data in Hive to support your analysis but this feels more OLAP than the CQL transpose on the fly approach. Also, we could still support the transpose() function syntax using a passthrough UTF so the same syntax works across CQL and HiveQL was (Author: tjake): Rather than trying to support all this flexibility on the fly in Hive, why don't we let users create transposed projections in the DML? I think this is reasonable vs CQL language because CQL is the source of truth for the DML where-as Hive refers to the meta info explicitly. Example for timeseries supercol wide rows: {code} CREATE EXTERNAL TABLE timeline(user_id string, tweet_id long, username string, timestamp long) STORED BY 'org.apache.hadoop.hive.cassandra.CassandraStorageHandler' WITH SERDEPROPERTIES ( "cassandra.columns.mapping" = ":key,:supercol,:subcol:uname,:subcol:ts") {code} :key, :supercol, :subcol are reserved keywords Example for composite/supercolumn level: {code} CREATE EXTERNAL TABLE retweets(user_id string, tweet_id long, retweet_id long) STORED BY 'org.apache.hadoop.hive.cassandra.CassandraStorageHandler' WITH SERDEPROPERTIES ( "cassandra.columns.mapping" = ":key,:component1,:component2) {code} This means you might have to create many projections on the source data in Hive to support your analysis but this feels more OLAP than the CQL transpose on the fly approach. Also, we could still support the transpose() function syntax using a passthrough UTF so the same syntax works across CQL and HiveQL > CQL support for compound columns > > > Key: CASSANDRA-2474 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2474 > Project: Cassandra > Issue Type: Sub-task > Components: API, Core >Reporter: Eric Evans > Labels: cql > Fix For: 1.0 > > > For the most part, this boils down to supporting the specification of > compound column names (the CQL syntax is colon-delimted terms), and then > teaching the decoders (drivers) to create structures from the results. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-2474) CQL support for compound columns
[ https://issues.apache.org/jira/browse/CASSANDRA-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13088738#comment-13088738 ] T Jake Luciani edited comment on CASSANDRA-2474 at 8/22/11 2:48 PM: Rather than trying to support all this flexibility on the fly in Hive, why don't we let users create transposed projections in the DML? I think this is reasonable vs CQL language because CQL is the source of truth for the DML where-as Hive refers to the meta info explicitly. Example for timeseries supercol wide rows: {code} CREATE EXTERNAL TABLE timeline(user_id string, tweet_id long, username string, timestamp long) STORED BY 'org.apache.hadoop.hive.cassandra.CassandraStorageHandler' WITH SERDEPROPERTIES ( "cassandra.columns.mapping" = ":key,:supercol,:subcol:uname,:subcol:ts") {code} :key, :supercol, :subcol are reserved keywords Example for composite/supercolumn level: {code} CREATE EXTERNAL TABLE retweets(user_id string, tweet_id long, retweet_id long) STORED BY 'org.apache.hadoop.hive.cassandra.CassandraStorageHandler' WITH SERDEPROPERTIES ( "cassandra.columns.mapping" = ":key,:component1,:component2) {code} This means you might have to create many projections on the source data in Hive to support your analysis but this feels more OLAP than the CQL transpose on the fly approach. Also, we could still support the transpose() function syntax using a passthrough UTF so the same syntax works across CQL and HiveQL was (Author: tjake): Rather than trying to support all this flexibility on the fly in Hive, why don't we let users create transposed projections in the DML? I think this is reasonable vs CQL language because CQL is the source of truth for the DDL where-as Hive refers to the meta info explicitly. Example for timeseries supercol wide rows: {code} CREATE EXTERNAL TABLE timeline(user_id string, tweet_id long, username string, timestamp long) STORED BY 'org.apache.hadoop.hive.cassandra.CassandraStorageHandler' WITH SERDEPROPERTIES ( "cassandra.columns.mapping" = ":key,:supercol,:subcol:uname,:subcol:ts") {code} :key, :supercol, :subcol are reserved keywords Example for composite/supercolumn level: {code} CREATE EXTERNAL TABLE retweets(user_id string, tweet_id long, retweet_id long) STORED BY 'org.apache.hadoop.hive.cassandra.CassandraStorageHandler' WITH SERDEPROPERTIES ( "cassandra.columns.mapping" = ":key,:component1,:component2) {code} This means you might have to create many projections on the source data in Hive to support your analysis but this feels more OLAP than the CQL transpose on the fly approach. Also, we could still support the transpose() function syntax using a passthrough UTF so the same syntax works across CQL and HiveQL > CQL support for compound columns > > > Key: CASSANDRA-2474 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2474 > Project: Cassandra > Issue Type: Sub-task > Components: API, Core >Reporter: Eric Evans > Labels: cql > Fix For: 1.0 > > > For the most part, this boils down to supporting the specification of > compound column names (the CQL syntax is colon-delimted terms), and then > teaching the decoders (drivers) to create structures from the results. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-2474) CQL support for compound columns
[ https://issues.apache.org/jira/browse/CASSANDRA-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13087831#comment-13087831 ] Jonathan Ellis edited comment on CASSANDRA-2474 at 8/19/11 5:48 PM: That would come from the schema, similar to how we have key_alias. (Implementation TBD, especially for "dynamic composite columns." For those we might just have to reserve component1, component2, etc. names) was (Author: jbellis): That would come from the schema, similar to how we have key_alias. (Implementation TBD, especially for "dynamic composite columns.") > CQL support for compound columns > > > Key: CASSANDRA-2474 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2474 > Project: Cassandra > Issue Type: Sub-task > Components: API, Core >Reporter: Eric Evans > Labels: cql > Fix For: 1.0 > > > For the most part, this boils down to supporting the specification of > compound column names (the CQL syntax is colon-delimted terms), and then > teaching the decoders (drivers) to create structures from the results. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-2474) CQL support for compound columns
[ https://issues.apache.org/jira/browse/CASSANDRA-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13087831#comment-13087831 ] Jonathan Ellis edited comment on CASSANDRA-2474 at 8/19/11 5:47 PM: That would come from the schema, similar to how we have key_alias. (Implementation TBD, especially for "dynamic composite columns.") was (Author: jbellis): That would come from the schema. > CQL support for compound columns > > > Key: CASSANDRA-2474 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2474 > Project: Cassandra > Issue Type: Sub-task > Components: API, Core >Reporter: Eric Evans > Labels: cql > Fix For: 1.0 > > > For the most part, this boils down to supporting the specification of > compound column names (the CQL syntax is colon-delimted terms), and then > teaching the decoders (drivers) to create structures from the results. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-2474) CQL support for compound columns
[ https://issues.apache.org/jira/browse/CASSANDRA-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13087765#comment-13087765 ] Jonathan Ellis edited comment on CASSANDRA-2474 at 8/19/11 3:36 PM: I need to back up. Neither transposed() nor transposed(column) gives us the power we need, unless we layer subqueries on top. Let me use a more complete example to illustrate why. Suppose we have a Twissandra data model where the timeline storage is done "properly" in supercolumns. Then we might want to say, "give me the most recent 50 tweets in user X's timeline:" {code} SELECT transposed(*) FROM timeline WHERE key=X LIMIT 50 {code} So far so good. (Note again how this obviates the need for the existing, cumbersome "column limit" syntax for ".." slices.) But what if we just want, say, the username and timestamp from each entry? {code} SELECT transposed(username, timestamp) FROM timeline WHERE key=X LIMIT 50 {code} That's where I was going with the multiple parameters to transposed(). Now, for the WHERE clause components... I guess for metastore-less Hive that's ok. Or even, as you originally suggested, just leave it out and say "you can only do full-CF scans without a C* metastore." I'm okay with that. But for CQL we need to be able to give meaningful names to things, e.g., a row full of supercolumns representing tweets could be "tweet_id" instead of "supercolumn." So it would be nice to support this somehow in "full metastore Hive." This is important because we want to generalize beyond the current "one level of nesting" you get with supercolumns, to arbitrary with composite columns. So, {code} SELECT transposed(tweet_id, -- supercolumn username, timestamp) -- subcolumns FROM timeline WHERE user_id='jbellis' -- key LIMIT 50 {code} Another example. Suppose I have a "retweets" CF, where I have a composite columns of (tweet_id, retweet_id). (You can think of this as simply supercolumn/subcolumn, the difference being that composites won't deserialize the entire parent container to fetch a single subcolumn or subcolumn slice.) And I want the most recent retweets, for a given tweet: {code} SELECT transposed(retweet_id) -- second-level column name / subcolumn FROM timeline WHERE tweet_id=1b09d795-f038-4cf9-a996-5678fb092c3a -- first-level column name / supercolumn AND user_id='jbellis' -- key LIMIT 50 {code} was (Author: jbellis): I need to back up. Neither transposed() nor transposed(column) gives us the power we need, unless we layer subqueries on top. Let me use a more complete example to illustrate why. Suppose we have a Twissandra data model where the timeline storage is done "properly" in supercolumns. Then we might want to say, "give me the most recent 50 tweets in user X's timeline:" {code} SELECT transposed(*) FROM foo WHERE key=X LIMIT 50 {code} So far so good. (Note again how this obviates the need for the existing, cumbersome "column limit" syntax for ".." slices.) But what if we just want, say, the username and timestamp from each entry? {code} SELECT transposed(username, timestamp) FROM foo WHERE key=X LIMIT 50 {code} That's where I was going with the multiple parameters to transposed(). Now, for the WHERE clause components... I guess for metastore-less Hive that's ok. Or even, as you originally suggested, just leave it out and say "you can only do full-CF scans without a C* metastore." I'm okay with that. But for CQL we need to be able to give meaningful names to things, e.g., a row full of supercolumns representing tweets could be "tweet_id" instead of "supercolumn." So it would be nice to support this somehow in "full metastore Hive." This is important because we want to generalize beyond the current "one level of nesting" you get with supercolumns, to arbitrary with composite columns. So, {code} SELECT transposed(tweet_id, -- supercolumn username, timestamp) -- subcolumns FROM timeline WHERE user_id='jbellis' -- key LIMIT 50 {code} Another example. Suppose I have a "retweets" CF, where I have a composite columns of (tweet_id, retweet_id). (You can think of this as simply supercolumn/subcolumn, the difference being that composites won't deserialize the entire parent container to fetch a single subcolumn or subcolumn slice.) And I want the most recent retweets, for a given tweet: {code} SELECT transposed(retweet_id) -- second-level column name / subcolumn FROM timeline WHERE tweet_id=1b09d795-f038-4cf9-a996-5678fb092c3a -- first-level column name / supercolumn AND user_id='jbellis' -- key LIMIT 50 {code} > CQL support for compound columns > > > Key: CASSANDRA-2474 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2474 > Project: Cassandra > Issue Type: Sub-tas
[jira] [Issue Comment Edited] (CASSANDRA-2474) CQL support for compound columns
[ https://issues.apache.org/jira/browse/CASSANDRA-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13087765#comment-13087765 ] Jonathan Ellis edited comment on CASSANDRA-2474 at 8/19/11 3:36 PM: I need to back up. Neither transposed() nor transposed(column) gives us the power we need, unless we layer subqueries on top. Let me use a more complete example to illustrate why. Suppose we have a Twissandra data model where the timeline storage is done "properly" in supercolumns. Then we might want to say, "give me the most recent 50 tweets in user X's timeline:" {code} SELECT transposed(*) FROM timeline WHERE key=X LIMIT 50 {code} So far so good. (Note again how this obviates the need for the existing, cumbersome "column limit" syntax for ".." slices.) But what if we just want, say, the username and timestamp from each entry? {code} SELECT transposed(username, timestamp) FROM timeline WHERE key=X LIMIT 50 {code} That's where I was going with the multiple parameters to transposed(). Now, for the WHERE clause components... I guess for metastore-less Hive that's ok. Or even, as you originally suggested, just leave it out and say "you can only do full-CF scans without a C* metastore." I'm okay with that. But for CQL we need to be able to give meaningful names to things, e.g., a row full of supercolumns representing tweets could be "tweet_id" instead of "supercolumn." So it would be nice to support this somehow in "full metastore Hive." This is important because we want to generalize beyond the current "one level of nesting" you get with supercolumns, to arbitrary with composite columns. So, {code} SELECT transposed(tweet_id, -- supercolumn username, timestamp) -- subcolumns FROM timeline WHERE user_id='jbellis' -- key LIMIT 50 {code} Another example. Suppose I have a "retweets" CF, where I have a composite columns of (tweet_id, retweet_id). (You can think of this as simply supercolumn/subcolumn, the difference being that composites won't deserialize the entire parent container to fetch a single subcolumn or subcolumn slice.) And I want the most recent retweets, for a given tweet: {code} SELECT transposed(retweet_id) -- second-level column name / subcolumn FROM retweets WHERE tweet_id=1b09d795-f038-4cf9-a996-5678fb092c3a -- first-level column name / supercolumn AND user_id='jbellis' -- key LIMIT 50 {code} was (Author: jbellis): I need to back up. Neither transposed() nor transposed(column) gives us the power we need, unless we layer subqueries on top. Let me use a more complete example to illustrate why. Suppose we have a Twissandra data model where the timeline storage is done "properly" in supercolumns. Then we might want to say, "give me the most recent 50 tweets in user X's timeline:" {code} SELECT transposed(*) FROM timeline WHERE key=X LIMIT 50 {code} So far so good. (Note again how this obviates the need for the existing, cumbersome "column limit" syntax for ".." slices.) But what if we just want, say, the username and timestamp from each entry? {code} SELECT transposed(username, timestamp) FROM timeline WHERE key=X LIMIT 50 {code} That's where I was going with the multiple parameters to transposed(). Now, for the WHERE clause components... I guess for metastore-less Hive that's ok. Or even, as you originally suggested, just leave it out and say "you can only do full-CF scans without a C* metastore." I'm okay with that. But for CQL we need to be able to give meaningful names to things, e.g., a row full of supercolumns representing tweets could be "tweet_id" instead of "supercolumn." So it would be nice to support this somehow in "full metastore Hive." This is important because we want to generalize beyond the current "one level of nesting" you get with supercolumns, to arbitrary with composite columns. So, {code} SELECT transposed(tweet_id, -- supercolumn username, timestamp) -- subcolumns FROM timeline WHERE user_id='jbellis' -- key LIMIT 50 {code} Another example. Suppose I have a "retweets" CF, where I have a composite columns of (tweet_id, retweet_id). (You can think of this as simply supercolumn/subcolumn, the difference being that composites won't deserialize the entire parent container to fetch a single subcolumn or subcolumn slice.) And I want the most recent retweets, for a given tweet: {code} SELECT transposed(retweet_id) -- second-level column name / subcolumn FROM timeline WHERE tweet_id=1b09d795-f038-4cf9-a996-5678fb092c3a -- first-level column name / supercolumn AND user_id='jbellis' -- key LIMIT 50 {code} > CQL support for compound columns > > > Key: CASSANDRA-2474 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2474 > Project: Cassandra > Issue Typ
[jira] [Issue Comment Edited] (CASSANDRA-2474) CQL support for compound columns
[ https://issues.apache.org/jira/browse/CASSANDRA-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13087699#comment-13087699 ] T Jake Luciani edited comment on CASSANDRA-2474 at 8/19/11 1:27 PM: I think the best we can do with no major changes to HiveQL is: {code} SELECT transposed(column) FROM foo WHERE key=X {code} This will return: rowkey, subcol1, subcol2 ... You could then also do supercol slices like transposed(column, start, end) Looking at hive you can make functions return many columns: http://massapi.com/source/hive-0.5.0-dev/src/contrib/src/java/org/apache/hadoop/hive/contrib/udtf/example/GenericUDTFExplode2.java.html was (Author: tjake): I think the best we can do with no major changes to HiveQL is: {code} SELECT transposed(column) FROM foo WHERE key=X {code} This will return: rowkey, column, subcol, value You could then also do supercol slices like transposed(column, start, end) Looking at hive you can make functions return many columns: http://massapi.com/source/hive-0.5.0-dev/src/contrib/src/java/org/apache/hadoop/hive/contrib/udtf/example/GenericUDTFExplode2.java.html > CQL support for compound columns > > > Key: CASSANDRA-2474 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2474 > Project: Cassandra > Issue Type: Sub-task > Components: API, Core >Reporter: Eric Evans > Labels: cql > Fix For: 1.0 > > > For the most part, this boils down to supporting the specification of > compound column names (the CQL syntax is colon-delimted terms), and then > teaching the decoders (drivers) to create structures from the results. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-2474) CQL support for compound columns
[ https://issues.apache.org/jira/browse/CASSANDRA-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13047331#comment-13047331 ] Jonathan Ellis edited comment on CASSANDRA-2474 at 6/10/11 6:22 PM: bq. you can specify the following column mappings in cassandra.columns.mapping Okay, I get it. The reason there is no explanation of WHERE and resultsets is because these are not supported, for the reason deduced above in the N*M explanation. So you are limited to the number of levels of nesting that you have magic names for? bq. you are going to be doing a map reduce on most/all the data anyway The right way to represent a "select * from foo" with no compound modification is to present it with compound keys (in the SQL sense not in the sense of "multiple keys crammed into a single value"): {noformat} rowkey columnkey subcol1subcol2 key1 col1val1 val2 key1 col2v3 v4 {noformat} was (Author: jbellis): bq. you can specify the following column mappings in cassandra.columns.mapping So you are limited to the number of levels of nesting that you have magic names for? bq. you are going to be doing a map reduce on most/all the data anyway The right way to represent a "select * from foo" with no compound modification is to present it with compound keys (in the SQL sense not in the sense of "multiple keys crammed into a single value"): {noformat} rowkey columnkey subcol1subcol2 key1 col1val1 val2 key1 col2v3 v4 {noformat} > CQL support for compound columns > > > Key: CASSANDRA-2474 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2474 > Project: Cassandra > Issue Type: Sub-task > Components: API, Core >Reporter: Eric Evans > Labels: cql > Fix For: 1.0 > > > For the most part, this boils down to supporting the specification of > compound column names (the CQL syntax is colon-delimted terms), and then > teaching the decoders (drivers) to create structures from the results. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Issue Comment Edited] (CASSANDRA-2474) CQL support for compound columns
[ https://issues.apache.org/jira/browse/CASSANDRA-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13046937#comment-13046937 ] T Jake Luciani edited comment on CASSANDRA-2474 at 6/10/11 12:33 AM: - Doesn't this limit things to one row key? If I have a CF of wide rows it would be nice to get back a transposed view of all rows.. you could use a SQL hint syntax like: {noformat} select /*+TRANSPOSED*/ key, column, subcolumn, value from foo; {noformat} was (Author: tjake): Doesn't this limit things to one row key? If I have a CF of wide rows it would be nice to get back a transposed view of all rows.. you could use a SQL hint syntax like: select /*+TRANSPOSED*/ key, column, subcolumn, value from foo; > CQL support for compound columns > > > Key: CASSANDRA-2474 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2474 > Project: Cassandra > Issue Type: Sub-task > Components: API, Core >Reporter: Eric Evans > Labels: cql > Fix For: 1.0 > > > For the most part, this boils down to supporting the specification of > compound column names (the CQL syntax is colon-delimted terms), and then > teaching the decoders (drivers) to create structures from the results. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira