[ https://issues.apache.org/jira/browse/CASSANDRA-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13098165#comment-13098165 ]
Sylvain Lebresne edited comment on CASSANDRA-2474 at 9/6/11 5:07 PM: --------------------------------------------------------------------- bq. A more Cassandra-ish way to model this would be to encode this as a series of columns: (<timestamp>, 'category', <category>), (<timestamp>, 'subcategory', <subcategory>), (<timestamp>, 'event', <eventId>). This is better in the general case for the same reason that a sparse top-level set of columns is better: I can easily add more data to events (e.g., "source") without rewriting existing events. But my point is: I disagree with that claim. Maybe sometime your proposal is better, but not always. What if you know that you won't add more data to events. Or more precisely, you know that what identify an event won't change. What if you decided to model it with a (timestamp, category, sub-category, eventId) composite not as a way to feed data into the column key, but because this correspond to how you want to query the data (which I would say is a very cassandra-ish way to model). Let's take an example. The data for the (timestamp, category, sub-category, eventId) composite (for some key) could look like (on disk): {noformat} ts1:catA:subcatA:id1 -> <value> ts1:catA:subcatA:id2 -> <value> ts1:catA:subcatA:id3 -> <value> ts1:catA:subcatA:id4 -> <value> ts1:catA:subcatB:id5 -> <value> ts1:catA:subcatB:id6 -> <value> ts1:catB:subcatA:id7 -> <value> ts1:catB:subcatA:id8 -> <value> .... {noformat} And say that value is some opaque bytes representing some event data. Now, I'm not even sure how you model the same thing with your proposal, but I'm pretty sure it will involve indirections (or duplication), I doubt it will be more user friendly and you will need more than one query (I would have said 3 queries at first but after trying to see how it would look like I'm not even sure I see where you would put the value in your proposal) to do query like: * give me all the events (eventid and value) for (ts1, catA, subcatA) * give me all the events (eventid and value) for (ts1, catA) * give me all the events (eventid and value) for ts1 because the events would not be ordered correctly. The kind of modeling you propose would make sense if the <value> for an event above was not opaque but composed of a number of property. They yes, I may would want to model things as: {noformat} ts1:catA:subcatA:id1:prop1 -> <value_prop1> ts1:catA:subcatA:id1:prop2 -> <value_prop2> ts1:catA:subcatA:id1:prop3 -> <value_prop3> ts1:catA:subcatA:id2:prop1 -> <value_prop1> ts1:catA:subcatA:id2:prop2 -> <value_prop2> ts1:catA:subcatA:id2:prop3 -> <value_prop3> ts1:catA:subcatA:id3:prop1 -> <value_prop1> ts1:catA:subcatA:id3:prop2 -> <value_prop2> ts1:catA:subcatA:id3:prop3 -> <value_prop3> ... {noformat} because that doesn't screw up with the sorting I'm trying to impose (and that correspond to my queries). And btw, prop1 could 'category' (though that would be redundant in that case). But there is two different thing: # the first part of the key (ts1:catA:subcatA:id1) is the key to my object. It is what makes the ordering corresponding to my queries. # the last component (prop1, ...) is just the way to express the different properties of my object (and just a way to emulate super columns after all). So I guess what I'm arguing here is just to not forget the case where you use CompositeType because your column key do is intrinsically composed of multiple parts. Because it *is* very useful. was (Author: slebresne): bq. A more Cassandra-ish way to model this would be to encode this as a series of columns: (<timestamp>, 'category', <category>), (<timestamp>, 'subcategory', <subcategory>), (<timestamp>, 'event', <eventId>). This is better in the general case for the same reason that a sparse top-level set of columns is better: I can easily add more data to events (e.g., "source") without rewriting existing events. But my point is: I disagree with that claim. Maybe sometime your proposal is better, but not always. What if you know that you won't add more data to events. Or more precisely, you know that what identify an event won't change. What if you decided to model it with a (timestamp, category, sub-category, eventId) composite not as a way to feed data into the column key, but because this correspond to how you want to query the data (which I would say is a very cassandra-ish way to model). Let's take an example. The data for the (timestamp, category, sub-category, eventId) composite (for some key) could look like (on disk): {noformat} ts1:catA:subcatA:id1 -> <value> ts1:catA:subcatA:id2 -> <value> ts1:catA:subcatA:id3 -> <value> ts1:catA:subcatA:id4 -> <value> ts1:catA:subcatB:id5 -> <value> ts1:catA:subcatB:id6 -> <value> ts1:catB:subcatA:id7 -> <value> ts1:catB:subcatA:id8 -> <value> .... {noformat} And say that value is some opaque bytes representing some event data. Now, I'm not even sure how you model the same thing with your proposal, but I'm pretty sure it will involve indirections (or duplication), I doubt it will be more user friendly and you will need more than one query (I would have said 3 queries at first but after trying to see how it would look like I'm not even sure I see where you would put the value in your proposal) to do query like: * give me all the events (eventid and value) for (ts1, catA, subcatA) * give me all the events (eventid and value) for (ts1, catA) * give me all the events (eventid and value) for ts1 because the events would not be ordered correctly. The kind of modeling you propose would make sense if the <value> for an event above was not opaque but composed of a number of property. They yes, I may would want to model things as: {noformat} ts1:catA:subcatA:id1:prop1 -> <value_prop1> ts1:catA:subcatA:id1:prop2 -> <value_prop2> ts1:catA:subcatA:id1:prop3 -> <value_prop3> ts1:catA:subcatA:id2:prop1 -> <value_prop1> ts1:catA:subcatA:id2:prop2 -> <value_prop2> ts1:catA:subcatA:id2:prop3 -> <value_prop3> ts1:catA:subcatA:id3:prop1 -> <value_prop1> ts1:catA:subcatA:id3:prop2 -> <value_prop2> ts1:catA:subcatA:id3:prop3 -> <value_prop3> ... {noformat} because that doesn't screw up with the sorting I'm trying to impose (and that correspond to my queries). And btw, prop1 could 'category' (though that would be redundant in that case). But there is two different thing: # the first part of the key (ts1:catA:subcatA:id1) is the key to my object. It is what makes the ordering corresponding to my queries. # the last component (prop1, ...) is just the way to express the different properties of my object (and just a way to emulate super columns after all). So I guess what I'm arguing here is just to not forget the case where you use CompositeType because your column key do is intrinsically composed of multiple parts. Before it *is* useful. > CQL support for compound columns > -------------------------------- > > Key: CASSANDRA-2474 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2474 > Project: Cassandra > Issue Type: Sub-task > Components: API, Core > Reporter: Eric Evans > Assignee: Pavel Yaskevich > Labels: cql > Fix For: 1.0 > > Attachments: screenshot-1.jpg, screenshot-2.jpg > > > For the most part, this boils down to supporting the specification of > compound column names (the CQL syntax is colon-delimted terms), and then > teaching the decoders (drivers) to create structures from the results. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira