[ 
https://issues.apache.org/jira/browse/CASSANDRA-2474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13097159#comment-13097159
 ] 

Sylvain Lebresne commented on CASSANDRA-2474:
---------------------------------------------


I do agree with Eric earlier on, I think this issue could stand being 
summarized, I'm not too sure I understand what is proposed here so far. So I 
apologize in advance if it turns out the propositions made above do answer 
everything that is below.

However, it seems that we're focusing on some representation based on 
materialized views here. Did we focus on that because we consider the basic use 
cases for composite type, those where we don't use them for materialized view 
at all, are easy to deal with ?

Why not consider composite column name for what they are, *one* column name 
that is composed of multiple sub-elements ? What I mean here is, I'm not that 
sure I'm convinced that
bq. the original idea from CASSANDRA-2025 of "SELECT columnA:x, columnA:y FROM 
foo WHERE key = 'bar'" is the wrong way to go

I'm even less convinced when I see the number of comments on this ticket.

Again, there seems that the focus was exclusively on materialized views, but I 
strongly think that composite column names are useful for more than 
materialized view (I've used composite column names countless time, never for 
materialized view).

But let's take an example of what I mean. Suppose that what you store in your 
column family are events. Those events arrive with a timestamp whose resolution 
is maybe the minute (or more precisely, you only care about query them at that 
precision). Those events have a category (that may have a sorting that make 
sense), and maybe a subcategory. They also have a unique identifier eventId. 
Moreover there is a lot of events every minutes and the category/subcategory 
are not necessarily predefined. The query you want to do are typically:
  * Give me all the events for time t, category c and sub-category sc.
  * Give me all the events for time t and category c.
  * Give me all the events for time t and category c1 to c2 (where c1 < c2 for 
the category sorting)
  * Give me everything for the last 4 hours
Probably most of those would requires paging because there is shit tons of 
events but still, I want to do those fast.

I haven't found a better data model for that kind of example than using a 
composite column name where the name is (timestamp, category, sub-category, 
eventId).

I haven't found in all the discussion above anything that would allow me to do 
this better than what is in the initial proposition of CASSANDRA-2025.

Now I completely agree that having a good notation to work with materialized 
view would be great, but IMO if we try to find a syntax that is too far from 
how composite columns work, I fear we'll end up limiting the usefulness of 
composite types in CQL to one narrow use case.

I'll note too that I haven't seen any proposal of how insertion with compound 
types should look like.

> CQL support for compound columns
> --------------------------------
>
>                 Key: CASSANDRA-2474
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2474
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: API, Core
>            Reporter: Eric Evans
>            Assignee: Pavel Yaskevich
>              Labels: cql
>             Fix For: 1.0
>
>         Attachments: screenshot-1.jpg, screenshot-2.jpg
>
>
> For the most part, this boils down to supporting the specification of 
> compound column names (the CQL syntax is colon-delimted terms), and then 
> teaching the decoders (drivers) to create structures from the results.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to