Column families for "most recent data", (a.k.a. size-safe wide rows)
--------------------------------------------------------------------

                 Key: CASSANDRA-3999
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3999
             Project: Cassandra
          Issue Type: New Feature
          Components: Core
            Reporter: Ahmet AKYOL


"Wide row design" is very handy (for time series data) and on the other hand we 
have to keep each row size around an acceptable amount. Then, we need buckets; 
right? Monthly, daily or even hourly buckets... The problem with bucket 
approach is the distribution of data in rows (as always). 

So, why not to tell cassandra we want a column family like LRU cache but on 
disk. If we start design from queries we usually end up with "most recent data" 
queries. This "size safe wide rows" approach can be very useful in many use 
cases.

Here are some example hypothetical column family storage parameters :

max_column_number_hint : 1000 // meaning: try to keep around 1000 columns. 
Since it's a hint, we(users) are OK with tombstones or 800 - 1200 range

or

max_row_size_hint : 1MB

I don't know "Cassandra Internals" but C* has already background jobs( for 
compaction,deletion and ttl) and columns already have timestamps. So both from 
user point of view and C*, it makes sense.

P.S: Sorry for my poor English and it's my very first "issue" :)


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to