Re: Data tiered compaction and data model question
What's the typical size of the data field? Unless it's very large, I don't think table 2 is a very wide row (10x20x60x24=288000 events/partition at worst). Plus you only need to store 30 days of data. The over data size is 288000x30=8,640,000 events. I am not even sure if you need C* depending on event size. On Thu, Feb 19, 2015 at 12:00 AM, cass savy casss...@gmail.com wrote: 10-20 per minute is the average. Worstcase can be 10x of avg. On Wed, Feb 18, 2015 at 4:49 PM, Mohammed Guller moham...@glassbeam.com wrote: What is the maximum number of events that you expect in a day? What is the worst-case scenario? Mohammed *From:* cass savy [mailto:casss...@gmail.com] *Sent:* Wednesday, February 18, 2015 4:21 PM *To:* user@cassandra.apache.org *Subject:* Data tiered compaction and data model question We want to track events in log Cf/table and should be able to query for events that occurred in range of mins or hours for given day. Multiple events can occur in a given minute. Listed 2 table designs and leaning towards table 1 to avoid large wide row. Please advice on *Table 1*: not very widerow, still be able to query for range of minutes for given day and/or given day and range of hours Create table *log_Event* ( event_day text, event_hr int, event_time timeuuid, data text, PRIMARY KEY (* (event_day,event_hr),*event_time) ) *Table 2: This will be very wide row* Create table *log_Event* ( event_day text, event_time timeuuid, data text, PRIMARY KEY (* event_day,*event_time) ) *Datatiered compaction: recommended for time series data as per below doc. Our data will be kept only for 30 days. Hence thought of using this compaction strategy.* http://www.datastax.com/dev/blog/datetieredcompactionstrategy Create table 1 listed above with this compaction strategy. Added some rows and did manual flush. I do not see any sstables created yet. Is that expected? compaction={'max_sstable_age_days': '1', 'class': 'DateTieredCompactionStrategy'}
RE: Data tiered compaction and data model question
Reading 288,000 rows from a partition may cause problems. It is recommended not to read more than 100k rows in a partition ((although paging may help). So Table 2 may cause issues. I agree with Kai that for you may not even need C* for this use-case. C* is ideal for data with 3 Vs: volume, velocity and variety. It doesn’t look like your data has the volume or velocity that a standard RDBMS cannot handle. Mohammed From: Kai Wang [mailto:dep...@gmail.com] Sent: Thursday, February 19, 2015 6:06 AM To: user@cassandra.apache.org Subject: Re: Data tiered compaction and data model question What's the typical size of the data field? Unless it's very large, I don't think table 2 is a very wide row (10x20x60x24=288000 events/partition at worst). Plus you only need to store 30 days of data. The over data size is 288000x30=8,640,000 events. I am not even sure if you need C* depending on event size. On Thu, Feb 19, 2015 at 12:00 AM, cass savy casss...@gmail.commailto:casss...@gmail.com wrote: 10-20 per minute is the average. Worstcase can be 10x of avg. On Wed, Feb 18, 2015 at 4:49 PM, Mohammed Guller moham...@glassbeam.commailto:moham...@glassbeam.com wrote: What is the maximum number of events that you expect in a day? What is the worst-case scenario? Mohammed From: cass savy [mailto:casss...@gmail.commailto:casss...@gmail.com] Sent: Wednesday, February 18, 2015 4:21 PM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Data tiered compaction and data model question We want to track events in log Cf/table and should be able to query for events that occurred in range of mins or hours for given day. Multiple events can occur in a given minute. Listed 2 table designs and leaning towards table 1 to avoid large wide row. Please advice on Table 1: not very widerow, still be able to query for range of minutes for given day and/or given day and range of hours Create table log_Event ( event_day text, event_hr int, event_time timeuuid, data text, PRIMARY KEY ( (event_day,event_hr),event_time) ) Table 2: This will be very wide row Create table log_Event ( event_day text, event_time timeuuid, data text, PRIMARY KEY ( event_day,event_time) ) Datatiered compaction: recommended for time series data as per below doc. Our data will be kept only for 30 days. Hence thought of using this compaction strategy. http://www.datastax.com/dev/blog/datetieredcompactionstrategy Create table 1 listed above with this compaction strategy. Added some rows and did manual flush. I do not see any sstables created yet. Is that expected? compaction={'max_sstable_age_days': '1', 'class': 'DateTieredCompactionStrategy'}
Re: Data tiered compaction and data model question
Hi Cass, just a hint from the off - if I got it right you have: Table 1: PRIMARY KEY ( (event_day,event_hr),event_time) Table 2: PRIMARY KEY (event_day,event_time) Assuming your events to write come in by wall clock time, the first table design will have a hotspot on a specific node getting all writes for a single hour as (event_day,event_hr) is the partioning key. The second table design will put this hotspot on a specific node per day as event_day is the partitoning key. So please be careful if you have a write intensive workload. I have designed my logging tables with a non datetime key in my partioning key to distribute writes to all nodes at a specific point in time. I have for example PRIMARY KEY ((sensor_id,measure_date)) and the timestamp-value pairs in the rows. They are quite wide as I have about 1 measurements per sensor and id, but analytics and cleanup jobs run daily. Of course as a not so long time cassandra user I can be wrong, please feel free to correct me. Cheers, Roland
RE: Data tiered compaction and data model question
What is the maximum number of events that you expect in a day? What is the worst-case scenario? Mohammed From: cass savy [mailto:casss...@gmail.com] Sent: Wednesday, February 18, 2015 4:21 PM To: user@cassandra.apache.org Subject: Data tiered compaction and data model question We want to track events in log Cf/table and should be able to query for events that occurred in range of mins or hours for given day. Multiple events can occur in a given minute. Listed 2 table designs and leaning towards table 1 to avoid large wide row. Please advice on Table 1: not very widerow, still be able to query for range of minutes for given day and/or given day and range of hours Create table log_Event ( event_day text, event_hr int, event_time timeuuid, data text, PRIMARY KEY ( (event_day,event_hr),event_time) ) Table 2: This will be very wide row Create table log_Event ( event_day text, event_time timeuuid, data text, PRIMARY KEY ( event_day,event_time) ) Datatiered compaction: recommended for time series data as per below doc. Our data will be kept only for 30 days. Hence thought of using this compaction strategy. http://www.datastax.com/dev/blog/datetieredcompactionstrategy Create table 1 listed above with this compaction strategy. Added some rows and did manual flush. I do not see any sstables created yet. Is that expected? compaction={'max_sstable_age_days': '1', 'class': 'DateTieredCompactionStrategy'}
Data tiered compaction and data model question
We want to track events in log Cf/table and should be able to query for events that occurred in range of mins or hours for given day. Multiple events can occur in a given minute. Listed 2 table designs and leaning towards table 1 to avoid large wide row. Please advice on *Table 1*: not very widerow, still be able to query for range of minutes for given day and/or given day and range of hours Create table *log_Event* ( event_day text, event_hr int, event_time timeuuid, data text, PRIMARY KEY ( *(event_day,event_hr),*event_time) ) *Table 2: This will be very wide row* Create table *log_Event* ( event_day text, event_time timeuuid, data text, PRIMARY KEY ( *event_day,*event_time) ) *Datatiered compaction: recommended for time series data as per below doc. Our data will be kept only for 30 days. Hence thought of using this compaction strategy.* http://www.datastax.com/dev/blog/datetieredcompactionstrategy Create table 1 listed above with this compaction strategy. Added some rows and did manual flush. I do not see any sstables created yet. Is that expected? compaction={'max_sstable_age_days': '1', 'class': 'DateTieredCompactionStrategy'}