Re: Data tiered compaction and data model question

2015-02-19 Thread Kai Wang
What's the typical size of the data field? Unless it's very large, I don't
think table 2 is a very wide row (10x20x60x24=288000 events/partition at
worst). Plus you only need to store 30 days of data. The over data size is
288000x30=8,640,000 events. I am not even sure if you need C* depending on
event size.

On Thu, Feb 19, 2015 at 12:00 AM, cass savy casss...@gmail.com wrote:

 10-20 per minute is the average. Worstcase can be 10x of avg.

 On Wed, Feb 18, 2015 at 4:49 PM, Mohammed Guller moham...@glassbeam.com
 wrote:

  What is the maximum number of events that you expect in a day? What is
 the worst-case scenario?



 Mohammed



 *From:* cass savy [mailto:casss...@gmail.com]
 *Sent:* Wednesday, February 18, 2015 4:21 PM
 *To:* user@cassandra.apache.org
 *Subject:* Data tiered compaction and data model question



 We want to track events in log  Cf/table and should be able to query for
 events that occurred in range of mins or hours for given day. Multiple
 events can occur in a given minute.  Listed 2 table designs and leaning
 towards table 1 to avoid large wide row.  Please advice on



 *Table 1*: not very widerow, still be able to query for range of minutes
 for given day

 and/or given day and range of hours

 Create table *log_Event*

 (

  event_day text,

  event_hr int,

  event_time timeuuid,

  data text,

 PRIMARY KEY (* (event_day,event_hr),*event_time)

 )

 *Table 2: This will be very wide row*



 Create table *log_Event*

 ( event_day text,

  event_time timeuuid,

  data text,

 PRIMARY KEY (* event_day,*event_time)

 )



 *Datatiered compaction: recommended for time series data as per below
 doc. Our data will be kept only for 30 days. Hence thought of using this
 compaction strategy.*

 http://www.datastax.com/dev/blog/datetieredcompactionstrategy

 Create table 1 listed above with this compaction strategy. Added some
 rows and did manual flush.  I do not see any sstables created yet. Is that
 expected?

  compaction={'max_sstable_age_days': '1', 'class':
 'DateTieredCompactionStrategy'}







RE: Data tiered compaction and data model question

2015-02-19 Thread Mohammed Guller
Reading 288,000 rows from a partition may cause problems. It is recommended not 
to read more than 100k rows in a partition ((although paging may help). So 
Table 2 may cause issues.

I agree with Kai that for you may not even need C* for this use-case. C* is 
ideal for data with  3 Vs: volume, velocity and variety. It doesn’t look like 
your data has the volume or velocity that a standard RDBMS cannot handle.

Mohammed

From: Kai Wang [mailto:dep...@gmail.com]
Sent: Thursday, February 19, 2015 6:06 AM
To: user@cassandra.apache.org
Subject: Re: Data tiered compaction and data model question

What's the typical size of the data field? Unless it's very large, I don't 
think table 2 is a very wide row (10x20x60x24=288000 events/partition at 
worst). Plus you only need to store 30 days of data. The over data size is 
288000x30=8,640,000 events. I am not even sure if you need C* depending on 
event size.

On Thu, Feb 19, 2015 at 12:00 AM, cass savy 
casss...@gmail.commailto:casss...@gmail.com wrote:
10-20 per minute is the average. Worstcase can be 10x of avg.

On Wed, Feb 18, 2015 at 4:49 PM, Mohammed Guller 
moham...@glassbeam.commailto:moham...@glassbeam.com wrote:
What is the maximum number of events that you expect in a day? What is the 
worst-case scenario?

Mohammed

From: cass savy [mailto:casss...@gmail.commailto:casss...@gmail.com]
Sent: Wednesday, February 18, 2015 4:21 PM
To: user@cassandra.apache.orgmailto:user@cassandra.apache.org
Subject: Data tiered compaction and data model question

We want to track events in log  Cf/table and should be able to query for events 
that occurred in range of mins or hours for given day. Multiple events can 
occur in a given minute.  Listed 2 table designs and leaning towards table 1 to 
avoid large wide row.  Please advice on

Table 1: not very widerow, still be able to query for range of minutes for 
given day
and/or given day and range of hours
Create table log_Event
(
 event_day text,
 event_hr int,
 event_time timeuuid,
 data text,
PRIMARY KEY ( (event_day,event_hr),event_time)
)
Table 2: This will be very wide row

Create table log_Event
( event_day text,
 event_time timeuuid,
 data text,
PRIMARY KEY ( event_day,event_time)
)

Datatiered compaction: recommended for time series data as per below doc. Our 
data will be kept only for 30 days. Hence thought of using this compaction 
strategy.
http://www.datastax.com/dev/blog/datetieredcompactionstrategy
Create table 1 listed above with this compaction strategy. Added some rows and 
did manual flush.  I do not see any sstables created yet. Is that expected?
 compaction={'max_sstable_age_days': '1', 'class': 
'DateTieredCompactionStrategy'}





Re: Data tiered compaction and data model question

2015-02-19 Thread Roland Etzenhammer

Hi Cass,

just a hint from the off - if I got it right you have:

Table 1: PRIMARY KEY ( (event_day,event_hr),event_time)
Table 2: PRIMARY KEY (event_day,event_time)

Assuming your events to write come in by wall clock time, the first 
table design will have a hotspot on a specific node getting all writes 
for a single hour as (event_day,event_hr) is the partioning key. The 
second table design will put this hotspot on a specific node per day as 
event_day is the partitoning key. So please be careful if you have a 
write intensive workload.


I have designed my logging tables with a non datetime key in my 
partioning key to distribute writes to all nodes at a specific point in 
time. I have for example


PRIMARY KEY ((sensor_id,measure_date))

and the timestamp-value pairs in the rows. They are quite wide as I have 
about 1 measurements per sensor and id, but analytics and cleanup 
jobs run daily.


Of course as a not so long time cassandra user I can be wrong, please 
feel free to correct me.


Cheers,
Roland




RE: Data tiered compaction and data model question

2015-02-18 Thread Mohammed Guller
What is the maximum number of events that you expect in a day? What is the 
worst-case scenario?

Mohammed

From: cass savy [mailto:casss...@gmail.com]
Sent: Wednesday, February 18, 2015 4:21 PM
To: user@cassandra.apache.org
Subject: Data tiered compaction and data model question

We want to track events in log  Cf/table and should be able to query for events 
that occurred in range of mins or hours for given day. Multiple events can 
occur in a given minute.  Listed 2 table designs and leaning towards table 1 to 
avoid large wide row.  Please advice on

Table 1: not very widerow, still be able to query for range of minutes for 
given day
and/or given day and range of hours
Create table log_Event
(
 event_day text,
 event_hr int,
 event_time timeuuid,
 data text,
PRIMARY KEY ( (event_day,event_hr),event_time)
)
Table 2: This will be very wide row

Create table log_Event
( event_day text,
 event_time timeuuid,
 data text,
PRIMARY KEY ( event_day,event_time)
)

Datatiered compaction: recommended for time series data as per below doc. Our 
data will be kept only for 30 days. Hence thought of using this compaction 
strategy.
http://www.datastax.com/dev/blog/datetieredcompactionstrategy
Create table 1 listed above with this compaction strategy. Added some rows and 
did manual flush.  I do not see any sstables created yet. Is that expected?
 compaction={'max_sstable_age_days': '1', 'class': 
'DateTieredCompactionStrategy'}



Data tiered compaction and data model question

2015-02-18 Thread cass savy
We want to track events in log  Cf/table and should be able to query for
events that occurred in range of mins or hours for given day. Multiple
events can occur in a given minute.  Listed 2 table designs and leaning
towards table 1 to avoid large wide row.  Please advice on

*Table 1*: not very widerow, still be able to query for range of minutes
for given day
and/or given day and range of hours

Create table *log_Event*

(

 event_day text,

 event_hr int,

 event_time timeuuid,

 data text,

PRIMARY KEY ( *(event_day,event_hr),*event_time)

)
*Table 2: This will be very wide row*

Create table *log_Event*

( event_day text,

 event_time timeuuid,

 data text,

PRIMARY KEY ( *event_day,*event_time)

)


*Datatiered compaction: recommended for time series data as per below doc.
Our data will be kept only for 30 days. Hence thought of using this
compaction strategy.*

http://www.datastax.com/dev/blog/datetieredcompactionstrategy

Create table 1 listed above with this compaction strategy. Added some rows
and did manual flush.  I do not see any sstables created yet. Is that
expected?

 compaction={'max_sstable_age_days': '1', 'class':
'DateTieredCompactionStrategy'}