[jira] [Comment Edited] (CASSANDRA-9666) Provide an alternative to DTCS

2016-05-11 Thread Lucas de Souza Santos (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15280577#comment-15280577
 ] 

Lucas de Souza Santos edited comment on CASSANDRA-9666 at 5/11/16 6:31 PM:
---

Thank you guys by the amazing work with Cassandra and TWCS!
Sure, I can share more details. Yesterday I upgraded our cluster to cassandra 
2.2.6 with TWCS. It was smoothly, no one even notice I was doing an upgrade.


was (Author: lucasdss):
Thank you guys by the amazing working with Cassandra and TWCS!
Sure, I can share more details. Yesterday I upgraded our cluster to cassandra 
2.2.6 with TWCS. It was smoothly, no one even notice I was doing an upgrade.

> Provide an alternative to DTCS
> --
>
> Key: CASSANDRA-9666
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9666
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Jeff Jirsa
>Assignee: Jeff Jirsa
> Fix For: 2.1.x, 2.2.x
>
> Attachments: compactomatic.py, dashboard-DTCS_to_TWCS.png, 
> dtcs-twcs-io.png, dtcs-twcs-load.png
>
>
> DTCS is great for time series data, but it comes with caveats that make it 
> difficult to use in production (typical operator behaviors such as bootstrap, 
> removenode, and repair have MAJOR caveats as they relate to 
> max_sstable_age_days, and hints/read repair break the selection algorithm).
> I'm proposing an alternative, TimeWindowCompactionStrategy, that sacrifices 
> the tiered nature of DTCS in order to address some of DTCS' operational 
> shortcomings. I believe it is necessary to propose an alternative rather than 
> simply adjusting DTCS, because it fundamentally removes the tiered nature in 
> order to remove the parameter max_sstable_age_days - the result is very very 
> different, even if it is heavily inspired by DTCS. 
> Specifically, rather than creating a number of windows of ever increasing 
> sizes, this strategy allows an operator to choose the window size, compact 
> with STCS within the first window of that size, and aggressive compact down 
> to a single sstable once that window is no longer current. The window size is 
> a combination of unit (minutes, hours, days) and size (1, etc), such that an 
> operator can expect all data using a block of that size to be compacted 
> together (that is, if your unit is hours, and size is 6, you will create 
> roughly 4 sstables per day, each one containing roughly 6 hours of data). 
> The result addresses a number of the problems with 
> DateTieredCompactionStrategy:
> - At the present time, DTCS’s first window is compacted using an unusual 
> selection criteria, which prefers files with earlier timestamps, but ignores 
> sizes. In TimeWindowCompactionStrategy, the first window data will be 
> compacted with the well tested, fast, reliable STCS. All STCS options can be 
> passed to TimeWindowCompactionStrategy to configure the first window’s 
> compaction behavior.
> - HintedHandoff may put old data in new sstables, but it will have little 
> impact other than slightly reduced efficiency (sstables will cover a wider 
> range, but the old timestamps will not impact sstable selection criteria 
> during compaction)
> - ReadRepair may put old data in new sstables, but it will have little impact 
> other than slightly reduced efficiency (sstables will cover a wider range, 
> but the old timestamps will not impact sstable selection criteria during 
> compaction)
> - Small, old sstables resulting from streams of any kind will be swiftly and 
> aggressively compacted with the other sstables matching their similar 
> maxTimestamp, without causing sstables in neighboring windows to grow in size.
> - The configuration options are explicit and straightforward - the tuning 
> parameters leave little room for error. The window is set in common, easily 
> understandable terms such as “12 hours”, “1 Day”, “30 days”. The 
> minute/hour/day options are granular enough for users keeping data for hours, 
> and users keeping data for years. 
> - There is no explicitly configurable max sstable age, though sstables will 
> naturally stop compacting once new data is written in that window. 
> - Streaming operations can create sstables with old timestamps, and they'll 
> naturally be joined together with sstables in the same time bucket. This is 
> true for bootstrap/repair/sstableloader/removenode. 
> - It remains true that if old data and new data is written into the memtable 
> at the same time, the resulting sstables will be treated as if they were new 
> sstables, however, that no longer negatively impacts the compaction 
> strategy’s selection criteria for older windows. 
> Patch provided for : 
> - 2.1: https://github.com/jeffjirsa/cassandra/commits/twcs-2.1 
> - 2.2: https://github.com/jeffjirsa/cassandra/commits/twcs-2.2
> - trunk 

[jira] [Comment Edited] (CASSANDRA-9666) Provide an alternative to DTCS

2016-05-11 Thread Jeff Jirsa (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15280425#comment-15280425
 ] 

Jeff Jirsa edited comment on CASSANDRA-9666 at 5/11/16 5:00 PM:


I love that you wrote the simulator, because it's REALLY hard to test this in 
real life.

I also fully agree that TWCS needs a plan before 10496. TWCS as-is with 10496 
would be painful.

I had been thinking about what I'd want to do with TWCS old windows in the 
context of CASSANDRA-10496. The technique I was planning on adopting was using 
STCS in old windows, but morphing the STCS parameters (notably 
{{min_sstable_size}} rather than {{bucket_low}} or {{bucket_high}} ) based on 
either the age of the sstables / age of the window to EVENTUALLY get back to 
single sstable per window.  I like your idea of doing the single major first, 
flagging it with a boolean in a system table, and then coming back and doing 
STCS on any new data after the fact. That also lets (sufficiently advanced) 
users go clear that boolean and force a new major if they really want that 
behavior.

If you and [~krummas] are generally good with that approach, I'll work on 
getting it implemented. 


was (Author: jjirsa):
I love that you wrote the simulator, because it's REALLY hard to test this in 
real life.

I also fully agree that TWCS needs a plan before 10496. TWCS as-is with 10496 
would be painful.

I had been thinking about what I'd want to do with TWCS old windows in the 
context of CASSANDRA-10496. The technique I was planning on adopting was using 
STCS in old windows, but morphing the STCS parameters (notably 
{{min_sstable_size}} rather than {{bucket_low}} or {{bucket_high}} ) based on 
either the age of the sstables / age of the window to EVENTUALLY get back to 
single sstable per window.  I like your idea of doing the single major first, 
flagging it with a boolean in a system table, and then coming back and doing 
STCS on any new data after the fact. That also lets (sufficiently advanced) 
users go clear that boolean and force a new major if they really want that 
behavior.

> Provide an alternative to DTCS
> --
>
> Key: CASSANDRA-9666
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9666
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Jeff Jirsa
>Assignee: Jeff Jirsa
> Fix For: 2.1.x, 2.2.x
>
> Attachments: compactomatic.py, dashboard-DTCS_to_TWCS.png, 
> dtcs-twcs-io.png, dtcs-twcs-load.png
>
>
> DTCS is great for time series data, but it comes with caveats that make it 
> difficult to use in production (typical operator behaviors such as bootstrap, 
> removenode, and repair have MAJOR caveats as they relate to 
> max_sstable_age_days, and hints/read repair break the selection algorithm).
> I'm proposing an alternative, TimeWindowCompactionStrategy, that sacrifices 
> the tiered nature of DTCS in order to address some of DTCS' operational 
> shortcomings. I believe it is necessary to propose an alternative rather than 
> simply adjusting DTCS, because it fundamentally removes the tiered nature in 
> order to remove the parameter max_sstable_age_days - the result is very very 
> different, even if it is heavily inspired by DTCS. 
> Specifically, rather than creating a number of windows of ever increasing 
> sizes, this strategy allows an operator to choose the window size, compact 
> with STCS within the first window of that size, and aggressive compact down 
> to a single sstable once that window is no longer current. The window size is 
> a combination of unit (minutes, hours, days) and size (1, etc), such that an 
> operator can expect all data using a block of that size to be compacted 
> together (that is, if your unit is hours, and size is 6, you will create 
> roughly 4 sstables per day, each one containing roughly 6 hours of data). 
> The result addresses a number of the problems with 
> DateTieredCompactionStrategy:
> - At the present time, DTCS’s first window is compacted using an unusual 
> selection criteria, which prefers files with earlier timestamps, but ignores 
> sizes. In TimeWindowCompactionStrategy, the first window data will be 
> compacted with the well tested, fast, reliable STCS. All STCS options can be 
> passed to TimeWindowCompactionStrategy to configure the first window’s 
> compaction behavior.
> - HintedHandoff may put old data in new sstables, but it will have little 
> impact other than slightly reduced efficiency (sstables will cover a wider 
> range, but the old timestamps will not impact sstable selection criteria 
> during compaction)
> - ReadRepair may put old data in new sstables, but it will have little impact 
> other than slightly reduced efficiency (sstables will cover a wider range, 
> but the old timestamps will not impact sstable selection 

[jira] [Comment Edited] (CASSANDRA-9666) Provide an alternative to DTCS

2016-05-11 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15280387#comment-15280387
 ] 

Jonathan Ellis edited comment on CASSANDRA-9666 at 5/11/16 4:38 PM:


I ended up writing a compaction simulator (specific to TS data) to compare TWCS 
and DTCS (attached).  Doing dozens of experiments with real data takes much too 
long.  (I did verify the simulation against actual data so I'm reasonably 
confident that it's producing valid results.)

Here are my findings:

# On writes, if compaction can keep up with writes, and if DTCS is "properly" 
configured (i.e. with max window an appropriate multiple of the base to avoid 
extra "partial" tiers), DTCS does about 10% less writes than TWCS.
# BUT, if compaction gets behind (which is common), and if write volume is low 
enough that TWCS doesn't have to do multiple passes in old windows due to 
max_threshold, DTCS can do much much worse.  This is because TWCS will always 
do the equivalent of a single major compaction in inactive windows (i.e. 100% 
write amplification which is the best possible), while DTCS does STCS in all 
windows (~300% amplification).
# The flip side of this behavior is that if repair scatters tiny sstables into 
inactive windows, TWCS will incur large write amplification merging those as 
well.  DTCS will not due to the size tiering.
# TWCS substantially outperforms DTCS on read amplification (number of sstables 
touched) because there is always a single sstable in inactive TWCS windows, 
while DTCS has multiple tiered files.  How much depends on the number of tiers 
generated, but typically DTCS will do 2x to 3x as many reads.

*My recommendation*

DTCS's explicit tiers are not worth the extra complexity.  The TWCS approach of 
doing STCS within the active window, and "major" compaction on inactive, 
provides excellent performance without manual tuning.

However, TWCS's write amplification on repair (point 3 above) is a potential 
problem.  Is there a way to get the best of both worlds?  Should we just brute 
force it and check inactive windows against a system table, where we record if 
we've done our initial major compaction?  If so, then further compactions due 
to repair et al should be done with STCS.

If we solve this problem (and finish CASSANDRA-10496) then I think to a large 
degree we won't need to worry nearly as much about users disabling read repair, 
only repairing during active windows, etc.


was (Author: jbellis):
I ended up writing a compaction simulator (specific to TS data) to compare TWCS 
and DTCS (attached).  Doing dozens of experiments with real data takes much too 
long.  (I did verify the simulation against actual data so I'm reasonably 
confident that it's producing valid results.)

Here are my findings:

# On writes, if compaction can keep up with writes, and if DTCS is "properly" 
configured (i.e. with max window an appropriate multiple of the base to avoid 
extra "partial" tiers), DTCS does about 10% less writes than TWCS.
# BUT, if compaction gets behind (which is common), and if write volume is low 
enough that TWCS doesn't have to do multiple passes in old windows due to 
max_threshold, DTCS can do much much worse.  This is because TWCS will always 
do the equivalent of a major compaction in inactive windows, while DTCS does 
STCS in all windows.
# The flip side of this behavior is that if repair scatters tiny sstables into 
inactive windows, TWCS will incur large write amplification merging those as 
well.  DTCS will not due to the size tiering.
# TWCS substantially outperforms DTCS on read amplification (number of sstables 
touched) because there is always a single sstable in inactive TWCS windows, 
while DTCS has multiple tiered files.  How much depends on the number of tiers 
generated, but typically DTCS will do 2x to 3x as many reads.

*My recommendation*

DTCS's explicit tiers are not worth the extra complexity.  The TWCS approach of 
doing STCS within the active window, and "major" compaction on inactive, 
provides excellent performance without manual tuning.

However, TWCS's write amplification on repair (point 3 above) is a potential 
problem.  Is there a way to get the best of both worlds?  Should we just brute 
force it and check inactive windows against a system table, where we record if 
we've done our initial major compaction?  If so, then further compactions due 
to repair et al should be done with STCS.

If we solve this problem (and finish CASSANDRA-10496) then I think to a large 
degree we won't need to worry nearly as much about users disabling read repair, 
only repairing during active windows, etc.

> Provide an alternative to DTCS
> --
>
> Key: CASSANDRA-9666
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9666
> Project: Cassandra
>  Issue Type: Improvement
> 

[jira] [Comment Edited] (CASSANDRA-9666) Provide an alternative to DTCS

2016-04-13 Thread Lucas de Souza Santos (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15240056#comment-15240056
 ] 

Lucas de Souza Santos edited comment on CASSANDRA-9666 at 4/13/16 10:02 PM:


I have been working for the biggest online news in Brazil and we are 
using/building a timeseries system with cassandra as persistence.
Since the end of last year we decided to use DTCS. This timeseries 
implementation is rolling to production to substitute legacy cacti.

Right now the cluster is composed by 10 Dell PE R6XX, some 610 and some 620. 
All with SAS discs, 8 cpus and 32GB of RAM, Linux Centos 6, kernel 2.
6.32.

Since Jan 20 2016 we are running cassandra21-2.1.12 over JRE 7. At that point 
we were just doing some tests, receiving ~140k points/minute. The cluster was 
fine and using STCS (the default) as compaction strategy.

At the end of February I changed to DTCS and we doubled the load, passing to 
around 200k points/minute. A week after, we saw the cpu load growing up, 
together with disc space and memory. First we thought it was us using more, so 
we built some dashboards to visualize the data.

About 3 weeks ago the cluster started to get some timeouts and we lost a node 
at least two times, a reboot was needed to get the node back.

Things I have done trying to fix/improve the cluster:

Upgraded jre7 to jdk8, configured GC G1, altered memtable_cleanup_threshold to 
0.10 (was using 0.20, getting this value high made the problem worst).

Changed all applications using cassandra to use consistency ONE because GC 
pause was putting nodes out of the cluster and we were receiving a lot of 
timeouts.
After those changes the cluster was better to use but we were not confident in 
growing the number of requests. Last week I noticed a problem when restarting 
any node, it took at lest 15 minutes, sometimes 30 minutes just to load/open 
sstables. I checked the data on disc and saw that cassandra created more than 
10 million sstables. I couldn't do a simple "ls" in any datadir (I have 14 
keyspaces).

Doing a search for cassandra issues with DTCS we found TWCS as an alternative, 
and we saw several of the problems we had reported regarding DTCS. I couldn't 
even wait for an complete test in QA afraid of a crash in production, so I 
decided to apply TWCS in our biggest keyspace. The result was impressive, from 
more than 2.5 million sstables to around 30 for each node (after full 
compaction). No data loss, no change in load or memory improvement. Given these 
results, yesterday (03/12/2016) I decide to apply TWCS to all 14 keyspaces, and 
today the result, at least for me, is mind blowing.

Now I have around 500 sstables per node, sum of all keyspaces, from 10 million 
to 500! The load5 dropped from ~6 to ~0.5, cassandra released around 3GB RAM 
per node. Disc usage drooped from ~150 GB to ~120GB. Right after that, the 
number of requests got up from 120k to 190k requests per minute and we are 
seeing no change in load.



Our DTCS create table:
CREATE TABLE IF NOT EXISTS %s.ts_number (id text, date timeuuid, value double, 
PRIMARY KEY (id, date))
WITH CLUSTERING ORDER BY (date ASC)
AND bloom_filter_fp_chance = 0.01
AND caching = {'keys':'ALL', 'rows_per_partition':'NONE'}
AND comment = ''
AND compaction= 
'class':'org.apache.cassandra.db.compaction.DateTieredCompactionStrategy', 
'tombstone_compaction_interval': '7', 'min_threshold': '8', 'max_threshold': 
'64', 'timestamp_resolution':'MILLISECONDS', 'base_time_seconds':'3600', 
'max_sstable_age_days':'365'}
AND compression = {'crc_check_chance': '0.5', 'sstable_compression': 
'org.apache.cassandra.io.compress.LZ4Compressor'}
AND dclocal_read_repair_chance = 0.0
AND default_time_to_live = %d
AND gc_grace_seconds = 0
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99.0PERCENTILE'
 AND 
compaction={'class':'org.apache.cassandra.db.compaction.DateTieredCompactionStrategy',
 'tombstone_compaction_interval': '7', 'min_threshold': '8', 'max_threshold': 
'64', 'timestamp_resolution':'MILLISECONDS', 'base_time_seconds':'3600', 
'max_sstable_age_days':'365'}


TWCS:
CREATE TABLE .ts_number (
id text,
date timeuuid,
value double,
PRIMARY KEY (id, date)
) WITH CLUSTERING ORDER BY (date ASC)
AND bloom_filter_fp_chance = 0.01
AND caching = '{"keys":"ALL", "rows_per_partition":"100"}'
AND comment = ''
AND compaction = {'compaction_window_unit': 'DAYS', 
'compaction_window_size': '7', 'class': 
'com.jeffjirsa.cassandra.db.compaction.TimeWindowCompactionStrategy'}
AND compression = {'crc_check_chance': '0.5', 'sstable_compression': 
'org.apache.cassandra.io.compress.LZ4Compressor'}
AND dclocal_read_repair_chance = 0.0
AND default_time_to_live = 0
AND gc_grace_seconds = 0
AND max_index_interval = 

[jira] [Comment Edited] (CASSANDRA-9666) Provide an alternative to DTCS

2016-04-13 Thread Lucas de Souza Santos (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15240056#comment-15240056
 ] 

Lucas de Souza Santos edited comment on CASSANDRA-9666 at 4/13/16 10:00 PM:


I have been working for the biggest online news in Brazil and we are 
using/building a timeseries system with cassandra as persistence.
Since the end of last year we decided to use DTCS. This timeseries 
implementation is rolling to production to substitute legacy cacti.

Right now the cluster is composed by 10 Dell PE R6XX, some 610 and some 620. 
All with SAS discs, 8 cpus and 32GB of RAM, Linux Centos 6, kernel 2.
6.32.

Since Jan 20 2016 we are running cassandra21-2.1.12 over JRE 7. At that point 
we were just doing some tests, receiving ~140k points/minute. The cluster was 
fine and using STCS (the default) as compaction strategy.

At the end of February I changed to DTCS and we doubled the load, passing to 
around 200k points/minute. A week after, we saw the cpu load growing up, 
together with disc space and memory. First we thought it was us using more, so 
we built some dashboards to visualize the data.

About 3 weeks ago the cluster started to get some timeouts and we lost a node 
at least two times, a reboot was needed to get the node back.

Things I have done trying to fix/improve the cluster:

Upgraded jre7 to jdk8, configured GC G1, altered memtable_cleanup_threshold to 
0.10 (was using 0.20, getting this value high made the problem worst).

Changed all applications using cassandra to use consistency ONE because GC 
pause was putting nodes out of the cluster and we were receiving a lot of 
timeouts.
After those changes the cluster was better to use but we were not confident in 
growing the number of requests. Last week I noticed a problem when restarting 
any node, it took at lest 15 minutes, sometimes 30 minutes just to load/open 
sstables. I checked the data on disc and saw that cassandra created more than 
10 million sstables. I couldn't do a simple "ls" in any datadir (I have 14 
keyspaces).

Doing a search for cassandra issues with DTCS we found TWCS as an alternative, 
and we saw several of the problems we had reported regarding DTCS. I couldn't 
even wait for an complete test in QA afraid of a crash in production, so I 
decided to apply TWCS in our biggest keyspace. The result was impressive, from 
more than 2.5 million sstables to around 30 for each node (after full 
compaction). No data loss, no change in load or memory improvement. Given these 
results, yesterday (03/12/2016) I decide to apply TWCS to all 14 keyspaces, and 
today the result, at least for me, is mind blowing.

Now I have around 500 sstables per node, sum of all keyspaces, from 10 million 
to 500! The load5 dropped from ~6 to ~0.5, cassandra released around 3GB RAM 
per node. Disc usage drooped from ~150 GB to ~120GB. Right after that, the 
number of requests got up from 120k to 190k requests per minute and we are 
seeing no change in load.



Our DTCS create table:
CREATE TABLE IF NOT EXISTS %s.ts_number (id text, date timeuuid, value double, 
PRIMARY KEY (id, date))
WITH CLUSTERING ORDER BY (date ASC)
AND bloom_filter_fp_chance = 0.01
AND caching = {'keys':'ALL', 'rows_per_partition':'NONE'}
AND comment = ''
AND compaction={ 'min_threshold': '8', 'max_threshold': '64', 
'compaction_window_unit': 'DAYS', 'compaction_window_size': '7', 'class': 
'com.jeffjirsa.cassandra.db.compaction.TimeWindowCompactionStrategy'}
AND compression = {'crc_check_chance': '0.5', 'sstable_compression': 
'org.apache.cassandra.io.compress.LZ4Compressor'}
AND dclocal_read_repair_chance = 0.0
AND default_time_to_live = %d
AND gc_grace_seconds = 0
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99.0PERCENTILE'
 AND 
compaction={'class':'org.apache.cassandra.db.compaction.DateTieredCompactionStrategy',
 'tombstone_compaction_interval': '7', 'min_threshold': '8', 'max_threshold': 
'64', 'timestamp_resolution':'MILLISECONDS', 'base_time_seconds':'3600', 
'max_sstable_age_days':'365'}


TWCS:
CREATE TABLE .ts_number (
id text,
date timeuuid,
value double,
PRIMARY KEY (id, date)
) WITH CLUSTERING ORDER BY (date ASC)
AND bloom_filter_fp_chance = 0.01
AND caching = '{"keys":"ALL", "rows_per_partition":"100"}'
AND comment = ''
AND compaction = {'compaction_window_unit': 'DAYS', 
'compaction_window_size': '7', 'class': 
'com.jeffjirsa.cassandra.db.compaction.TimeWindowCompactionStrategy'}
AND compression = {'crc_check_chance': '0.5', 'sstable_compression': 
'org.apache.cassandra.io.compress.LZ4Compressor'}
AND dclocal_read_repair_chance = 0.0
AND default_time_to_live = 0
AND gc_grace_seconds = 0
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND 

[jira] [Comment Edited] (CASSANDRA-9666) Provide an alternative to DTCS

2016-04-13 Thread Lucas de Souza Santos (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15240056#comment-15240056
 ] 

Lucas de Souza Santos edited comment on CASSANDRA-9666 at 4/13/16 9:54 PM:
---

I have been working for the biggest online news in Brazil and we are 
using/building a timeseries system with cassandra as persistence.
Since the end of last year we decided to use DTCS. This timeseries 
implementation is rolling to production to substitute legacy cacti.

Right now the cluster is composed by 10 Dell PE R6XX, some 610 and some 620. 
All with SAS discs, 8 cpus and 32GB of RAM, Linux Centos 6, kernel 2.
6.32.

Since Jan 20 2016 we are running cassandra21-2.1.12 over JRE 7. At that point 
we were just doing some tests, receiving ~140k points/minute. The cluster was 
fine and using STCS (the default) as compaction strategy.

At the end of February I changed to DTCS and we doubled the load, passing to 
around 200k points/minute. A week after, we saw the cpu load growing up, 
together with disc space and memory. First we thought it was us using more, so 
we built some dashboards to visualize the data.

About 3 weeks ago the cluster started to get some timeouts and we lost a node 
at least two times, a reboot was needed to get the node back.

Things I have done trying to fix/improve the cluster:

Upgraded jre7 to jdk8, configured GC G1, altered memtable_cleanup_threshold to 
0.10 (was using 0.20, getting this value high made the problem worst).

Changed all applications using cassandra to use consistency ONE because GC 
pause was putting nodes out of the cluster and we were receiving a lot of 
timeouts.
After those changes the cluster was better to use but we were not confident in 
growing the number of requests. Last week I noticed a problem when restarting 
any node, it took at lest 15 minutes, sometimes 30 minutes just to load/open 
sstables. I checked the data on disc and saw that cassandra created more than 
10 million sstables. I couldn't do a simple "ls" in any datadir (I have 14 
keyspaces).

Doing a search for cassandra issues with DTCS we found TWCS as an alternative, 
and we saw several of the problems we had reported regarding DTCS. I couldn't 
even wait for an complete test in QA afraid of a crash in production, so I 
decided to apply TWCS in our biggest keyspace. The result was impressive, from 
more than 2.5 million sstables to around 30 for each node (after full 
compaction). No data loss, no change in load or memory improvement. Given these 
results, yesterday (03/12/2016) I decide to apply TWCS to all 14 keyspaces, and 
today the result, at least for me, is mind blowing.

Now I have around 500 sstables per node, sum of all keyspaces, from 10 million 
to 500! The load5 dropped from ~6 to ~0.5, cassandra released around 3GB RAM 
per node. Disc usage drooped from ~150 GB to ~120GB. Right after that, the 
number of requests got up from 120k to 190k requests per minute and we are 
seeing no change in load.



Our DTCS create table:
CREATE TABLE IF NOT EXISTS %s.ts_number (id text, date timeuuid, value double, 
PRIMARY KEY (id, date))
WITH CLUSTERING ORDER BY (date ASC)
AND bloom_filter_fp_chance = 0.01
AND caching = {'keys':'ALL', 'rows_per_partition':'NONE'}
AND comment = ''
AND compaction={ 'min_threshold': '8', 'max_threshold': '64', 
'compaction_window_unit': 'DAYS', 'compaction_window_size': '7', 'class': 
'com.jeffjirsa.cassandra.db.compaction.TimeWindowCompactionStrategy'}
AND compression = {'crc_check_chance': '0.5', 'sstable_compression': 
'org.apache.cassandra.io.compress.LZ4Compressor'}
AND dclocal_read_repair_chance = 0.0
AND default_time_to_live = %d
AND gc_grace_seconds = 0
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99.0PERCENTILE'


TWCS:
CREATE TABLE .ts_number (
id text,
date timeuuid,
value double,
PRIMARY KEY (id, date)
) WITH CLUSTERING ORDER BY (date ASC)
AND bloom_filter_fp_chance = 0.01
AND caching = '{"keys":"ALL", "rows_per_partition":"100"}'
AND comment = ''
AND compaction = {'compaction_window_unit': 'DAYS', 
'compaction_window_size': '7', 'class': 
'com.jeffjirsa.cassandra.db.compaction.TimeWindowCompactionStrategy'}
AND compression = {'crc_check_chance': '0.5', 'sstable_compression': 
'org.apache.cassandra.io.compress.LZ4Compressor'}
AND dclocal_read_repair_chance = 0.0
AND default_time_to_live = 0
AND gc_grace_seconds = 0
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99.0PERCENTILE';



was (Author: lucasdss):
I have been working for the biggest online news in Brazil and we are 
using/building a timeseries system with cassandra as persistence.
Since the end of last year 

[jira] [Comment Edited] (CASSANDRA-9666) Provide an alternative to DTCS

2016-04-13 Thread Lucas de Souza Santos (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15240056#comment-15240056
 ] 

Lucas de Souza Santos edited comment on CASSANDRA-9666 at 4/13/16 9:28 PM:
---

I have been working for the biggest online news in Brazil and we are 
using/building a timeseries system with cassandra as persistence.
Since the end of last year we decided to use DTCS. This timeseries 
implementation is rolling to production to substitute legacy cacti.

Right now the cluster is composed by 10 Dell PE R6XX, some 610 and some 620. 
All with SAS discs, 8 cpus and 32GB of RAM, Linux Centos 6, kernel 2.
6.32.

Since Jan 20 2016 we are running cassandra21-2.1.12 over JRE 7. At that point 
we were just doing some tests, receiving ~140k points/minute. The cluster was 
fine and using STCS (the default) as compaction strategy.

At the end of February I changed to DTCS and we doubled the load, passing to 
around 200k points/minute. A week after, we saw the cpu load growing up, 
together with disc space and memory. First we thought it was us using more, so 
we built some dashboards to visualize the data.

About 3 weeks ago the cluster started to get some timeouts and we lost a node 
at least two times, a reboot was needed to get the node back.

Things I have done trying to fix/improve the cluster:

Upgraded jre7 to jdk8, configured GC G1, altered memtable_cleanup_threshold to 
0.10 (was using 0.20, getting this value high made the problem worst).

Changed all applications using cassandra to use consistency ONE because GC 
pause was putting nodes out of the cluster and we were receiving a lot of 
timeouts.
After those changes the cluster was better to use but we were not confident in 
growing the number of requests. Last week I noticed a problem when restarting 
any node, it took at lest 15 minutes, sometimes 30 minutes just to load/open 
sstables. I checked the data on disc and saw that cassandra created more than 
10 million sstables. I couldn't do a simple "ls" in any datadir (I have 14 
keyspaces).

Doing a search for cassandra issues with DTCS we found TWCS as an alternative, 
and we saw several of the problems we had reported regarding DTCS. I couldn't 
even wait for an complete test in QA afraid of a crash in production, so I 
decided to apply TWCS in our biggest keyspace. The result was impressive, from 
more than 2.5 million sstables to around 30 for each node (after full 
compaction). No data loss, no change in load or memory improvement. Given these 
results, yesterday (03/12/2016) I decide to apply TWCS to all 14 keyspaces, and 
today the result, at least for me, is mind blowing.

Now I have around 500 sstables per node, sum of all keyspaces, from 10 million 
to 500! The load5 dropped from ~6 to ~0.5, cassandra released around 3GB RAM 
per node. Disc usage drooped from ~150 GB to ~120GB. Right after that, the 
number of requests got up from 120k to 190k requests per minute and we are 
seeing no change in load.


was (Author: lucasdss):
I have been working for the biggest online news in Brazil and we are 
using/building a timeseries system with cassandra as persistence.
Since the end of last year we decided to use DTCS. This timeseries 
implementation is rolling to production to substitute legacy cacti.

Right now the cluster is composed by 10 Dell PE R6XX, some 610 and some 620. 
All with SAS discs, 8 cpus and 32GB of RAM, Linux Centos 6, kernel 2.
6.32.

Since Jan 20 2016 we are running cassandra21-2.1.12 over JRE 7. At that point 
we were just doing some tests, receiving ~140k points/minute. The cluster was 
fine and using STCS (the default) as compaction strategy.

At the end of February I changed to DTCS and we doubled the load, passing to 
around 200k points/minute. A week after, we saw the cpu load growing up, 
together with disc space and memory. First we thought it was us using more, so 
we built some dashboards to visualize the data.

About 3 weeks ago the cluster started to get some timeouts and we lost a node 
at least two times, a reboot was needed to get the node back.

Things I have done trying to fix/improve the cluster:

Upgraded jre7 to jdk8, configured GC G1, altered memtable_cleanup_threshold to 
0.10 (was using 0.20, getting this value high made the problem worst).

Changed all applications using cassandra to use consistency ONE because GC 
pause was putting nodes out of the cluster and we were receiving a lot of 
timeouts.
After those changes the cluster was better to use but we were not confident in 
growing the number of requests. Last week I noticed a problem when restarting 
any node, it took at lest 15 minutes, sometimes 30 minutes just to load/open 
sstables. I checked the data on disc and saw that cassandra created more than 
10 million sstables. I couldn't do a simple "ls" in any datadir (I have 14 
keyspaces).

> Provide an alternative to DTCS
> 

[jira] [Comment Edited] (CASSANDRA-9666) Provide an alternative to DTCS

2016-03-29 Thread Jonathan Shook (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15216688#comment-15216688
 ] 

Jonathan Shook edited comment on CASSANDRA-9666 at 3/29/16 7:21 PM:


There are two areas of concern that we should discuss more directly..

1. The pacing of memtable flushing on a given system can be matched up with the 
base window size with DTCS, avoiding logical write amplification that can occur 
before the scheduling discipline kicks in. This is not so easy when  you water 
down the configuration and remove the ability to manage the fresh sstables. The 
benefits from time-series friendly compaction can be had for both the newest 
and the oldest tables, and both are relevant here.

2. The window placement. From what I've seen, the anchoring point for whether a 
cell goes into a bucket or not is different between the two approaches. To me 
this is fairly arbitrary in terms of processing overhead comparisons, all else 
assumed close enough. However, when trying to reconcile, shifting all of your 
data to a different bucket will not be a welcome event for most users. This 
makes "graceful" reconciliation difficult at best.

Can we simply try to make DTCS as (perceptually) easy to use for the default 
case as TWCS (perceptually) ? To me, this is more about the user entry point 
and understanding behavior as designed than it is about the machinery that 
makes it happen.

The basic design between them has so much in common that reconciling them 
completely would be mostly a shell game of parameter names as well as lobbing 
off some functionality that can be completely bypassed, given the right 
settings.

Can we identify the functionally equivalent settings for TWCS that DTCS needs 
to emulate, given proper settings (possibly including anchoring point), and 
then simply provide the same simple configuration to users, without having to 
maintain two separate sibling compaction strategies?

One sticking point that I've had on this suggesting in conversation is the 
bucketing logic being too difficult to think about. If we were able to provide 
the self-same behavior for TWCS-like configuration, the bucketing logic could 
be used only when the parameters require non-uniform windows. Would that make 
everyone happy?







was (Author: jshook):
There are two areas of concern that we should discuss more directly..

1. The pacing of memtable flushing on a given system can be matched up with the 
base window size with DTCS, avoiding logical write amplification that can occur 
before the scheduling discipline kicks in. This is not so easy when  you water 
down the configuration and remove the ability to manage the fresh sstables. The 
benefits from time-series friendly compaction can be had for both the newest 
and the oldest tables, and both are relevant here.

2. The window placement. From what I've seen, the anchoring point for whether a 
cell goes into a bucket or not is different between the two approaches. To me 
this is fairly arbitrary in terms of processing overhead comparisons, all else 
assumed close enough. However, when trying to reconcile, shifting all of your 
data to a different bucket will not be a welcome event for most users. This 
makes "graceful" reconciliation difficult at best.

Can we simply try to make DTCS as (perceptually) easy to use for the default 
case as TWCS (perceptually) ? To me, this is more about the user entry point 
and understanding behavior as designed than it is about the machinery that 
makes it happen.

The basic design between them has so much in common that reconciling them 
completely would be mostly a shell game of parameter names as well as lobbing 
off some functionality that can be complete bypassed, given the right settings.

Can we identify the functionally equivalent settings for TWCS that DTCS needs 
to emulate, given proper settings (possibly including anchoring point), and 
then simply provide the same simple configuration to users, without having to 
maintain two separate sibling compaction strategies?

One sticking point that I've had on this suggesting in conversation is the 
bucketing logic being too difficult to think about. If we were able to provide 
the self-same behavior for TWCS-like configuration, the bucketing logic could 
be used only when the parameters require non-uniform windows. Would that make 
everyone happy?






> Provide an alternative to DTCS
> --
>
> Key: CASSANDRA-9666
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9666
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Jeff Jirsa
>Assignee: Jeff Jirsa
> Fix For: 2.1.x, 2.2.x
>
> Attachments: dtcs-twcs-io.png, dtcs-twcs-load.png
>
>
> DTCS is great for time series data, but it comes with caveats that make it 
> 

[jira] [Comment Edited] (CASSANDRA-9666) Provide an alternative to DTCS

2016-03-25 Thread Robbie Strickland (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15212527#comment-15212527
 ] 

Robbie Strickland edited comment on CASSANDRA-9666 at 3/25/16 10:46 PM:


We run TWCS at a sustained 2M writes/sec on just shy of 30TB that rolls through 
the cluster every few days. It does a great job keeping up after 6ish months of 
heavy pounding.





was (Author: rstrickland):
We run TWCS at a sustained 2M writes/sec on just shy of 30TB that rolls
through the cluster every few days. It does a great job keeping up after
6ish months of heavy pounding.




> Provide an alternative to DTCS
> --
>
> Key: CASSANDRA-9666
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9666
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Jeff Jirsa
>Assignee: Jeff Jirsa
> Fix For: 2.1.x, 2.2.x
>
> Attachments: dtcs-twcs-io.png, dtcs-twcs-load.png
>
>
> DTCS is great for time series data, but it comes with caveats that make it 
> difficult to use in production (typical operator behaviors such as bootstrap, 
> removenode, and repair have MAJOR caveats as they relate to 
> max_sstable_age_days, and hints/read repair break the selection algorithm).
> I'm proposing an alternative, TimeWindowCompactionStrategy, that sacrifices 
> the tiered nature of DTCS in order to address some of DTCS' operational 
> shortcomings. I believe it is necessary to propose an alternative rather than 
> simply adjusting DTCS, because it fundamentally removes the tiered nature in 
> order to remove the parameter max_sstable_age_days - the result is very very 
> different, even if it is heavily inspired by DTCS. 
> Specifically, rather than creating a number of windows of ever increasing 
> sizes, this strategy allows an operator to choose the window size, compact 
> with STCS within the first window of that size, and aggressive compact down 
> to a single sstable once that window is no longer current. The window size is 
> a combination of unit (minutes, hours, days) and size (1, etc), such that an 
> operator can expect all data using a block of that size to be compacted 
> together (that is, if your unit is hours, and size is 6, you will create 
> roughly 4 sstables per day, each one containing roughly 6 hours of data). 
> The result addresses a number of the problems with 
> DateTieredCompactionStrategy:
> - At the present time, DTCS’s first window is compacted using an unusual 
> selection criteria, which prefers files with earlier timestamps, but ignores 
> sizes. In TimeWindowCompactionStrategy, the first window data will be 
> compacted with the well tested, fast, reliable STCS. All STCS options can be 
> passed to TimeWindowCompactionStrategy to configure the first window’s 
> compaction behavior.
> - HintedHandoff may put old data in new sstables, but it will have little 
> impact other than slightly reduced efficiency (sstables will cover a wider 
> range, but the old timestamps will not impact sstable selection criteria 
> during compaction)
> - ReadRepair may put old data in new sstables, but it will have little impact 
> other than slightly reduced efficiency (sstables will cover a wider range, 
> but the old timestamps will not impact sstable selection criteria during 
> compaction)
> - Small, old sstables resulting from streams of any kind will be swiftly and 
> aggressively compacted with the other sstables matching their similar 
> maxTimestamp, without causing sstables in neighboring windows to grow in size.
> - The configuration options are explicit and straightforward - the tuning 
> parameters leave little room for error. The window is set in common, easily 
> understandable terms such as “12 hours”, “1 Day”, “30 days”. The 
> minute/hour/day options are granular enough for users keeping data for hours, 
> and users keeping data for years. 
> - There is no explicitly configurable max sstable age, though sstables will 
> naturally stop compacting once new data is written in that window. 
> - Streaming operations can create sstables with old timestamps, and they'll 
> naturally be joined together with sstables in the same time bucket. This is 
> true for bootstrap/repair/sstableloader/removenode. 
> - It remains true that if old data and new data is written into the memtable 
> at the same time, the resulting sstables will be treated as if they were new 
> sstables, however, that no longer negatively impacts the compaction 
> strategy’s selection criteria for older windows. 
> Patch provided for : 
> - 2.1: https://github.com/jeffjirsa/cassandra/commits/twcs-2.1 
> - 2.2: https://github.com/jeffjirsa/cassandra/commits/twcs-2.2
> - trunk (post-8099):  https://github.com/jeffjirsa/cassandra/commits/twcs 
> Rebased, force-pushed July 18, with bug fixes for estimated 

[jira] [Comment Edited] (CASSANDRA-9666) Provide an alternative to DTCS

2016-03-22 Thread Jon Haddad (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15207104#comment-15207104
 ] 

Jon Haddad edited comment on CASSANDRA-9666 at 3/22/16 7:32 PM:


I talk to a lot of people using TWCS who have switched it to from DTCS due to 
it's operational simplicity.  +1 to including it.


was (Author: rustyrazorblade):
I talk to a lot of people using TWCS who have switched it to from DTCS due to 
it's operational simplicity.

> Provide an alternative to DTCS
> --
>
> Key: CASSANDRA-9666
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9666
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Jeff Jirsa
>Assignee: Jeff Jirsa
> Fix For: 2.1.x, 2.2.x
>
> Attachments: dtcs-twcs-io.png, dtcs-twcs-load.png
>
>
> DTCS is great for time series data, but it comes with caveats that make it 
> difficult to use in production (typical operator behaviors such as bootstrap, 
> removenode, and repair have MAJOR caveats as they relate to 
> max_sstable_age_days, and hints/read repair break the selection algorithm).
> I'm proposing an alternative, TimeWindowCompactionStrategy, that sacrifices 
> the tiered nature of DTCS in order to address some of DTCS' operational 
> shortcomings. I believe it is necessary to propose an alternative rather than 
> simply adjusting DTCS, because it fundamentally removes the tiered nature in 
> order to remove the parameter max_sstable_age_days - the result is very very 
> different, even if it is heavily inspired by DTCS. 
> Specifically, rather than creating a number of windows of ever increasing 
> sizes, this strategy allows an operator to choose the window size, compact 
> with STCS within the first window of that size, and aggressive compact down 
> to a single sstable once that window is no longer current. The window size is 
> a combination of unit (minutes, hours, days) and size (1, etc), such that an 
> operator can expect all data using a block of that size to be compacted 
> together (that is, if your unit is hours, and size is 6, you will create 
> roughly 4 sstables per day, each one containing roughly 6 hours of data). 
> The result addresses a number of the problems with 
> DateTieredCompactionStrategy:
> - At the present time, DTCS’s first window is compacted using an unusual 
> selection criteria, which prefers files with earlier timestamps, but ignores 
> sizes. In TimeWindowCompactionStrategy, the first window data will be 
> compacted with the well tested, fast, reliable STCS. All STCS options can be 
> passed to TimeWindowCompactionStrategy to configure the first window’s 
> compaction behavior.
> - HintedHandoff may put old data in new sstables, but it will have little 
> impact other than slightly reduced efficiency (sstables will cover a wider 
> range, but the old timestamps will not impact sstable selection criteria 
> during compaction)
> - ReadRepair may put old data in new sstables, but it will have little impact 
> other than slightly reduced efficiency (sstables will cover a wider range, 
> but the old timestamps will not impact sstable selection criteria during 
> compaction)
> - Small, old sstables resulting from streams of any kind will be swiftly and 
> aggressively compacted with the other sstables matching their similar 
> maxTimestamp, without causing sstables in neighboring windows to grow in size.
> - The configuration options are explicit and straightforward - the tuning 
> parameters leave little room for error. The window is set in common, easily 
> understandable terms such as “12 hours”, “1 Day”, “30 days”. The 
> minute/hour/day options are granular enough for users keeping data for hours, 
> and users keeping data for years. 
> - There is no explicitly configurable max sstable age, though sstables will 
> naturally stop compacting once new data is written in that window. 
> - Streaming operations can create sstables with old timestamps, and they'll 
> naturally be joined together with sstables in the same time bucket. This is 
> true for bootstrap/repair/sstableloader/removenode. 
> - It remains true that if old data and new data is written into the memtable 
> at the same time, the resulting sstables will be treated as if they were new 
> sstables, however, that no longer negatively impacts the compaction 
> strategy’s selection criteria for older windows. 
> Patch provided for : 
> - 2.1: https://github.com/jeffjirsa/cassandra/commits/twcs-2.1 
> - 2.2: https://github.com/jeffjirsa/cassandra/commits/twcs-2.2
> - trunk (post-8099):  https://github.com/jeffjirsa/cassandra/commits/twcs 
> Rebased, force-pushed July 18, with bug fixes for estimated pending 
> compactions and potential starvation if more than min_threshold tables 
> existed in current window but STCS did not 

[jira] [Comment Edited] (CASSANDRA-9666) Provide an alternative to DTCS

2015-12-01 Thread Jeff Jirsa (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15034070#comment-15034070
 ] 

Jeff Jirsa edited comment on CASSANDRA-9666 at 12/1/15 5:07 PM:


 If/when it's wont-fixed, I'll continue updating at 
https://github.com/jeffjirsa/twcs/

We're almost certainly going to continue using TWCS, and given that others are 
using it in production, I'll continue maintaining it until that's no longer true

For what it's worth, we'll continue using TWCS because the explicit confit 
options are easier to reason about, and it's significantly less likely to be 
confused by old data via foreground read repair



was (Author: jjirsa):
 If/when it's wont-fixed, I'll continue updating at 
https://github.com/jeffjirsa/twcs/

We're almost certainly going to continue using TWCS, and given that others are 
using it in production, I'll continue maintaining it until that's no longer true


> Provide an alternative to DTCS
> --
>
> Key: CASSANDRA-9666
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9666
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Jeff Jirsa
>Assignee: Jeff Jirsa
> Fix For: 2.1.x, 2.2.x
>
>
> DTCS is great for time series data, but it comes with caveats that make it 
> difficult to use in production (typical operator behaviors such as bootstrap, 
> removenode, and repair have MAJOR caveats as they relate to 
> max_sstable_age_days, and hints/read repair break the selection algorithm).
> I'm proposing an alternative, TimeWindowCompactionStrategy, that sacrifices 
> the tiered nature of DTCS in order to address some of DTCS' operational 
> shortcomings. I believe it is necessary to propose an alternative rather than 
> simply adjusting DTCS, because it fundamentally removes the tiered nature in 
> order to remove the parameter max_sstable_age_days - the result is very very 
> different, even if it is heavily inspired by DTCS. 
> Specifically, rather than creating a number of windows of ever increasing 
> sizes, this strategy allows an operator to choose the window size, compact 
> with STCS within the first window of that size, and aggressive compact down 
> to a single sstable once that window is no longer current. The window size is 
> a combination of unit (minutes, hours, days) and size (1, etc), such that an 
> operator can expect all data using a block of that size to be compacted 
> together (that is, if your unit is hours, and size is 6, you will create 
> roughly 4 sstables per day, each one containing roughly 6 hours of data). 
> The result addresses a number of the problems with 
> DateTieredCompactionStrategy:
> - At the present time, DTCS’s first window is compacted using an unusual 
> selection criteria, which prefers files with earlier timestamps, but ignores 
> sizes. In TimeWindowCompactionStrategy, the first window data will be 
> compacted with the well tested, fast, reliable STCS. All STCS options can be 
> passed to TimeWindowCompactionStrategy to configure the first window’s 
> compaction behavior.
> - HintedHandoff may put old data in new sstables, but it will have little 
> impact other than slightly reduced efficiency (sstables will cover a wider 
> range, but the old timestamps will not impact sstable selection criteria 
> during compaction)
> - ReadRepair may put old data in new sstables, but it will have little impact 
> other than slightly reduced efficiency (sstables will cover a wider range, 
> but the old timestamps will not impact sstable selection criteria during 
> compaction)
> - Small, old sstables resulting from streams of any kind will be swiftly and 
> aggressively compacted with the other sstables matching their similar 
> maxTimestamp, without causing sstables in neighboring windows to grow in size.
> - The configuration options are explicit and straightforward - the tuning 
> parameters leave little room for error. The window is set in common, easily 
> understandable terms such as “12 hours”, “1 Day”, “30 days”. The 
> minute/hour/day options are granular enough for users keeping data for hours, 
> and users keeping data for years. 
> - There is no explicitly configurable max sstable age, though sstables will 
> naturally stop compacting once new data is written in that window. 
> - Streaming operations can create sstables with old timestamps, and they'll 
> naturally be joined together with sstables in the same time bucket. This is 
> true for bootstrap/repair/sstableloader/removenode. 
> - It remains true that if old data and new data is written into the memtable 
> at the same time, the resulting sstables will be treated as if they were new 
> sstables, however, that no longer negatively impacts the compaction 
> strategy’s selection criteria for older windows. 
> Patch provided for : 
> - 2.1: 

[jira] [Comment Edited] (CASSANDRA-9666) Provide an alternative to DTCS

2015-12-01 Thread Jeff Jirsa (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15034070#comment-15034070
 ] 

Jeff Jirsa edited comment on CASSANDRA-9666 at 12/1/15 5:19 PM:


 If/when it's wont-fixed, I'll continue updating at 
https://github.com/jeffjirsa/twcs/

We're almost certainly going to continue using TWCS, and given that others are 
using it in production, I'll continue maintaining it until that's no longer true

For what it's worth, we'll continue using TWCS because the explicit config 
options are easier to reason about, and it's significantly less likely to be 
confused by old data via foreground read repair



was (Author: jjirsa):
 If/when it's wont-fixed, I'll continue updating at 
https://github.com/jeffjirsa/twcs/

We're almost certainly going to continue using TWCS, and given that others are 
using it in production, I'll continue maintaining it until that's no longer true

For what it's worth, we'll continue using TWCS because the explicit confit 
options are easier to reason about, and it's significantly less likely to be 
confused by old data via foreground read repair


> Provide an alternative to DTCS
> --
>
> Key: CASSANDRA-9666
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9666
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Jeff Jirsa
>Assignee: Jeff Jirsa
> Fix For: 2.1.x, 2.2.x
>
>
> DTCS is great for time series data, but it comes with caveats that make it 
> difficult to use in production (typical operator behaviors such as bootstrap, 
> removenode, and repair have MAJOR caveats as they relate to 
> max_sstable_age_days, and hints/read repair break the selection algorithm).
> I'm proposing an alternative, TimeWindowCompactionStrategy, that sacrifices 
> the tiered nature of DTCS in order to address some of DTCS' operational 
> shortcomings. I believe it is necessary to propose an alternative rather than 
> simply adjusting DTCS, because it fundamentally removes the tiered nature in 
> order to remove the parameter max_sstable_age_days - the result is very very 
> different, even if it is heavily inspired by DTCS. 
> Specifically, rather than creating a number of windows of ever increasing 
> sizes, this strategy allows an operator to choose the window size, compact 
> with STCS within the first window of that size, and aggressive compact down 
> to a single sstable once that window is no longer current. The window size is 
> a combination of unit (minutes, hours, days) and size (1, etc), such that an 
> operator can expect all data using a block of that size to be compacted 
> together (that is, if your unit is hours, and size is 6, you will create 
> roughly 4 sstables per day, each one containing roughly 6 hours of data). 
> The result addresses a number of the problems with 
> DateTieredCompactionStrategy:
> - At the present time, DTCS’s first window is compacted using an unusual 
> selection criteria, which prefers files with earlier timestamps, but ignores 
> sizes. In TimeWindowCompactionStrategy, the first window data will be 
> compacted with the well tested, fast, reliable STCS. All STCS options can be 
> passed to TimeWindowCompactionStrategy to configure the first window’s 
> compaction behavior.
> - HintedHandoff may put old data in new sstables, but it will have little 
> impact other than slightly reduced efficiency (sstables will cover a wider 
> range, but the old timestamps will not impact sstable selection criteria 
> during compaction)
> - ReadRepair may put old data in new sstables, but it will have little impact 
> other than slightly reduced efficiency (sstables will cover a wider range, 
> but the old timestamps will not impact sstable selection criteria during 
> compaction)
> - Small, old sstables resulting from streams of any kind will be swiftly and 
> aggressively compacted with the other sstables matching their similar 
> maxTimestamp, without causing sstables in neighboring windows to grow in size.
> - The configuration options are explicit and straightforward - the tuning 
> parameters leave little room for error. The window is set in common, easily 
> understandable terms such as “12 hours”, “1 Day”, “30 days”. The 
> minute/hour/day options are granular enough for users keeping data for hours, 
> and users keeping data for years. 
> - There is no explicitly configurable max sstable age, though sstables will 
> naturally stop compacting once new data is written in that window. 
> - Streaming operations can create sstables with old timestamps, and they'll 
> naturally be joined together with sstables in the same time bucket. This is 
> true for bootstrap/repair/sstableloader/removenode. 
> - It remains true that if old data and new data is written into the memtable 
> at the same time, the