[ 
https://issues.apache.org/jira/browse/CASSANDRA-9666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15240056#comment-15240056
 ] 

Lucas de Souza Santos edited comment on CASSANDRA-9666 at 4/13/16 10:02 PM:
----------------------------------------------------------------------------

I have been working for the biggest online news in Brazil and we are 
using/building a timeseries system with cassandra as persistence.
Since the end of last year we decided to use DTCS. This timeseries 
implementation is rolling to production to substitute legacy cacti.

Right now the cluster is composed by 10 Dell PE R6XX, some 610 and some 620. 
All with SAS discs, 8 cpus and 32GB of RAM, Linux Centos 6, kernel 2.
6.32.

Since Jan 20 2016 we are running cassandra21-2.1.12 over JRE 7. At that point 
we were just doing some tests, receiving ~140k points/minute. The cluster was 
fine and using STCS (the default) as compaction strategy.

At the end of February I changed to DTCS and we doubled the load, passing to 
around 200k points/minute. A week after, we saw the cpu load growing up, 
together with disc space and memory. First we thought it was us using more, so 
we built some dashboards to visualize the data.

About 3 weeks ago the cluster started to get some timeouts and we lost a node 
at least two times, a reboot was needed to get the node back.

Things I have done trying to fix/improve the cluster:

Upgraded jre7 to jdk8, configured GC G1, altered memtable_cleanup_threshold to 
0.10 (was using 0.20, getting this value high made the problem worst).

Changed all applications using cassandra to use consistency ONE because GC 
pause was putting nodes out of the cluster and we were receiving a lot of 
timeouts.
After those changes the cluster was better to use but we were not confident in 
growing the number of requests. Last week I noticed a problem when restarting 
any node, it took at lest 15 minutes, sometimes 30 minutes just to load/open 
sstables. I checked the data on disc and saw that cassandra created more than 
10 million sstables. I couldn't do a simple "ls" in any datadir (I have 14 
keyspaces).

Doing a search for cassandra issues with DTCS we found TWCS as an alternative, 
and we saw several of the problems we had reported regarding DTCS. I couldn't 
even wait for an complete test in QA afraid of a crash in production, so I 
decided to apply TWCS in our biggest keyspace. The result was impressive, from 
more than 2.5 million sstables to around 30 for each node (after full 
compaction). No data loss, no change in load or memory improvement. Given these 
results, yesterday (03/12/2016) I decide to apply TWCS to all 14 keyspaces, and 
today the result, at least for me, is mind blowing.

Now I have around 500 sstables per node, sum of all keyspaces, from 10 million 
to 500! The load5 dropped from ~6 to ~0.5, cassandra released around 3GB RAM 
per node. Disc usage drooped from ~150 GB to ~120GB. Right after that, the 
number of requests got up from 120k to 190k requests per minute and we are 
seeing no change in load.



Our DTCS create table:
CREATE TABLE IF NOT EXISTS %s.ts_number (id text, date timeuuid, value double, 
PRIMARY KEY (id, date))
WITH CLUSTERING ORDER BY (date ASC)
AND bloom_filter_fp_chance = 0.01
AND caching = {'keys':'ALL', 'rows_per_partition':'NONE'}
AND comment = ''
AND compaction= 
'class':'org.apache.cassandra.db.compaction.DateTieredCompactionStrategy', 
'tombstone_compaction_interval': '7', 'min_threshold': '8', 'max_threshold': 
'64', 'timestamp_resolution':'MILLISECONDS', 'base_time_seconds':'3600', 
'max_sstable_age_days':'365'}
AND compression = {'crc_check_chance': '0.5', 'sstable_compression': 
'org.apache.cassandra.io.compress.LZ4Compressor'}
AND dclocal_read_repair_chance = 0.0
AND default_time_to_live = %d
AND gc_grace_seconds = 0
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99.0PERCENTILE'
 AND 
compaction={'class':'org.apache.cassandra.db.compaction.DateTieredCompactionStrategy',
 'tombstone_compaction_interval': '7', 'min_threshold': '8', 'max_threshold': 
'64', 'timestamp_resolution':'MILLISECONDS', 'base_time_seconds':'3600', 
'max_sstable_age_days':'365'}


TWCS:
CREATE TABLE XXXX.ts_number (
    id text,
    date timeuuid,
    value double,
    PRIMARY KEY (id, date)
) WITH CLUSTERING ORDER BY (date ASC)
    AND bloom_filter_fp_chance = 0.01
    AND caching = '{"keys":"ALL", "rows_per_partition":"100"}'
    AND comment = ''
    AND compaction = {'compaction_window_unit': 'DAYS', 
'compaction_window_size': '7', 'class': 
'com.jeffjirsa.cassandra.db.compaction.TimeWindowCompactionStrategy'}
    AND compression = {'crc_check_chance': '0.5', 'sstable_compression': 
'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND dclocal_read_repair_chance = 0.0
    AND default_time_to_live = 0
    AND gc_grace_seconds = 0
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99.0PERCENTILE';






was (Author: lucasdss):
I have been working for the biggest online news in Brazil and we are 
using/building a timeseries system with cassandra as persistence.
Since the end of last year we decided to use DTCS. This timeseries 
implementation is rolling to production to substitute legacy cacti.

Right now the cluster is composed by 10 Dell PE R6XX, some 610 and some 620. 
All with SAS discs, 8 cpus and 32GB of RAM, Linux Centos 6, kernel 2.
6.32.

Since Jan 20 2016 we are running cassandra21-2.1.12 over JRE 7. At that point 
we were just doing some tests, receiving ~140k points/minute. The cluster was 
fine and using STCS (the default) as compaction strategy.

At the end of February I changed to DTCS and we doubled the load, passing to 
around 200k points/minute. A week after, we saw the cpu load growing up, 
together with disc space and memory. First we thought it was us using more, so 
we built some dashboards to visualize the data.

About 3 weeks ago the cluster started to get some timeouts and we lost a node 
at least two times, a reboot was needed to get the node back.

Things I have done trying to fix/improve the cluster:

Upgraded jre7 to jdk8, configured GC G1, altered memtable_cleanup_threshold to 
0.10 (was using 0.20, getting this value high made the problem worst).

Changed all applications using cassandra to use consistency ONE because GC 
pause was putting nodes out of the cluster and we were receiving a lot of 
timeouts.
After those changes the cluster was better to use but we were not confident in 
growing the number of requests. Last week I noticed a problem when restarting 
any node, it took at lest 15 minutes, sometimes 30 minutes just to load/open 
sstables. I checked the data on disc and saw that cassandra created more than 
10 million sstables. I couldn't do a simple "ls" in any datadir (I have 14 
keyspaces).

Doing a search for cassandra issues with DTCS we found TWCS as an alternative, 
and we saw several of the problems we had reported regarding DTCS. I couldn't 
even wait for an complete test in QA afraid of a crash in production, so I 
decided to apply TWCS in our biggest keyspace. The result was impressive, from 
more than 2.5 million sstables to around 30 for each node (after full 
compaction). No data loss, no change in load or memory improvement. Given these 
results, yesterday (03/12/2016) I decide to apply TWCS to all 14 keyspaces, and 
today the result, at least for me, is mind blowing.

Now I have around 500 sstables per node, sum of all keyspaces, from 10 million 
to 500! The load5 dropped from ~6 to ~0.5, cassandra released around 3GB RAM 
per node. Disc usage drooped from ~150 GB to ~120GB. Right after that, the 
number of requests got up from 120k to 190k requests per minute and we are 
seeing no change in load.



Our DTCS create table:
CREATE TABLE IF NOT EXISTS %s.ts_number (id text, date timeuuid, value double, 
PRIMARY KEY (id, date))
WITH CLUSTERING ORDER BY (date ASC)
AND bloom_filter_fp_chance = 0.01
AND caching = {'keys':'ALL', 'rows_per_partition':'NONE'}
AND comment = ''
AND compaction={ 'min_threshold': '8', 'max_threshold': '64', 
'compaction_window_unit': 'DAYS', 'compaction_window_size': '7', 'class': 
'com.jeffjirsa.cassandra.db.compaction.TimeWindowCompactionStrategy'}
AND compression = {'crc_check_chance': '0.5', 'sstable_compression': 
'org.apache.cassandra.io.compress.LZ4Compressor'}
AND dclocal_read_repair_chance = 0.0
AND default_time_to_live = %d
AND gc_grace_seconds = 0
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = '99.0PERCENTILE'
 AND 
compaction={'class':'org.apache.cassandra.db.compaction.DateTieredCompactionStrategy',
 'tombstone_compaction_interval': '7', 'min_threshold': '8', 'max_threshold': 
'64', 'timestamp_resolution':'MILLISECONDS', 'base_time_seconds':'3600', 
'max_sstable_age_days':'365'}


TWCS:
CREATE TABLE XXXX.ts_number (
    id text,
    date timeuuid,
    value double,
    PRIMARY KEY (id, date)
) WITH CLUSTERING ORDER BY (date ASC)
    AND bloom_filter_fp_chance = 0.01
    AND caching = '{"keys":"ALL", "rows_per_partition":"100"}'
    AND comment = ''
    AND compaction = {'compaction_window_unit': 'DAYS', 
'compaction_window_size': '7', 'class': 
'com.jeffjirsa.cassandra.db.compaction.TimeWindowCompactionStrategy'}
    AND compression = {'crc_check_chance': '0.5', 'sstable_compression': 
'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND dclocal_read_repair_chance = 0.0
    AND default_time_to_live = 0
    AND gc_grace_seconds = 0
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99.0PERCENTILE';





> Provide an alternative to DTCS
> ------------------------------
>
>                 Key: CASSANDRA-9666
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9666
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jeff Jirsa
>            Assignee: Jeff Jirsa
>             Fix For: 2.1.x, 2.2.x
>
>         Attachments: dashboard-DTCS_to_TWCS.png, dtcs-twcs-io.png, 
> dtcs-twcs-load.png
>
>
> DTCS is great for time series data, but it comes with caveats that make it 
> difficult to use in production (typical operator behaviors such as bootstrap, 
> removenode, and repair have MAJOR caveats as they relate to 
> max_sstable_age_days, and hints/read repair break the selection algorithm).
> I'm proposing an alternative, TimeWindowCompactionStrategy, that sacrifices 
> the tiered nature of DTCS in order to address some of DTCS' operational 
> shortcomings. I believe it is necessary to propose an alternative rather than 
> simply adjusting DTCS, because it fundamentally removes the tiered nature in 
> order to remove the parameter max_sstable_age_days - the result is very very 
> different, even if it is heavily inspired by DTCS. 
> Specifically, rather than creating a number of windows of ever increasing 
> sizes, this strategy allows an operator to choose the window size, compact 
> with STCS within the first window of that size, and aggressive compact down 
> to a single sstable once that window is no longer current. The window size is 
> a combination of unit (minutes, hours, days) and size (1, etc), such that an 
> operator can expect all data using a block of that size to be compacted 
> together (that is, if your unit is hours, and size is 6, you will create 
> roughly 4 sstables per day, each one containing roughly 6 hours of data). 
> The result addresses a number of the problems with 
> DateTieredCompactionStrategy:
> - At the present time, DTCS’s first window is compacted using an unusual 
> selection criteria, which prefers files with earlier timestamps, but ignores 
> sizes. In TimeWindowCompactionStrategy, the first window data will be 
> compacted with the well tested, fast, reliable STCS. All STCS options can be 
> passed to TimeWindowCompactionStrategy to configure the first window’s 
> compaction behavior.
> - HintedHandoff may put old data in new sstables, but it will have little 
> impact other than slightly reduced efficiency (sstables will cover a wider 
> range, but the old timestamps will not impact sstable selection criteria 
> during compaction)
> - ReadRepair may put old data in new sstables, but it will have little impact 
> other than slightly reduced efficiency (sstables will cover a wider range, 
> but the old timestamps will not impact sstable selection criteria during 
> compaction)
> - Small, old sstables resulting from streams of any kind will be swiftly and 
> aggressively compacted with the other sstables matching their similar 
> maxTimestamp, without causing sstables in neighboring windows to grow in size.
> - The configuration options are explicit and straightforward - the tuning 
> parameters leave little room for error. The window is set in common, easily 
> understandable terms such as “12 hours”, “1 Day”, “30 days”. The 
> minute/hour/day options are granular enough for users keeping data for hours, 
> and users keeping data for years. 
> - There is no explicitly configurable max sstable age, though sstables will 
> naturally stop compacting once new data is written in that window. 
> - Streaming operations can create sstables with old timestamps, and they'll 
> naturally be joined together with sstables in the same time bucket. This is 
> true for bootstrap/repair/sstableloader/removenode. 
> - It remains true that if old data and new data is written into the memtable 
> at the same time, the resulting sstables will be treated as if they were new 
> sstables, however, that no longer negatively impacts the compaction 
> strategy’s selection criteria for older windows. 
> Patch provided for : 
> - 2.1: https://github.com/jeffjirsa/cassandra/commits/twcs-2.1 
> - 2.2: https://github.com/jeffjirsa/cassandra/commits/twcs-2.2
> - trunk (post-8099):  https://github.com/jeffjirsa/cassandra/commits/twcs 
> Rebased, force-pushed July 18, with bug fixes for estimated pending 
> compactions and potential starvation if more than min_threshold tables 
> existed in current window but STCS did not consider them viable candidates
> Rebased, force-pushed Aug 20 to bring in relevant logic from CASSANDRA-9882



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to