[ 
https://issues.apache.org/jira/browse/CASSANDRA-9666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15216688#comment-15216688
 ] 

Jonathan Shook edited comment on CASSANDRA-9666 at 3/29/16 7:21 PM:
--------------------------------------------------------------------

There are two areas of concern that we should discuss more directly..

1. The pacing of memtable flushing on a given system can be matched up with the 
base window size with DTCS, avoiding logical write amplification that can occur 
before the scheduling discipline kicks in. This is not so easy when  you water 
down the configuration and remove the ability to manage the fresh sstables. The 
benefits from time-series friendly compaction can be had for both the newest 
and the oldest tables, and both are relevant here.

2. The window placement. From what I've seen, the anchoring point for whether a 
cell goes into a bucket or not is different between the two approaches. To me 
this is fairly arbitrary in terms of processing overhead comparisons, all else 
assumed close enough. However, when trying to reconcile, shifting all of your 
data to a different bucket will not be a welcome event for most users. This 
makes "graceful" reconciliation difficult at best.

Can we simply try to make DTCS as (perceptually) easy to use for the default 
case as TWCS (perceptually) ? To me, this is more about the user entry point 
and understanding behavior as designed than it is about the machinery that 
makes it happen.

The basic design between them has so much in common that reconciling them 
completely would be mostly a shell game of parameter names as well as lobbing 
off some functionality that can be completely bypassed, given the right 
settings.

Can we identify the functionally equivalent settings for TWCS that DTCS needs 
to emulate, given proper settings (possibly including anchoring point), and 
then simply provide the same simple configuration to users, without having to 
maintain two separate sibling compaction strategies?

One sticking point that I've had on this suggesting in conversation is the 
bucketing logic being too difficult to think about. If we were able to provide 
the self-same behavior for TWCS-like configuration, the bucketing logic could 
be used only when the parameters require non-uniform windows. Would that make 
everyone happy?







was (Author: jshook):
There are two areas of concern that we should discuss more directly..

1. The pacing of memtable flushing on a given system can be matched up with the 
base window size with DTCS, avoiding logical write amplification that can occur 
before the scheduling discipline kicks in. This is not so easy when  you water 
down the configuration and remove the ability to manage the fresh sstables. The 
benefits from time-series friendly compaction can be had for both the newest 
and the oldest tables, and both are relevant here.

2. The window placement. From what I've seen, the anchoring point for whether a 
cell goes into a bucket or not is different between the two approaches. To me 
this is fairly arbitrary in terms of processing overhead comparisons, all else 
assumed close enough. However, when trying to reconcile, shifting all of your 
data to a different bucket will not be a welcome event for most users. This 
makes "graceful" reconciliation difficult at best.

Can we simply try to make DTCS as (perceptually) easy to use for the default 
case as TWCS (perceptually) ? To me, this is more about the user entry point 
and understanding behavior as designed than it is about the machinery that 
makes it happen.

The basic design between them has so much in common that reconciling them 
completely would be mostly a shell game of parameter names as well as lobbing 
off some functionality that can be complete bypassed, given the right settings.

Can we identify the functionally equivalent settings for TWCS that DTCS needs 
to emulate, given proper settings (possibly including anchoring point), and 
then simply provide the same simple configuration to users, without having to 
maintain two separate sibling compaction strategies?

One sticking point that I've had on this suggesting in conversation is the 
bucketing logic being too difficult to think about. If we were able to provide 
the self-same behavior for TWCS-like configuration, the bucketing logic could 
be used only when the parameters require non-uniform windows. Would that make 
everyone happy?






> Provide an alternative to DTCS
> ------------------------------
>
>                 Key: CASSANDRA-9666
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9666
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jeff Jirsa
>            Assignee: Jeff Jirsa
>             Fix For: 2.1.x, 2.2.x
>
>         Attachments: dtcs-twcs-io.png, dtcs-twcs-load.png
>
>
> DTCS is great for time series data, but it comes with caveats that make it 
> difficult to use in production (typical operator behaviors such as bootstrap, 
> removenode, and repair have MAJOR caveats as they relate to 
> max_sstable_age_days, and hints/read repair break the selection algorithm).
> I'm proposing an alternative, TimeWindowCompactionStrategy, that sacrifices 
> the tiered nature of DTCS in order to address some of DTCS' operational 
> shortcomings. I believe it is necessary to propose an alternative rather than 
> simply adjusting DTCS, because it fundamentally removes the tiered nature in 
> order to remove the parameter max_sstable_age_days - the result is very very 
> different, even if it is heavily inspired by DTCS. 
> Specifically, rather than creating a number of windows of ever increasing 
> sizes, this strategy allows an operator to choose the window size, compact 
> with STCS within the first window of that size, and aggressive compact down 
> to a single sstable once that window is no longer current. The window size is 
> a combination of unit (minutes, hours, days) and size (1, etc), such that an 
> operator can expect all data using a block of that size to be compacted 
> together (that is, if your unit is hours, and size is 6, you will create 
> roughly 4 sstables per day, each one containing roughly 6 hours of data). 
> The result addresses a number of the problems with 
> DateTieredCompactionStrategy:
> - At the present time, DTCS’s first window is compacted using an unusual 
> selection criteria, which prefers files with earlier timestamps, but ignores 
> sizes. In TimeWindowCompactionStrategy, the first window data will be 
> compacted with the well tested, fast, reliable STCS. All STCS options can be 
> passed to TimeWindowCompactionStrategy to configure the first window’s 
> compaction behavior.
> - HintedHandoff may put old data in new sstables, but it will have little 
> impact other than slightly reduced efficiency (sstables will cover a wider 
> range, but the old timestamps will not impact sstable selection criteria 
> during compaction)
> - ReadRepair may put old data in new sstables, but it will have little impact 
> other than slightly reduced efficiency (sstables will cover a wider range, 
> but the old timestamps will not impact sstable selection criteria during 
> compaction)
> - Small, old sstables resulting from streams of any kind will be swiftly and 
> aggressively compacted with the other sstables matching their similar 
> maxTimestamp, without causing sstables in neighboring windows to grow in size.
> - The configuration options are explicit and straightforward - the tuning 
> parameters leave little room for error. The window is set in common, easily 
> understandable terms such as “12 hours”, “1 Day”, “30 days”. The 
> minute/hour/day options are granular enough for users keeping data for hours, 
> and users keeping data for years. 
> - There is no explicitly configurable max sstable age, though sstables will 
> naturally stop compacting once new data is written in that window. 
> - Streaming operations can create sstables with old timestamps, and they'll 
> naturally be joined together with sstables in the same time bucket. This is 
> true for bootstrap/repair/sstableloader/removenode. 
> - It remains true that if old data and new data is written into the memtable 
> at the same time, the resulting sstables will be treated as if they were new 
> sstables, however, that no longer negatively impacts the compaction 
> strategy’s selection criteria for older windows. 
> Patch provided for : 
> - 2.1: https://github.com/jeffjirsa/cassandra/commits/twcs-2.1 
> - 2.2: https://github.com/jeffjirsa/cassandra/commits/twcs-2.2
> - trunk (post-8099):  https://github.com/jeffjirsa/cassandra/commits/twcs 
> Rebased, force-pushed July 18, with bug fixes for estimated pending 
> compactions and potential starvation if more than min_threshold tables 
> existed in current window but STCS did not consider them viable candidates
> Rebased, force-pushed Aug 20 to bring in relevant logic from CASSANDRA-9882



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to