[jira] [Updated] (CASSANDRA-9666) Provide an alternative to DTCS

2016-06-08 Thread Marcus Eriksson (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcus Eriksson updated CASSANDRA-9666:
---
Fix Version/s: (was: 3.0.x)
   3.0.8

> Provide an alternative to DTCS
> --
>
> Key: CASSANDRA-9666
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9666
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Jeff Jirsa
>Assignee: Jeff Jirsa
> Fix For: 3.8, 3.0.8
>
> Attachments: compactomatic.py, dashboard-DTCS_to_TWCS.png, 
> dtcs-twcs-io.png, dtcs-twcs-load.png
>
>
> DTCS is great for time series data, but it comes with caveats that make it 
> difficult to use in production (typical operator behaviors such as bootstrap, 
> removenode, and repair have MAJOR caveats as they relate to 
> max_sstable_age_days, and hints/read repair break the selection algorithm).
> I'm proposing an alternative, TimeWindowCompactionStrategy, that sacrifices 
> the tiered nature of DTCS in order to address some of DTCS' operational 
> shortcomings. I believe it is necessary to propose an alternative rather than 
> simply adjusting DTCS, because it fundamentally removes the tiered nature in 
> order to remove the parameter max_sstable_age_days - the result is very very 
> different, even if it is heavily inspired by DTCS. 
> Specifically, rather than creating a number of windows of ever increasing 
> sizes, this strategy allows an operator to choose the window size, compact 
> with STCS within the first window of that size, and aggressive compact down 
> to a single sstable once that window is no longer current. The window size is 
> a combination of unit (minutes, hours, days) and size (1, etc), such that an 
> operator can expect all data using a block of that size to be compacted 
> together (that is, if your unit is hours, and size is 6, you will create 
> roughly 4 sstables per day, each one containing roughly 6 hours of data). 
> The result addresses a number of the problems with 
> DateTieredCompactionStrategy:
> - At the present time, DTCS’s first window is compacted using an unusual 
> selection criteria, which prefers files with earlier timestamps, but ignores 
> sizes. In TimeWindowCompactionStrategy, the first window data will be 
> compacted with the well tested, fast, reliable STCS. All STCS options can be 
> passed to TimeWindowCompactionStrategy to configure the first window’s 
> compaction behavior.
> - HintedHandoff may put old data in new sstables, but it will have little 
> impact other than slightly reduced efficiency (sstables will cover a wider 
> range, but the old timestamps will not impact sstable selection criteria 
> during compaction)
> - ReadRepair may put old data in new sstables, but it will have little impact 
> other than slightly reduced efficiency (sstables will cover a wider range, 
> but the old timestamps will not impact sstable selection criteria during 
> compaction)
> - Small, old sstables resulting from streams of any kind will be swiftly and 
> aggressively compacted with the other sstables matching their similar 
> maxTimestamp, without causing sstables in neighboring windows to grow in size.
> - The configuration options are explicit and straightforward - the tuning 
> parameters leave little room for error. The window is set in common, easily 
> understandable terms such as “12 hours”, “1 Day”, “30 days”. The 
> minute/hour/day options are granular enough for users keeping data for hours, 
> and users keeping data for years. 
> - There is no explicitly configurable max sstable age, though sstables will 
> naturally stop compacting once new data is written in that window. 
> - Streaming operations can create sstables with old timestamps, and they'll 
> naturally be joined together with sstables in the same time bucket. This is 
> true for bootstrap/repair/sstableloader/removenode. 
> - It remains true that if old data and new data is written into the memtable 
> at the same time, the resulting sstables will be treated as if they were new 
> sstables, however, that no longer negatively impacts the compaction 
> strategy’s selection criteria for older windows. 
> Patch provided for : 
> - 2.1: https://github.com/jeffjirsa/cassandra/commits/twcs-2.1 
> - 2.2: https://github.com/jeffjirsa/cassandra/commits/twcs-2.2
> - trunk (post-8099):  https://github.com/jeffjirsa/cassandra/commits/twcs 
> Rebased, force-pushed July 18, with bug fixes for estimated pending 
> compactions and potential starvation if more than min_threshold tables 
> existed in current window but STCS did not consider them viable candidates
> Rebased, force-pushed Aug 20 to bring in relevant logic from CASSANDRA-9882



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-9666) Provide an alternative to DTCS

2016-06-06 Thread Marcus Eriksson (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcus Eriksson updated CASSANDRA-9666:
---
   Resolution: Fixed
Fix Version/s: (was: 3.x)
   3.8
   3.0.x
   Status: Resolved  (was: Patch Available)

committed, thanks! \o/

I'll update the 3.0 fixver to 3.0.8 once it is available in jira

> Provide an alternative to DTCS
> --
>
> Key: CASSANDRA-9666
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9666
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Jeff Jirsa
>Assignee: Jeff Jirsa
> Fix For: 3.0.x, 3.8
>
> Attachments: compactomatic.py, dashboard-DTCS_to_TWCS.png, 
> dtcs-twcs-io.png, dtcs-twcs-load.png
>
>
> DTCS is great for time series data, but it comes with caveats that make it 
> difficult to use in production (typical operator behaviors such as bootstrap, 
> removenode, and repair have MAJOR caveats as they relate to 
> max_sstable_age_days, and hints/read repair break the selection algorithm).
> I'm proposing an alternative, TimeWindowCompactionStrategy, that sacrifices 
> the tiered nature of DTCS in order to address some of DTCS' operational 
> shortcomings. I believe it is necessary to propose an alternative rather than 
> simply adjusting DTCS, because it fundamentally removes the tiered nature in 
> order to remove the parameter max_sstable_age_days - the result is very very 
> different, even if it is heavily inspired by DTCS. 
> Specifically, rather than creating a number of windows of ever increasing 
> sizes, this strategy allows an operator to choose the window size, compact 
> with STCS within the first window of that size, and aggressive compact down 
> to a single sstable once that window is no longer current. The window size is 
> a combination of unit (minutes, hours, days) and size (1, etc), such that an 
> operator can expect all data using a block of that size to be compacted 
> together (that is, if your unit is hours, and size is 6, you will create 
> roughly 4 sstables per day, each one containing roughly 6 hours of data). 
> The result addresses a number of the problems with 
> DateTieredCompactionStrategy:
> - At the present time, DTCS’s first window is compacted using an unusual 
> selection criteria, which prefers files with earlier timestamps, but ignores 
> sizes. In TimeWindowCompactionStrategy, the first window data will be 
> compacted with the well tested, fast, reliable STCS. All STCS options can be 
> passed to TimeWindowCompactionStrategy to configure the first window’s 
> compaction behavior.
> - HintedHandoff may put old data in new sstables, but it will have little 
> impact other than slightly reduced efficiency (sstables will cover a wider 
> range, but the old timestamps will not impact sstable selection criteria 
> during compaction)
> - ReadRepair may put old data in new sstables, but it will have little impact 
> other than slightly reduced efficiency (sstables will cover a wider range, 
> but the old timestamps will not impact sstable selection criteria during 
> compaction)
> - Small, old sstables resulting from streams of any kind will be swiftly and 
> aggressively compacted with the other sstables matching their similar 
> maxTimestamp, without causing sstables in neighboring windows to grow in size.
> - The configuration options are explicit and straightforward - the tuning 
> parameters leave little room for error. The window is set in common, easily 
> understandable terms such as “12 hours”, “1 Day”, “30 days”. The 
> minute/hour/day options are granular enough for users keeping data for hours, 
> and users keeping data for years. 
> - There is no explicitly configurable max sstable age, though sstables will 
> naturally stop compacting once new data is written in that window. 
> - Streaming operations can create sstables with old timestamps, and they'll 
> naturally be joined together with sstables in the same time bucket. This is 
> true for bootstrap/repair/sstableloader/removenode. 
> - It remains true that if old data and new data is written into the memtable 
> at the same time, the resulting sstables will be treated as if they were new 
> sstables, however, that no longer negatively impacts the compaction 
> strategy’s selection criteria for older windows. 
> Patch provided for : 
> - 2.1: https://github.com/jeffjirsa/cassandra/commits/twcs-2.1 
> - 2.2: https://github.com/jeffjirsa/cassandra/commits/twcs-2.2
> - trunk (post-8099):  https://github.com/jeffjirsa/cassandra/commits/twcs 
> Rebased, force-pushed July 18, with bug fixes for estimated pending 
> compactions and potential starvation if more than min_threshold tables 
> existed in current window but STCS did not consider them viable 

[jira] [Updated] (CASSANDRA-9666) Provide an alternative to DTCS

2016-05-25 Thread Jonathan Ellis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-9666:
--
Status: Patch Available  (was: Open)

> Provide an alternative to DTCS
> --
>
> Key: CASSANDRA-9666
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9666
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Jeff Jirsa
>Assignee: Jeff Jirsa
> Fix For: 3.x
>
> Attachments: compactomatic.py, dashboard-DTCS_to_TWCS.png, 
> dtcs-twcs-io.png, dtcs-twcs-load.png
>
>
> DTCS is great for time series data, but it comes with caveats that make it 
> difficult to use in production (typical operator behaviors such as bootstrap, 
> removenode, and repair have MAJOR caveats as they relate to 
> max_sstable_age_days, and hints/read repair break the selection algorithm).
> I'm proposing an alternative, TimeWindowCompactionStrategy, that sacrifices 
> the tiered nature of DTCS in order to address some of DTCS' operational 
> shortcomings. I believe it is necessary to propose an alternative rather than 
> simply adjusting DTCS, because it fundamentally removes the tiered nature in 
> order to remove the parameter max_sstable_age_days - the result is very very 
> different, even if it is heavily inspired by DTCS. 
> Specifically, rather than creating a number of windows of ever increasing 
> sizes, this strategy allows an operator to choose the window size, compact 
> with STCS within the first window of that size, and aggressive compact down 
> to a single sstable once that window is no longer current. The window size is 
> a combination of unit (minutes, hours, days) and size (1, etc), such that an 
> operator can expect all data using a block of that size to be compacted 
> together (that is, if your unit is hours, and size is 6, you will create 
> roughly 4 sstables per day, each one containing roughly 6 hours of data). 
> The result addresses a number of the problems with 
> DateTieredCompactionStrategy:
> - At the present time, DTCS’s first window is compacted using an unusual 
> selection criteria, which prefers files with earlier timestamps, but ignores 
> sizes. In TimeWindowCompactionStrategy, the first window data will be 
> compacted with the well tested, fast, reliable STCS. All STCS options can be 
> passed to TimeWindowCompactionStrategy to configure the first window’s 
> compaction behavior.
> - HintedHandoff may put old data in new sstables, but it will have little 
> impact other than slightly reduced efficiency (sstables will cover a wider 
> range, but the old timestamps will not impact sstable selection criteria 
> during compaction)
> - ReadRepair may put old data in new sstables, but it will have little impact 
> other than slightly reduced efficiency (sstables will cover a wider range, 
> but the old timestamps will not impact sstable selection criteria during 
> compaction)
> - Small, old sstables resulting from streams of any kind will be swiftly and 
> aggressively compacted with the other sstables matching their similar 
> maxTimestamp, without causing sstables in neighboring windows to grow in size.
> - The configuration options are explicit and straightforward - the tuning 
> parameters leave little room for error. The window is set in common, easily 
> understandable terms such as “12 hours”, “1 Day”, “30 days”. The 
> minute/hour/day options are granular enough for users keeping data for hours, 
> and users keeping data for years. 
> - There is no explicitly configurable max sstable age, though sstables will 
> naturally stop compacting once new data is written in that window. 
> - Streaming operations can create sstables with old timestamps, and they'll 
> naturally be joined together with sstables in the same time bucket. This is 
> true for bootstrap/repair/sstableloader/removenode. 
> - It remains true that if old data and new data is written into the memtable 
> at the same time, the resulting sstables will be treated as if they were new 
> sstables, however, that no longer negatively impacts the compaction 
> strategy’s selection criteria for older windows. 
> Patch provided for : 
> - 2.1: https://github.com/jeffjirsa/cassandra/commits/twcs-2.1 
> - 2.2: https://github.com/jeffjirsa/cassandra/commits/twcs-2.2
> - trunk (post-8099):  https://github.com/jeffjirsa/cassandra/commits/twcs 
> Rebased, force-pushed July 18, with bug fixes for estimated pending 
> compactions and potential starvation if more than min_threshold tables 
> existed in current window but STCS did not consider them viable candidates
> Rebased, force-pushed Aug 20 to bring in relevant logic from CASSANDRA-9882



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-9666) Provide an alternative to DTCS

2016-05-24 Thread Jeff Jirsa (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Jirsa updated CASSANDRA-9666:
--
Fix Version/s: (was: 2.2.x)
   (was: 2.1.x)
   3.x

> Provide an alternative to DTCS
> --
>
> Key: CASSANDRA-9666
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9666
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Jeff Jirsa
>Assignee: Jeff Jirsa
> Fix For: 3.x
>
> Attachments: compactomatic.py, dashboard-DTCS_to_TWCS.png, 
> dtcs-twcs-io.png, dtcs-twcs-load.png
>
>
> DTCS is great for time series data, but it comes with caveats that make it 
> difficult to use in production (typical operator behaviors such as bootstrap, 
> removenode, and repair have MAJOR caveats as they relate to 
> max_sstable_age_days, and hints/read repair break the selection algorithm).
> I'm proposing an alternative, TimeWindowCompactionStrategy, that sacrifices 
> the tiered nature of DTCS in order to address some of DTCS' operational 
> shortcomings. I believe it is necessary to propose an alternative rather than 
> simply adjusting DTCS, because it fundamentally removes the tiered nature in 
> order to remove the parameter max_sstable_age_days - the result is very very 
> different, even if it is heavily inspired by DTCS. 
> Specifically, rather than creating a number of windows of ever increasing 
> sizes, this strategy allows an operator to choose the window size, compact 
> with STCS within the first window of that size, and aggressive compact down 
> to a single sstable once that window is no longer current. The window size is 
> a combination of unit (minutes, hours, days) and size (1, etc), such that an 
> operator can expect all data using a block of that size to be compacted 
> together (that is, if your unit is hours, and size is 6, you will create 
> roughly 4 sstables per day, each one containing roughly 6 hours of data). 
> The result addresses a number of the problems with 
> DateTieredCompactionStrategy:
> - At the present time, DTCS’s first window is compacted using an unusual 
> selection criteria, which prefers files with earlier timestamps, but ignores 
> sizes. In TimeWindowCompactionStrategy, the first window data will be 
> compacted with the well tested, fast, reliable STCS. All STCS options can be 
> passed to TimeWindowCompactionStrategy to configure the first window’s 
> compaction behavior.
> - HintedHandoff may put old data in new sstables, but it will have little 
> impact other than slightly reduced efficiency (sstables will cover a wider 
> range, but the old timestamps will not impact sstable selection criteria 
> during compaction)
> - ReadRepair may put old data in new sstables, but it will have little impact 
> other than slightly reduced efficiency (sstables will cover a wider range, 
> but the old timestamps will not impact sstable selection criteria during 
> compaction)
> - Small, old sstables resulting from streams of any kind will be swiftly and 
> aggressively compacted with the other sstables matching their similar 
> maxTimestamp, without causing sstables in neighboring windows to grow in size.
> - The configuration options are explicit and straightforward - the tuning 
> parameters leave little room for error. The window is set in common, easily 
> understandable terms such as “12 hours”, “1 Day”, “30 days”. The 
> minute/hour/day options are granular enough for users keeping data for hours, 
> and users keeping data for years. 
> - There is no explicitly configurable max sstable age, though sstables will 
> naturally stop compacting once new data is written in that window. 
> - Streaming operations can create sstables with old timestamps, and they'll 
> naturally be joined together with sstables in the same time bucket. This is 
> true for bootstrap/repair/sstableloader/removenode. 
> - It remains true that if old data and new data is written into the memtable 
> at the same time, the resulting sstables will be treated as if they were new 
> sstables, however, that no longer negatively impacts the compaction 
> strategy’s selection criteria for older windows. 
> Patch provided for : 
> - 2.1: https://github.com/jeffjirsa/cassandra/commits/twcs-2.1 
> - 2.2: https://github.com/jeffjirsa/cassandra/commits/twcs-2.2
> - trunk (post-8099):  https://github.com/jeffjirsa/cassandra/commits/twcs 
> Rebased, force-pushed July 18, with bug fixes for estimated pending 
> compactions and potential starvation if more than min_threshold tables 
> existed in current window but STCS did not consider them viable candidates
> Rebased, force-pushed Aug 20 to bring in relevant logic from CASSANDRA-9882



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-9666) Provide an alternative to DTCS

2016-05-24 Thread Jeff Jirsa (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Jirsa updated CASSANDRA-9666:
--
Component/s: Compaction

> Provide an alternative to DTCS
> --
>
> Key: CASSANDRA-9666
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9666
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Jeff Jirsa
>Assignee: Jeff Jirsa
> Fix For: 3.x
>
> Attachments: compactomatic.py, dashboard-DTCS_to_TWCS.png, 
> dtcs-twcs-io.png, dtcs-twcs-load.png
>
>
> DTCS is great for time series data, but it comes with caveats that make it 
> difficult to use in production (typical operator behaviors such as bootstrap, 
> removenode, and repair have MAJOR caveats as they relate to 
> max_sstable_age_days, and hints/read repair break the selection algorithm).
> I'm proposing an alternative, TimeWindowCompactionStrategy, that sacrifices 
> the tiered nature of DTCS in order to address some of DTCS' operational 
> shortcomings. I believe it is necessary to propose an alternative rather than 
> simply adjusting DTCS, because it fundamentally removes the tiered nature in 
> order to remove the parameter max_sstable_age_days - the result is very very 
> different, even if it is heavily inspired by DTCS. 
> Specifically, rather than creating a number of windows of ever increasing 
> sizes, this strategy allows an operator to choose the window size, compact 
> with STCS within the first window of that size, and aggressive compact down 
> to a single sstable once that window is no longer current. The window size is 
> a combination of unit (minutes, hours, days) and size (1, etc), such that an 
> operator can expect all data using a block of that size to be compacted 
> together (that is, if your unit is hours, and size is 6, you will create 
> roughly 4 sstables per day, each one containing roughly 6 hours of data). 
> The result addresses a number of the problems with 
> DateTieredCompactionStrategy:
> - At the present time, DTCS’s first window is compacted using an unusual 
> selection criteria, which prefers files with earlier timestamps, but ignores 
> sizes. In TimeWindowCompactionStrategy, the first window data will be 
> compacted with the well tested, fast, reliable STCS. All STCS options can be 
> passed to TimeWindowCompactionStrategy to configure the first window’s 
> compaction behavior.
> - HintedHandoff may put old data in new sstables, but it will have little 
> impact other than slightly reduced efficiency (sstables will cover a wider 
> range, but the old timestamps will not impact sstable selection criteria 
> during compaction)
> - ReadRepair may put old data in new sstables, but it will have little impact 
> other than slightly reduced efficiency (sstables will cover a wider range, 
> but the old timestamps will not impact sstable selection criteria during 
> compaction)
> - Small, old sstables resulting from streams of any kind will be swiftly and 
> aggressively compacted with the other sstables matching their similar 
> maxTimestamp, without causing sstables in neighboring windows to grow in size.
> - The configuration options are explicit and straightforward - the tuning 
> parameters leave little room for error. The window is set in common, easily 
> understandable terms such as “12 hours”, “1 Day”, “30 days”. The 
> minute/hour/day options are granular enough for users keeping data for hours, 
> and users keeping data for years. 
> - There is no explicitly configurable max sstable age, though sstables will 
> naturally stop compacting once new data is written in that window. 
> - Streaming operations can create sstables with old timestamps, and they'll 
> naturally be joined together with sstables in the same time bucket. This is 
> true for bootstrap/repair/sstableloader/removenode. 
> - It remains true that if old data and new data is written into the memtable 
> at the same time, the resulting sstables will be treated as if they were new 
> sstables, however, that no longer negatively impacts the compaction 
> strategy’s selection criteria for older windows. 
> Patch provided for : 
> - 2.1: https://github.com/jeffjirsa/cassandra/commits/twcs-2.1 
> - 2.2: https://github.com/jeffjirsa/cassandra/commits/twcs-2.2
> - trunk (post-8099):  https://github.com/jeffjirsa/cassandra/commits/twcs 
> Rebased, force-pushed July 18, with bug fixes for estimated pending 
> compactions and potential starvation if more than min_threshold tables 
> existed in current window but STCS did not consider them viable candidates
> Rebased, force-pushed Aug 20 to bring in relevant logic from CASSANDRA-9882



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-9666) Provide an alternative to DTCS

2016-05-11 Thread Jonathan Ellis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-9666:
--
Attachment: compactomatic.py

I ended up writing a compaction simulator (specific to TS data) to compare TWCS 
and DTCS (attached).  Doing dozens of experiments with real data takes much too 
long.  (I did verify the simulation against actual data so I'm reasonably 
confident that it's producing valid results.)

Here are my findings:

# On writes, if compaction can keep up with writes, and if DTCS is "properly" 
configured (i.e. with max window an appropriate multiple of the base to avoid 
extra "partial" tiers), DTCS does about 10% less writes than TWCS.
# BUT, if compaction gets behind (which is common), and if write volume is low 
enough that TWCS doesn't have to do multiple passes in old windows due to 
max_threshold, DTCS can do much much worse.  This is because TWCS will always 
do the equivalent of a major compaction in inactive windows, while DTCS does 
STCS in all windows.
# The flip side of this behavior is that if repair scatters tiny sstables into 
inactive windows, TWCS will incur large write amplification merging those as 
well.  DTCS will not due to the size tiering.
# TWCS substantially outperforms DTCS on read amplification (number of sstables 
touched) because there is always a single sstable in inactive TWCS windows, 
while DTCS has multiple tiered files.  How much depends on the number of tiers 
generated, but typically DTCS will do 2x to 3x as many reads.

*My recommendation*

DTCS's explicit tiers are not worth the extra complexity.  The TWCS approach of 
doing STCS within the active window, and "major" compaction on inactive, 
provides excellent performance without manual tuning.

However, TWCS's write amplification on repair (point 3 above) is a potential 
problem.  Is there a way to get the best of both worlds?  Should we just brute 
force it and check inactive windows against a system table, where we record if 
we've done our initial major compaction?  If so, then further compactions due 
to repair et al should be done with STCS.

If we solve this problem (and finish CASSANDRA-10496) then I think to a large 
degree we won't need to worry nearly as much about users disabling read repair, 
only repairing during active windows, etc.

> Provide an alternative to DTCS
> --
>
> Key: CASSANDRA-9666
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9666
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Jeff Jirsa
>Assignee: Jeff Jirsa
> Fix For: 2.1.x, 2.2.x
>
> Attachments: compactomatic.py, dashboard-DTCS_to_TWCS.png, 
> dtcs-twcs-io.png, dtcs-twcs-load.png
>
>
> DTCS is great for time series data, but it comes with caveats that make it 
> difficult to use in production (typical operator behaviors such as bootstrap, 
> removenode, and repair have MAJOR caveats as they relate to 
> max_sstable_age_days, and hints/read repair break the selection algorithm).
> I'm proposing an alternative, TimeWindowCompactionStrategy, that sacrifices 
> the tiered nature of DTCS in order to address some of DTCS' operational 
> shortcomings. I believe it is necessary to propose an alternative rather than 
> simply adjusting DTCS, because it fundamentally removes the tiered nature in 
> order to remove the parameter max_sstable_age_days - the result is very very 
> different, even if it is heavily inspired by DTCS. 
> Specifically, rather than creating a number of windows of ever increasing 
> sizes, this strategy allows an operator to choose the window size, compact 
> with STCS within the first window of that size, and aggressive compact down 
> to a single sstable once that window is no longer current. The window size is 
> a combination of unit (minutes, hours, days) and size (1, etc), such that an 
> operator can expect all data using a block of that size to be compacted 
> together (that is, if your unit is hours, and size is 6, you will create 
> roughly 4 sstables per day, each one containing roughly 6 hours of data). 
> The result addresses a number of the problems with 
> DateTieredCompactionStrategy:
> - At the present time, DTCS’s first window is compacted using an unusual 
> selection criteria, which prefers files with earlier timestamps, but ignores 
> sizes. In TimeWindowCompactionStrategy, the first window data will be 
> compacted with the well tested, fast, reliable STCS. All STCS options can be 
> passed to TimeWindowCompactionStrategy to configure the first window’s 
> compaction behavior.
> - HintedHandoff may put old data in new sstables, but it will have little 
> impact other than slightly reduced efficiency (sstables will cover a wider 
> range, but the old timestamps will not impact sstable selection criteria 
> during compaction)
> - 

[jira] [Updated] (CASSANDRA-9666) Provide an alternative to DTCS

2016-04-13 Thread Lucas de Souza Santos (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lucas de Souza Santos updated CASSANDRA-9666:
-
Attachment: dashboard-DTCS_to_TWCS.png

> Provide an alternative to DTCS
> --
>
> Key: CASSANDRA-9666
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9666
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Jeff Jirsa
>Assignee: Jeff Jirsa
> Fix For: 2.1.x, 2.2.x
>
> Attachments: dashboard-DTCS_to_TWCS.png, dtcs-twcs-io.png, 
> dtcs-twcs-load.png
>
>
> DTCS is great for time series data, but it comes with caveats that make it 
> difficult to use in production (typical operator behaviors such as bootstrap, 
> removenode, and repair have MAJOR caveats as they relate to 
> max_sstable_age_days, and hints/read repair break the selection algorithm).
> I'm proposing an alternative, TimeWindowCompactionStrategy, that sacrifices 
> the tiered nature of DTCS in order to address some of DTCS' operational 
> shortcomings. I believe it is necessary to propose an alternative rather than 
> simply adjusting DTCS, because it fundamentally removes the tiered nature in 
> order to remove the parameter max_sstable_age_days - the result is very very 
> different, even if it is heavily inspired by DTCS. 
> Specifically, rather than creating a number of windows of ever increasing 
> sizes, this strategy allows an operator to choose the window size, compact 
> with STCS within the first window of that size, and aggressive compact down 
> to a single sstable once that window is no longer current. The window size is 
> a combination of unit (minutes, hours, days) and size (1, etc), such that an 
> operator can expect all data using a block of that size to be compacted 
> together (that is, if your unit is hours, and size is 6, you will create 
> roughly 4 sstables per day, each one containing roughly 6 hours of data). 
> The result addresses a number of the problems with 
> DateTieredCompactionStrategy:
> - At the present time, DTCS’s first window is compacted using an unusual 
> selection criteria, which prefers files with earlier timestamps, but ignores 
> sizes. In TimeWindowCompactionStrategy, the first window data will be 
> compacted with the well tested, fast, reliable STCS. All STCS options can be 
> passed to TimeWindowCompactionStrategy to configure the first window’s 
> compaction behavior.
> - HintedHandoff may put old data in new sstables, but it will have little 
> impact other than slightly reduced efficiency (sstables will cover a wider 
> range, but the old timestamps will not impact sstable selection criteria 
> during compaction)
> - ReadRepair may put old data in new sstables, but it will have little impact 
> other than slightly reduced efficiency (sstables will cover a wider range, 
> but the old timestamps will not impact sstable selection criteria during 
> compaction)
> - Small, old sstables resulting from streams of any kind will be swiftly and 
> aggressively compacted with the other sstables matching their similar 
> maxTimestamp, without causing sstables in neighboring windows to grow in size.
> - The configuration options are explicit and straightforward - the tuning 
> parameters leave little room for error. The window is set in common, easily 
> understandable terms such as “12 hours”, “1 Day”, “30 days”. The 
> minute/hour/day options are granular enough for users keeping data for hours, 
> and users keeping data for years. 
> - There is no explicitly configurable max sstable age, though sstables will 
> naturally stop compacting once new data is written in that window. 
> - Streaming operations can create sstables with old timestamps, and they'll 
> naturally be joined together with sstables in the same time bucket. This is 
> true for bootstrap/repair/sstableloader/removenode. 
> - It remains true that if old data and new data is written into the memtable 
> at the same time, the resulting sstables will be treated as if they were new 
> sstables, however, that no longer negatively impacts the compaction 
> strategy’s selection criteria for older windows. 
> Patch provided for : 
> - 2.1: https://github.com/jeffjirsa/cassandra/commits/twcs-2.1 
> - 2.2: https://github.com/jeffjirsa/cassandra/commits/twcs-2.2
> - trunk (post-8099):  https://github.com/jeffjirsa/cassandra/commits/twcs 
> Rebased, force-pushed July 18, with bug fixes for estimated pending 
> compactions and potential starvation if more than min_threshold tables 
> existed in current window but STCS did not consider them viable candidates
> Rebased, force-pushed Aug 20 to bring in relevant logic from CASSANDRA-9882



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-9666) Provide an alternative to DTCS

2016-01-20 Thread Jeff Jirsa (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Jirsa updated CASSANDRA-9666:
--
Attachment: dtcs-twcs-io.png
dtcs-twcs-load.png

One last data point from a real world cluster. The two screenshots attached 
(dtcs-twcs-load and dtcs-twcs-io) show difference in IO and CPU on a real world 
cluster as it transitioned from DTCS -> TWCS with no other changes/tuning. This 
cluster is running a stable version of DSE/Cassandra, so it does not have 
changes to DTCS that were not backported. 

It seems that there's little desire to integrate this work upstream, given that 
DTCS already exists and compaction is pluggable. Rather than try to keep 
rebasing for no reason, I'm fine with this going to won't-fix, and users who 
prefer twcs (because it's easier to reason about, or because it doesn't 
constantly recompact, or for whatever reason) can just build it on their own 
from my github repo. 

> Provide an alternative to DTCS
> --
>
> Key: CASSANDRA-9666
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9666
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Jeff Jirsa
>Assignee: Jeff Jirsa
> Fix For: 2.1.x, 2.2.x
>
> Attachments: dtcs-twcs-io.png, dtcs-twcs-load.png
>
>
> DTCS is great for time series data, but it comes with caveats that make it 
> difficult to use in production (typical operator behaviors such as bootstrap, 
> removenode, and repair have MAJOR caveats as they relate to 
> max_sstable_age_days, and hints/read repair break the selection algorithm).
> I'm proposing an alternative, TimeWindowCompactionStrategy, that sacrifices 
> the tiered nature of DTCS in order to address some of DTCS' operational 
> shortcomings. I believe it is necessary to propose an alternative rather than 
> simply adjusting DTCS, because it fundamentally removes the tiered nature in 
> order to remove the parameter max_sstable_age_days - the result is very very 
> different, even if it is heavily inspired by DTCS. 
> Specifically, rather than creating a number of windows of ever increasing 
> sizes, this strategy allows an operator to choose the window size, compact 
> with STCS within the first window of that size, and aggressive compact down 
> to a single sstable once that window is no longer current. The window size is 
> a combination of unit (minutes, hours, days) and size (1, etc), such that an 
> operator can expect all data using a block of that size to be compacted 
> together (that is, if your unit is hours, and size is 6, you will create 
> roughly 4 sstables per day, each one containing roughly 6 hours of data). 
> The result addresses a number of the problems with 
> DateTieredCompactionStrategy:
> - At the present time, DTCS’s first window is compacted using an unusual 
> selection criteria, which prefers files with earlier timestamps, but ignores 
> sizes. In TimeWindowCompactionStrategy, the first window data will be 
> compacted with the well tested, fast, reliable STCS. All STCS options can be 
> passed to TimeWindowCompactionStrategy to configure the first window’s 
> compaction behavior.
> - HintedHandoff may put old data in new sstables, but it will have little 
> impact other than slightly reduced efficiency (sstables will cover a wider 
> range, but the old timestamps will not impact sstable selection criteria 
> during compaction)
> - ReadRepair may put old data in new sstables, but it will have little impact 
> other than slightly reduced efficiency (sstables will cover a wider range, 
> but the old timestamps will not impact sstable selection criteria during 
> compaction)
> - Small, old sstables resulting from streams of any kind will be swiftly and 
> aggressively compacted with the other sstables matching their similar 
> maxTimestamp, without causing sstables in neighboring windows to grow in size.
> - The configuration options are explicit and straightforward - the tuning 
> parameters leave little room for error. The window is set in common, easily 
> understandable terms such as “12 hours”, “1 Day”, “30 days”. The 
> minute/hour/day options are granular enough for users keeping data for hours, 
> and users keeping data for years. 
> - There is no explicitly configurable max sstable age, though sstables will 
> naturally stop compacting once new data is written in that window. 
> - Streaming operations can create sstables with old timestamps, and they'll 
> naturally be joined together with sstables in the same time bucket. This is 
> true for bootstrap/repair/sstableloader/removenode. 
> - It remains true that if old data and new data is written into the memtable 
> at the same time, the resulting sstables will be treated as if they were new 
> sstables, however, that no longer negatively impacts the compaction 
> 

[jira] [Updated] (CASSANDRA-9666) Provide an alternative to DTCS

2015-08-20 Thread Jeff Jirsa (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Jirsa updated CASSANDRA-9666:
--
Description: 
DTCS is great for time series data, but it comes with caveats that make it 
difficult to use in production (typical operator behaviors such as bootstrap, 
removenode, and repair have MAJOR caveats as they relate to 
max_sstable_age_days, and hints/read repair break the selection algorithm).

I'm proposing an alternative, TimeWindowCompactionStrategy, that sacrifices the 
tiered nature of DTCS in order to address some of DTCS' operational 
shortcomings. I believe it is necessary to propose an alternative rather than 
simply adjusting DTCS, because it fundamentally removes the tiered nature in 
order to remove the parameter max_sstable_age_days - the result is very very 
different, even if it is heavily inspired by DTCS. 

Specifically, rather than creating a number of windows of ever increasing 
sizes, this strategy allows an operator to choose the window size, compact with 
STCS within the first window of that size, and aggressive compact down to a 
single sstable once that window is no longer current. The window size is a 
combination of unit (minutes, hours, days) and size (1, etc), such that an 
operator can expect all data using a block of that size to be compacted 
together (that is, if your unit is hours, and size is 6, you will create 
roughly 4 sstables per day, each one containing roughly 6 hours of data). 

The result addresses a number of the problems with DateTieredCompactionStrategy:

- At the present time, DTCS’s first window is compacted using an unusual 
selection criteria, which prefers files with earlier timestamps, but ignores 
sizes. In TimeWindowCompactionStrategy, the first window data will be compacted 
with the well tested, fast, reliable STCS. All STCS options can be passed to 
TimeWindowCompactionStrategy to configure the first window’s compaction 
behavior.

- HintedHandoff may put old data in new sstables, but it will have little 
impact other than slightly reduced efficiency (sstables will cover a wider 
range, but the old timestamps will not impact sstable selection criteria during 
compaction)

- ReadRepair may put old data in new sstables, but it will have little impact 
other than slightly reduced efficiency (sstables will cover a wider range, but 
the old timestamps will not impact sstable selection criteria during compaction)

- Small, old sstables resulting from streams of any kind will be swiftly and 
aggressively compacted with the other sstables matching their similar 
maxTimestamp, without causing sstables in neighboring windows to grow in size.

- The configuration options are explicit and straightforward - the tuning 
parameters leave little room for error. The window is set in common, easily 
understandable terms such as “12 hours”, “1 Day”, “30 days”. The 
minute/hour/day options are granular enough for users keeping data for hours, 
and users keeping data for years. 

- There is no explicitly configurable max sstable age, though sstables will 
naturally stop compacting once new data is written in that window. 

- Streaming operations can create sstables with old timestamps, and they'll 
naturally be joined together with sstables in the same time bucket. This is 
true for bootstrap/repair/sstableloader/removenode. 

- It remains true that if old data and new data is written into the memtable at 
the same time, the resulting sstables will be treated as if they were new 
sstables, however, that no longer negatively impacts the compaction strategy’s 
selection criteria for older windows. 

Patch provided for : 

- 2.1: https://github.com/jeffjirsa/cassandra/commits/twcs-2.1 
- 2.2: https://github.com/jeffjirsa/cassandra/commits/twcs-2.2
- trunk (post-8099):  https://github.com/jeffjirsa/cassandra/commits/twcs 

Rebased, force-pushed July 18, with bug fixes for estimated pending compactions 
and potential starvation if more than min_threshold tables existed in current 
window but STCS did not consider them viable candidates
Rebased, force-pushed Aug 20 to bring in relevant logic from CASSANDRA-9882

  was:
DTCS is great for time series data, but it comes with caveats that make it 
difficult to use in production (typical operator behaviors such as bootstrap, 
removenode, and repair have MAJOR caveats as they relate to 
max_sstable_age_days, and hints/read repair break the selection algorithm).

I'm proposing an alternative, TimeWindowCompactionStrategy, that sacrifices the 
tiered nature of DTCS in order to address some of DTCS' operational 
shortcomings. I believe it is necessary to propose an alternative rather than 
simply adjusting DTCS, because it fundamentally removes the tiered nature in 
order to remove the parameter max_sstable_age_days - the result is very very 
different, even if it is heavily inspired by DTCS. 

Specifically, rather than 

[jira] [Updated] (CASSANDRA-9666) Provide an alternative to DTCS

2015-07-19 Thread Jeff Jirsa (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Jirsa updated CASSANDRA-9666:
--
Description: 
DTCS is great for time series data, but it comes with caveats that make it 
difficult to use in production (typical operator behaviors such as bootstrap, 
removenode, and repair have MAJOR caveats as they relate to 
max_sstable_age_days, and hints/read repair break the selection algorithm).

I'm proposing an alternative, TimeWindowCompactionStrategy, that sacrifices the 
tiered nature of DTCS in order to address some of DTCS' operational 
shortcomings. I believe it is necessary to propose an alternative rather than 
simply adjusting DTCS, because it fundamentally removes the tiered nature in 
order to remove the parameter max_sstable_age_days - the result is very very 
different, even if it is heavily inspired by DTCS. 

Specifically, rather than creating a number of windows of ever increasing 
sizes, this strategy allows an operator to choose the window size, compact with 
STCS within the first window of that size, and aggressive compact down to a 
single sstable once that window is no longer current. The window size is a 
combination of unit (minutes, hours, days) and size (1, etc), such that an 
operator can expect all data using a block of that size to be compacted 
together (that is, if your unit is hours, and size is 6, you will create 
roughly 4 sstables per day, each one containing roughly 6 hours of data). 

The result addresses a number of the problems with DateTieredCompactionStrategy:

- At the present time, DTCS’s first window is compacted using an unusual 
selection criteria, which prefers files with earlier timestamps, but ignores 
sizes. In TimeWindowCompactionStrategy, the first window data will be compacted 
with the well tested, fast, reliable STCS. All STCS options can be passed to 
TimeWindowCompactionStrategy to configure the first window’s compaction 
behavior.

- HintedHandoff may put old data in new sstables, but it will have little 
impact other than slightly reduced efficiency (sstables will cover a wider 
range, but the old timestamps will not impact sstable selection criteria during 
compaction)

- ReadRepair may put old data in new sstables, but it will have little impact 
other than slightly reduced efficiency (sstables will cover a wider range, but 
the old timestamps will not impact sstable selection criteria during compaction)

- Small, old sstables resulting from streams of any kind will be swiftly and 
aggressively compacted with the other sstables matching their similar 
maxTimestamp, without causing sstables in neighboring windows to grow in size.

- The configuration options are explicit and straightforward - the tuning 
parameters leave little room for error. The window is set in common, easily 
understandable terms such as “12 hours”, “1 Day”, “30 days”. The 
minute/hour/day options are granular enough for users keeping data for hours, 
and users keeping data for years. 

- There is no explicitly configurable max sstable age, though sstables will 
naturally stop compacting once new data is written in that window. 

- Streaming operations can create sstables with old timestamps, and they'll 
naturally be joined together with sstables in the same time bucket. This is 
true for bootstrap/repair/sstableloader/removenode. 

- It remains true that if old data and new data is written into the memtable at 
the same time, the resulting sstables will be treated as if they were new 
sstables, however, that no longer negatively impacts the compaction strategy’s 
selection criteria for older windows. 

Patch provided for (Rebased July 18, with bug fixes for estimated pending 
compactions and potential starvation if more than min_threshold tables existed 
in current window but STCS did not consider them viable candidates): 

- 2.1: https://github.com/jeffjirsa/cassandra/commits/twcs-2.1 
- 2.2: https://github.com/jeffjirsa/cassandra/commits/twcs-2.2
- trunk (post-8099):  https://github.com/jeffjirsa/cassandra/commits/twcs 



  was:
DTCS is great for time series data, but it comes with caveats that make it 
difficult to use in production (typical operator behaviors such as bootstrap, 
removenode, and repair have MAJOR caveats as they relate to 
max_sstable_age_days, and hints/read repair break the selection algorithm).

I'm proposing an alternative, TimeWindowCompactionStrategy, that sacrifices the 
tiered nature of DTCS in order to address some of DTCS' operational 
shortcomings. I believe it is necessary to propose an alternative rather than 
simply adjusting DTCS, because it fundamentally removes the tiered nature in 
order to remove the parameter max_sstable_age_days - the result is very very 
different, even if it is heavily inspired by DTCS. 

Specifically, rather than creating a number of windows of ever increasing 
sizes, this strategy allows an operator 

[jira] [Updated] (CASSANDRA-9666) Provide an alternative to DTCS

2015-06-28 Thread Philip Thompson (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philip Thompson updated CASSANDRA-9666:
---
Reviewer: Marcus Eriksson

 Provide an alternative to DTCS
 --

 Key: CASSANDRA-9666
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9666
 Project: Cassandra
  Issue Type: Improvement
Reporter: Jeff Jirsa
Assignee: Jeff Jirsa
 Fix For: 2.1.x, 2.2.x


 DTCS is great for time series data, but it comes with caveats that make it 
 difficult to use in production (typical operator behaviors such as bootstrap, 
 removenode, and repair have MAJOR caveats as they relate to 
 max_sstable_age_days, and hints/read repair break the selection algorithm).
 I'm proposing an alternative, TimeWindowCompactionStrategy, that sacrifices 
 the tiered nature of DTCS in order to address some of DTCS' operational 
 shortcomings. I believe it is necessary to propose an alternative rather than 
 simply adjusting DTCS, because it fundamentally removes the tiered nature in 
 order to remove the parameter max_sstable_age_days - the result is very very 
 different, even if it is heavily inspired by DTCS. 
 Specifically, rather than creating a number of windows of ever increasing 
 sizes, this strategy allows an operator to choose the window size, compact 
 with STCS within the first window of that size, and aggressive compact down 
 to a single sstable once that window is no longer current. The window size is 
 a combination of unit (minutes, hours, days) and size (1, etc), such that an 
 operator can expect all data using a block of that size to be compacted 
 together (that is, if your unit is hours, and size is 6, you will create 
 roughly 4 sstables per day, each one containing roughly 6 hours of data). 
 The result addresses a number of the problems with 
 DateTieredCompactionStrategy:
 - At the present time, DTCS’s first window is compacted using an unusual 
 selection criteria, which prefers files with earlier timestamps, but ignores 
 sizes. In TimeWindowCompactionStrategy, the first window data will be 
 compacted with the well tested, fast, reliable STCS. All STCS options can be 
 passed to TimeWindowCompactionStrategy to configure the first window’s 
 compaction behavior.
 - HintedHandoff may put old data in new sstables, but it will have little 
 impact other than slightly reduced efficiency (sstables will cover a wider 
 range, but the old timestamps will not impact sstable selection criteria 
 during compaction)
 - ReadRepair may put old data in new sstables, but it will have little impact 
 other than slightly reduced efficiency (sstables will cover a wider range, 
 but the old timestamps will not impact sstable selection criteria during 
 compaction)
 - Small, old sstables resulting from streams of any kind will be swiftly and 
 aggressively compacted with the other sstables matching their similar 
 maxTimestamp, without causing sstables in neighboring windows to grow in size.
 - The configuration options are explicit and straightforward - the tuning 
 parameters leave little room for error. The window is set in common, easily 
 understandable terms such as “12 hours”, “1 Day”, “30 days”. The 
 minute/hour/day options are granular enough for users keeping data for hours, 
 and users keeping data for years. 
 - There is no explicitly configurable max sstable age, though sstables will 
 naturally stop compacting once new data is written in that window. 
 - Streaming operations can create sstables with old timestamps, and they'll 
 naturally be joined together with sstables in the same time bucket. This is 
 true for bootstrap/repair/sstableloader/removenode. 
 - It remains true that if old data and new data is written into the memtable 
 at the same time, the resulting sstables will be treated as if they were new 
 sstables, however, that no longer negatively impacts the compaction 
 strategy’s selection criteria for older windows. 
 Patch provided for both 2.1 ( 
 https://github.com/jeffjirsa/cassandra/commits/twcs-2.1 ) and 2.2 ( 
 https://github.com/jeffjirsa/cassandra/commits/twcs )



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-9666) Provide an alternative to DTCS

2015-06-28 Thread Philip Thompson (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philip Thompson updated CASSANDRA-9666:
---
Assignee: Jeff Jirsa

 Provide an alternative to DTCS
 --

 Key: CASSANDRA-9666
 URL: https://issues.apache.org/jira/browse/CASSANDRA-9666
 Project: Cassandra
  Issue Type: Improvement
Reporter: Jeff Jirsa
Assignee: Jeff Jirsa
 Fix For: 2.1.x, 2.2.x


 DTCS is great for time series data, but it comes with caveats that make it 
 difficult to use in production (typical operator behaviors such as bootstrap, 
 removenode, and repair have MAJOR caveats as they relate to 
 max_sstable_age_days, and hints/read repair break the selection algorithm).
 I'm proposing an alternative, TimeWindowCompactionStrategy, that sacrifices 
 the tiered nature of DTCS in order to address some of DTCS' operational 
 shortcomings. I believe it is necessary to propose an alternative rather than 
 simply adjusting DTCS, because it fundamentally removes the tiered nature in 
 order to remove the parameter max_sstable_age_days - the result is very very 
 different, even if it is heavily inspired by DTCS. 
 Specifically, rather than creating a number of windows of ever increasing 
 sizes, this strategy allows an operator to choose the window size, compact 
 with STCS within the first window of that size, and aggressive compact down 
 to a single sstable once that window is no longer current. The window size is 
 a combination of unit (minutes, hours, days) and size (1, etc), such that an 
 operator can expect all data using a block of that size to be compacted 
 together (that is, if your unit is hours, and size is 6, you will create 
 roughly 4 sstables per day, each one containing roughly 6 hours of data). 
 The result addresses a number of the problems with 
 DateTieredCompactionStrategy:
 - At the present time, DTCS’s first window is compacted using an unusual 
 selection criteria, which prefers files with earlier timestamps, but ignores 
 sizes. In TimeWindowCompactionStrategy, the first window data will be 
 compacted with the well tested, fast, reliable STCS. All STCS options can be 
 passed to TimeWindowCompactionStrategy to configure the first window’s 
 compaction behavior.
 - HintedHandoff may put old data in new sstables, but it will have little 
 impact other than slightly reduced efficiency (sstables will cover a wider 
 range, but the old timestamps will not impact sstable selection criteria 
 during compaction)
 - ReadRepair may put old data in new sstables, but it will have little impact 
 other than slightly reduced efficiency (sstables will cover a wider range, 
 but the old timestamps will not impact sstable selection criteria during 
 compaction)
 - Small, old sstables resulting from streams of any kind will be swiftly and 
 aggressively compacted with the other sstables matching their similar 
 maxTimestamp, without causing sstables in neighboring windows to grow in size.
 - The configuration options are explicit and straightforward - the tuning 
 parameters leave little room for error. The window is set in common, easily 
 understandable terms such as “12 hours”, “1 Day”, “30 days”. The 
 minute/hour/day options are granular enough for users keeping data for hours, 
 and users keeping data for years. 
 - There is no explicitly configurable max sstable age, though sstables will 
 naturally stop compacting once new data is written in that window. 
 - Streaming operations can create sstables with old timestamps, and they'll 
 naturally be joined together with sstables in the same time bucket. This is 
 true for bootstrap/repair/sstableloader/removenode. 
 - It remains true that if old data and new data is written into the memtable 
 at the same time, the resulting sstables will be treated as if they were new 
 sstables, however, that no longer negatively impacts the compaction 
 strategy’s selection criteria for older windows. 
 Patch provided for both 2.1 ( 
 https://github.com/jeffjirsa/cassandra/commits/twcs-2.1 ) and 2.2 ( 
 https://github.com/jeffjirsa/cassandra/commits/twcs )



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-9666) Provide an alternative to DTCS

2015-06-26 Thread Jeff Jirsa (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Jirsa updated CASSANDRA-9666:
--
Description: 
DTCS is great for time series data, but it comes with caveats that make it 
difficult to use in production (typical operator behaviors such as bootstrap, 
removenode, and repair have MAJOR caveats as they relate to 
max_sstable_age_days, and hints/read repair break the selection algorithm).

I'm proposing an alternative, TimeWindowCompactionStrategy, that sacrifices the 
tiered nature of DTCS in order to address some of DTCS' operational 
shortcomings. I believe it is necessary to propose an alternative rather than 
simply adjusting DTCS, because it fundamentally removes the tiered nature in 
order to remove the parameter max_sstable_age_days - the result is very very 
different, even if it is heavily inspired by DTCS. 

Specifically, rather than creating a number of windows of ever increasing 
sizes, this strategy allows an operator to choose the window size, compact with 
STCS within the first window of that size, and aggressive compact down to a 
single sstable once that window is no longer current. The window size is a 
combination of unit (minutes, hours, days) and size (1, etc), such that an 
operator can expect all data using a block of that size to be compacted 
together (that is, if your unit is hours, and size is 6, you will create 
roughly 4 sstables per day, each one containing roughly 6 hours of data). 

The result addresses a number of the problems with DateTieredCompactionStrategy:

- At the present time, DTCS’s first window is compacted using an unusual 
selection criteria, which prefers files with earlier timestamps, but ignores 
sizes. In TimeWindowCompactionStrategy, the first window data will be compacted 
with the well tested, fast, reliable STCS. All STCS options can be passed to 
TimeWindowCompactionStrategy to configure the first window’s compaction 
behavior.

- HintedHandoff may put old data in new sstables, but it will have little 
impact other than slightly reduced efficiency (sstables will cover a wider 
range, but the old timestamps will not impact sstable selection criteria during 
compaction)

- ReadRepair may put old data in new sstables, but it will have little impact 
other than slightly reduced efficiency (sstables will cover a wider range, but 
the old timestamps will not impact sstable selection criteria during compaction)

- Small, old sstables resulting from streams of any kind will be swiftly and 
aggressively compacted with the other sstables matching their similar 
maxTimestamp, without causing sstables in neighboring windows to grow in size.

- The configuration options are explicit and straightforward - the tuning 
parameters leave little room for error. The window is set in common, easily 
understandable terms such as “12 hours”, “1 Day”, “30 days”. The 
minute/hour/day options are granular enough for users keeping data for hours, 
and users keeping data for years. 

- There is no explicitly configurable max sstable age, though sstables will 
naturally stop compacting once new data is written in that window. 

- Streaming operations can create sstables with old timestamps, and they'll 
naturally be joined together with sstables in the same time bucket. This is 
true for bootstrap/repair/sstableloader/removenode. 

- It remains true that if old data and new data is written into the memtable at 
the same time, the resulting sstables will be treated as if they were new 
sstables, however, that no longer negatively impacts the compaction strategy’s 
selection criteria for older windows. 

Patch provided for both 2.1 ( 
https://github.com/jeffjirsa/cassandra/commits/twcs-2.1 ) and 2.2 ( 
https://github.com/jeffjirsa/cassandra/commits/twcs )



  was:
DTCS is great for time series data, but it comes with caveats that make it 
difficult to use in production (typical operator behaviors such as bootstrap, 
removenode, and repair have MAJOR caveats as they relate to 
max_sstable_age_days, and hints/read repair break the selection algorithm).

I'm proposing an alternative, TimeWindowCompactionStrategy, that sacrifices the 
tiered nature of DTCS in order to address some of DTCS' operational 
shortcomings. I believe it is necessary to propose an alternative rather than 
simply adjusting DTCS, because it fundamentally removes the tiered nature in 
order to remove the parameter max_sstable_age_days - the result is very very 
different, even if it is heavily inspired by DTCS. 

Specifically, rather than creating a number of windows of ever increasing 
sizes, this strategy allows an operator to choose the window size, compact with 
STCS within the first window of that size, and aggressive compact down to a 
single sstable once that window is no longer current. The window size is a 
combination of unit (minutes, hours, days) and size (1, etc), such that an