[ 
https://issues.apache.org/jira/browse/CASSANDRA-8371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14237065#comment-14237065
 ] 

Jonathan Shook commented on CASSANDRA-8371:
-------------------------------------------

I tend to agree with Tupshin on the first point, which is to say that an 
occasional side-effect of a needed repair should be small compared to the 
over-arching benefit of having a (tunably) lower steady-state compaction load. 
My rationale, in detail is below. If I have made a mistake somewhere in this 
explanation, please correct me. 

It is true that the boundary increases geometrically, but not necessarily true 
that this means compaction load will be lower as the windows get larger. There 
is a distinction for the most recent intervals simply because that is where 
memtable flushing and DTCS meet up, with the expected variance in sstable 
sizes. I'll assume the implications of this are obvious and only focus for now 
on later compactions.

If we had ideal scheduling of later compactions, each sstable would be 
coalesced exactly once per interval size. This isn't what we will expect to see 
as a rule, but we have to start somewhere for a reasonable estimate on the 
bounds. This means that our average compaction load would tend towards a 
constant over time for each window, and that the average compaction load for 
all active interval sizes would stack linearly depending on how many windows 
were accounted for. This means that the compaction load is super-linear over 
time in the case of no max age.  Even though the stacking effect does slow down 
over time, it's merely a slowing of increased load, not the base load itself.

In contrast, given a max age and an average ingestion rate, the average 
steady-state compaction load increases as each larger interval becomes active, 
but levels out at a maximum. If the max age is low enough, then the effect can 
be significant. Considering that the load stacking effect occurs more quickly 
in recent time but less quickly as time progresses, the adjustment of max age 
closer to now() has the most visible effect. In other words, a max adjustment 
which deactivates compaction at the 4th smallest interval size will have a less 
obvious effect that one that deactivates the 3rd or 2nd.

Reducing the steady-state compaction load has significant advantages across the 
board in a well-balanced system. Testing can easily show the correlation 
between higher average compaction load and lower op rates and worsening latency 
spikes.

Requiring that the max be higher than the time it takes for a scheduled repair 
cycle would rule out these types of adjustments. As well, the boundary between 
those two settings is pretty fuzzy, considering that most automated repair 
schedules take a week or more.

There are also remedies, if you see that repairs are significantly affecting 
your larger intervals. If you want want to have it be perfectly compacted, 
(probably not that important, in all honestly) simply adjust the max age, let 
DTCS recompact the higher intervals, and then adjust it back, or not. If I were 
having a significant amount of data being repaired on a routine basis, I'd 
probably be scaling or tuning the system at that point, anyway. Repairs that 
have to stream enough data to really become a problem for larger intervals 
should be considered a bad thing-- a sign that there are other pressures in the 
system that need to be addressed. However, a limited amount of data being 
repaired, as in a healthy cluster be handled quite well enough by IntervalTree, 
BloomFilter and friends.

I'm not advocating specifically for a  low default max, but I did want to 
explain the rationale for not ruling it out as a valid choice in certain cases.








> DateTieredCompactionStrategy is always compacting 
> --------------------------------------------------
>
>                 Key: CASSANDRA-8371
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8371
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: mck
>            Assignee: Björn Hegerfors
>              Labels: compaction, performance
>         Attachments: java_gc_counts_rate-month.png, 
> read-latency-recommenders-adview.png, read-latency.png, 
> sstables-recommenders-adviews.png, sstables.png, vg2_iad-month.png
>
>
> Running 2.0.11 and having switched a table to 
> [DTCS|https://issues.apache.org/jira/browse/CASSANDRA-6602] we've seen that 
> disk IO and gc count increase, along with the number of reads happening in 
> the "compaction" hump of cfhistograms.
> Data, and generally performance, looks good, but compactions are always 
> happening, and pending compactions are building up.
> The schema for this is 
> {code}CREATE TABLE search (
>   loginid text,
>   searchid timeuuid,
>   description text,
>   searchkey text,
>   searchurl text,
>   PRIMARY KEY ((loginid), searchid)
> );{code}
> We're sitting on about 82G (per replica) across 6 nodes in 4 DCs.
> CQL executed against this keyspace, and traffic patterns, can be seen in 
> slides 7+8 of https://prezi.com/b9-aj6p2esft/
> Attached are sstables-per-read and read-latency graphs from cfhistograms, and 
> screenshots of our munin graphs as we have gone from STCS, to LCS (week ~44), 
> to DTCS (week ~46).
> These screenshots are also found in the prezi on slides 9-11.
> [~pmcfadin], [~Bj0rn], 
> Can this be a consequence of occasional deleted rows, as is described under 
> (3) in the description of CASSANDRA-6602 ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to