[ https://issues.apache.org/jira/browse/CASSANDRA-9597?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14588051#comment-14588051 ]
Marcus Eriksson edited comment on CASSANDRA-9597 at 6/16/15 1:55 PM: --------------------------------------------------------------------- Yeah repair is a problem with vnodes (CASSANDRA-5220), and yes we should probably do something smarter when we have > max_threshold sstables to compact. [~Bj0rn] do you have time to take a stab at this? Note that it is still going to be quite painful, if you are on 2.1 incremental repair should reduce the pain a lot (ie, never stream in any old data in) edit: oops, repair was not mentioned, but for anyone hitting this, inc repair should help. was (Author: krummas): Yeah repair is a problem with vnodes (CASSANDRA-5220), and yes we should probably do something smarter when we have > max_threshold sstables to compact. [~Bj0rn] do you have time to take a stab at this? Note that it is still going to be quite painful, if you are on 2.1 incremental repair should reduce the pain a lot (ie, never stream in any old data in) > DTCS should consider file SIZE in addition to time windowing > ------------------------------------------------------------ > > Key: CASSANDRA-9597 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9597 > Project: Cassandra > Issue Type: Improvement > Reporter: Jeff Jirsa > Priority: Minor > Labels: dtcs > > DTCS seems to work well for the typical use case - writing data in perfect > time order, compacting recent files, and ignoring older files. > However, there are "normal" operational actions where DTCS will fall behind > and is unlikely to recover. > An example of this is streaming operations (for example, bootstrap or loading > data into a cluster using sstableloader), where lots (tens of thousands) of > very small sstables can be created spanning multiple time buckets. In these > case, even if max_sstable_age_days is extended to allow the older incoming > files to be compacted, the selection logic is likely to re-compact large > files with fewer small files over and over, rather than prioritizing > selection of max_threshold smallest files to decrease the number of candidate > sstables as quickly as possible. -- This message was sent by Atlassian JIRA (v6.3.4#6332)