[ https://issues.apache.org/jira/browse/CASSANDRA-10342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14989067#comment-14989067 ]
Marcus Eriksson commented on CASSANDRA-10342: --------------------------------------------- ping [~mambocab] > Read defragmentation can cause unnecessary repairs > -------------------------------------------------- > > Key: CASSANDRA-10342 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10342 > Project: Cassandra > Issue Type: Bug > Reporter: Marcus Olsson > Assignee: Marcus Eriksson > Priority: Minor > > After applying the fix from CASSANDRA-10299 to the cluster we started having > a problem of ~20k small sstables appearing for the table with static data > when running incremental repair. > In the logs there were several messages about flushes for that table, one for > each repaired range. The flushed sstables were 0.000kb in size with < 100 ops > in each. When checking cfstats there were several writes to that table, even > though we were only reading from it and read repair did not repair anything. > After digging around in the codebase I noticed that defragmentation of data > can occur while reading, depending on the query and some other conditions. > This causes the read data to be inserted again to have it in a more recent > sstable, which can be a problem if that data was repaired using incremental > repair. The defragmentation is done in > [CollationController.java|https://github.com/apache/cassandra/blob/cassandra-2.1/src/java/org/apache/cassandra/db/CollationController.java#L151]. > I guess this wasn't a problem with full repairs since I assume that the > digest should be the same even if you have two copies of the same data. But > with incremental repair this will most probably cause a mismatch between > nodes if that data already was repaired, since the other nodes probably won't > have that data in their unrepaired set. > ------ > I can add that the problems on our cluster was probably due to the fact that > CASSANDRA-10299 caused the same data to be streamed multiple times and ending > up in several sstables. One of the conditions for the defragmentation is that > the number of sstables read during a read request have to be more than the > minimum number of sstables needed for a compaction(> 4 in our case). So > normally I don't think this would cause ~20k sstables to appear, we probably > hit an extreme. > One workaround for this is to use another compaction strategy than STCS(it > seems to be the only affected strategy, atleast in 2.1), but the solution > might be to either make defragmentation configurable per table or avoid > reinserting the data if any of the sstables involved in the read are repaired. -- This message was sent by Atlassian JIRA (v6.3.4#6332)