[ https://issues.apache.org/jira/browse/CASSANDRA-3843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13207827#comment-13207827 ]
Jonathan Ellis commented on CASSANDRA-3843: ------------------------------------------- I suggest testing with a single range scan at debug level. Too much hay to see the needle when you're doing 100s or 1000s of scans. > Unnecessary ReadRepair request during RangeScan > ------------------------------------------------ > > Key: CASSANDRA-3843 > URL: https://issues.apache.org/jira/browse/CASSANDRA-3843 > Project: Cassandra > Issue Type: Bug > Components: Core > Affects Versions: 1.0.0 > Reporter: Philip Andronov > Assignee: Jonathan Ellis > Fix For: 1.0.8 > > Attachments: 3843-v2.txt, 3843.txt > > > During reading with Quorum level and replication factor greater then 2, > Cassandra sends at least one ReadRepair, even if there is no need to do that. > With the fact that read requests await until ReadRepair will finish it slows > down requsts a lot, up to the Timeout :( > It seems that the problem has been introduced by the CASSANDRA-2494, > unfortunately I have no enought knowledge of Cassandra internals to fix the > problem and do not broke CASSANDRA-2494 functionality, so my report without a > patch. > Code explanations: > {code:title=RangeSliceResponseResolver.java|borderStyle=solid} > class RangeSliceResponseResolver { > // .... > private class Reducer extends > MergeIterator.Reducer<Pair<Row,InetAddress>, Row> > { > // .... > protected Row getReduced() > { > ColumnFamily resolved = versions.size() > 1 > ? > RowRepairResolver.resolveSuperset(versions) > : versions.get(0); > if (versions.size() < sources.size()) > { > for (InetAddress source : sources) > { > if (!versionSources.contains(source)) > { > > // [PA] Here we are adding null ColumnFamily. > // later it will be compared with the "desired" > // version and will give us "fake" difference which > // forces Cassandra to send ReadRepair to a given > source > versions.add(null); > versionSources.add(source); > } > } > } > // .... > if (resolved != null) > > repairResults.addAll(RowRepairResolver.scheduleRepairs(resolved, table, key, > versions, versionSources)); > // .... > } > } > } > {code} > {code:title=RowRepairResolver.java|borderStyle=solid} > public class RowRepairResolver extends AbstractRowResolver { > // .... > public static List<IAsyncResult> scheduleRepairs(ColumnFamily resolved, > String table, DecoratedKey<?> key, List<ColumnFamily> versions, > List<InetAddress> endpoints) > { > List<IAsyncResult> results = new > ArrayList<IAsyncResult>(versions.size()); > for (int i = 0; i < versions.size(); i++) > { > // On some iteration we have to compare null and resolved which > are obviously > // not equals, so it will fire a ReadRequest, however it is not > needed here > ColumnFamily diffCf = ColumnFamily.diff(versions.get(i), > resolved); > if (diffCf == null) > continue; > // .... > {code} > Imagine the following situation: > NodeA has X.1 // row X with the version 1 > NodeB has X.2 > NodeC has X.? // Unknown version, but because write was with Quorum it is 1 > or 2 > During the Quorum read from nodes A and B, Cassandra creates version 12 and > send ReadRepair, so now nodes has the following content: > NodeA has X.12 > NodeB has X.12 > which is correct, however Cassandra also will fire ReadRepair to NodeC. There > is no need to do that, the next consistent read have a chance to be served by > nodes {A, B} (no ReadRepair) or by pair {?, C} and in that case ReadRepair > will be fired and brings nodeC to the consistent state > Right now we are reading from the Index a lot and starting from some point in > time we are getting TimeOutException because cluster is overloaded by the > ReadRepairRequests *even* if all nodes has the same data :( -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira