[ 
https://issues.apache.org/jira/browse/CASSANDRA-3843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13204600#comment-13204600
 ] 

Philip Andronov commented on CASSANDRA-3843:
--------------------------------------------

> The null version was added for CASSANDRA-2680.
Oh, good point. Sorry, I've should pay more attention on git history, not only 
on annotations :)

Anyway, thanks for the patch, now we could apply correct patch on our servers.
                
> Unnecessary  ReadRepair request during RangeScan
> ------------------------------------------------
>
>                 Key: CASSANDRA-3843
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3843
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 1.0.0
>            Reporter: Philip Andronov
>            Assignee: Jonathan Ellis
>             Fix For: 1.0.8
>
>         Attachments: 3843.txt
>
>
> During reading with Quorum level and replication factor greater then 2, 
> Cassandra sends at least one ReadRepair, even if there is no need to do that. 
> With the fact that read requests await until ReadRepair will finish it slows 
> down requsts a lot, up to the Timeout :(
> It seems that the problem has been introduced by the CASSANDRA-2494, 
> unfortunately I have no enought knowledge of Cassandra internals to fix the 
> problem and do not broke CASSANDRA-2494 functionality, so my report without a 
> patch.
> Code explanations:
> {code:title=RangeSliceResponseResolver.java|borderStyle=solid}
> class RangeSliceResponseResolver {
>     // ....
>     private class Reducer extends 
> MergeIterator.Reducer<Pair<Row,InetAddress>, Row>
>     {
>     // ....
>         protected Row getReduced()
>         {
>             ColumnFamily resolved = versions.size() > 1
>                                   ? 
> RowRepairResolver.resolveSuperset(versions)
>                                   : versions.get(0);
>             if (versions.size() < sources.size())
>             {
>                 for (InetAddress source : sources)
>                 {
>                     if (!versionSources.contains(source))
>                     {
>                           
>                         // [PA] Here we are adding null ColumnFamily.
>                         // later it will be compared with the "desired"
>                         // version and will give us "fake" difference which
>                         // forces Cassandra to send ReadRepair to a given 
> source
>                         versions.add(null);
>                         versionSources.add(source);
>                     }
>                 }
>             }
>             // ....
>             if (resolved != null)
>                 
> repairResults.addAll(RowRepairResolver.scheduleRepairs(resolved, table, key, 
> versions, versionSources));
>             // ....
>         }
>     }
> }
> {code}
> {code:title=RowRepairResolver.java|borderStyle=solid}
> public class RowRepairResolver extends AbstractRowResolver {
>     // ....
>     public static List<IAsyncResult> scheduleRepairs(ColumnFamily resolved, 
> String table, DecoratedKey<?> key, List<ColumnFamily> versions, 
> List<InetAddress> endpoints)
>     {
>         List<IAsyncResult> results = new 
> ArrayList<IAsyncResult>(versions.size());
>         for (int i = 0; i < versions.size(); i++)
>         {
>             // On some iteration we have to compare null and resolved which 
> are obviously
>             // not equals, so it will fire a ReadRequest, however it is not 
> needed here
>             ColumnFamily diffCf = ColumnFamily.diff(versions.get(i), 
> resolved);
>             if (diffCf == null)
>                 continue;
>         // .... 
> {code}
> Imagine the following situation:
> NodeA has X.1 // row X with the version 1
> NodeB has X.2 
> NodeC has X.? // Unknown version, but because write was with Quorum it is 1 
> or 2
> During the Quorum read from nodes A and B, Cassandra creates version 12 and 
> send ReadRepair, so now nodes has the following content:
> NodeA has X.12
> NodeB has X.12
> which is correct, however Cassandra also will fire ReadRepair to NodeC. There 
> is no need to do that, the next consistent read have a chance to be served by 
> nodes {A, B} (no ReadRepair) or by pair {?, C} and in that case ReadRepair 
> will be fired and brings nodeC to the consistent state
> Right now we are reading from the Index a lot and starting from some point in 
> time we are getting TimeOutException because cluster is overloaded by the 
> ReadRepairRequests *even* if all nodes has the same data :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to