[jira] [Commented] (CASSANDRA-3843) Unnecessary ReadRepair request during RangeScan

2012-02-21 Thread Jeremy Hanna (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13212725#comment-13212725
 ] 

Jeremy Hanna commented on CASSANDRA-3843:
-

Good to know - we'll upgrade to 1.0.8 as soon as we can then.

> Unnecessary  ReadRepair request during RangeScan
> 
>
> Key: CASSANDRA-3843
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3843
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.0.0
>Reporter: Philip Andronov
>Assignee: Jonathan Ellis
> Fix For: 1.0.8
>
> Attachments: 3843-v2.txt, 3843.txt
>
>
> During reading with Quorum level and replication factor greater then 2, 
> Cassandra sends at least one ReadRepair, even if there is no need to do that. 
> With the fact that read requests await until ReadRepair will finish it slows 
> down requsts a lot, up to the Timeout :(
> It seems that the problem has been introduced by the CASSANDRA-2494, 
> unfortunately I have no enought knowledge of Cassandra internals to fix the 
> problem and do not broke CASSANDRA-2494 functionality, so my report without a 
> patch.
> Code explanations:
> {code:title=RangeSliceResponseResolver.java|borderStyle=solid}
> class RangeSliceResponseResolver {
> // 
> private class Reducer extends 
> MergeIterator.Reducer, Row>
> {
> // 
> protected Row getReduced()
> {
> ColumnFamily resolved = versions.size() > 1
>   ? 
> RowRepairResolver.resolveSuperset(versions)
>   : versions.get(0);
> if (versions.size() < sources.size())
> {
> for (InetAddress source : sources)
> {
> if (!versionSources.contains(source))
> {
>   
> // [PA] Here we are adding null ColumnFamily.
> // later it will be compared with the "desired"
> // version and will give us "fake" difference which
> // forces Cassandra to send ReadRepair to a given 
> source
> versions.add(null);
> versionSources.add(source);
> }
> }
> }
> // 
> if (resolved != null)
> 
> repairResults.addAll(RowRepairResolver.scheduleRepairs(resolved, table, key, 
> versions, versionSources));
> // 
> }
> }
> }
> {code}
> {code:title=RowRepairResolver.java|borderStyle=solid}
> public class RowRepairResolver extends AbstractRowResolver {
> // 
> public static List scheduleRepairs(ColumnFamily resolved, 
> String table, DecoratedKey key, List versions, 
> List endpoints)
> {
> List results = new 
> ArrayList(versions.size());
> for (int i = 0; i < versions.size(); i++)
> {
> // On some iteration we have to compare null and resolved which 
> are obviously
> // not equals, so it will fire a ReadRequest, however it is not 
> needed here
> ColumnFamily diffCf = ColumnFamily.diff(versions.get(i), 
> resolved);
> if (diffCf == null)
> continue;
> //  
> {code}
> Imagine the following situation:
> NodeA has X.1 // row X with the version 1
> NodeB has X.2 
> NodeC has X.? // Unknown version, but because write was with Quorum it is 1 
> or 2
> During the Quorum read from nodes A and B, Cassandra creates version 12 and 
> send ReadRepair, so now nodes has the following content:
> NodeA has X.12
> NodeB has X.12
> which is correct, however Cassandra also will fire ReadRepair to NodeC. There 
> is no need to do that, the next consistent read have a chance to be served by 
> nodes {A, B} (no ReadRepair) or by pair {?, C} and in that case ReadRepair 
> will be fired and brings nodeC to the consistent state
> Right now we are reading from the Index a lot and starting from some point in 
> time we are getting TimeOutException because cluster is overloaded by the 
> ReadRepairRequests *even* if all nodes has the same data :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3843) Unnecessary ReadRepair request during RangeScan

2012-02-21 Thread Brandon Williams (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13212679#comment-13212679
 ] 

Brandon Williams commented on CASSANDRA-3843:
-

I'm unable to repro against 1.0 HEAD.

> Unnecessary  ReadRepair request during RangeScan
> 
>
> Key: CASSANDRA-3843
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3843
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.0.0
>Reporter: Philip Andronov
>Assignee: Jonathan Ellis
> Fix For: 1.0.8
>
> Attachments: 3843-v2.txt, 3843.txt
>
>
> During reading with Quorum level and replication factor greater then 2, 
> Cassandra sends at least one ReadRepair, even if there is no need to do that. 
> With the fact that read requests await until ReadRepair will finish it slows 
> down requsts a lot, up to the Timeout :(
> It seems that the problem has been introduced by the CASSANDRA-2494, 
> unfortunately I have no enought knowledge of Cassandra internals to fix the 
> problem and do not broke CASSANDRA-2494 functionality, so my report without a 
> patch.
> Code explanations:
> {code:title=RangeSliceResponseResolver.java|borderStyle=solid}
> class RangeSliceResponseResolver {
> // 
> private class Reducer extends 
> MergeIterator.Reducer, Row>
> {
> // 
> protected Row getReduced()
> {
> ColumnFamily resolved = versions.size() > 1
>   ? 
> RowRepairResolver.resolveSuperset(versions)
>   : versions.get(0);
> if (versions.size() < sources.size())
> {
> for (InetAddress source : sources)
> {
> if (!versionSources.contains(source))
> {
>   
> // [PA] Here we are adding null ColumnFamily.
> // later it will be compared with the "desired"
> // version and will give us "fake" difference which
> // forces Cassandra to send ReadRepair to a given 
> source
> versions.add(null);
> versionSources.add(source);
> }
> }
> }
> // 
> if (resolved != null)
> 
> repairResults.addAll(RowRepairResolver.scheduleRepairs(resolved, table, key, 
> versions, versionSources));
> // 
> }
> }
> }
> {code}
> {code:title=RowRepairResolver.java|borderStyle=solid}
> public class RowRepairResolver extends AbstractRowResolver {
> // 
> public static List scheduleRepairs(ColumnFamily resolved, 
> String table, DecoratedKey key, List versions, 
> List endpoints)
> {
> List results = new 
> ArrayList(versions.size());
> for (int i = 0; i < versions.size(); i++)
> {
> // On some iteration we have to compare null and resolved which 
> are obviously
> // not equals, so it will fire a ReadRequest, however it is not 
> needed here
> ColumnFamily diffCf = ColumnFamily.diff(versions.get(i), 
> resolved);
> if (diffCf == null)
> continue;
> //  
> {code}
> Imagine the following situation:
> NodeA has X.1 // row X with the version 1
> NodeB has X.2 
> NodeC has X.? // Unknown version, but because write was with Quorum it is 1 
> or 2
> During the Quorum read from nodes A and B, Cassandra creates version 12 and 
> send ReadRepair, so now nodes has the following content:
> NodeA has X.12
> NodeB has X.12
> which is correct, however Cassandra also will fire ReadRepair to NodeC. There 
> is no need to do that, the next consistent read have a chance to be served by 
> nodes {A, B} (no ReadRepair) or by pair {?, C} and in that case ReadRepair 
> will be fired and brings nodeC to the consistent state
> Right now we are reading from the Index a lot and starting from some point in 
> time we are getting TimeOutException because cluster is overloaded by the 
> ReadRepairRequests *even* if all nodes has the same data :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3843) Unnecessary ReadRepair request during RangeScan

2012-02-21 Thread Jeremy Hanna (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13212617#comment-13212617
 ] 

Jeremy Hanna commented on CASSANDRA-3843:
-

I did repairs on all the nodes and then compacts on all the nodes. Then I did a 
pig job to simply count the number of rows in the column family. Again I think 
the overall writes were reduced but there are writes going on. I need to turn 
debug on and do the same test again. I did the compactions at 6:42 and the 
range scans at 14:16:

-rw-r--r-- 1 root root 40106228511 Feb 21 06:42 account_snapshot-g-792-Data.db
-rw-r--r-- 1 root root   206884816 Feb 21 06:42 account_snapshot-g-792-Filter.db
-rw-r--r-- 1 root root  2913796038 Feb 21 06:42 account_snapshot-g-792-Index.db
-rw-r--r-- 1 root root4276 Feb 21 06:42 
account_snapshot-g-792-Statistics.db
-rw-r--r-- 1 root root   0 Feb 21 14:20 account_snapshot-g-793-Compacted
-rw-r--r-- 1 root root  287286 Feb 21 14:16 account_snapshot-g-793-Data.db
-rw-r--r-- 1 root root 976 Feb 21 14:16 account_snapshot-g-793-Filter.db
-rw-r--r-- 1 root root   20857 Feb 21 14:16 account_snapshot-g-793-Index.db
-rw-r--r-- 1 root root4276 Feb 21 14:16 
account_snapshot-g-793-Statistics.db
-rw-r--r-- 1 root root   0 Feb 21 14:20 account_snapshot-g-794-Compacted
-rw-r--r-- 1 root root87770771 Feb 21 14:17 account_snapshot-g-794-Data.db
-rw-r--r-- 1 root root  293944 Feb 21 14:17 account_snapshot-g-794-Filter.db
-rw-r--r-- 1 root root 6377968 Feb 21 14:17 account_snapshot-g-794-Index.db
-rw-r--r-- 1 root root4276 Feb 21 14:17 
account_snapshot-g-794-Statistics.db
-rw-r--r-- 1 root root   0 Feb 21 14:20 account_snapshot-g-795-Compacted
-rw-r--r-- 1 root root78459166 Feb 21 14:17 account_snapshot-g-795-Data.db
-rw-r--r-- 1 root root  262600 Feb 21 14:17 account_snapshot-g-795-Filter.db
-rw-r--r-- 1 root root 5698156 Feb 21 14:17 account_snapshot-g-795-Index.db
-rw-r--r-- 1 root root4276 Feb 21 14:17 
account_snapshot-g-795-Statistics.db
-rw-r--r-- 1 root root   0 Feb 21 14:20 account_snapshot-g-796-Compacted
-rw-r--r-- 1 root root69838937 Feb 21 14:17 account_snapshot-g-796-Data.db
-rw-r--r-- 1 root root  234000 Feb 21 14:17 account_snapshot-g-796-Filter.db
-rw-r--r-- 1 root root 5077447 Feb 21 14:17 account_snapshot-g-796-Index.db
-rw-r--r-- 1 root root4276 Feb 21 14:17 
account_snapshot-g-796-Statistics.db
-rw-r--r-- 1 root root   0 Feb 21 14:20 account_snapshot-g-797-Compacted
-rw-r--r-- 1 root root68094433 Feb 21 14:17 account_snapshot-g-797-Data.db
-rw-r--r-- 1 root root  227808 Feb 21 14:17 account_snapshot-g-797-Filter.db
-rw-r--r-- 1 root root 4943098 Feb 21 14:17 account_snapshot-g-797-Index.db
-rw-r--r-- 1 root root4276 Feb 21 14:17 
account_snapshot-g-797-Statistics.db
-rw-r--r-- 1 root root   304163307 Feb 21 14:20 account_snapshot-g-798-Data.db
-rw-r--r-- 1 root root 1019776 Feb 21 14:20 account_snapshot-g-798-Filter.db
-rw-r--r-- 1 root root22096669 Feb 21 14:20 account_snapshot-g-798-Index.db
-rw-r--r-- 1 root root4276 Feb 21 14:20 
account_snapshot-g-798-Statistics.db
-rw-r--r-- 1 root root65874829 Feb 21 14:18 account_snapshot-g-799-Data.db
-rw-r--r-- 1 root root  220192 Feb 21 14:18 account_snapshot-g-799-Filter.db
-rw-r--r-- 1 root root 4777809 Feb 21 14:18 account_snapshot-g-799-Index.db
-rw-r--r-- 1 root root4276 Feb 21 14:18 
account_snapshot-g-799-Statistics.db
-rw-r--r-- 1 root root   0 Feb 21 14:20 account_snapshot-g-800-Compacted
-rw-r--r-- 1 root root50067413 Feb 21 14:18 account_snapshot-g-800-Data.db
-rw-r--r-- 1 root root  167416 Feb 21 14:18 account_snapshot-g-800-Filter.db
-rw-r--r-- 1 root root 3632313 Feb 21 14:18 account_snapshot-g-800-Index.db
-rw-r--r-- 1 root root4276 Feb 21 14:18 
account_snapshot-g-800-Statistics.db
-rw-r--r-- 1 root root   0 Feb 21 14:20 account_snapshot-g-801-Compacted
-rw-r--r-- 1 root root50575719 Feb 21 14:18 account_snapshot-g-801-Data.db
-rw-r--r-- 1 root root  169160 Feb 21 14:18 account_snapshot-g-801-Filter.db
-rw-r--r-- 1 root root 3669880 Feb 21 14:18 account_snapshot-g-801-Index.db
-rw-r--r-- 1 root root4276 Feb 21 14:18 
account_snapshot-g-801-Statistics.db
-rw-r--r-- 1 root root   0 Feb 21 14:20 account_snapshot-g-802-Compacted
-rw-r--r-- 1 root root41788766 Feb 21 14:19 account_snapshot-g-802-Data.db
-rw-r--r-- 1 root root  139776 Feb 21 14:19 account_snapshot-g-802-Filter.db
-rw-r--r-- 1 root root 3033069 Feb 21 14:19 account_snapshot-g-802-Index.db
-rw-r--r-- 1 root root4276 Feb 21 14:19 
account_snapshot-g-802-Statistics.db
-rw-r--r-- 1 root root46547146 Feb 21 14:19 account_snapshot-g-803-Data.db
-rw-r--r-- 1 root root  155720 Feb 21 14:19 account_snapshot-g-803-Filter.db
-r

[jira] [Commented] (CASSANDRA-3843) Unnecessary ReadRepair request during RangeScan

2012-02-14 Thread Jeremy Hanna (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13207832#comment-13207832
 ] 

Jeremy Hanna commented on CASSANDRA-3843:
-

I did patch with v2.  Doing more testing today and it appears that there are 
writes occurring but it looks like a definite reduction.  It could be a valid 
repair thing.  I'll do some more testing and hopefully repair every node and 
compact every node and then do a scan across a large column family and see what 
happens.

> Unnecessary  ReadRepair request during RangeScan
> 
>
> Key: CASSANDRA-3843
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3843
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.0.0
>Reporter: Philip Andronov
>Assignee: Jonathan Ellis
> Fix For: 1.0.8
>
> Attachments: 3843-v2.txt, 3843.txt
>
>
> During reading with Quorum level and replication factor greater then 2, 
> Cassandra sends at least one ReadRepair, even if there is no need to do that. 
> With the fact that read requests await until ReadRepair will finish it slows 
> down requsts a lot, up to the Timeout :(
> It seems that the problem has been introduced by the CASSANDRA-2494, 
> unfortunately I have no enought knowledge of Cassandra internals to fix the 
> problem and do not broke CASSANDRA-2494 functionality, so my report without a 
> patch.
> Code explanations:
> {code:title=RangeSliceResponseResolver.java|borderStyle=solid}
> class RangeSliceResponseResolver {
> // 
> private class Reducer extends 
> MergeIterator.Reducer, Row>
> {
> // 
> protected Row getReduced()
> {
> ColumnFamily resolved = versions.size() > 1
>   ? 
> RowRepairResolver.resolveSuperset(versions)
>   : versions.get(0);
> if (versions.size() < sources.size())
> {
> for (InetAddress source : sources)
> {
> if (!versionSources.contains(source))
> {
>   
> // [PA] Here we are adding null ColumnFamily.
> // later it will be compared with the "desired"
> // version and will give us "fake" difference which
> // forces Cassandra to send ReadRepair to a given 
> source
> versions.add(null);
> versionSources.add(source);
> }
> }
> }
> // 
> if (resolved != null)
> 
> repairResults.addAll(RowRepairResolver.scheduleRepairs(resolved, table, key, 
> versions, versionSources));
> // 
> }
> }
> }
> {code}
> {code:title=RowRepairResolver.java|borderStyle=solid}
> public class RowRepairResolver extends AbstractRowResolver {
> // 
> public static List scheduleRepairs(ColumnFamily resolved, 
> String table, DecoratedKey key, List versions, 
> List endpoints)
> {
> List results = new 
> ArrayList(versions.size());
> for (int i = 0; i < versions.size(); i++)
> {
> // On some iteration we have to compare null and resolved which 
> are obviously
> // not equals, so it will fire a ReadRequest, however it is not 
> needed here
> ColumnFamily diffCf = ColumnFamily.diff(versions.get(i), 
> resolved);
> if (diffCf == null)
> continue;
> //  
> {code}
> Imagine the following situation:
> NodeA has X.1 // row X with the version 1
> NodeB has X.2 
> NodeC has X.? // Unknown version, but because write was with Quorum it is 1 
> or 2
> During the Quorum read from nodes A and B, Cassandra creates version 12 and 
> send ReadRepair, so now nodes has the following content:
> NodeA has X.12
> NodeB has X.12
> which is correct, however Cassandra also will fire ReadRepair to NodeC. There 
> is no need to do that, the next consistent read have a chance to be served by 
> nodes {A, B} (no ReadRepair) or by pair {?, C} and in that case ReadRepair 
> will be fired and brings nodeC to the consistent state
> Right now we are reading from the Index a lot and starting from some point in 
> time we are getting TimeOutException because cluster is overloaded by the 
> ReadRepairRequests *even* if all nodes has the same data :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3843) Unnecessary ReadRepair request during RangeScan

2012-02-14 Thread Jonathan Ellis (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13207828#comment-13207828
 ] 

Jonathan Ellis commented on CASSANDRA-3843:
---

... You did patch with v2, right?

> Unnecessary  ReadRepair request during RangeScan
> 
>
> Key: CASSANDRA-3843
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3843
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.0.0
>Reporter: Philip Andronov
>Assignee: Jonathan Ellis
> Fix For: 1.0.8
>
> Attachments: 3843-v2.txt, 3843.txt
>
>
> During reading with Quorum level and replication factor greater then 2, 
> Cassandra sends at least one ReadRepair, even if there is no need to do that. 
> With the fact that read requests await until ReadRepair will finish it slows 
> down requsts a lot, up to the Timeout :(
> It seems that the problem has been introduced by the CASSANDRA-2494, 
> unfortunately I have no enought knowledge of Cassandra internals to fix the 
> problem and do not broke CASSANDRA-2494 functionality, so my report without a 
> patch.
> Code explanations:
> {code:title=RangeSliceResponseResolver.java|borderStyle=solid}
> class RangeSliceResponseResolver {
> // 
> private class Reducer extends 
> MergeIterator.Reducer, Row>
> {
> // 
> protected Row getReduced()
> {
> ColumnFamily resolved = versions.size() > 1
>   ? 
> RowRepairResolver.resolveSuperset(versions)
>   : versions.get(0);
> if (versions.size() < sources.size())
> {
> for (InetAddress source : sources)
> {
> if (!versionSources.contains(source))
> {
>   
> // [PA] Here we are adding null ColumnFamily.
> // later it will be compared with the "desired"
> // version and will give us "fake" difference which
> // forces Cassandra to send ReadRepair to a given 
> source
> versions.add(null);
> versionSources.add(source);
> }
> }
> }
> // 
> if (resolved != null)
> 
> repairResults.addAll(RowRepairResolver.scheduleRepairs(resolved, table, key, 
> versions, versionSources));
> // 
> }
> }
> }
> {code}
> {code:title=RowRepairResolver.java|borderStyle=solid}
> public class RowRepairResolver extends AbstractRowResolver {
> // 
> public static List scheduleRepairs(ColumnFamily resolved, 
> String table, DecoratedKey key, List versions, 
> List endpoints)
> {
> List results = new 
> ArrayList(versions.size());
> for (int i = 0; i < versions.size(); i++)
> {
> // On some iteration we have to compare null and resolved which 
> are obviously
> // not equals, so it will fire a ReadRequest, however it is not 
> needed here
> ColumnFamily diffCf = ColumnFamily.diff(versions.get(i), 
> resolved);
> if (diffCf == null)
> continue;
> //  
> {code}
> Imagine the following situation:
> NodeA has X.1 // row X with the version 1
> NodeB has X.2 
> NodeC has X.? // Unknown version, but because write was with Quorum it is 1 
> or 2
> During the Quorum read from nodes A and B, Cassandra creates version 12 and 
> send ReadRepair, so now nodes has the following content:
> NodeA has X.12
> NodeB has X.12
> which is correct, however Cassandra also will fire ReadRepair to NodeC. There 
> is no need to do that, the next consistent read have a chance to be served by 
> nodes {A, B} (no ReadRepair) or by pair {?, C} and in that case ReadRepair 
> will be fired and brings nodeC to the consistent state
> Right now we are reading from the Index a lot and starting from some point in 
> time we are getting TimeOutException because cluster is overloaded by the 
> ReadRepairRequests *even* if all nodes has the same data :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3843) Unnecessary ReadRepair request during RangeScan

2012-02-14 Thread Jonathan Ellis (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13207827#comment-13207827
 ] 

Jonathan Ellis commented on CASSANDRA-3843:
---

I suggest testing with a single range scan at debug level.  Too much hay to see 
the needle when you're doing 100s or 1000s of scans.

> Unnecessary  ReadRepair request during RangeScan
> 
>
> Key: CASSANDRA-3843
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3843
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.0.0
>Reporter: Philip Andronov
>Assignee: Jonathan Ellis
> Fix For: 1.0.8
>
> Attachments: 3843-v2.txt, 3843.txt
>
>
> During reading with Quorum level and replication factor greater then 2, 
> Cassandra sends at least one ReadRepair, even if there is no need to do that. 
> With the fact that read requests await until ReadRepair will finish it slows 
> down requsts a lot, up to the Timeout :(
> It seems that the problem has been introduced by the CASSANDRA-2494, 
> unfortunately I have no enought knowledge of Cassandra internals to fix the 
> problem and do not broke CASSANDRA-2494 functionality, so my report without a 
> patch.
> Code explanations:
> {code:title=RangeSliceResponseResolver.java|borderStyle=solid}
> class RangeSliceResponseResolver {
> // 
> private class Reducer extends 
> MergeIterator.Reducer, Row>
> {
> // 
> protected Row getReduced()
> {
> ColumnFamily resolved = versions.size() > 1
>   ? 
> RowRepairResolver.resolveSuperset(versions)
>   : versions.get(0);
> if (versions.size() < sources.size())
> {
> for (InetAddress source : sources)
> {
> if (!versionSources.contains(source))
> {
>   
> // [PA] Here we are adding null ColumnFamily.
> // later it will be compared with the "desired"
> // version and will give us "fake" difference which
> // forces Cassandra to send ReadRepair to a given 
> source
> versions.add(null);
> versionSources.add(source);
> }
> }
> }
> // 
> if (resolved != null)
> 
> repairResults.addAll(RowRepairResolver.scheduleRepairs(resolved, table, key, 
> versions, versionSources));
> // 
> }
> }
> }
> {code}
> {code:title=RowRepairResolver.java|borderStyle=solid}
> public class RowRepairResolver extends AbstractRowResolver {
> // 
> public static List scheduleRepairs(ColumnFamily resolved, 
> String table, DecoratedKey key, List versions, 
> List endpoints)
> {
> List results = new 
> ArrayList(versions.size());
> for (int i = 0; i < versions.size(); i++)
> {
> // On some iteration we have to compare null and resolved which 
> are obviously
> // not equals, so it will fire a ReadRequest, however it is not 
> needed here
> ColumnFamily diffCf = ColumnFamily.diff(versions.get(i), 
> resolved);
> if (diffCf == null)
> continue;
> //  
> {code}
> Imagine the following situation:
> NodeA has X.1 // row X with the version 1
> NodeB has X.2 
> NodeC has X.? // Unknown version, but because write was with Quorum it is 1 
> or 2
> During the Quorum read from nodes A and B, Cassandra creates version 12 and 
> send ReadRepair, so now nodes has the following content:
> NodeA has X.12
> NodeB has X.12
> which is correct, however Cassandra also will fire ReadRepair to NodeC. There 
> is no need to do that, the next consistent read have a chance to be served by 
> nodes {A, B} (no ReadRepair) or by pair {?, C} and in that case ReadRepair 
> will be fired and brings nodeC to the consistent state
> Right now we are reading from the Index a lot and starting from some point in 
> time we are getting TimeOutException because cluster is overloaded by the 
> ReadRepairRequests *even* if all nodes has the same data :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3843) Unnecessary ReadRepair request during RangeScan

2012-02-14 Thread Jeremy Hanna (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13207577#comment-13207577
 ] 

Jeremy Hanna commented on CASSANDRA-3843:
-

I patched the version of 0.8.4 that we use with the change.  I applied it to 
all of our staging nodes.  However, the problem with writes on the column 
family it was simply doing range scans of still persists.  I had major 
compacted a column family on all of the nodes, then did a simple pig job to 
read the contents of that CF, then I got a lot of minor compactions for that 
column family.

> Unnecessary  ReadRepair request during RangeScan
> 
>
> Key: CASSANDRA-3843
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3843
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.0.0
>Reporter: Philip Andronov
>Assignee: Jonathan Ellis
> Fix For: 1.0.8
>
> Attachments: 3843-v2.txt, 3843.txt
>
>
> During reading with Quorum level and replication factor greater then 2, 
> Cassandra sends at least one ReadRepair, even if there is no need to do that. 
> With the fact that read requests await until ReadRepair will finish it slows 
> down requsts a lot, up to the Timeout :(
> It seems that the problem has been introduced by the CASSANDRA-2494, 
> unfortunately I have no enought knowledge of Cassandra internals to fix the 
> problem and do not broke CASSANDRA-2494 functionality, so my report without a 
> patch.
> Code explanations:
> {code:title=RangeSliceResponseResolver.java|borderStyle=solid}
> class RangeSliceResponseResolver {
> // 
> private class Reducer extends 
> MergeIterator.Reducer, Row>
> {
> // 
> protected Row getReduced()
> {
> ColumnFamily resolved = versions.size() > 1
>   ? 
> RowRepairResolver.resolveSuperset(versions)
>   : versions.get(0);
> if (versions.size() < sources.size())
> {
> for (InetAddress source : sources)
> {
> if (!versionSources.contains(source))
> {
>   
> // [PA] Here we are adding null ColumnFamily.
> // later it will be compared with the "desired"
> // version and will give us "fake" difference which
> // forces Cassandra to send ReadRepair to a given 
> source
> versions.add(null);
> versionSources.add(source);
> }
> }
> }
> // 
> if (resolved != null)
> 
> repairResults.addAll(RowRepairResolver.scheduleRepairs(resolved, table, key, 
> versions, versionSources));
> // 
> }
> }
> }
> {code}
> {code:title=RowRepairResolver.java|borderStyle=solid}
> public class RowRepairResolver extends AbstractRowResolver {
> // 
> public static List scheduleRepairs(ColumnFamily resolved, 
> String table, DecoratedKey key, List versions, 
> List endpoints)
> {
> List results = new 
> ArrayList(versions.size());
> for (int i = 0; i < versions.size(); i++)
> {
> // On some iteration we have to compare null and resolved which 
> are obviously
> // not equals, so it will fire a ReadRequest, however it is not 
> needed here
> ColumnFamily diffCf = ColumnFamily.diff(versions.get(i), 
> resolved);
> if (diffCf == null)
> continue;
> //  
> {code}
> Imagine the following situation:
> NodeA has X.1 // row X with the version 1
> NodeB has X.2 
> NodeC has X.? // Unknown version, but because write was with Quorum it is 1 
> or 2
> During the Quorum read from nodes A and B, Cassandra creates version 12 and 
> send ReadRepair, so now nodes has the following content:
> NodeA has X.12
> NodeB has X.12
> which is correct, however Cassandra also will fire ReadRepair to NodeC. There 
> is no need to do that, the next consistent read have a chance to be served by 
> nodes {A, B} (no ReadRepair) or by pair {?, C} and in that case ReadRepair 
> will be fired and brings nodeC to the consistent state
> Right now we are reading from the Index a lot and starting from some point in 
> time we are getting TimeOutException because cluster is overloaded by the 
> ReadRepairRequests *even* if all nodes has the same data :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see:

[jira] [Commented] (CASSANDRA-3843) Unnecessary ReadRepair request during RangeScan

2012-02-13 Thread Jonathan Ellis (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13207050#comment-13207050
 ] 

Jonathan Ellis commented on CASSANDRA-3843:
---

Looks to me like the 1.0 code changes from v2 apply cleanly to 0.8.  (CHANGES 
diff does not apply but can be ignored.)

> Unnecessary  ReadRepair request during RangeScan
> 
>
> Key: CASSANDRA-3843
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3843
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.0.0
>Reporter: Philip Andronov
>Assignee: Jonathan Ellis
> Fix For: 1.0.8
>
> Attachments: 3843-v2.txt, 3843.txt
>
>
> During reading with Quorum level and replication factor greater then 2, 
> Cassandra sends at least one ReadRepair, even if there is no need to do that. 
> With the fact that read requests await until ReadRepair will finish it slows 
> down requsts a lot, up to the Timeout :(
> It seems that the problem has been introduced by the CASSANDRA-2494, 
> unfortunately I have no enought knowledge of Cassandra internals to fix the 
> problem and do not broke CASSANDRA-2494 functionality, so my report without a 
> patch.
> Code explanations:
> {code:title=RangeSliceResponseResolver.java|borderStyle=solid}
> class RangeSliceResponseResolver {
> // 
> private class Reducer extends 
> MergeIterator.Reducer, Row>
> {
> // 
> protected Row getReduced()
> {
> ColumnFamily resolved = versions.size() > 1
>   ? 
> RowRepairResolver.resolveSuperset(versions)
>   : versions.get(0);
> if (versions.size() < sources.size())
> {
> for (InetAddress source : sources)
> {
> if (!versionSources.contains(source))
> {
>   
> // [PA] Here we are adding null ColumnFamily.
> // later it will be compared with the "desired"
> // version and will give us "fake" difference which
> // forces Cassandra to send ReadRepair to a given 
> source
> versions.add(null);
> versionSources.add(source);
> }
> }
> }
> // 
> if (resolved != null)
> 
> repairResults.addAll(RowRepairResolver.scheduleRepairs(resolved, table, key, 
> versions, versionSources));
> // 
> }
> }
> }
> {code}
> {code:title=RowRepairResolver.java|borderStyle=solid}
> public class RowRepairResolver extends AbstractRowResolver {
> // 
> public static List scheduleRepairs(ColumnFamily resolved, 
> String table, DecoratedKey key, List versions, 
> List endpoints)
> {
> List results = new 
> ArrayList(versions.size());
> for (int i = 0; i < versions.size(); i++)
> {
> // On some iteration we have to compare null and resolved which 
> are obviously
> // not equals, so it will fire a ReadRequest, however it is not 
> needed here
> ColumnFamily diffCf = ColumnFamily.diff(versions.get(i), 
> resolved);
> if (diffCf == null)
> continue;
> //  
> {code}
> Imagine the following situation:
> NodeA has X.1 // row X with the version 1
> NodeB has X.2 
> NodeC has X.? // Unknown version, but because write was with Quorum it is 1 
> or 2
> During the Quorum read from nodes A and B, Cassandra creates version 12 and 
> send ReadRepair, so now nodes has the following content:
> NodeA has X.12
> NodeB has X.12
> which is correct, however Cassandra also will fire ReadRepair to NodeC. There 
> is no need to do that, the next consistent read have a chance to be served by 
> nodes {A, B} (no ReadRepair) or by pair {?, C} and in that case ReadRepair 
> will be fired and brings nodeC to the consistent state
> Right now we are reading from the Index a lot and starting from some point in 
> time we are getting TimeOutException because cluster is overloaded by the 
> ReadRepairRequests *even* if all nodes has the same data :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3843) Unnecessary ReadRepair request during RangeScan

2012-02-13 Thread Jonathan Ellis (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13207046#comment-13207046
 ] 

Jonathan Ellis commented on CASSANDRA-3843:
---

It's a relatively small patch, but StorageProxy and its callbacks can be 
fragile...  I almost didn't commit it to 1.0 either.  Tell you what though, 
I'll post a backported patch here and if you want you can run with it. :)

> Unnecessary  ReadRepair request during RangeScan
> 
>
> Key: CASSANDRA-3843
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3843
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.0.0
>Reporter: Philip Andronov
>Assignee: Jonathan Ellis
> Fix For: 1.0.8
>
> Attachments: 3843-v2.txt, 3843.txt
>
>
> During reading with Quorum level and replication factor greater then 2, 
> Cassandra sends at least one ReadRepair, even if there is no need to do that. 
> With the fact that read requests await until ReadRepair will finish it slows 
> down requsts a lot, up to the Timeout :(
> It seems that the problem has been introduced by the CASSANDRA-2494, 
> unfortunately I have no enought knowledge of Cassandra internals to fix the 
> problem and do not broke CASSANDRA-2494 functionality, so my report without a 
> patch.
> Code explanations:
> {code:title=RangeSliceResponseResolver.java|borderStyle=solid}
> class RangeSliceResponseResolver {
> // 
> private class Reducer extends 
> MergeIterator.Reducer, Row>
> {
> // 
> protected Row getReduced()
> {
> ColumnFamily resolved = versions.size() > 1
>   ? 
> RowRepairResolver.resolveSuperset(versions)
>   : versions.get(0);
> if (versions.size() < sources.size())
> {
> for (InetAddress source : sources)
> {
> if (!versionSources.contains(source))
> {
>   
> // [PA] Here we are adding null ColumnFamily.
> // later it will be compared with the "desired"
> // version and will give us "fake" difference which
> // forces Cassandra to send ReadRepair to a given 
> source
> versions.add(null);
> versionSources.add(source);
> }
> }
> }
> // 
> if (resolved != null)
> 
> repairResults.addAll(RowRepairResolver.scheduleRepairs(resolved, table, key, 
> versions, versionSources));
> // 
> }
> }
> }
> {code}
> {code:title=RowRepairResolver.java|borderStyle=solid}
> public class RowRepairResolver extends AbstractRowResolver {
> // 
> public static List scheduleRepairs(ColumnFamily resolved, 
> String table, DecoratedKey key, List versions, 
> List endpoints)
> {
> List results = new 
> ArrayList(versions.size());
> for (int i = 0; i < versions.size(); i++)
> {
> // On some iteration we have to compare null and resolved which 
> are obviously
> // not equals, so it will fire a ReadRequest, however it is not 
> needed here
> ColumnFamily diffCf = ColumnFamily.diff(versions.get(i), 
> resolved);
> if (diffCf == null)
> continue;
> //  
> {code}
> Imagine the following situation:
> NodeA has X.1 // row X with the version 1
> NodeB has X.2 
> NodeC has X.? // Unknown version, but because write was with Quorum it is 1 
> or 2
> During the Quorum read from nodes A and B, Cassandra creates version 12 and 
> send ReadRepair, so now nodes has the following content:
> NodeA has X.12
> NodeB has X.12
> which is correct, however Cassandra also will fire ReadRepair to NodeC. There 
> is no need to do that, the next consistent read have a chance to be served by 
> nodes {A, B} (no ReadRepair) or by pair {?, C} and in that case ReadRepair 
> will be fired and brings nodeC to the consistent state
> Right now we are reading from the Index a lot and starting from some point in 
> time we are getting TimeOutException because cluster is overloaded by the 
> ReadRepairRequests *even* if all nodes has the same data :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3843) Unnecessary ReadRepair request during RangeScan

2012-02-13 Thread Jeremy Hanna (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13206881#comment-13206881
 ] 

Jeremy Hanna commented on CASSANDRA-3843:
-

We'll be upgrading to 1.0.8 as soon as we can, but this seems like a 
significant issue for anyone doing range scans - does it make sense to backport 
to 0.8.x?

> Unnecessary  ReadRepair request during RangeScan
> 
>
> Key: CASSANDRA-3843
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3843
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.0.0
>Reporter: Philip Andronov
>Assignee: Jonathan Ellis
> Fix For: 1.0.8
>
> Attachments: 3843-v2.txt, 3843.txt
>
>
> During reading with Quorum level and replication factor greater then 2, 
> Cassandra sends at least one ReadRepair, even if there is no need to do that. 
> With the fact that read requests await until ReadRepair will finish it slows 
> down requsts a lot, up to the Timeout :(
> It seems that the problem has been introduced by the CASSANDRA-2494, 
> unfortunately I have no enought knowledge of Cassandra internals to fix the 
> problem and do not broke CASSANDRA-2494 functionality, so my report without a 
> patch.
> Code explanations:
> {code:title=RangeSliceResponseResolver.java|borderStyle=solid}
> class RangeSliceResponseResolver {
> // 
> private class Reducer extends 
> MergeIterator.Reducer, Row>
> {
> // 
> protected Row getReduced()
> {
> ColumnFamily resolved = versions.size() > 1
>   ? 
> RowRepairResolver.resolveSuperset(versions)
>   : versions.get(0);
> if (versions.size() < sources.size())
> {
> for (InetAddress source : sources)
> {
> if (!versionSources.contains(source))
> {
>   
> // [PA] Here we are adding null ColumnFamily.
> // later it will be compared with the "desired"
> // version and will give us "fake" difference which
> // forces Cassandra to send ReadRepair to a given 
> source
> versions.add(null);
> versionSources.add(source);
> }
> }
> }
> // 
> if (resolved != null)
> 
> repairResults.addAll(RowRepairResolver.scheduleRepairs(resolved, table, key, 
> versions, versionSources));
> // 
> }
> }
> }
> {code}
> {code:title=RowRepairResolver.java|borderStyle=solid}
> public class RowRepairResolver extends AbstractRowResolver {
> // 
> public static List scheduleRepairs(ColumnFamily resolved, 
> String table, DecoratedKey key, List versions, 
> List endpoints)
> {
> List results = new 
> ArrayList(versions.size());
> for (int i = 0; i < versions.size(); i++)
> {
> // On some iteration we have to compare null and resolved which 
> are obviously
> // not equals, so it will fire a ReadRequest, however it is not 
> needed here
> ColumnFamily diffCf = ColumnFamily.diff(versions.get(i), 
> resolved);
> if (diffCf == null)
> continue;
> //  
> {code}
> Imagine the following situation:
> NodeA has X.1 // row X with the version 1
> NodeB has X.2 
> NodeC has X.? // Unknown version, but because write was with Quorum it is 1 
> or 2
> During the Quorum read from nodes A and B, Cassandra creates version 12 and 
> send ReadRepair, so now nodes has the following content:
> NodeA has X.12
> NodeB has X.12
> which is correct, however Cassandra also will fire ReadRepair to NodeC. There 
> is no need to do that, the next consistent read have a chance to be served by 
> nodes {A, B} (no ReadRepair) or by pair {?, C} and in that case ReadRepair 
> will be fired and brings nodeC to the consistent state
> Right now we are reading from the Index a lot and starting from some point in 
> time we are getting TimeOutException because cluster is overloaded by the 
> ReadRepairRequests *even* if all nodes has the same data :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3843) Unnecessary ReadRepair request during RangeScan

2012-02-09 Thread Philip Andronov (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13204600#comment-13204600
 ] 

Philip Andronov commented on CASSANDRA-3843:


> The null version was added for CASSANDRA-2680.
Oh, good point. Sorry, I've should pay more attention on git history, not only 
on annotations :)

Anyway, thanks for the patch, now we could apply correct patch on our servers.

> Unnecessary  ReadRepair request during RangeScan
> 
>
> Key: CASSANDRA-3843
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3843
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.0.0
>Reporter: Philip Andronov
>Assignee: Jonathan Ellis
> Fix For: 1.0.8
>
> Attachments: 3843.txt
>
>
> During reading with Quorum level and replication factor greater then 2, 
> Cassandra sends at least one ReadRepair, even if there is no need to do that. 
> With the fact that read requests await until ReadRepair will finish it slows 
> down requsts a lot, up to the Timeout :(
> It seems that the problem has been introduced by the CASSANDRA-2494, 
> unfortunately I have no enought knowledge of Cassandra internals to fix the 
> problem and do not broke CASSANDRA-2494 functionality, so my report without a 
> patch.
> Code explanations:
> {code:title=RangeSliceResponseResolver.java|borderStyle=solid}
> class RangeSliceResponseResolver {
> // 
> private class Reducer extends 
> MergeIterator.Reducer, Row>
> {
> // 
> protected Row getReduced()
> {
> ColumnFamily resolved = versions.size() > 1
>   ? 
> RowRepairResolver.resolveSuperset(versions)
>   : versions.get(0);
> if (versions.size() < sources.size())
> {
> for (InetAddress source : sources)
> {
> if (!versionSources.contains(source))
> {
>   
> // [PA] Here we are adding null ColumnFamily.
> // later it will be compared with the "desired"
> // version and will give us "fake" difference which
> // forces Cassandra to send ReadRepair to a given 
> source
> versions.add(null);
> versionSources.add(source);
> }
> }
> }
> // 
> if (resolved != null)
> 
> repairResults.addAll(RowRepairResolver.scheduleRepairs(resolved, table, key, 
> versions, versionSources));
> // 
> }
> }
> }
> {code}
> {code:title=RowRepairResolver.java|borderStyle=solid}
> public class RowRepairResolver extends AbstractRowResolver {
> // 
> public static List scheduleRepairs(ColumnFamily resolved, 
> String table, DecoratedKey key, List versions, 
> List endpoints)
> {
> List results = new 
> ArrayList(versions.size());
> for (int i = 0; i < versions.size(); i++)
> {
> // On some iteration we have to compare null and resolved which 
> are obviously
> // not equals, so it will fire a ReadRequest, however it is not 
> needed here
> ColumnFamily diffCf = ColumnFamily.diff(versions.get(i), 
> resolved);
> if (diffCf == null)
> continue;
> //  
> {code}
> Imagine the following situation:
> NodeA has X.1 // row X with the version 1
> NodeB has X.2 
> NodeC has X.? // Unknown version, but because write was with Quorum it is 1 
> or 2
> During the Quorum read from nodes A and B, Cassandra creates version 12 and 
> send ReadRepair, so now nodes has the following content:
> NodeA has X.12
> NodeB has X.12
> which is correct, however Cassandra also will fire ReadRepair to NodeC. There 
> is no need to do that, the next consistent read have a chance to be served by 
> nodes {A, B} (no ReadRepair) or by pair {?, C} and in that case ReadRepair 
> will be fired and brings nodeC to the consistent state
> Right now we are reading from the Index a lot and starting from some point in 
> time we are getting TimeOutException because cluster is overloaded by the 
> ReadRepairRequests *even* if all nodes has the same data :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3843) Unnecessary ReadRepair request during RangeScan

2012-02-08 Thread Vijay (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13204269#comment-13204269
 ] 

Vijay commented on CASSANDRA-3843:
--

+1

> Unnecessary  ReadRepair request during RangeScan
> 
>
> Key: CASSANDRA-3843
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3843
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.0.0
>Reporter: Philip Andronov
>Assignee: Jonathan Ellis
> Fix For: 1.0.8
>
> Attachments: 3843.txt
>
>
> During reading with Quorum level and replication factor greater then 2, 
> Cassandra sends at least one ReadRepair, even if there is no need to do that. 
> With the fact that read requests await until ReadRepair will finish it slows 
> down requsts a lot, up to the Timeout :(
> It seems that the problem has been introduced by the CASSANDRA-2494, 
> unfortunately I have no enought knowledge of Cassandra internals to fix the 
> problem and do not broke CASSANDRA-2494 functionality, so my report without a 
> patch.
> Code explanations:
> {code:title=RangeSliceResponseResolver.java|borderStyle=solid}
> class RangeSliceResponseResolver {
> // 
> private class Reducer extends 
> MergeIterator.Reducer, Row>
> {
> // 
> protected Row getReduced()
> {
> ColumnFamily resolved = versions.size() > 1
>   ? 
> RowRepairResolver.resolveSuperset(versions)
>   : versions.get(0);
> if (versions.size() < sources.size())
> {
> for (InetAddress source : sources)
> {
> if (!versionSources.contains(source))
> {
>   
> // [PA] Here we are adding null ColumnFamily.
> // later it will be compared with the "desired"
> // version and will give us "fake" difference which
> // forces Cassandra to send ReadRepair to a given 
> source
> versions.add(null);
> versionSources.add(source);
> }
> }
> }
> // 
> if (resolved != null)
> 
> repairResults.addAll(RowRepairResolver.scheduleRepairs(resolved, table, key, 
> versions, versionSources));
> // 
> }
> }
> }
> {code}
> {code:title=RowRepairResolver.java|borderStyle=solid}
> public class RowRepairResolver extends AbstractRowResolver {
> // 
> public static List scheduleRepairs(ColumnFamily resolved, 
> String table, DecoratedKey key, List versions, 
> List endpoints)
> {
> List results = new 
> ArrayList(versions.size());
> for (int i = 0; i < versions.size(); i++)
> {
> // On some iteration we have to compare null and resolved which 
> are obviously
> // not equals, so it will fire a ReadRequest, however it is not 
> needed here
> ColumnFamily diffCf = ColumnFamily.diff(versions.get(i), 
> resolved);
> if (diffCf == null)
> continue;
> //  
> {code}
> Imagine the following situation:
> NodeA has X.1 // row X with the version 1
> NodeB has X.2 
> NodeC has X.? // Unknown version, but because write was with Quorum it is 1 
> or 2
> During the Quorum read from nodes A and B, Cassandra creates version 12 and 
> send ReadRepair, so now nodes has the following content:
> NodeA has X.12
> NodeB has X.12
> which is correct, however Cassandra also will fire ReadRepair to NodeC. There 
> is no need to do that, the next consistent read have a chance to be served by 
> nodes {A, B} (no ReadRepair) or by pair {?, C} and in that case ReadRepair 
> will be fired and brings nodeC to the consistent state
> Right now we are reading from the Index a lot and starting from some point in 
> time we are getting TimeOutException because cluster is overloaded by the 
> ReadRepairRequests *even* if all nodes has the same data :(

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira