[jira] [Commented] (CASSANDRA-3843) Unnecessary ReadRepair request during RangeScan
[ https://issues.apache.org/jira/browse/CASSANDRA-3843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13212725#comment-13212725 ] Jeremy Hanna commented on CASSANDRA-3843: - Good to know - we'll upgrade to 1.0.8 as soon as we can then. > Unnecessary ReadRepair request during RangeScan > > > Key: CASSANDRA-3843 > URL: https://issues.apache.org/jira/browse/CASSANDRA-3843 > Project: Cassandra > Issue Type: Bug > Components: Core >Affects Versions: 1.0.0 >Reporter: Philip Andronov >Assignee: Jonathan Ellis > Fix For: 1.0.8 > > Attachments: 3843-v2.txt, 3843.txt > > > During reading with Quorum level and replication factor greater then 2, > Cassandra sends at least one ReadRepair, even if there is no need to do that. > With the fact that read requests await until ReadRepair will finish it slows > down requsts a lot, up to the Timeout :( > It seems that the problem has been introduced by the CASSANDRA-2494, > unfortunately I have no enought knowledge of Cassandra internals to fix the > problem and do not broke CASSANDRA-2494 functionality, so my report without a > patch. > Code explanations: > {code:title=RangeSliceResponseResolver.java|borderStyle=solid} > class RangeSliceResponseResolver { > // > private class Reducer extends > MergeIterator.Reducer, Row> > { > // > protected Row getReduced() > { > ColumnFamily resolved = versions.size() > 1 > ? > RowRepairResolver.resolveSuperset(versions) > : versions.get(0); > if (versions.size() < sources.size()) > { > for (InetAddress source : sources) > { > if (!versionSources.contains(source)) > { > > // [PA] Here we are adding null ColumnFamily. > // later it will be compared with the "desired" > // version and will give us "fake" difference which > // forces Cassandra to send ReadRepair to a given > source > versions.add(null); > versionSources.add(source); > } > } > } > // > if (resolved != null) > > repairResults.addAll(RowRepairResolver.scheduleRepairs(resolved, table, key, > versions, versionSources)); > // > } > } > } > {code} > {code:title=RowRepairResolver.java|borderStyle=solid} > public class RowRepairResolver extends AbstractRowResolver { > // > public static List scheduleRepairs(ColumnFamily resolved, > String table, DecoratedKey key, List versions, > List endpoints) > { > List results = new > ArrayList(versions.size()); > for (int i = 0; i < versions.size(); i++) > { > // On some iteration we have to compare null and resolved which > are obviously > // not equals, so it will fire a ReadRequest, however it is not > needed here > ColumnFamily diffCf = ColumnFamily.diff(versions.get(i), > resolved); > if (diffCf == null) > continue; > // > {code} > Imagine the following situation: > NodeA has X.1 // row X with the version 1 > NodeB has X.2 > NodeC has X.? // Unknown version, but because write was with Quorum it is 1 > or 2 > During the Quorum read from nodes A and B, Cassandra creates version 12 and > send ReadRepair, so now nodes has the following content: > NodeA has X.12 > NodeB has X.12 > which is correct, however Cassandra also will fire ReadRepair to NodeC. There > is no need to do that, the next consistent read have a chance to be served by > nodes {A, B} (no ReadRepair) or by pair {?, C} and in that case ReadRepair > will be fired and brings nodeC to the consistent state > Right now we are reading from the Index a lot and starting from some point in > time we are getting TimeOutException because cluster is overloaded by the > ReadRepairRequests *even* if all nodes has the same data :( -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3843) Unnecessary ReadRepair request during RangeScan
[ https://issues.apache.org/jira/browse/CASSANDRA-3843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13212679#comment-13212679 ] Brandon Williams commented on CASSANDRA-3843: - I'm unable to repro against 1.0 HEAD. > Unnecessary ReadRepair request during RangeScan > > > Key: CASSANDRA-3843 > URL: https://issues.apache.org/jira/browse/CASSANDRA-3843 > Project: Cassandra > Issue Type: Bug > Components: Core >Affects Versions: 1.0.0 >Reporter: Philip Andronov >Assignee: Jonathan Ellis > Fix For: 1.0.8 > > Attachments: 3843-v2.txt, 3843.txt > > > During reading with Quorum level and replication factor greater then 2, > Cassandra sends at least one ReadRepair, even if there is no need to do that. > With the fact that read requests await until ReadRepair will finish it slows > down requsts a lot, up to the Timeout :( > It seems that the problem has been introduced by the CASSANDRA-2494, > unfortunately I have no enought knowledge of Cassandra internals to fix the > problem and do not broke CASSANDRA-2494 functionality, so my report without a > patch. > Code explanations: > {code:title=RangeSliceResponseResolver.java|borderStyle=solid} > class RangeSliceResponseResolver { > // > private class Reducer extends > MergeIterator.Reducer, Row> > { > // > protected Row getReduced() > { > ColumnFamily resolved = versions.size() > 1 > ? > RowRepairResolver.resolveSuperset(versions) > : versions.get(0); > if (versions.size() < sources.size()) > { > for (InetAddress source : sources) > { > if (!versionSources.contains(source)) > { > > // [PA] Here we are adding null ColumnFamily. > // later it will be compared with the "desired" > // version and will give us "fake" difference which > // forces Cassandra to send ReadRepair to a given > source > versions.add(null); > versionSources.add(source); > } > } > } > // > if (resolved != null) > > repairResults.addAll(RowRepairResolver.scheduleRepairs(resolved, table, key, > versions, versionSources)); > // > } > } > } > {code} > {code:title=RowRepairResolver.java|borderStyle=solid} > public class RowRepairResolver extends AbstractRowResolver { > // > public static List scheduleRepairs(ColumnFamily resolved, > String table, DecoratedKey key, List versions, > List endpoints) > { > List results = new > ArrayList(versions.size()); > for (int i = 0; i < versions.size(); i++) > { > // On some iteration we have to compare null and resolved which > are obviously > // not equals, so it will fire a ReadRequest, however it is not > needed here > ColumnFamily diffCf = ColumnFamily.diff(versions.get(i), > resolved); > if (diffCf == null) > continue; > // > {code} > Imagine the following situation: > NodeA has X.1 // row X with the version 1 > NodeB has X.2 > NodeC has X.? // Unknown version, but because write was with Quorum it is 1 > or 2 > During the Quorum read from nodes A and B, Cassandra creates version 12 and > send ReadRepair, so now nodes has the following content: > NodeA has X.12 > NodeB has X.12 > which is correct, however Cassandra also will fire ReadRepair to NodeC. There > is no need to do that, the next consistent read have a chance to be served by > nodes {A, B} (no ReadRepair) or by pair {?, C} and in that case ReadRepair > will be fired and brings nodeC to the consistent state > Right now we are reading from the Index a lot and starting from some point in > time we are getting TimeOutException because cluster is overloaded by the > ReadRepairRequests *even* if all nodes has the same data :( -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3843) Unnecessary ReadRepair request during RangeScan
[ https://issues.apache.org/jira/browse/CASSANDRA-3843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13212617#comment-13212617 ] Jeremy Hanna commented on CASSANDRA-3843: - I did repairs on all the nodes and then compacts on all the nodes. Then I did a pig job to simply count the number of rows in the column family. Again I think the overall writes were reduced but there are writes going on. I need to turn debug on and do the same test again. I did the compactions at 6:42 and the range scans at 14:16: -rw-r--r-- 1 root root 40106228511 Feb 21 06:42 account_snapshot-g-792-Data.db -rw-r--r-- 1 root root 206884816 Feb 21 06:42 account_snapshot-g-792-Filter.db -rw-r--r-- 1 root root 2913796038 Feb 21 06:42 account_snapshot-g-792-Index.db -rw-r--r-- 1 root root4276 Feb 21 06:42 account_snapshot-g-792-Statistics.db -rw-r--r-- 1 root root 0 Feb 21 14:20 account_snapshot-g-793-Compacted -rw-r--r-- 1 root root 287286 Feb 21 14:16 account_snapshot-g-793-Data.db -rw-r--r-- 1 root root 976 Feb 21 14:16 account_snapshot-g-793-Filter.db -rw-r--r-- 1 root root 20857 Feb 21 14:16 account_snapshot-g-793-Index.db -rw-r--r-- 1 root root4276 Feb 21 14:16 account_snapshot-g-793-Statistics.db -rw-r--r-- 1 root root 0 Feb 21 14:20 account_snapshot-g-794-Compacted -rw-r--r-- 1 root root87770771 Feb 21 14:17 account_snapshot-g-794-Data.db -rw-r--r-- 1 root root 293944 Feb 21 14:17 account_snapshot-g-794-Filter.db -rw-r--r-- 1 root root 6377968 Feb 21 14:17 account_snapshot-g-794-Index.db -rw-r--r-- 1 root root4276 Feb 21 14:17 account_snapshot-g-794-Statistics.db -rw-r--r-- 1 root root 0 Feb 21 14:20 account_snapshot-g-795-Compacted -rw-r--r-- 1 root root78459166 Feb 21 14:17 account_snapshot-g-795-Data.db -rw-r--r-- 1 root root 262600 Feb 21 14:17 account_snapshot-g-795-Filter.db -rw-r--r-- 1 root root 5698156 Feb 21 14:17 account_snapshot-g-795-Index.db -rw-r--r-- 1 root root4276 Feb 21 14:17 account_snapshot-g-795-Statistics.db -rw-r--r-- 1 root root 0 Feb 21 14:20 account_snapshot-g-796-Compacted -rw-r--r-- 1 root root69838937 Feb 21 14:17 account_snapshot-g-796-Data.db -rw-r--r-- 1 root root 234000 Feb 21 14:17 account_snapshot-g-796-Filter.db -rw-r--r-- 1 root root 5077447 Feb 21 14:17 account_snapshot-g-796-Index.db -rw-r--r-- 1 root root4276 Feb 21 14:17 account_snapshot-g-796-Statistics.db -rw-r--r-- 1 root root 0 Feb 21 14:20 account_snapshot-g-797-Compacted -rw-r--r-- 1 root root68094433 Feb 21 14:17 account_snapshot-g-797-Data.db -rw-r--r-- 1 root root 227808 Feb 21 14:17 account_snapshot-g-797-Filter.db -rw-r--r-- 1 root root 4943098 Feb 21 14:17 account_snapshot-g-797-Index.db -rw-r--r-- 1 root root4276 Feb 21 14:17 account_snapshot-g-797-Statistics.db -rw-r--r-- 1 root root 304163307 Feb 21 14:20 account_snapshot-g-798-Data.db -rw-r--r-- 1 root root 1019776 Feb 21 14:20 account_snapshot-g-798-Filter.db -rw-r--r-- 1 root root22096669 Feb 21 14:20 account_snapshot-g-798-Index.db -rw-r--r-- 1 root root4276 Feb 21 14:20 account_snapshot-g-798-Statistics.db -rw-r--r-- 1 root root65874829 Feb 21 14:18 account_snapshot-g-799-Data.db -rw-r--r-- 1 root root 220192 Feb 21 14:18 account_snapshot-g-799-Filter.db -rw-r--r-- 1 root root 4777809 Feb 21 14:18 account_snapshot-g-799-Index.db -rw-r--r-- 1 root root4276 Feb 21 14:18 account_snapshot-g-799-Statistics.db -rw-r--r-- 1 root root 0 Feb 21 14:20 account_snapshot-g-800-Compacted -rw-r--r-- 1 root root50067413 Feb 21 14:18 account_snapshot-g-800-Data.db -rw-r--r-- 1 root root 167416 Feb 21 14:18 account_snapshot-g-800-Filter.db -rw-r--r-- 1 root root 3632313 Feb 21 14:18 account_snapshot-g-800-Index.db -rw-r--r-- 1 root root4276 Feb 21 14:18 account_snapshot-g-800-Statistics.db -rw-r--r-- 1 root root 0 Feb 21 14:20 account_snapshot-g-801-Compacted -rw-r--r-- 1 root root50575719 Feb 21 14:18 account_snapshot-g-801-Data.db -rw-r--r-- 1 root root 169160 Feb 21 14:18 account_snapshot-g-801-Filter.db -rw-r--r-- 1 root root 3669880 Feb 21 14:18 account_snapshot-g-801-Index.db -rw-r--r-- 1 root root4276 Feb 21 14:18 account_snapshot-g-801-Statistics.db -rw-r--r-- 1 root root 0 Feb 21 14:20 account_snapshot-g-802-Compacted -rw-r--r-- 1 root root41788766 Feb 21 14:19 account_snapshot-g-802-Data.db -rw-r--r-- 1 root root 139776 Feb 21 14:19 account_snapshot-g-802-Filter.db -rw-r--r-- 1 root root 3033069 Feb 21 14:19 account_snapshot-g-802-Index.db -rw-r--r-- 1 root root4276 Feb 21 14:19 account_snapshot-g-802-Statistics.db -rw-r--r-- 1 root root46547146 Feb 21 14:19 account_snapshot-g-803-Data.db -rw-r--r-- 1 root root 155720 Feb 21 14:19 account_snapshot-g-803-Filter.db -r
[jira] [Commented] (CASSANDRA-3843) Unnecessary ReadRepair request during RangeScan
[ https://issues.apache.org/jira/browse/CASSANDRA-3843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13207832#comment-13207832 ] Jeremy Hanna commented on CASSANDRA-3843: - I did patch with v2. Doing more testing today and it appears that there are writes occurring but it looks like a definite reduction. It could be a valid repair thing. I'll do some more testing and hopefully repair every node and compact every node and then do a scan across a large column family and see what happens. > Unnecessary ReadRepair request during RangeScan > > > Key: CASSANDRA-3843 > URL: https://issues.apache.org/jira/browse/CASSANDRA-3843 > Project: Cassandra > Issue Type: Bug > Components: Core >Affects Versions: 1.0.0 >Reporter: Philip Andronov >Assignee: Jonathan Ellis > Fix For: 1.0.8 > > Attachments: 3843-v2.txt, 3843.txt > > > During reading with Quorum level and replication factor greater then 2, > Cassandra sends at least one ReadRepair, even if there is no need to do that. > With the fact that read requests await until ReadRepair will finish it slows > down requsts a lot, up to the Timeout :( > It seems that the problem has been introduced by the CASSANDRA-2494, > unfortunately I have no enought knowledge of Cassandra internals to fix the > problem and do not broke CASSANDRA-2494 functionality, so my report without a > patch. > Code explanations: > {code:title=RangeSliceResponseResolver.java|borderStyle=solid} > class RangeSliceResponseResolver { > // > private class Reducer extends > MergeIterator.Reducer, Row> > { > // > protected Row getReduced() > { > ColumnFamily resolved = versions.size() > 1 > ? > RowRepairResolver.resolveSuperset(versions) > : versions.get(0); > if (versions.size() < sources.size()) > { > for (InetAddress source : sources) > { > if (!versionSources.contains(source)) > { > > // [PA] Here we are adding null ColumnFamily. > // later it will be compared with the "desired" > // version and will give us "fake" difference which > // forces Cassandra to send ReadRepair to a given > source > versions.add(null); > versionSources.add(source); > } > } > } > // > if (resolved != null) > > repairResults.addAll(RowRepairResolver.scheduleRepairs(resolved, table, key, > versions, versionSources)); > // > } > } > } > {code} > {code:title=RowRepairResolver.java|borderStyle=solid} > public class RowRepairResolver extends AbstractRowResolver { > // > public static List scheduleRepairs(ColumnFamily resolved, > String table, DecoratedKey key, List versions, > List endpoints) > { > List results = new > ArrayList(versions.size()); > for (int i = 0; i < versions.size(); i++) > { > // On some iteration we have to compare null and resolved which > are obviously > // not equals, so it will fire a ReadRequest, however it is not > needed here > ColumnFamily diffCf = ColumnFamily.diff(versions.get(i), > resolved); > if (diffCf == null) > continue; > // > {code} > Imagine the following situation: > NodeA has X.1 // row X with the version 1 > NodeB has X.2 > NodeC has X.? // Unknown version, but because write was with Quorum it is 1 > or 2 > During the Quorum read from nodes A and B, Cassandra creates version 12 and > send ReadRepair, so now nodes has the following content: > NodeA has X.12 > NodeB has X.12 > which is correct, however Cassandra also will fire ReadRepair to NodeC. There > is no need to do that, the next consistent read have a chance to be served by > nodes {A, B} (no ReadRepair) or by pair {?, C} and in that case ReadRepair > will be fired and brings nodeC to the consistent state > Right now we are reading from the Index a lot and starting from some point in > time we are getting TimeOutException because cluster is overloaded by the > ReadRepairRequests *even* if all nodes has the same data :( -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3843) Unnecessary ReadRepair request during RangeScan
[ https://issues.apache.org/jira/browse/CASSANDRA-3843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13207828#comment-13207828 ] Jonathan Ellis commented on CASSANDRA-3843: --- ... You did patch with v2, right? > Unnecessary ReadRepair request during RangeScan > > > Key: CASSANDRA-3843 > URL: https://issues.apache.org/jira/browse/CASSANDRA-3843 > Project: Cassandra > Issue Type: Bug > Components: Core >Affects Versions: 1.0.0 >Reporter: Philip Andronov >Assignee: Jonathan Ellis > Fix For: 1.0.8 > > Attachments: 3843-v2.txt, 3843.txt > > > During reading with Quorum level and replication factor greater then 2, > Cassandra sends at least one ReadRepair, even if there is no need to do that. > With the fact that read requests await until ReadRepair will finish it slows > down requsts a lot, up to the Timeout :( > It seems that the problem has been introduced by the CASSANDRA-2494, > unfortunately I have no enought knowledge of Cassandra internals to fix the > problem and do not broke CASSANDRA-2494 functionality, so my report without a > patch. > Code explanations: > {code:title=RangeSliceResponseResolver.java|borderStyle=solid} > class RangeSliceResponseResolver { > // > private class Reducer extends > MergeIterator.Reducer, Row> > { > // > protected Row getReduced() > { > ColumnFamily resolved = versions.size() > 1 > ? > RowRepairResolver.resolveSuperset(versions) > : versions.get(0); > if (versions.size() < sources.size()) > { > for (InetAddress source : sources) > { > if (!versionSources.contains(source)) > { > > // [PA] Here we are adding null ColumnFamily. > // later it will be compared with the "desired" > // version and will give us "fake" difference which > // forces Cassandra to send ReadRepair to a given > source > versions.add(null); > versionSources.add(source); > } > } > } > // > if (resolved != null) > > repairResults.addAll(RowRepairResolver.scheduleRepairs(resolved, table, key, > versions, versionSources)); > // > } > } > } > {code} > {code:title=RowRepairResolver.java|borderStyle=solid} > public class RowRepairResolver extends AbstractRowResolver { > // > public static List scheduleRepairs(ColumnFamily resolved, > String table, DecoratedKey key, List versions, > List endpoints) > { > List results = new > ArrayList(versions.size()); > for (int i = 0; i < versions.size(); i++) > { > // On some iteration we have to compare null and resolved which > are obviously > // not equals, so it will fire a ReadRequest, however it is not > needed here > ColumnFamily diffCf = ColumnFamily.diff(versions.get(i), > resolved); > if (diffCf == null) > continue; > // > {code} > Imagine the following situation: > NodeA has X.1 // row X with the version 1 > NodeB has X.2 > NodeC has X.? // Unknown version, but because write was with Quorum it is 1 > or 2 > During the Quorum read from nodes A and B, Cassandra creates version 12 and > send ReadRepair, so now nodes has the following content: > NodeA has X.12 > NodeB has X.12 > which is correct, however Cassandra also will fire ReadRepair to NodeC. There > is no need to do that, the next consistent read have a chance to be served by > nodes {A, B} (no ReadRepair) or by pair {?, C} and in that case ReadRepair > will be fired and brings nodeC to the consistent state > Right now we are reading from the Index a lot and starting from some point in > time we are getting TimeOutException because cluster is overloaded by the > ReadRepairRequests *even* if all nodes has the same data :( -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3843) Unnecessary ReadRepair request during RangeScan
[ https://issues.apache.org/jira/browse/CASSANDRA-3843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13207827#comment-13207827 ] Jonathan Ellis commented on CASSANDRA-3843: --- I suggest testing with a single range scan at debug level. Too much hay to see the needle when you're doing 100s or 1000s of scans. > Unnecessary ReadRepair request during RangeScan > > > Key: CASSANDRA-3843 > URL: https://issues.apache.org/jira/browse/CASSANDRA-3843 > Project: Cassandra > Issue Type: Bug > Components: Core >Affects Versions: 1.0.0 >Reporter: Philip Andronov >Assignee: Jonathan Ellis > Fix For: 1.0.8 > > Attachments: 3843-v2.txt, 3843.txt > > > During reading with Quorum level and replication factor greater then 2, > Cassandra sends at least one ReadRepair, even if there is no need to do that. > With the fact that read requests await until ReadRepair will finish it slows > down requsts a lot, up to the Timeout :( > It seems that the problem has been introduced by the CASSANDRA-2494, > unfortunately I have no enought knowledge of Cassandra internals to fix the > problem and do not broke CASSANDRA-2494 functionality, so my report without a > patch. > Code explanations: > {code:title=RangeSliceResponseResolver.java|borderStyle=solid} > class RangeSliceResponseResolver { > // > private class Reducer extends > MergeIterator.Reducer, Row> > { > // > protected Row getReduced() > { > ColumnFamily resolved = versions.size() > 1 > ? > RowRepairResolver.resolveSuperset(versions) > : versions.get(0); > if (versions.size() < sources.size()) > { > for (InetAddress source : sources) > { > if (!versionSources.contains(source)) > { > > // [PA] Here we are adding null ColumnFamily. > // later it will be compared with the "desired" > // version and will give us "fake" difference which > // forces Cassandra to send ReadRepair to a given > source > versions.add(null); > versionSources.add(source); > } > } > } > // > if (resolved != null) > > repairResults.addAll(RowRepairResolver.scheduleRepairs(resolved, table, key, > versions, versionSources)); > // > } > } > } > {code} > {code:title=RowRepairResolver.java|borderStyle=solid} > public class RowRepairResolver extends AbstractRowResolver { > // > public static List scheduleRepairs(ColumnFamily resolved, > String table, DecoratedKey key, List versions, > List endpoints) > { > List results = new > ArrayList(versions.size()); > for (int i = 0; i < versions.size(); i++) > { > // On some iteration we have to compare null and resolved which > are obviously > // not equals, so it will fire a ReadRequest, however it is not > needed here > ColumnFamily diffCf = ColumnFamily.diff(versions.get(i), > resolved); > if (diffCf == null) > continue; > // > {code} > Imagine the following situation: > NodeA has X.1 // row X with the version 1 > NodeB has X.2 > NodeC has X.? // Unknown version, but because write was with Quorum it is 1 > or 2 > During the Quorum read from nodes A and B, Cassandra creates version 12 and > send ReadRepair, so now nodes has the following content: > NodeA has X.12 > NodeB has X.12 > which is correct, however Cassandra also will fire ReadRepair to NodeC. There > is no need to do that, the next consistent read have a chance to be served by > nodes {A, B} (no ReadRepair) or by pair {?, C} and in that case ReadRepair > will be fired and brings nodeC to the consistent state > Right now we are reading from the Index a lot and starting from some point in > time we are getting TimeOutException because cluster is overloaded by the > ReadRepairRequests *even* if all nodes has the same data :( -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3843) Unnecessary ReadRepair request during RangeScan
[ https://issues.apache.org/jira/browse/CASSANDRA-3843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13207577#comment-13207577 ] Jeremy Hanna commented on CASSANDRA-3843: - I patched the version of 0.8.4 that we use with the change. I applied it to all of our staging nodes. However, the problem with writes on the column family it was simply doing range scans of still persists. I had major compacted a column family on all of the nodes, then did a simple pig job to read the contents of that CF, then I got a lot of minor compactions for that column family. > Unnecessary ReadRepair request during RangeScan > > > Key: CASSANDRA-3843 > URL: https://issues.apache.org/jira/browse/CASSANDRA-3843 > Project: Cassandra > Issue Type: Bug > Components: Core >Affects Versions: 1.0.0 >Reporter: Philip Andronov >Assignee: Jonathan Ellis > Fix For: 1.0.8 > > Attachments: 3843-v2.txt, 3843.txt > > > During reading with Quorum level and replication factor greater then 2, > Cassandra sends at least one ReadRepair, even if there is no need to do that. > With the fact that read requests await until ReadRepair will finish it slows > down requsts a lot, up to the Timeout :( > It seems that the problem has been introduced by the CASSANDRA-2494, > unfortunately I have no enought knowledge of Cassandra internals to fix the > problem and do not broke CASSANDRA-2494 functionality, so my report without a > patch. > Code explanations: > {code:title=RangeSliceResponseResolver.java|borderStyle=solid} > class RangeSliceResponseResolver { > // > private class Reducer extends > MergeIterator.Reducer, Row> > { > // > protected Row getReduced() > { > ColumnFamily resolved = versions.size() > 1 > ? > RowRepairResolver.resolveSuperset(versions) > : versions.get(0); > if (versions.size() < sources.size()) > { > for (InetAddress source : sources) > { > if (!versionSources.contains(source)) > { > > // [PA] Here we are adding null ColumnFamily. > // later it will be compared with the "desired" > // version and will give us "fake" difference which > // forces Cassandra to send ReadRepair to a given > source > versions.add(null); > versionSources.add(source); > } > } > } > // > if (resolved != null) > > repairResults.addAll(RowRepairResolver.scheduleRepairs(resolved, table, key, > versions, versionSources)); > // > } > } > } > {code} > {code:title=RowRepairResolver.java|borderStyle=solid} > public class RowRepairResolver extends AbstractRowResolver { > // > public static List scheduleRepairs(ColumnFamily resolved, > String table, DecoratedKey key, List versions, > List endpoints) > { > List results = new > ArrayList(versions.size()); > for (int i = 0; i < versions.size(); i++) > { > // On some iteration we have to compare null and resolved which > are obviously > // not equals, so it will fire a ReadRequest, however it is not > needed here > ColumnFamily diffCf = ColumnFamily.diff(versions.get(i), > resolved); > if (diffCf == null) > continue; > // > {code} > Imagine the following situation: > NodeA has X.1 // row X with the version 1 > NodeB has X.2 > NodeC has X.? // Unknown version, but because write was with Quorum it is 1 > or 2 > During the Quorum read from nodes A and B, Cassandra creates version 12 and > send ReadRepair, so now nodes has the following content: > NodeA has X.12 > NodeB has X.12 > which is correct, however Cassandra also will fire ReadRepair to NodeC. There > is no need to do that, the next consistent read have a chance to be served by > nodes {A, B} (no ReadRepair) or by pair {?, C} and in that case ReadRepair > will be fired and brings nodeC to the consistent state > Right now we are reading from the Index a lot and starting from some point in > time we are getting TimeOutException because cluster is overloaded by the > ReadRepairRequests *even* if all nodes has the same data :( -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see:
[jira] [Commented] (CASSANDRA-3843) Unnecessary ReadRepair request during RangeScan
[ https://issues.apache.org/jira/browse/CASSANDRA-3843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13207050#comment-13207050 ] Jonathan Ellis commented on CASSANDRA-3843: --- Looks to me like the 1.0 code changes from v2 apply cleanly to 0.8. (CHANGES diff does not apply but can be ignored.) > Unnecessary ReadRepair request during RangeScan > > > Key: CASSANDRA-3843 > URL: https://issues.apache.org/jira/browse/CASSANDRA-3843 > Project: Cassandra > Issue Type: Bug > Components: Core >Affects Versions: 1.0.0 >Reporter: Philip Andronov >Assignee: Jonathan Ellis > Fix For: 1.0.8 > > Attachments: 3843-v2.txt, 3843.txt > > > During reading with Quorum level and replication factor greater then 2, > Cassandra sends at least one ReadRepair, even if there is no need to do that. > With the fact that read requests await until ReadRepair will finish it slows > down requsts a lot, up to the Timeout :( > It seems that the problem has been introduced by the CASSANDRA-2494, > unfortunately I have no enought knowledge of Cassandra internals to fix the > problem and do not broke CASSANDRA-2494 functionality, so my report without a > patch. > Code explanations: > {code:title=RangeSliceResponseResolver.java|borderStyle=solid} > class RangeSliceResponseResolver { > // > private class Reducer extends > MergeIterator.Reducer, Row> > { > // > protected Row getReduced() > { > ColumnFamily resolved = versions.size() > 1 > ? > RowRepairResolver.resolveSuperset(versions) > : versions.get(0); > if (versions.size() < sources.size()) > { > for (InetAddress source : sources) > { > if (!versionSources.contains(source)) > { > > // [PA] Here we are adding null ColumnFamily. > // later it will be compared with the "desired" > // version and will give us "fake" difference which > // forces Cassandra to send ReadRepair to a given > source > versions.add(null); > versionSources.add(source); > } > } > } > // > if (resolved != null) > > repairResults.addAll(RowRepairResolver.scheduleRepairs(resolved, table, key, > versions, versionSources)); > // > } > } > } > {code} > {code:title=RowRepairResolver.java|borderStyle=solid} > public class RowRepairResolver extends AbstractRowResolver { > // > public static List scheduleRepairs(ColumnFamily resolved, > String table, DecoratedKey key, List versions, > List endpoints) > { > List results = new > ArrayList(versions.size()); > for (int i = 0; i < versions.size(); i++) > { > // On some iteration we have to compare null and resolved which > are obviously > // not equals, so it will fire a ReadRequest, however it is not > needed here > ColumnFamily diffCf = ColumnFamily.diff(versions.get(i), > resolved); > if (diffCf == null) > continue; > // > {code} > Imagine the following situation: > NodeA has X.1 // row X with the version 1 > NodeB has X.2 > NodeC has X.? // Unknown version, but because write was with Quorum it is 1 > or 2 > During the Quorum read from nodes A and B, Cassandra creates version 12 and > send ReadRepair, so now nodes has the following content: > NodeA has X.12 > NodeB has X.12 > which is correct, however Cassandra also will fire ReadRepair to NodeC. There > is no need to do that, the next consistent read have a chance to be served by > nodes {A, B} (no ReadRepair) or by pair {?, C} and in that case ReadRepair > will be fired and brings nodeC to the consistent state > Right now we are reading from the Index a lot and starting from some point in > time we are getting TimeOutException because cluster is overloaded by the > ReadRepairRequests *even* if all nodes has the same data :( -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3843) Unnecessary ReadRepair request during RangeScan
[ https://issues.apache.org/jira/browse/CASSANDRA-3843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13207046#comment-13207046 ] Jonathan Ellis commented on CASSANDRA-3843: --- It's a relatively small patch, but StorageProxy and its callbacks can be fragile... I almost didn't commit it to 1.0 either. Tell you what though, I'll post a backported patch here and if you want you can run with it. :) > Unnecessary ReadRepair request during RangeScan > > > Key: CASSANDRA-3843 > URL: https://issues.apache.org/jira/browse/CASSANDRA-3843 > Project: Cassandra > Issue Type: Bug > Components: Core >Affects Versions: 1.0.0 >Reporter: Philip Andronov >Assignee: Jonathan Ellis > Fix For: 1.0.8 > > Attachments: 3843-v2.txt, 3843.txt > > > During reading with Quorum level and replication factor greater then 2, > Cassandra sends at least one ReadRepair, even if there is no need to do that. > With the fact that read requests await until ReadRepair will finish it slows > down requsts a lot, up to the Timeout :( > It seems that the problem has been introduced by the CASSANDRA-2494, > unfortunately I have no enought knowledge of Cassandra internals to fix the > problem and do not broke CASSANDRA-2494 functionality, so my report without a > patch. > Code explanations: > {code:title=RangeSliceResponseResolver.java|borderStyle=solid} > class RangeSliceResponseResolver { > // > private class Reducer extends > MergeIterator.Reducer, Row> > { > // > protected Row getReduced() > { > ColumnFamily resolved = versions.size() > 1 > ? > RowRepairResolver.resolveSuperset(versions) > : versions.get(0); > if (versions.size() < sources.size()) > { > for (InetAddress source : sources) > { > if (!versionSources.contains(source)) > { > > // [PA] Here we are adding null ColumnFamily. > // later it will be compared with the "desired" > // version and will give us "fake" difference which > // forces Cassandra to send ReadRepair to a given > source > versions.add(null); > versionSources.add(source); > } > } > } > // > if (resolved != null) > > repairResults.addAll(RowRepairResolver.scheduleRepairs(resolved, table, key, > versions, versionSources)); > // > } > } > } > {code} > {code:title=RowRepairResolver.java|borderStyle=solid} > public class RowRepairResolver extends AbstractRowResolver { > // > public static List scheduleRepairs(ColumnFamily resolved, > String table, DecoratedKey key, List versions, > List endpoints) > { > List results = new > ArrayList(versions.size()); > for (int i = 0; i < versions.size(); i++) > { > // On some iteration we have to compare null and resolved which > are obviously > // not equals, so it will fire a ReadRequest, however it is not > needed here > ColumnFamily diffCf = ColumnFamily.diff(versions.get(i), > resolved); > if (diffCf == null) > continue; > // > {code} > Imagine the following situation: > NodeA has X.1 // row X with the version 1 > NodeB has X.2 > NodeC has X.? // Unknown version, but because write was with Quorum it is 1 > or 2 > During the Quorum read from nodes A and B, Cassandra creates version 12 and > send ReadRepair, so now nodes has the following content: > NodeA has X.12 > NodeB has X.12 > which is correct, however Cassandra also will fire ReadRepair to NodeC. There > is no need to do that, the next consistent read have a chance to be served by > nodes {A, B} (no ReadRepair) or by pair {?, C} and in that case ReadRepair > will be fired and brings nodeC to the consistent state > Right now we are reading from the Index a lot and starting from some point in > time we are getting TimeOutException because cluster is overloaded by the > ReadRepairRequests *even* if all nodes has the same data :( -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3843) Unnecessary ReadRepair request during RangeScan
[ https://issues.apache.org/jira/browse/CASSANDRA-3843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13206881#comment-13206881 ] Jeremy Hanna commented on CASSANDRA-3843: - We'll be upgrading to 1.0.8 as soon as we can, but this seems like a significant issue for anyone doing range scans - does it make sense to backport to 0.8.x? > Unnecessary ReadRepair request during RangeScan > > > Key: CASSANDRA-3843 > URL: https://issues.apache.org/jira/browse/CASSANDRA-3843 > Project: Cassandra > Issue Type: Bug > Components: Core >Affects Versions: 1.0.0 >Reporter: Philip Andronov >Assignee: Jonathan Ellis > Fix For: 1.0.8 > > Attachments: 3843-v2.txt, 3843.txt > > > During reading with Quorum level and replication factor greater then 2, > Cassandra sends at least one ReadRepair, even if there is no need to do that. > With the fact that read requests await until ReadRepair will finish it slows > down requsts a lot, up to the Timeout :( > It seems that the problem has been introduced by the CASSANDRA-2494, > unfortunately I have no enought knowledge of Cassandra internals to fix the > problem and do not broke CASSANDRA-2494 functionality, so my report without a > patch. > Code explanations: > {code:title=RangeSliceResponseResolver.java|borderStyle=solid} > class RangeSliceResponseResolver { > // > private class Reducer extends > MergeIterator.Reducer, Row> > { > // > protected Row getReduced() > { > ColumnFamily resolved = versions.size() > 1 > ? > RowRepairResolver.resolveSuperset(versions) > : versions.get(0); > if (versions.size() < sources.size()) > { > for (InetAddress source : sources) > { > if (!versionSources.contains(source)) > { > > // [PA] Here we are adding null ColumnFamily. > // later it will be compared with the "desired" > // version and will give us "fake" difference which > // forces Cassandra to send ReadRepair to a given > source > versions.add(null); > versionSources.add(source); > } > } > } > // > if (resolved != null) > > repairResults.addAll(RowRepairResolver.scheduleRepairs(resolved, table, key, > versions, versionSources)); > // > } > } > } > {code} > {code:title=RowRepairResolver.java|borderStyle=solid} > public class RowRepairResolver extends AbstractRowResolver { > // > public static List scheduleRepairs(ColumnFamily resolved, > String table, DecoratedKey key, List versions, > List endpoints) > { > List results = new > ArrayList(versions.size()); > for (int i = 0; i < versions.size(); i++) > { > // On some iteration we have to compare null and resolved which > are obviously > // not equals, so it will fire a ReadRequest, however it is not > needed here > ColumnFamily diffCf = ColumnFamily.diff(versions.get(i), > resolved); > if (diffCf == null) > continue; > // > {code} > Imagine the following situation: > NodeA has X.1 // row X with the version 1 > NodeB has X.2 > NodeC has X.? // Unknown version, but because write was with Quorum it is 1 > or 2 > During the Quorum read from nodes A and B, Cassandra creates version 12 and > send ReadRepair, so now nodes has the following content: > NodeA has X.12 > NodeB has X.12 > which is correct, however Cassandra also will fire ReadRepair to NodeC. There > is no need to do that, the next consistent read have a chance to be served by > nodes {A, B} (no ReadRepair) or by pair {?, C} and in that case ReadRepair > will be fired and brings nodeC to the consistent state > Right now we are reading from the Index a lot and starting from some point in > time we are getting TimeOutException because cluster is overloaded by the > ReadRepairRequests *even* if all nodes has the same data :( -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3843) Unnecessary ReadRepair request during RangeScan
[ https://issues.apache.org/jira/browse/CASSANDRA-3843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13204600#comment-13204600 ] Philip Andronov commented on CASSANDRA-3843: > The null version was added for CASSANDRA-2680. Oh, good point. Sorry, I've should pay more attention on git history, not only on annotations :) Anyway, thanks for the patch, now we could apply correct patch on our servers. > Unnecessary ReadRepair request during RangeScan > > > Key: CASSANDRA-3843 > URL: https://issues.apache.org/jira/browse/CASSANDRA-3843 > Project: Cassandra > Issue Type: Bug > Components: Core >Affects Versions: 1.0.0 >Reporter: Philip Andronov >Assignee: Jonathan Ellis > Fix For: 1.0.8 > > Attachments: 3843.txt > > > During reading with Quorum level and replication factor greater then 2, > Cassandra sends at least one ReadRepair, even if there is no need to do that. > With the fact that read requests await until ReadRepair will finish it slows > down requsts a lot, up to the Timeout :( > It seems that the problem has been introduced by the CASSANDRA-2494, > unfortunately I have no enought knowledge of Cassandra internals to fix the > problem and do not broke CASSANDRA-2494 functionality, so my report without a > patch. > Code explanations: > {code:title=RangeSliceResponseResolver.java|borderStyle=solid} > class RangeSliceResponseResolver { > // > private class Reducer extends > MergeIterator.Reducer, Row> > { > // > protected Row getReduced() > { > ColumnFamily resolved = versions.size() > 1 > ? > RowRepairResolver.resolveSuperset(versions) > : versions.get(0); > if (versions.size() < sources.size()) > { > for (InetAddress source : sources) > { > if (!versionSources.contains(source)) > { > > // [PA] Here we are adding null ColumnFamily. > // later it will be compared with the "desired" > // version and will give us "fake" difference which > // forces Cassandra to send ReadRepair to a given > source > versions.add(null); > versionSources.add(source); > } > } > } > // > if (resolved != null) > > repairResults.addAll(RowRepairResolver.scheduleRepairs(resolved, table, key, > versions, versionSources)); > // > } > } > } > {code} > {code:title=RowRepairResolver.java|borderStyle=solid} > public class RowRepairResolver extends AbstractRowResolver { > // > public static List scheduleRepairs(ColumnFamily resolved, > String table, DecoratedKey key, List versions, > List endpoints) > { > List results = new > ArrayList(versions.size()); > for (int i = 0; i < versions.size(); i++) > { > // On some iteration we have to compare null and resolved which > are obviously > // not equals, so it will fire a ReadRequest, however it is not > needed here > ColumnFamily diffCf = ColumnFamily.diff(versions.get(i), > resolved); > if (diffCf == null) > continue; > // > {code} > Imagine the following situation: > NodeA has X.1 // row X with the version 1 > NodeB has X.2 > NodeC has X.? // Unknown version, but because write was with Quorum it is 1 > or 2 > During the Quorum read from nodes A and B, Cassandra creates version 12 and > send ReadRepair, so now nodes has the following content: > NodeA has X.12 > NodeB has X.12 > which is correct, however Cassandra also will fire ReadRepair to NodeC. There > is no need to do that, the next consistent read have a chance to be served by > nodes {A, B} (no ReadRepair) or by pair {?, C} and in that case ReadRepair > will be fired and brings nodeC to the consistent state > Right now we are reading from the Index a lot and starting from some point in > time we are getting TimeOutException because cluster is overloaded by the > ReadRepairRequests *even* if all nodes has the same data :( -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (CASSANDRA-3843) Unnecessary ReadRepair request during RangeScan
[ https://issues.apache.org/jira/browse/CASSANDRA-3843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13204269#comment-13204269 ] Vijay commented on CASSANDRA-3843: -- +1 > Unnecessary ReadRepair request during RangeScan > > > Key: CASSANDRA-3843 > URL: https://issues.apache.org/jira/browse/CASSANDRA-3843 > Project: Cassandra > Issue Type: Bug > Components: Core >Affects Versions: 1.0.0 >Reporter: Philip Andronov >Assignee: Jonathan Ellis > Fix For: 1.0.8 > > Attachments: 3843.txt > > > During reading with Quorum level and replication factor greater then 2, > Cassandra sends at least one ReadRepair, even if there is no need to do that. > With the fact that read requests await until ReadRepair will finish it slows > down requsts a lot, up to the Timeout :( > It seems that the problem has been introduced by the CASSANDRA-2494, > unfortunately I have no enought knowledge of Cassandra internals to fix the > problem and do not broke CASSANDRA-2494 functionality, so my report without a > patch. > Code explanations: > {code:title=RangeSliceResponseResolver.java|borderStyle=solid} > class RangeSliceResponseResolver { > // > private class Reducer extends > MergeIterator.Reducer, Row> > { > // > protected Row getReduced() > { > ColumnFamily resolved = versions.size() > 1 > ? > RowRepairResolver.resolveSuperset(versions) > : versions.get(0); > if (versions.size() < sources.size()) > { > for (InetAddress source : sources) > { > if (!versionSources.contains(source)) > { > > // [PA] Here we are adding null ColumnFamily. > // later it will be compared with the "desired" > // version and will give us "fake" difference which > // forces Cassandra to send ReadRepair to a given > source > versions.add(null); > versionSources.add(source); > } > } > } > // > if (resolved != null) > > repairResults.addAll(RowRepairResolver.scheduleRepairs(resolved, table, key, > versions, versionSources)); > // > } > } > } > {code} > {code:title=RowRepairResolver.java|borderStyle=solid} > public class RowRepairResolver extends AbstractRowResolver { > // > public static List scheduleRepairs(ColumnFamily resolved, > String table, DecoratedKey key, List versions, > List endpoints) > { > List results = new > ArrayList(versions.size()); > for (int i = 0; i < versions.size(); i++) > { > // On some iteration we have to compare null and resolved which > are obviously > // not equals, so it will fire a ReadRequest, however it is not > needed here > ColumnFamily diffCf = ColumnFamily.diff(versions.get(i), > resolved); > if (diffCf == null) > continue; > // > {code} > Imagine the following situation: > NodeA has X.1 // row X with the version 1 > NodeB has X.2 > NodeC has X.? // Unknown version, but because write was with Quorum it is 1 > or 2 > During the Quorum read from nodes A and B, Cassandra creates version 12 and > send ReadRepair, so now nodes has the following content: > NodeA has X.12 > NodeB has X.12 > which is correct, however Cassandra also will fire ReadRepair to NodeC. There > is no need to do that, the next consistent read have a chance to be served by > nodes {A, B} (no ReadRepair) or by pair {?, C} and in that case ReadRepair > will be fired and brings nodeC to the consistent state > Right now we are reading from the Index a lot and starting from some point in > time we are getting TimeOutException because cluster is overloaded by the > ReadRepairRequests *even* if all nodes has the same data :( -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira