[ https://issues.apache.org/jira/browse/CASSANDRA-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15872879#comment-15872879 ]
ASF GitHub Bot commented on CASSANDRA-10726: -------------------------------------------- GitHub user xiaolong302 opened a pull request: https://github.com/apache/cassandra/pull/94 CASSANDRA-10726: Read repair inserts should use speculative retry 1. do an extra read repair retry to only guarantee “monotonic quorum read”. Here “quorum” means majority of nodes among replicas 2. only block what is needed for resolving the digest mismatch no matter whether it’s speculative retry or read repair chance. You can merge this pull request into a Git repository by running: $ git pull https://github.com/xiaolong302/cassandra CASSANDRA-10726 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/cassandra/pull/94.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #94 ---- commit a587c20e82ffc4aa7c4a3cb1468551b255fc7f71 Author: Xiaolong Jiang <xiaolong_ji...@apple.com> Date: 2017-01-17T05:31:06Z Cass: CASSANDRA-10726: Read repair inserts should use speculative retry 1. do an extra read repair retry to only guarantee “monotonic quorum read”. Here “quorum” means majority of nodes among replicas 2. only block what is needed for resolving the digest mismatch no matter whether it’s speculative retry or read repair chance. ---- > Read repair inserts should not be blocking > ------------------------------------------ > > Key: CASSANDRA-10726 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10726 > Project: Cassandra > Issue Type: Improvement > Components: Coordination > Reporter: Richard Low > Assignee: Xiaolong Jiang > > Today, if there’s a digest mismatch in a foreground read repair, the insert > to update out of date replicas is blocking. This means, if it fails, the read > fails with a timeout. If a node is dropping writes (maybe it is overloaded or > the mutation stage is backed up for some other reason), all reads to a > replica set could fail. Further, replicas dropping writes get more out of > sync so will require more read repair. > The comment on the code for why the writes are blocking is: > {code} > // wait for the repair writes to be acknowledged, to minimize impact on any > replica that's > // behind on writes in case the out-of-sync row is read multiple times in > quick succession > {code} > but the bad side effect is that reads timeout. Either the writes should not > be blocking or we should return success for the read even if the write times > out. -- This message was sent by Atlassian JIRA (v6.3.15#6346)