[ https://issues.apache.org/jira/browse/CASSANDRA-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16574643#comment-16574643 ]
Alex Petrov edited comment on CASSANDRA-10726 at 8/10/18 7:53 AM: ------------------------------------------------------------------ It seems that there still might be a problem with {{Accumulator}} during {{BlockingReadRepair}}. {{DataResolver}} is created in {{BlockingReadRepair#startRepair}}, which is gets {{allReplicas}} from {{ReadCallback#endpoints}}, which gets it through {{AbstractReadExecutor#getReadExecutor}}, where they represent {{consistencyLevel.filterForQuery(keyspace, allReplicas)}}. This means that, when we're sending more data requests, we will call {{getLiveSortedEndpoints}} and can end up with more nodes, but since {{Accumulator}} was initialised with just "target" nodes, if we keep getting responses (e.g. if the node was slow, not dead, which is more often the case), {{Accumulator}} will overflow. Unfortunately, testing with RF3 won't reveal that. Some comments on the patch itself: * We might want to simplify the code a little by not caching versions [here|https://github.com/apache/cassandra/compare/trunk...bdeggleston:10726-v4#diff-b677a5a6a3f1a90a889bcf906c1f8001R211]. * {{BlockingDigestRepair}} has methods anmed [awaitRepair|https://github.com/apache/cassandra/compare/trunk...bdeggleston:10726-v4#diff-0246c72855070863c2fdbee6d97f494dR123] and [awaitRepairs|https://github.com/apache/cassandra/compare/trunk...bdeggleston:10726-v4#diff-0246c72855070863c2fdbee6d97f494dR174], which might be a bit counter-intuitive. * Partition range code path is also affected (since {{StorageProxy#fetchRows}} is changed). It'd be great to have dtests for partition ranges as well. was (Author: ifesdjeen): It seems that there still might be a problem with {{Accumulator}} during {{BlockingReadRepair}}. {{DataResolver}} is created in {{BlockingReadRepair#startRepair}}, which is gets {{allReplicas}} from {{ReadCallback#endpoints}}, which gets it through {{AbstractReadExecutor#getReadExecutor}}, where they represent {{consistencyLevel.filterForQuery(keyspace, allReplicas)}}. This means that, when we're sending more data requests, we will call {{getLiveSortedEndpoints}} and can end up with more nodes, but since {{Accumulator}} was initialised with just "target" nodes, if we keep getting responses (e.g. if the node was slow, not dead, which is more often the case), {{Accumulator}} will overflow. Unfortunately, testing with RF3 won't reveal that. > Read repair inserts should not be blocking > ------------------------------------------ > > Key: CASSANDRA-10726 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10726 > Project: Cassandra > Issue Type: Improvement > Components: Coordination > Reporter: Richard Low > Assignee: Blake Eggleston > Priority: Major > Fix For: 4.x > > > Today, if there’s a digest mismatch in a foreground read repair, the insert > to update out of date replicas is blocking. This means, if it fails, the read > fails with a timeout. If a node is dropping writes (maybe it is overloaded or > the mutation stage is backed up for some other reason), all reads to a > replica set could fail. Further, replicas dropping writes get more out of > sync so will require more read repair. > The comment on the code for why the writes are blocking is: > {code} > // wait for the repair writes to be acknowledged, to minimize impact on any > replica that's > // behind on writes in case the out-of-sync row is read multiple times in > quick succession > {code} > but the bad side effect is that reads timeout. Either the writes should not > be blocking or we should return success for the read even if the write times > out. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org