[ 
https://issues.apache.org/jira/browse/CASSANDRA-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15925260#comment-15925260
 ] 

Jason Brown commented on CASSANDRA-10726:
-----------------------------------------

[~xiaolong...@gmail.com] Thanks for the comments they add more context to the 
patch. Here's my initial round of comments on your patch:

- general nit: code style is incorrect wrt braces
- rename {{ConsistencyLevel#isMajority}} to {{ConsistencyLevel#isQuorum}} as 
it's more in line with our general naming conventions
- {{AbstractReadExecutor#getReadRepairExtra}} 
-- function needs a javadoc comment
-- rename {{#readRepairExtra}} to ... ??? "spareReadRepairNode". I would like 
to make this Optional<InetAddress> as it makes it clearer that it's not 
required (rather than piggybacking on "null" to indicate that sematic meaning).
-- {{readRepairExtra = allSortedReplicas.get(blockForReplicas.size());}} - this 
seems unsafe as it assumes the {{List}}s are ordered the same.

- {{DataResolver}}
-- {{#reppairExtra}} is misspelled. Further, make it consistent with whatever 
we name it in {{AbstractReadExecutor}}
-- {{#close}} - don't add the exception to the {{Tracing.trace()}} call, and 
it's debatable if you need it on the {{logger.debug}} line, as well.
-- {{repairResponseRequestMap}} and {{#responseCntSnapshot}} - I don't think 
you actually need these. You just care about the number of replicas that have 
responded, and you can just call {{responses.size()}} where you actually care 
about (in {{#waitRepairToFinishWithPossibleRetry()}}.
-- {{#distinctHostNum()}} I'm not sure why you need this as we should only send 
the message to distinct hosts. I think you can just use {{results.size()}} 
instead.
-- {{#waitRepairToFinishWithPossibleRetry}}
--- wrt the block starting at line 224. You iterate over all the timed out 
entries, but you need to cover the full merge set and send that "repairExtra" 
node. You don't know what data that replica has or needs, thus you need to send 
the full merged set. I'm not sure off the top of my head where/how to grab the 
merged rows, but I'm sure you can figure it out ;)
--- Furthermore, once you get the merged set, you do not need to send multiple 
messages to the target node, and can instead send one. Thus you can simplify 
the block starting at line 224 as such:
{code}
                Tracing.trace("retry read-repair-mutation to {}", reppairExtra);
                PartitionUpdate update = // get merged result from some 
location ...
                MessageOut messageOut = new 
Mutation(update).createMessage(MessagingService.Verb.READ_REPAIR); 
                AsyncOneResponse response = 
MessagingService.instance().sendRR(messageOut,reppairExtra);
                response.get(waitTimeNanos, TimeUnit.NANOSECONDS); // 
TimeoutException will be thrown 
                ....
{code}

The biggest thing this patch needs is testing. You might be able to unit test 
this one (in fact [~xiaolong...@gmail.com] spoke offline about some idea about 
how to do it), but it will take some time (worthwhile, in my opinion). A dtest 
will probably be required, as well, even though that will get tricky - byteman 
will probably necessary to help you out.

> Read repair inserts should not be blocking
> ------------------------------------------
>
>                 Key: CASSANDRA-10726
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10726
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Coordination
>            Reporter: Richard Low
>            Assignee: Xiaolong Jiang
>             Fix For: 3.0.x
>
>
> Today, if there’s a digest mismatch in a foreground read repair, the insert 
> to update out of date replicas is blocking. This means, if it fails, the read 
> fails with a timeout. If a node is dropping writes (maybe it is overloaded or 
> the mutation stage is backed up for some other reason), all reads to a 
> replica set could fail. Further, replicas dropping writes get more out of 
> sync so will require more read repair.
> The comment on the code for why the writes are blocking is:
> {code}
> // wait for the repair writes to be acknowledged, to minimize impact on any 
> replica that's
> // behind on writes in case the out-of-sync row is read multiple times in 
> quick succession
> {code}
> but the bad side effect is that reads timeout. Either the writes should not 
> be blocking or we should return success for the read even if the write times 
> out.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to