[ https://issues.apache.org/jira/browse/CASSANDRA-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13084154#comment-13084154 ]
Jonathan Ellis commented on CASSANDRA-2034: ------------------------------------------- - there's an unused overload of MS.addCallback - shouldHint should probably be a CallbackInfo method - the .warn in scheduleMH should be .error - any reason scheduleMH is protected instead of private? - technically MS.shutdown should probably collect the Futures of the hints being written and wait on them in {code} if (consistencyLevel != null && consistencyLevel == ConsistencyLevel.ANY) {code} did you mean this? {code} if (responseHandler != null && consistencyLevel == ConsistencyLevel.ANY) {code} - not a big fan of the "subclass just exposes a different constructor than its parent" idiom. I think the normal expectation is that a subclass encapsulates some kind of different behavior than its parent. I'd get rid of CIWM and just expose a with-Message constructor in CallbackInfo. - Would also make CI message and isMutation final - avoid declaring @Override on abstract methods (looking at writeLocalHint Runnable, also CTAF, may be others) - avoid comments that repeat what the code says, like "// One more task in the hints queue." - would name hintCounter -> totalHints (I had to look at usages to figure out what it does) - there's no actual hint queue anymore so would name currentHintsQueueSize -> hintsInProgress (similarly, maxHintsQueueSize) - prefer camelCase variable names (CTAF overall_timeout) - unit.toMillis(timeout) would be more idiomatic than TimeUnit.MILLISECONDS.convert(timeout, unit) - otherwise CTAF is a good clean encapsulation, nice job - generics is upset about "hintsFutures.add(new CreationTimeAwareFuture(hintfuture))" -- can you fix the unchecked warning there? - unnecessary whitespace added to the method below "// wait for writes. throws TimeoutException if necessary" - would prefer to avoid allocating the hintFutures list unless we actually need to write hints, since this is on the write inner loop - still think we can simplify sendToHinted by getting rid of getHintedEndpoints and operating directly on the raw writeEndpoints (and consulting FD to decide whether to write a hint) - currentHintsQueueSize increment needs to be done OUTSIDE the runnable or it will never get above the number of task executors - it would be nice if we could tell the CallbackInfo whether the original write reached ConsistencyLevel. Maybe we could do this by changing isMutation to volatile isHintRequired, something like that. (And just set it to false if CL is not reached.) If not, we don't have to write hints for timed out replicas -- this will help avoiding OOM if nodes in the cluster start getting overloaded and dropping writes. Otherwise, the coordinator could run out of memory queuing up hints for writes that timed out and the client will retry. > Make Read Repair unnecessary when Hinted Handoff is enabled > ----------------------------------------------------------- > > Key: CASSANDRA-2034 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2034 > Project: Cassandra > Issue Type: Improvement > Components: Core > Reporter: Jonathan Ellis > Assignee: Patricio Echague > Fix For: 1.0 > > Attachments: 2034-formatting.txt, CASSANDRA-2034-trunk-v10.patch, > CASSANDRA-2034-trunk-v11.patch, CASSANDRA-2034-trunk-v11.patch, > CASSANDRA-2034-trunk-v2.patch, CASSANDRA-2034-trunk-v3.patch, > CASSANDRA-2034-trunk-v4.patch, CASSANDRA-2034-trunk-v5.patch, > CASSANDRA-2034-trunk-v6.patch, CASSANDRA-2034-trunk-v7.patch, > CASSANDRA-2034-trunk-v8.patch, CASSANDRA-2034-trunk-v9.patch, > CASSANDRA-2034-trunk.patch > > Original Estimate: 8h > Remaining Estimate: 8h > > Currently, HH is purely an optimization -- if a machine goes down, enabling > HH means RR/AES will have less work to do, but you can't disable RR entirely > in most situations since HH doesn't kick in until the FailureDetector does. > Let's add a scheduled task to the mutate path, such that we return to the > client normally after ConsistencyLevel is achieved, but after RpcTimeout we > check the responseHandler write acks and write local hints for any missing > targets. > This would making disabling RR when HH is enabled a much more reasonable > option, which has a huge impact on read throughput. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira