[ https://issues.apache.org/jira/browse/CASSANDRA-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15068248#comment-15068248 ]
Jonathan Ellis commented on CASSANDRA-10726: -------------------------------------------- Seeing reads "go backwards in time" is one of the most confusing aspects of eventual consistency for people, so I do think it's important that quorum reads avoid that, even more so because users tend to oversimplify quorum reads as "strong consistency that means I don't have to think about EC." So to the degree we can make that assumption true, we should, especially if that's been our behavior already for 4+ years. It seems like there are two primary problem scenarios: * When a node is overloaded for writes, this stops reads as well. First, delaying reads when we're behind on writes is arguably a good thing that will help you recover faster. Second, the right way to tackle this is with better handling of the write overload as in CASANDRA-9318. * When data is read-only because disks are failing. I agree with Sylvain that half-broken is often worse than completely broken, and in this specific case if a disk puts itself in read-only mode then it won't be long until it isn't readable either. This is another case where "mark a disk bad and broadcast to other nodes not to send me requests for tokens pinned to it" as envisioned in CASSANDRA-6696 would be useful, along with an option for "promote write errors to blacklist on reads as wells." > Read repair inserts should not be blocking > ------------------------------------------ > > Key: CASSANDRA-10726 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10726 > Project: Cassandra > Issue Type: Improvement > Components: Coordination > Reporter: Richard Low > > Today, if there’s a digest mismatch in a foreground read repair, the insert > to update out of date replicas is blocking. This means, if it fails, the read > fails with a timeout. If a node is dropping writes (maybe it is overloaded or > the mutation stage is backed up for some other reason), all reads to a > replica set could fail. Further, replicas dropping writes get more out of > sync so will require more read repair. > The comment on the code for why the writes are blocking is: > {code} > // wait for the repair writes to be acknowledged, to minimize impact on any > replica that's > // behind on writes in case the out-of-sync row is read multiple times in > quick succession > {code} > but the bad side effect is that reads timeout. Either the writes should not > be blocking or we should return success for the read even if the write times > out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)