[ https://issues.apache.org/jira/browse/CASSANDRA-13910?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Stefan Podkowinski updated CASSANDRA-13910: ------------------------------------------- Fix Version/s: 3.11.3 3.0.17 > Remove read_repair_chance/dclocal_read_repair_chance > ---------------------------------------------------- > > Key: CASSANDRA-13910 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13910 > Project: Cassandra > Issue Type: Improvement > Reporter: Sylvain Lebresne > Assignee: Aleksey Yeschenko > Priority: Minor > Fix For: 3.0.17, 3.11.3, 4.0 > > > First, let me clarify so this is not misunderstood that I'm not *at all* > suggesting to remove the read-repair mechanism of detecting and repairing > inconsistencies between read responses: that mechanism is imo fine and > useful. But the {{read_repair_chance}} and {{dclocal_read_repair_chance}} > have never been about _enabling_ that mechanism, they are about querying all > replicas (even when this is not required by the consistency level) for the > sole purpose of maybe read-repairing some of the replica that wouldn't have > been queried otherwise. Which btw, bring me to reason 1 for considering their > removal: their naming/behavior is super confusing. Over the years, I've seen > countless users (and not only newbies) misunderstanding what those options > do, and as a consequence misunderstand when read-repair itself was happening. > But my 2nd reason for suggesting this is that I suspect > {{read_repair_chance}}/{{dclocal_read_repair_chance}} are, especially > nowadays, more harmful than anything else when enabled. When those option > kick in, what you trade-off is additional resources consumption (all nodes > have to execute the read) for a _fairly remote chance_ of having some > inconsistencies repaired on _some_ replica _a bit faster_ than they would > otherwise be. To justify that last part, let's recall that: > # most inconsistencies are actually fixed by hints in practice; and in the > case where a node stay dead for a long time so that hints ends up timing-out, > you really should repair the node when it comes back (if not simply > re-bootstrapping it). Read-repair probably don't fix _that_ much stuff in > the first place. > # again, read-repair do happen without those options kicking in. If you do > reads at {{QUORUM}}, inconsistencies will eventually get read-repaired all > the same. Just a tiny bit less quickly. > # I suspect almost everyone use a low "chance" for those options at best > (because the extra resources consumption is real), so at the end of the day, > it's up to chance how much faster this fixes inconsistencies. > Overall, I'm having a hard time imagining real cases where that trade-off > really make sense. Don't get me wrong, those options had their places a long > time ago when hints weren't working all that well, but I think they bring > more confusion than benefits now. > And I think it's sane to reconsider stuffs every once in a while, and to > clean up anything that may not make all that much sense anymore, which I > think is the case here. > Tl;dr, I feel the benefits brought by those options are very slim at best and > well overshadowed by the confusion they bring, and not worth maintaining the > code that supports them (which, to be fair, isn't huge, but getting rid of > {{ReadCallback.AsyncRepairRunner}} wouldn't hurt for instance). > Lastly, if the consensus here ends up being that they can have their use in > weird case and that we fill supporting those cases is worth confusing > everyone else and maintaining that code, I would still suggest disabling them > totally by default. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org