[ https://issues.apache.org/jira/browse/CASSANDRA-13910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16196421#comment-16196421 ]
Kurt Greaves edited comment on CASSANDRA-13910 at 10/9/17 1:19 AM: ------------------------------------------------------------------- I think the proposed timeline for removal is a bit quick. I think if people do consider it a good idea to go ahead with this something like deprecate now, change default to disable in 4.0, and remove in 4.1 if no complaints are had would be far more suitable. Considering people take a long time to upgrade, if we remove in 4.0 users probably won't even realise until 4.1 is already out. I suggest the longer timeline because I do believe there are people with legitimate use cases out there for this. There's also very minimal gain from removing this from the codebase, so there isn't much point in rushing it. Disabling by default is reasonable in the interim. {quote}If you do reads at QUORUM, inconsistencies will eventually get read-repaired all the same. Just a tiny bit less quickly.{quote} There are a lot of users out there who don't use QUORUM, so this can't really be used as an alternative. Also hints are better but they are still prone to issues. They obviously don't work 100% of the time because I've seen read repairs on clusters that haven't had a node unavailable for longer than the HH window. Hints can also be dropped or get corrupted. Anyway, the main point is there could be people relying on it and we should probably give them at least a few releases to either complain or work out an alternative. On another note, and correct me if I'm wrong, but RR chance is still applied on QUORUM CL's. Without RR triggering I thought that QUORUM would only repair the nodes involved in the QUORUM, rather than all replicas (this is why you need to use ALL if you're trying to repair using read repair). This could be another use case or potentially a suprise. Albeit, not a good use case, but a use case regardless. was (Author: kurtg): I think the proposed timeline for removal is a bit quick. I think if people do consider it a good idea to go ahead with this something like deprecate now, change default to disable in 4.0, and remove in 4.1 if no complaints are had would be far more suitable. Considering people take a long time to upgrade, if we remove in 4.0 users probably won't even realise until 4.1 is already out. I suggest the longer timeline because I do believe there are people with legitimate use cases out there for this. There's also very minimal gain from removing this from the codebase, so there isn't much point in rushing it. Disabling by default is reasonable in the interim. .bq If you do reads at QUORUM, inconsistencies will eventually get read-repaired all the same. Just a tiny bit less quickly. There are a lot of users out there who don't use QUORUM, so this can't really be used as an alternative. Also hints are better but they are still prone to issues. They obviously don't work 100% of the time because I've seen read repairs on clusters that haven't had a node unavailable for longer than the HH window. Hints can also be dropped or get corrupted. Anyway, the main point is there could be people relying on it and we should probably give them at least a few releases to either complain or work out an alternative. On another note, and correct me if I'm wrong, but RR chance is still applied on QUORUM CL's. Without RR triggering I thought that QUORUM would only repair the nodes involved in the QUORUM, rather than all replicas (this is why you need to use ALL if you're trying to repair using read repair). This could be another use case or potentially a suprise. Albeit, not a good use case, but a use case regardless. > Consider deprecating (then removing) > read_repair_chance/dclocal_read_repair_chance > ---------------------------------------------------------------------------------- > > Key: CASSANDRA-13910 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13910 > Project: Cassandra > Issue Type: Improvement > Reporter: Sylvain Lebresne > Priority: Minor > Labels: CommunityFeedbackRequested > > First, let me clarify so this is not misunderstood that I'm not *at all* > suggesting to remove the read-repair mechanism of detecting and repairing > inconsistencies between read responses: that mechanism is imo fine and > useful. But the {{read_repair_chance}} and {{dclocal_read_repair_chance}} > have never been about _enabling_ that mechanism, they are about querying all > replicas (even when this is not required by the consistency level) for the > sole purpose of maybe read-repairing some of the replica that wouldn't have > been queried otherwise. Which btw, bring me to reason 1 for considering their > removal: their naming/behavior is super confusing. Over the years, I've seen > countless users (and not only newbies) misunderstanding what those options > do, and as a consequence misunderstand when read-repair itself was happening. > But my 2nd reason for suggesting this is that I suspect > {{read_repair_chance}}/{{dclocal_read_repair_chance}} are, especially > nowadays, more harmful than anything else when enabled. When those option > kick in, what you trade-off is additional resources consumption (all nodes > have to execute the read) for a _fairly remote chance_ of having some > inconsistencies repaired on _some_ replica _a bit faster_ than they would > otherwise be. To justify that last part, let's recall that: > # most inconsistencies are actually fixed by hints in practice; and in the > case where a node stay dead for a long time so that hints ends up timing-out, > you really should repair the node when it comes back (if not simply > re-bootstrapping it). Read-repair probably don't fix _that_ much stuff in > the first place. > # again, read-repair do happen without those options kicking in. If you do > reads at {{QUORUM}}, inconsistencies will eventually get read-repaired all > the same. Just a tiny bit less quickly. > # I suspect almost everyone use a low "chance" for those options at best > (because the extra resources consumption is real), so at the end of the day, > it's up to chance how much faster this fixes inconsistencies. > Overall, I'm having a hard time imagining real cases where that trade-off > really make sense. Don't get me wrong, those options had their places a long > time ago when hints weren't working all that well, but I think they bring > more confusion than benefits now. > And I think it's sane to reconsider stuffs every once in a while, and to > clean up anything that may not make all that much sense anymore, which I > think is the case here. > Tl;dr, I feel the benefits brought by those options are very slim at best and > well overshadowed by the confusion they bring, and not worth maintaining the > code that supports them (which, to be fair, isn't huge, but getting rid of > {{ReadCallback.AsyncRepairRunner}} wouldn't hurt for instance). > Lastly, if the consensus here ends up being that they can have their use in > weird case and that we fill supporting those cases is worth confusing > everyone else and maintaining that code, I would still suggest disabling them > totally by default. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org