[ https://issues.apache.org/jira/browse/CASSANDRA-7489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14051740#comment-14051740 ]
Benedict commented on CASSANDRA-7489: ------------------------------------- AFAICT this new scheme suffers none of the problems mentioned in CASSANDRA-3620. That's not to say this is definitely foolproof, but I think it is worth exploring. > Track lower bound necessary for a repair, live, without actually repairing > -------------------------------------------------------------------------- > > Key: CASSANDRA-7489 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7489 > Project: Cassandra > Issue Type: Improvement > Reporter: Benedict > Labels: performance, repair > > We will need a few things in place to get this right, but it should be > possible to track live what the current health of a single range is across > the cluster. If we force an owning node to be the coordinator for an update > (so if a non-smart client sends a mutation to a non-owning node, it just > proxies it on to an owning node to coordinate the update; this should tend to > minimal overhead as smart clients become the norm, and smart clients scale up > to cope with huge clusters), then each owner can maintain the oldest known > timestamp it has coordinated an update for that was not acknowledged by every > owning node it propagated it to. The minimum of all of these for a region is > the lower bound from which we need to either repair, or retain tombstones. > With vnode file segregation we can mark an entire vnode range as repaired up > to the most recently determined healthy lower bound. > There are some subtleties with this, but it means tombstones can be cleared > potentially only minutes after they are generated, instead of days or weeks. > It also means even repairs can be even more incremental, only operating over > ranges and time periods we know to be potentially out of sync. > It will most likely need RAMP transactions in place, so that atomic batch > mutations are not serialized on non-owning nodes. Having owning nodes > coordinate updates is to ensure robustness in case of a single node failure - > in this case all ranges owned by the node are considered to have a lower > bound of -Inf. Without this a single node being down would result in the > entire cluster being considered out of sync. > We will still need a short grace period for clients to send timestamps, and > we would have to outright reject any updates that arrived with a timestamp > near to that window expiring. But that window could safely be just minutes. -- This message was sent by Atlassian JIRA (v6.2#6252)