[ https://issues.apache.org/jira/browse/CASSANDRA-14409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16585874#comment-16585874 ]
Benedict commented on CASSANDRA-14409: -------------------------------------- I have pushed a patch with some major refactors to {{ReplicaCollection}} [here|https://github.com/belliottsmith/cassandra/tree/14409-replicas-wip], in that it is seemingly at parity for unit/dtests. There are some things I would like to improve, but we’re too pressed for time, and at the point of diminishing returns. The patch does include some minor behavioural changes to fix some unit tests that were broken prior to it, and some (in progress) review feedback comments I’m collecting. In summary, it introduces {{EndpointsForRange}}, {{EndpointsForToken}} and {{RangesAtEndpoint}}. There are also {{Endpoints}}, a super class of the first two, and {{AbstractReplicaCollection}} that implements the majority of functionality. Main improvements: * These classes are immutable by default; I think this actually reduces copying on average, and improves clarity * There are {{Mutable}} variants, for clearly delineating where this is necessary * We now declare in all locations what our expected semantics are, and we enforce them * There is only one implementation for all non-boilerplate {{ReplicaCollection}} methods, so a reader can quickly determine the characteristics at a call site * (Almost) all O(n^2) call sites have been eliminated, mostly by our stronger constraints and maintaining a {{LinkedHashMap}}. This is eagerly constructed for a brand-new collection, and lazily for a filtered/sorted collection. Some open questions: * I have limited mutability to append only, which permits easy immutable snapshots, but may be fiddly to change later. * I have made the {{Mutable}} variants extend the immutable ones, but this is not necessary, as an immutable view can always be constructed - this might be cleaner, and might also permit easier sharing of {{Mutable}} implementation details * {{AbstractReplicaCollection}} introduces a method called {{select()}} for efficiently getting a subset of the collection based on a sequence of filters - this considerably clarifies one caller, and might clarify others I didn’t spot, but is optional. * Conversely, since we often filter and sort together, this class could be extended to perform them together, and avoid some unnecessary work, but since this would be harder to rollback it probably needs consensus first. * We maintain a {{LinkedHashMap}} to avoid code bloat while enforcing the same iteration order in our derived collections, but we could introduce an {{AbstractMap}} that proxies to a pure {{HashMap}}, and iterates our list. This further complicates the code (slightly), but improves performance for sorting and building. Suboptimality: * {{Endpoints<?>}} is necessary in a number of places, because we conflate range and point ops in just a handful of places; namely writing batch/hints to ‘any’ other host, and in Data/DigestResolver, which can be for range or point queries. This means some uglier generics in places, and some code is less clear than it might be. The places where this conflation occurs are well delineated now, at least. * We don’t have isolation when getting natural and pending replicas, so we ignore conflicting endpoints when concatenating these. * {{Endpoints.newMutable}} - this is an instance method for a new {{Mutable}} that matches the type of the instance. It’s ugly, but presently needed for the above conflation. Possibly the select() class could be modified to support these cases instead. * {{EndpointsForToken}} and {{EndpointsForRange}} *require* a {{Token}} and {{Range}} respectively - this means a bit of boilerplate occasionally to pass the correct value to empty(), as well as some unnecessary allocations. We could probably relax this, but it makes the semantics of {{newMutable}} tricky (if we improved select() to replace this, we could probably avoid this) * {{EndpointsByRange}} and {{RangesByEndpoint}} have immutable and mutable variants, though I don’t love these at all. Nor do I like {{EndpointsByReplica}}. But they will probably suffice. > Transient Replication: Support ring changes when transient replication is in > use (add/remove node, change RF, add/remove DC) > ---------------------------------------------------------------------------------------------------------------------------- > > Key: CASSANDRA-14409 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14409 > Project: Cassandra > Issue Type: Sub-task > Components: Coordination, Core, Documentation and Website > Reporter: Ariel Weisberg > Assignee: Ariel Weisberg > Priority: Major > Fix For: 4.0 > > > The additional state transitions that transient replication introduces > require streaming and nodetool cleanup to behave differently. We already have > code that does the streaming, but in some cases we shouldn't stream any data > and in others when we stream to receive data we have to make sure we stream > from a full replica and not a transient replica. > Transitioning from not replicated to transiently replicated means that a node > must stay pending until the next incremental repair completes at which point > the data for that range is known to be available at full replicas. > Transitioning from transiently replicated to fully replicated requires > streaming from a full replica and is identical to how we stream from not > replicated to replicated. The transition must be managed so the transient > replica is not read from as a full replica until streaming completes. It can > be used immediately for a write quorum. > Transitioning from fully replicated to transiently replicated requires > cleanup to remove repaired data from the transiently replicated range to > reclaim space. It can be used immediately for a write quorum. > Transitioning from transiently replicated to not replicated requires cleanup > to be run to remove the formerly transiently replicated data. > nodetool move, removenode, cleanup, decommission, and rebuild need to handle > these issues as does bootstrap. > Update web site, documentation, NEWS.txt with a description of the steps for > doing common operations. Add/remove DC, Add/remove node(s), replace node, > change RF. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org