[ 
https://issues.apache.org/jira/browse/CASSANDRA-14409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16585874#comment-16585874
 ] 

Benedict commented on CASSANDRA-14409:
--------------------------------------

I have pushed a patch with some major refactors to {{ReplicaCollection}} 
[here|https://github.com/belliottsmith/cassandra/tree/14409-replicas-wip], in 
that it is seemingly at parity for unit/dtests.  There are some things I would 
like to improve, but we’re too pressed for time, and at the point of 
diminishing returns.  

The patch does include some minor behavioural changes to fix some unit tests 
that were broken prior to it, and some (in progress) review feedback comments 
I’m collecting.

In summary, it introduces {{EndpointsForRange}}, {{EndpointsForToken}} and 
{{RangesAtEndpoint}}.  There are also {{Endpoints}}, a super class of the first 
two, and {{AbstractReplicaCollection}} that implements the majority of 
functionality.

Main improvements:
* These classes are immutable by default; I think this actually reduces copying 
on average, and improves clarity
* There are {{Mutable}} variants, for clearly delineating where this is 
necessary
* We now declare in all locations what our expected semantics are, and we 
enforce them
* There is only one implementation for all non-boilerplate 
{{ReplicaCollection}} methods, so a reader can quickly determine the 
characteristics at a call site
* (Almost) all O(n^2) call sites have been eliminated, mostly by our stronger 
constraints and maintaining a {{LinkedHashMap}}.  This is eagerly constructed 
for a brand-new collection, and lazily for a filtered/sorted collection.

Some open questions: 
* I have limited mutability to append only, which permits easy immutable 
snapshots, but may be fiddly to change later.
* I have made the {{Mutable}} variants extend the immutable ones, but this is 
not necessary, as an immutable view can always be constructed - this might be 
cleaner, and might also permit easier sharing of {{Mutable}} implementation 
details
* {{AbstractReplicaCollection}} introduces a method called {{select()}} for 
efficiently getting a subset of the collection based on a sequence of filters - 
this considerably clarifies one caller, and might clarify others I didn’t spot, 
but is optional.
* Conversely, since we often filter and sort together, this class could be 
extended to perform them together, and avoid some unnecessary work, but since 
this would be harder to rollback it probably needs consensus first.
* We maintain a {{LinkedHashMap}} to avoid code bloat while enforcing the same 
iteration order in our derived collections, but we could introduce an 
{{AbstractMap}} that proxies to a pure {{HashMap}}, and iterates our list.  
This further complicates the code (slightly), but improves performance for 
sorting and building.

Suboptimality:
* {{Endpoints<?>}} is necessary in a number of places, because we conflate 
range and point ops in just a handful of places; namely writing batch/hints to 
‘any’ other host, and in Data/DigestResolver, which can be for range or point 
queries.  This means some uglier generics in places, and some code is less 
clear than it might be.  The places where this conflation occurs are well 
delineated now, at least.
* We don’t have isolation when getting natural and pending replicas, so we 
ignore conflicting endpoints when concatenating these.
* {{Endpoints.newMutable}} - this is an instance method for a new {{Mutable}} 
that matches the type of the instance.  It’s ugly, but presently needed for the 
above conflation.  Possibly the select() class could be modified to support 
these cases instead.
* {{EndpointsForToken}} and {{EndpointsForRange}} *require* a {{Token}} and 
{{Range}} respectively - this means a bit of boilerplate occasionally to pass 
the correct value to empty(), as well as some unnecessary allocations.  We 
could probably relax this, but it makes the semantics of {{newMutable}} tricky 
(if we improved select() to replace this, we could probably avoid this)
* {{EndpointsByRange}} and {{RangesByEndpoint}} have immutable and mutable 
variants, though I don’t love these at all.  Nor do I like 
{{EndpointsByReplica}}.  But they will probably suffice.

> Transient Replication: Support ring changes when transient replication is in 
> use (add/remove node, change RF, add/remove DC)
> ----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-14409
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-14409
>             Project: Cassandra
>          Issue Type: Sub-task
>          Components: Coordination, Core, Documentation and Website
>            Reporter: Ariel Weisberg
>            Assignee: Ariel Weisberg
>            Priority: Major
>             Fix For: 4.0
>
>
> The additional state transitions that transient replication introduces 
> require streaming and nodetool cleanup to behave differently. We already have 
> code that does the streaming, but in some cases we shouldn't stream any data 
> and in others when we stream to receive data we have to make sure we stream 
> from a full replica and not a transient replica.
> Transitioning from not replicated to transiently replicated means that a node 
> must stay pending until the next incremental repair completes at which point 
> the data for that range is known to be available at full replicas.
> Transitioning from transiently replicated to fully replicated requires 
> streaming from a full replica and is identical to how we stream from not 
> replicated to replicated. The transition must be managed so the transient 
> replica is not read from as a full replica until streaming completes. It can 
> be used immediately for a write quorum.
> Transitioning from fully replicated to transiently replicated requires 
> cleanup to remove repaired data from the transiently replicated range to 
> reclaim space. It can be used immediately for a write quorum.
> Transitioning from transiently replicated to not replicated requires cleanup 
> to be run to remove the formerly transiently replicated data.
> nodetool move, removenode, cleanup, decommission, and rebuild need to handle 
> these issues as does bootstrap.
> Update web site, documentation, NEWS.txt with a description of the steps for 
> doing common operations. Add/remove DC, Add/remove node(s), replace node, 
> change RF.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to