[ 
https://issues.apache.org/jira/browse/CASSANDRA-19488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17856271#comment-17856271
 ] 

Sam Tunnicliffe commented on CASSANDRA-19488:
---------------------------------------------

Linked to a WIP branch for this so people can start taking a look without 
waiting for everything that's outstanding.
This completely deprecates the {{IEndpointSnitch}} interface, splitting its 
responsibilities out into a few new classes.
* {{o.a.c.locator.Locator}} is responsible for looking up DC/rack info for 
endpoints
* {{o.a.c.locator.InitialLocationProvider}} supplies the DC/rack for a new node 
to use when joining the cluster.
* {{o.a.c.locator.NodeProximity}} handles the sorting and ranking of replica 
lists
* {{o.a.c.locator.NodeAddressConfig}} is mainly to support the functionality of 
{{ReconnectableSnitchHelper}} and also {{Ec2MultiRegionSnitch}} which 
dynamically configures the broadcast address (not just the 
local/private/preferred one).

For migration and to allow us to deprecate snitches in a controlled way, it is 
still fully supported to configure with using the {{endpoint_snitch}} setting 
in yaml. {{o.a.c.locator.SnitchAdapter}} acts as a facade here, presenting the 
new interfaces to calling code and delegating to the legacy snitch impl. Most 
of the in-tree snitches impls have been refactored to extract impls of the new 
interfaces so that their functionality can be used via the new 
{{initial_location_provider/node_proximity}} settings.

Additionally there is some plumbing in {{o.a.c.locator.LocatorAdapter}} to help 
with inspecting topology before cluster metadata is fully available. 
Specifically, we prefer to always categorise peers as remote if the actual 
status can't reliably be determined. The reasoning here is that it's better to 
wrongly assume a local peer is remote and apply inter-dc settings (e.g. 
compression/encryption) than the reverse. The window where this is an issue is 
rather small (just at startup) and affects only the initial connections a node 
establishes with its peers. The adapter is notified once CM is available and 
any such connections are torn down to be re-established using the correct 
metadata.

An incomplete todo list: 
* Documentation. javadoc, inline comments/annotations & user docs all need more 
work.
* Naming. I'm not wedded to most of the naming here and there are some things 
which I didn't rename at all yet:
** {{o.a.c.locator.SnitchProperties}}
** {{o.a.c.locator.DynamicEndpointSnitch}}, this now {{implements 
NodeProximity}} but we have continuity of MBean naming to consider
** {{o.a.c.locator.EndpointSnitchInfo}}, similar MBean concerns apply here
** The method names on the new {{NodeProximity}} iface are copied directly from 
{{IEndpointSnitch}}, should fix that.
* Testing. CI is currently looking not perfect but ok, some new tests are 
needed.
** Behaviour around initial connections to peers before cluster metadata is 
available and re-connecting once it is.
** in-jvm & python dtests as well as simulator are mostly still configuring C* 
with a snitch and letting the adapter do its thing. This exercises the 
migration path and verifies existing yaml files won't break things on upgrade. 
At the same time, there are currently no tests of using the new config appraoch 
with {{InitialLocationProvider/NodeProximity/NodeAddressConfig}} directly. Is 
the right way to approach this to add the new settings to 
{{cassandra_latest.yaml}}?
** This patch appears to have exacerbated the issue described in 
CASSANDRA-19239. The branch includes a temporary commit which skips 
{{NativeTransportEncryptionOptionsTest}}.
** I haven't run python upgrade dtests on this for a while.
** Need to organise testing of the various supported cloud metadata services in 
their actual environments  
* There is an {{InitialLocationProvider}} that corresponds to or replaces each 
of the {{AbstractCloudMetadataServiceSnitch}} impls, but I haven't (yet) 
refactored {{PropertyFileSnitch/GossipingPropertyFileSnitch}} in the same way.


> Ensure snitches always defer to ClusterMetadata
> -----------------------------------------------
>
>                 Key: CASSANDRA-19488
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-19488
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Cluster/Membership, Messaging/Internode, Transactional 
> Cluster Metadata
>            Reporter: Sam Tunnicliffe
>            Assignee: Sam Tunnicliffe
>            Priority: Normal
>             Fix For: 5.x
>
>
> Internally, C* always uses {{ClusterMetadata}} as the source of topology 
> information when calculating data placements, replica plans etc and as such 
> the role of the snitch has been somewhat reduced. 
> Sorting and comparison functions as provided by specialisations like 
> {{DynamicEndpointSnitch}} are still used, but the snitch should only be 
> responsible for providing the DC and rack for a new node when it first joins 
> a cluster.
> Aside from initial startup and registration, snitch implementations should 
> always defer to {{{}ClusterMetadata{}}}, for DC and rack otherwise there is a 
> risk that the snitch config drifts out of sync with TCM and output from tools 
> like {{nodetool ring}} and {{gossipinfo}} becomes incorrect.
> A complication is that topology is used when opening connections to peers as 
> certain internode connection settings are variable at the DC level, so at the 
> time of connecting we want to check the location of the remote peer. Usually, 
> this is available from {{{}ClusterMetadata{}}}, but in the case of a brand 
> new node joining the cluster nothing is known a priori. The current 
> implementation assumes that the snitch will know the location of the new node 
> ahead of time, but in practice this is often not the case (though with 
> variants of {{PropertyFileSnitch}} it _should_ be), and the remote node is 
> temporarily assigned a default DC. This is problematic as it can cause the 
> internode connection settings which depend on DC to be incorrectly set. 
> Internode connections are long lived and any established while the DC is 
> unknown (potentially with incorrect config) will persist indefinitely. This 
> particular issue is not directly related to TCM and is present in earlier 
> versions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to