[ 
https://issues.apache.org/jira/browse/CASSANDRA-15158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17102037#comment-17102037
 ] 

Blake Eggleston commented on CASSANDRA-15158:
---------------------------------------------

It's likely we'll want to fix this in 3.0 and up, so I'll review the 3.0 
version to start with and we can go from there.

I haven't completed my review yet, but there are some structural and design 
issues we should address up front.

*Structural issues*

First, the {{waitForSchema*}} methods should live in the MigrationManager. This 
will prevent you from setting an updated status if you're sending out 
additional schema pulls, but we can revisit that later if we think it's 
neccesary.

Second, instantiating MigrationTaskCallbacks in StorageService/MigrationManager 
and passing it into MigrationTask is a little awkward. I'd prefer if the 
callback remained an anonymous class. We can communicate endpoint to send 
schema pulls to with an inetaddress argument, and we need to rethink what 
`isRunningForcibly` is doing and why. First, we shouldn't be adding it to 
{{IAsyncCallback}} for this narrow use case. Next, it's use seems to be 
changing how schema pulls actually work, but only in a test environment, which 
is something we should avoid.

*Design issues*

This doesn't deal with multiple schema versions. If a node joins, and there are 
2 or more schema versions floating around, it will only wait until it has 
_some_ schema to begin bootstrapping, not all. Related to this, we also need a 
plan for unreachable schema versions. For instance, if a single node is 
reporting a schema version that no one else has, but the node is unreachable, 
what do we do?

Next, I like how this limits the number of messages sent to a given endpoint, 
but we should also limit the number of messages we send out for a given schema 
version. If we have a large cluster, and all nodes are reporting the same 
version, we don't need to ask every node for it's schema.

> Wait for schema agreement rather then in flight schema requests when 
> bootstrapping
> ----------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-15158
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15158
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Cluster/Gossip, Cluster/Schema
>            Reporter: Vincent White
>            Assignee: Ben Bromhead
>            Priority: Normal
>
> Currently when a node is bootstrapping we use a set of latches 
> (org.apache.cassandra.service.MigrationTask#inflightTasks) to keep track of 
> in-flight schema pull requests, and we don't proceed with 
> bootstrapping/stream until all the latches are released (or we timeout 
> waiting for each one). One issue with this is that if we have a large schema, 
> or the retrieval of the schema from the other nodes was unexpectedly slow 
> then we have no explicit check in place to ensure we have actually received a 
> schema before we proceed.
> While it's possible to increase "migration_task_wait_in_seconds" to force the 
> node to wait on each latche longer, there are cases where this doesn't help 
> because the callbacks for the schema pull requests have expired off the 
> messaging service's callback map 
> (org.apache.cassandra.net.MessagingService#callbacks) after 
> request_timeout_in_ms (default 10 seconds) before the other nodes were able 
> to respond to the new node.
> This patch checks for schema agreement between the bootstrapping node and the 
> rest of the live nodes before proceeding with bootstrapping. It also adds a 
> check to prevent the new node from flooding existing nodes with simultaneous 
> schema pull requests as can happen in large clusters.
> Removing the latch system should also prevent new nodes in large clusters 
> getting stuck for extended amounts of time as they wait 
> `migration_task_wait_in_seconds` on each of the latches left orphaned by the 
> timed out callbacks.
>  
> ||3.11||
> |[PoC|https://github.com/apache/cassandra/compare/cassandra-3.11...vincewhite:check_for_schema]|
> |[dtest|https://github.com/apache/cassandra-dtest/compare/master...vincewhite:wait_for_schema_agreement]|
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to