[ 
https://issues.apache.org/jira/browse/CASSANDRA-15158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17102543#comment-17102543
 ] 

Stefan Miklosovic commented on CASSANDRA-15158:
-----------------------------------------------

It seems to me that one aspect of the PR was overlooked so I just iterate on 
that one. The mechanims how to not flood nodes with schema pull messages is 
incorporated in a loop over callbacks. If you notice it, there are sleeps of 
various lenghts based on a request being already sent or not. This sleep will 
actually "delay" the next schema pull from the other node because during this 
time of a sleep, a schema could come in so on the next iteration when another 
node is compared on schema equality, it may happen that there is not any need 
to pull it anymore because they are on par. Hence we are not blindly sending 
messages to all nodes.
If there are some discrepancies, there is the global timeout set after which 
whole bootstrapping process will be evaluated as errorneous and (in the current 
code) we throw a ConfigurationException. This behaviour might be relaxed but I 
consider it more appropriate to just throw it there.

> Wait for schema agreement rather then in flight schema requests when 
> bootstrapping
> ----------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-15158
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15158
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Cluster/Gossip, Cluster/Schema
>            Reporter: Vincent White
>            Assignee: Ben Bromhead
>            Priority: Normal
>
> Currently when a node is bootstrapping we use a set of latches 
> (org.apache.cassandra.service.MigrationTask#inflightTasks) to keep track of 
> in-flight schema pull requests, and we don't proceed with 
> bootstrapping/stream until all the latches are released (or we timeout 
> waiting for each one). One issue with this is that if we have a large schema, 
> or the retrieval of the schema from the other nodes was unexpectedly slow 
> then we have no explicit check in place to ensure we have actually received a 
> schema before we proceed.
> While it's possible to increase "migration_task_wait_in_seconds" to force the 
> node to wait on each latche longer, there are cases where this doesn't help 
> because the callbacks for the schema pull requests have expired off the 
> messaging service's callback map 
> (org.apache.cassandra.net.MessagingService#callbacks) after 
> request_timeout_in_ms (default 10 seconds) before the other nodes were able 
> to respond to the new node.
> This patch checks for schema agreement between the bootstrapping node and the 
> rest of the live nodes before proceeding with bootstrapping. It also adds a 
> check to prevent the new node from flooding existing nodes with simultaneous 
> schema pull requests as can happen in large clusters.
> Removing the latch system should also prevent new nodes in large clusters 
> getting stuck for extended amounts of time as they wait 
> `migration_task_wait_in_seconds` on each of the latches left orphaned by the 
> timed out callbacks.
>  
> ||3.11||
> |[PoC|https://github.com/apache/cassandra/compare/cassandra-3.11...vincewhite:check_for_schema]|
> |[dtest|https://github.com/apache/cassandra-dtest/compare/master...vincewhite:wait_for_schema_agreement]|
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to