[ 
https://issues.apache.org/jira/browse/CASSANDRA-15158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17102741#comment-17102741
 ] 

Blake Eggleston commented on CASSANDRA-15158:
---------------------------------------------

{quote}commenting on design issues, I am not completely sure if these issues 
you are talking about are related to this patch or they are already existing? 
We could indeed focus on the points you raised but it seems to me that the 
current (comitted) code is worse without this patch than with as I guess these 
problems are already there?
Isn't the goal here to have all nodes on same versions? Isn't the very fact 
that there are multiple versions pretty strange to begin with so we should not 
even try to join a node if they mismatch hence there is nothing to deal with in 
the first place?
{quote}
When there are schema changes, it's not strange at all for there to be multiple 
schema versions in the cluster before they converge. We also don't forbid 
making schema changes while changing cluster topology, so this would be 
something we should expect to encounter, although I would expect it to happen 
infrequently. Since bootstrap doesn't stream keyspaces it doesn't know about, 
this could create a window of data loss. Since the goal of this ticket is to 
wait for schema to converge before starting bootstrap, we should deal with edge 
cases like this. Also, I believe there have been bugs that caused a lot of 
schema change activity when nodes bootstrap, so depending on what exactly 
you're doing
{quote}How can a node report its schema while being unreachable?
{quote}
Schema versions are gossiped. So a node might gossip a new schema version then 
become unreachable. The bootstrapping node would learn about this new version 
via gossip, but be unable to contact it.
{quote}> admit that adding isRunningForcibly method feels like a hack but I had 
very hard time to test this stuff out.
{quote}
I'll look into how testing can be improved.
{quote}> This is the most likely not true unless I am not getting something. 
The node to be bootstrapped will never advance in doing so unless all nodes 
have same versions.
{quote}
Ah, yes you're right. Althought waiting for all nodes to arrive at the same 
schema version isn't neccesary, we just need to receive and merge at least one 
schema pull from every schema version in the cluster.

> Wait for schema agreement rather then in flight schema requests when 
> bootstrapping
> ----------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-15158
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15158
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Cluster/Gossip, Cluster/Schema
>            Reporter: Vincent White
>            Assignee: Ben Bromhead
>            Priority: Normal
>
> Currently when a node is bootstrapping we use a set of latches 
> (org.apache.cassandra.service.MigrationTask#inflightTasks) to keep track of 
> in-flight schema pull requests, and we don't proceed with 
> bootstrapping/stream until all the latches are released (or we timeout 
> waiting for each one). One issue with this is that if we have a large schema, 
> or the retrieval of the schema from the other nodes was unexpectedly slow 
> then we have no explicit check in place to ensure we have actually received a 
> schema before we proceed.
> While it's possible to increase "migration_task_wait_in_seconds" to force the 
> node to wait on each latche longer, there are cases where this doesn't help 
> because the callbacks for the schema pull requests have expired off the 
> messaging service's callback map 
> (org.apache.cassandra.net.MessagingService#callbacks) after 
> request_timeout_in_ms (default 10 seconds) before the other nodes were able 
> to respond to the new node.
> This patch checks for schema agreement between the bootstrapping node and the 
> rest of the live nodes before proceeding with bootstrapping. It also adds a 
> check to prevent the new node from flooding existing nodes with simultaneous 
> schema pull requests as can happen in large clusters.
> Removing the latch system should also prevent new nodes in large clusters 
> getting stuck for extended amounts of time as they wait 
> `migration_task_wait_in_seconds` on each of the latches left orphaned by the 
> timed out callbacks.
>  
> ||3.11||
> |[PoC|https://github.com/apache/cassandra/compare/cassandra-3.11...vincewhite:check_for_schema]|
> |[dtest|https://github.com/apache/cassandra-dtest/compare/master...vincewhite:wait_for_schema_agreement]|
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to