[ 
https://issues.apache.org/jira/browse/CASSANDRA-15158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17171139#comment-17171139
 ] 

Blake Eggleston commented on CASSANDRA-15158:
---------------------------------------------

{quote}
I am not completely sure why are we pulling again here. I would rewrite the 
whole solution in a such way that this Callable just does one thing on a 
successful response (merging of a schema) and the actual "retry" would be 
handled from outside. The reader has to make quite a mental exercise to 
visualise that this callback might actually call another callback in it until 
some "version" is completed etc ... At least for me, it was quite tedious to 
track.


{quote}
In the case of a successful pull, we won't pull again. Response and fail both 
call pullComplete, but an additional pull is only called if it's called from 
fail.

I get that this can be a bit difficult to follow, but I'm not sure there's a 
better approach, given the schema pulls are completely event driven during 
normal runtime. If we miss a schema change during normal runtime (not 
bootstrap), there's nothing waiting on schema convergence that would enable us 
to retry from the outside.

There is a periodic task that pulls schema for outstanding versions that don't 
have any in flight requests^[1]^, but it only runs once a minute, and we need 
to be more proactive about learning about schema updates since we'll be unable 
to serve some reads and writes until we're up to date.
{quote}TBH that is quite counterintuitive too
{quote}
Could you expand on what's counterintuitive about it? If the endpoint's schema 
version has changed, we need to disassociate it with it's previously reported 
version. I have added a comment saying as much.
{quote}The test has failed for me (repeatedly):
{quote}
Thanks, it should be passing now.

[1] This handles the case where all nodes reporting a given version are on a 
different version so we can't pull schema from them, and acts as a hedge 
against any bugs in this implementation that might cause us to not schedule 
schema pulls as intended

> Wait for schema agreement rather than in flight schema requests when 
> bootstrapping
> ----------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-15158
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15158
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Cluster/Gossip, Cluster/Schema
>            Reporter: Vincent White
>            Assignee: Blake Eggleston
>            Priority: Normal
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently when a node is bootstrapping we use a set of latches 
> (org.apache.cassandra.service.MigrationTask#inflightTasks) to keep track of 
> in-flight schema pull requests, and we don't proceed with 
> bootstrapping/stream until all the latches are released (or we timeout 
> waiting for each one). One issue with this is that if we have a large schema, 
> or the retrieval of the schema from the other nodes was unexpectedly slow 
> then we have no explicit check in place to ensure we have actually received a 
> schema before we proceed.
> While it's possible to increase "migration_task_wait_in_seconds" to force the 
> node to wait on each latche longer, there are cases where this doesn't help 
> because the callbacks for the schema pull requests have expired off the 
> messaging service's callback map 
> (org.apache.cassandra.net.MessagingService#callbacks) after 
> request_timeout_in_ms (default 10 seconds) before the other nodes were able 
> to respond to the new node.
> This patch checks for schema agreement between the bootstrapping node and the 
> rest of the live nodes before proceeding with bootstrapping. It also adds a 
> check to prevent the new node from flooding existing nodes with simultaneous 
> schema pull requests as can happen in large clusters.
> Removing the latch system should also prevent new nodes in large clusters 
> getting stuck for extended amounts of time as they wait 
> `migration_task_wait_in_seconds` on each of the latches left orphaned by the 
> timed out callbacks.
>  
> ||3.11||
> |[PoC|https://github.com/apache/cassandra/compare/cassandra-3.11...vincewhite:check_for_schema]|
> |[dtest|https://github.com/apache/cassandra-dtest/compare/master...vincewhite:wait_for_schema_agreement]|
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to