[ 
https://issues.apache.org/jira/browse/CASSANDRA-18096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Miklosovic updated CASSANDRA-18096:
------------------------------------------
    Status: Ready to Commit  (was: Review In Progress)

> Do not spam the logs with MigrationCoordinator not able to pull schemas on 
> bootstrap
> ------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-18096
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-18096
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Cluster/Schema
>            Reporter: Stefan Miklosovic
>            Assignee: Stefan Miklosovic
>            Priority: Low
>             Fix For: 3.0.x, 3.11.x, 4.0.x, 4.1.x, 4.x
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> When a node is joining a cluster, there is this output upon startup:
> {code}
> cassandra_node_6  | INFO  [GossipStage:1] 2022-12-06 12:48:07,187 
> Gossiper.java:1413 - Node /172.19.0.5:7000 is now part of the cluster
> cassandra_node_6  | WARN MigrationCoordinator.java:650 - Can't send schema 
> pull request: node /172.19.0.5:7000 is down.
> cassandra_node_6  | WARN MigrationCoordinator.java:650 - Can't send schema 
> pull request: node /172.19.0.5:7000 is down.
> cassandra_node_6  | WARN MigrationCoordinator.java:650 - Can't send schema 
> pull request: node /172.19.0.5:7000 is down.
> cassandra_node_6  | WARN MigrationCoordinator.java:650 - Can't send schema 
> pull request: node /172.19.0.5:7000 is down.
> cassandra_node_6  | WARN MigrationCoordinator.java:650 - Can't send schema 
> pull request: node /172.19.0.5:7000 is down.
> cassandra_node_6  | WARN MigrationCoordinator.java:650 - Can't send schema 
> pull request: node /172.19.0.5:7000 is down.
> cassandra_node_6  | WARN MigrationCoordinator.java:650 - Can't send schema 
> pull request: node /172.19.0.5:7000 is down.
> {code}
> This is there for a lot of already existing nodes. You got the idea. This log 
> is misleading, it indeed can not pull requests because "node is down" but it 
> is not down, it just thinks it is because Gossiper has not had a chance to 
> receive any gossip about these nodes _yet_.
> I put there more logs and it writes this:
> {code}
>  MigrationCoordinator.java:655 - Can't send schema pull request: node 
> /172.19.0.5:7000 is down: NORMAL, isAlive: false
> {code}
> When I do this:
> {code}
>         if (!gossiper.hasEndpointState(endpoint))
>             return;
>         if (!gossiper.isAlive(endpoint))
>         {
>             EndpointState endpointStateForEndpoint = 
> gossiper.getEndpointStateForEndpoint(endpoint);
>             String status = 
> Gossiper.getGossipStatus(endpointStateForEndpoint);
>             logger.warn("Can't send schema pull request: node {} is down: {}, 
> isAlive: {}", endpoint, status, endpointStateForEndpoint.isAlive());
>             callback.onFailure(endpoint, RequestFailureReason.UNKNOWN);
>             return;
>         }
> {code}
> So it is in NORMAL but it is not alive yet which is quite strange.
> The fix is to still return prematurely but we would not skip the logging on 
> WARN only in case isAlive is false and status is _not_NORMAL. We would 
> however still log on TRACE at least.
> (1) 
> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/schema/MigrationCoordinator.java#L648-L653



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to