[jira] [Updated] (CASSANDRA-16796) Clear pending ranges for a SHUTDOWN peer

Sam Tunnicliffe (Jira) Mon, 12 Jul 2021 10:39:10 -0700


     [ 
https://issues.apache.org/jira/browse/CASSANDRA-16796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Sam Tunnicliffe updated CASSANDRA-16796:
----------------------------------------
    Description: 
If a node involved in a MOVE operation should fail, peers can sometimes 
maintain pending ranges for it even when it has left the ring and/or been 
replaced (in practice until the peer is next bounced). This in turn can lead to 
bogus unavailable responses to clients if a replica for the any of the pending 
ranges should go down.

If the moving node crashes hard, a subsequent replacement will correctly fail 
as long as cassandra.consistent.rangemovement is set to true because the new 
node will learn the MOVING status from the remaining peers. A graceful 
shutdown, however, causes that status to be replaced with SHUTDOWN, but doesn't 
update TokenMetadata, so pending ranges remain for the down node, even after it 
has been removed from the ring.

  was:
If a node involved in a MOVE operation should fail, peers can sometimes 
maintain pending ranges even when it has left the ring and/or been replaced (in 
practice until the peer is next bounced). This in turn can lead to bogus 
unavailable responses to clients if a replica for the any of the pending ranges 
should go down.

If the moving node crashes hard, a subsequent replacement will correctly fail 
as long as cassandra.consistent.rangemovement is set to true because the new 
node will learn the MOVING status from the remaining peers. A graceful 
shutdown, however, causes that status to be replaced with SHUTDOWN, but doesn't 
update TokenMetadata, so pending ranges remain for the down node, even after it 
has been removed from the ring.


> Clear pending ranges for a SHUTDOWN peer
> ----------------------------------------
>
>                 Key: CASSANDRA-16796
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-16796
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Cluster/Membership
>            Reporter: Sam Tunnicliffe
>            Assignee: Sam Tunnicliffe
>            Priority: Normal
>             Fix For: 3.0.x, 3.11.x, 4.0.x
>
>
> If a node involved in a MOVE operation should fail, peers can sometimes 
> maintain pending ranges for it even when it has left the ring and/or been 
> replaced (in practice until the peer is next bounced). This in turn can lead 
> to bogus unavailable responses to clients if a replica for the any of the 
> pending ranges should go down.
> If the moving node crashes hard, a subsequent replacement will correctly fail 
> as long as cassandra.consistent.rangemovement is set to true because the new 
> node will learn the MOVING status from the remaining peers. A graceful 
> shutdown, however, causes that status to be replaced with SHUTDOWN, but 
> doesn't update TokenMetadata, so pending ranges remain for the down node, 
> even after it has been removed from the ring.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

[jira] [Updated] (CASSANDRA-16796) Clear pending ranges for a SHUTDOWN peer

Reply via email to