[ 
https://issues.apache.org/jira/browse/CASSANDRA-17572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-17572:
-----------------------------------------
     Bug Category: Parent values: Correctness(12982)Level 1 values: Recoverable 
Corruption / Loss(12986)
       Complexity: Normal
      Component/s: Cluster/Membership
    Discovered By: User Report
    Fix Version/s: 3.0.x
                   3.11.x
                   4.0.x
                   4.x
         Severity: Normal
           Status: Open  (was: Triage Needed)

> Race condition when IP address changes for a node can cause reads/writes to 
> route to the wrong node
> ---------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-17572
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-17572
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Cluster/Membership
>            Reporter: Sam Kramer
>            Priority: Normal
>             Fix For: 3.0.x, 3.11.x, 4.0.x, 4.x
>
>
> Hi,
> We noticed that there is a race condition present in the trunk of 3.x code, 
> and confirmed that it’s there in 4.x as well, which will result in incorrect 
> reads, and missed writes, for a very short period of time.
> What brought the race condition to our attention was due to the fact we 
> started noticing a couple of missed writes for our Cassandra clusters in 
> Kubernetes. We found the Kubernetes piece interesting, as IP changes are very 
> frequent as opposed to a traditional setup.
> More concretely:
>  # When a Cassandra node is turned off, and then starts with a new IP address 
> Z (former IP address X), it announces to the cluster (via gossip) it has IP Z 
> for Host ID Y
>  # If there are no conflicts, each node will decide to remove the old IP 
> address associated with Host ID Y 
> ([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/service/StorageService.java#L2529-L2532])
>  from the storage ring. This also causes us to invalidate our token ring 
> cache 
> ([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/locator/TokenMetadata.java#L488]
>  ).
>  # At this time, a new request could come in (read or write), and will 
> re-calculate which endpoints to send the request to, as we’ve invalidated our 
> token ring cache 
> ([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/locator/AbstractReplicationStrategy.java#L88-L104]).
>  # However, at this time we’ve only removed the IP address X (former IP 
> address), and have not re-added IP address Z.
>  # As a result, we will choose a new host to route our request to. In our 
> case, our keyspaces all run with NetworkTopologyStrategy, and so we simply 
> choose the node with the next closest token in the same rack as host Y 
> ([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/locator/NetworkTopologyStrategy.java#L149-L191]).
>  # Thus, the request is routed to a _different_ host, rather than the host 
> that has came back online.
>  # However, shortly later, we re-add the host (via it’s _new_ endpoint) to 
> the token ring 
> [https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/service/StorageService.java#L2549]
>  # This will result in us invalidating our cache, and then again re-routing 
> requests appropriately.
> Couple of additional thoughts:
>  - This doesn’t affect clusters where nodes <= RF with network topology 
> strategy.
>  - During this very brief period of time, CL for all user queries are 
> violated, but are ACK’d as successful.
>  - It’s easy to reproduce this race condition by simply adding a sleep here 
> ([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/service/StorageService.java#L2529-L2532])
>  - If a cleanup is not ran before any range movement, it’s possible for rows 
> that were temporarily written to the wrong node re-appear. 
>  - We tested that the race condition exists in our Cassandra 2.x fork (we're 
> not on 3.x or 4.x). So, there is a possibility here that it's only for 
> Cassandra 2.x, however unlikely from reading the code. 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to