[ https://issues.apache.org/jira/browse/CASSANDRA-17572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Brandon Williams updated CASSANDRA-17572: ----------------------------------------- Bug Category: Parent values: Correctness(12982)Level 1 values: Recoverable Corruption / Loss(12986) Complexity: Normal Component/s: Cluster/Membership Discovered By: User Report Fix Version/s: 3.0.x 3.11.x 4.0.x 4.x Severity: Normal Status: Open (was: Triage Needed) > Race condition when IP address changes for a node can cause reads/writes to > route to the wrong node > --------------------------------------------------------------------------------------------------- > > Key: CASSANDRA-17572 > URL: https://issues.apache.org/jira/browse/CASSANDRA-17572 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Membership > Reporter: Sam Kramer > Priority: Normal > Fix For: 3.0.x, 3.11.x, 4.0.x, 4.x > > > Hi, > We noticed that there is a race condition present in the trunk of 3.x code, > and confirmed that it’s there in 4.x as well, which will result in incorrect > reads, and missed writes, for a very short period of time. > What brought the race condition to our attention was due to the fact we > started noticing a couple of missed writes for our Cassandra clusters in > Kubernetes. We found the Kubernetes piece interesting, as IP changes are very > frequent as opposed to a traditional setup. > More concretely: > # When a Cassandra node is turned off, and then starts with a new IP address > Z (former IP address X), it announces to the cluster (via gossip) it has IP Z > for Host ID Y > # If there are no conflicts, each node will decide to remove the old IP > address associated with Host ID Y > ([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/service/StorageService.java#L2529-L2532]) > from the storage ring. This also causes us to invalidate our token ring > cache > ([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/locator/TokenMetadata.java#L488] > ). > # At this time, a new request could come in (read or write), and will > re-calculate which endpoints to send the request to, as we’ve invalidated our > token ring cache > ([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/locator/AbstractReplicationStrategy.java#L88-L104]). > # However, at this time we’ve only removed the IP address X (former IP > address), and have not re-added IP address Z. > # As a result, we will choose a new host to route our request to. In our > case, our keyspaces all run with NetworkTopologyStrategy, and so we simply > choose the node with the next closest token in the same rack as host Y > ([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/locator/NetworkTopologyStrategy.java#L149-L191]). > # Thus, the request is routed to a _different_ host, rather than the host > that has came back online. > # However, shortly later, we re-add the host (via it’s _new_ endpoint) to > the token ring > [https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/service/StorageService.java#L2549] > # This will result in us invalidating our cache, and then again re-routing > requests appropriately. > Couple of additional thoughts: > - This doesn’t affect clusters where nodes <= RF with network topology > strategy. > - During this very brief period of time, CL for all user queries are > violated, but are ACK’d as successful. > - It’s easy to reproduce this race condition by simply adding a sleep here > ([https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/service/StorageService.java#L2529-L2532]) > - If a cleanup is not ran before any range movement, it’s possible for rows > that were temporarily written to the wrong node re-appear. > - We tested that the race condition exists in our Cassandra 2.x fork (we're > not on 3.x or 4.x). So, there is a possibility here that it's only for > Cassandra 2.x, however unlikely from reading the code. -- This message was sent by Atlassian Jira (v8.20.7#820007) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org