[ https://issues.apache.org/jira/browse/CASSANDRA-16525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Yifan Cai updated CASSANDRA-16525: ---------------------------------- Description: In 4.0, new application states are added in Gossip and the corresponding old ones are deprecated, e.g. {{STATUS}} and the successor {{STATUS_WITH_PORT}}. There are 2 issues discovered by the jvm (upgrade) dtest. First, the {{STATUS}} field of a peer in the lower version (e.g. 3.0) node can be missing. Second, it is possible the {{STATUS}} coexist with the new state {{STATUS_WITH_PORT}} in the 4.0 nodes after cluster is fully upgraded and the {{STATUS}} field can becomes stale as the 4.0 node filters out when applying new state. The first issue can happen in this scenario. During upgrade, node1 and node2 are in v4, and node3 is still in v3. If node3 only gets the gossip info regarding node2 from node1, the {{STATUS}} field of node2 will be missing in node3's local state, which is unexpected. There could be many reasons that node3 does not exchange gossip with node2 directly, e.g. network issue between node2 and node3, or node2 simply does not select node3 when initiating the gossip round. Gossip should be resilient to it. I have a [jvm upgrade dtest|https://github.com/yifan-c/cassandra/blob/CASSANDRA-16525/trunk/test/distributed/org/apache/cassandra/distributed/upgrade/MixedModeGossipTest.java#L81] to demonstrate the unexpected behavior. The cause of the second issue is more subtle. Heartbeat update happens as part of the Gossip task and outside of the GossipStage. When node2 just update its local application state and received a {{SYN}} from node1, node 2 just replies its gossip state without updating the heartbeat version. When node1 receives it, it first filters out the legacy {{STATUS}} field, and only saves the new one. So far so good. However, node2 soon updates its heart beat, and node1 realizes that its local version is less than the remove (node2) version in the next gossip round. So node2 sends {{STATUS}} along to node1. Because it does not come together with the new field, node1 does not filter it out when receiving. Boo! Node1 now has the {{STATUS}} field from node2. Such field can become stale and diverge with its successor in a live cluster. The jvm upgrade test [testStatusFieldShouldExistInOldVersionNodes|https://github.com/yifan-c/cassandra/blob/CASSANDRA-16525/trunk/test/distributed/org/apache/cassandra/distributed/upgrade/MixedModeGossipTest.java#L47] can fairly easy to reproduce it when the entire cluster is upgraded. And there is another jvm dtest (with source changes to help make deterministic result, see the attached Demonstrate-a-scenario-that-a-node-may-hold-the-stale-status.patch) that demonstrates the {{STATUS}} can be replicated to the peer and become stale. The fix is to 1) retain the legacy fields if the cluster is still in mixed mode 2) remove the legacy field when cluster is fully upgraded was: In 4.0, new application states are added in Gossip and the corresponding old ones are deprecated, e.g. {{STATUS}} and the successor {{STATUS_WITH_PORT}}. There are 2 issues discovered by the jvm (upgrade) dtest. First, the {{STATUS}} field of a peer in the lower version (e.g. 3.0) node can be missing. Second, it is possible the {{STATUS}} coexist with the new state {{STATUS_WITH_PORT}} in the 4.0 nodes after cluster is fully upgraded and the {{STATUS}} field can becomes stale as the 4.0 node filters out when applying new state. The first issue can happen in this scenario. During upgrade, node1 and node2 are in v4 and node3 is still in v3. If node3 only gets the gossip info regarding node2 from node1, the {{STATUS}} field of node2 will be missing in node3's local state, which is unexpected. There could be many reasons that node3 does not exchange gossip with node2 directly, e.g. network issue between node2 and node3, or simple node2 does not select node3 when initiating the gossip round. I have a [jvm upgrade dtest|https://github.com/yifan-c/cassandra/blob/CASSANDRA-16525/trunk/test/distributed/org/apache/cassandra/distributed/upgrade/MixedModeGossipTest.java#L81] to demonstrate the unexpected behavior. The cause of the second issue is more subtle. Heartbeat update happens as part of the Gossip task and outside of the GossipStage. When node2 just update its local application state and received a {{SYN}} from node1, node 2 just replies its gossip state without updating the heartbeat version. When node1 receives it, it first filters out the legacy {{STATUS}} field, and only saves the new one. So far so good. However, node2 soon updates its heart beat, and node1 realizes that its local version is less than the remove (node2) version in the next gossip round. So node2 sends {{STATUS}} along to node1. Because it does not come together with the new field, node1 does not filter it out when receiving. Boo! Node1 now has the {{STATUS}} field from node2. Such field can become stale and diverge with its successor in a live cluster. The jvm upgrade test [testStatusFieldShouldExistInOldVersionNodes|https://github.com/yifan-c/cassandra/blob/CASSANDRA-16525/trunk/test/distributed/org/apache/cassandra/distributed/upgrade/MixedModeGossipTest.java#L47] can fairly easy to reproduce it when the entire cluster is upgraded. And there is another jvm dtest (with source changes to help make deterministic result, see the attached Demonstrate-a-scenario-that-a-node-may-hold-the-stale-status.patch) that demonstrates the {{STATUS}} can be replicated to the peer and become stale. The fix is to 1) retain the legacy fields if the cluster is still in mixed mode 2) remove the legacy field when cluster is fully upgraded > Gossip STATUS can be either missing during upgrade or stale after upgrade > ------------------------------------------------------------------------- > > Key: CASSANDRA-16525 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16525 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Gossip > Reporter: Yifan Cai > Assignee: Yifan Cai > Priority: Normal > Fix For: 4.0, 4.0-rc > > Attachments: > Demonstrate-a-scenario-that-a-node-may-hold-the-stale-status.patch > > Time Spent: 10m > Remaining Estimate: 0h > > In 4.0, new application states are added in Gossip and the corresponding old > ones are deprecated, e.g. {{STATUS}} and the successor {{STATUS_WITH_PORT}}. > There are 2 issues discovered by the jvm (upgrade) dtest. First, the > {{STATUS}} field of a peer in the lower version (e.g. 3.0) node can be > missing. Second, it is possible the {{STATUS}} coexist with the new state > {{STATUS_WITH_PORT}} in the 4.0 nodes after cluster is fully upgraded and the > {{STATUS}} field can becomes stale as the 4.0 node filters out when applying > new state. > The first issue can happen in this scenario. During upgrade, node1 and node2 > are in v4, and node3 is still in v3. If node3 only gets the gossip info > regarding node2 from node1, the {{STATUS}} field of node2 will be missing in > node3's local state, which is unexpected. There could be many reasons that > node3 does not exchange gossip with node2 directly, e.g. network issue > between node2 and node3, or node2 simply does not select node3 when > initiating the gossip round. Gossip should be resilient to it. I have a [jvm > upgrade > dtest|https://github.com/yifan-c/cassandra/blob/CASSANDRA-16525/trunk/test/distributed/org/apache/cassandra/distributed/upgrade/MixedModeGossipTest.java#L81] > to demonstrate the unexpected behavior. > The cause of the second issue is more subtle. Heartbeat update happens as > part of the Gossip task and outside of the GossipStage. When node2 just > update its local application state and received a {{SYN}} from node1, node 2 > just replies its gossip state without updating the heartbeat version. When > node1 receives it, it first filters out the legacy {{STATUS}} field, and only > saves the new one. So far so good. However, node2 soon updates its heart > beat, and node1 realizes that its local version is less than the remove > (node2) version in the next gossip round. So node2 sends {{STATUS}} along to > node1. Because it does not come together with the new field, node1 does not > filter it out when receiving. Boo! Node1 now has the {{STATUS}} field from > node2. Such field can become stale and diverge with its successor in a live > cluster. The jvm upgrade test > [testStatusFieldShouldExistInOldVersionNodes|https://github.com/yifan-c/cassandra/blob/CASSANDRA-16525/trunk/test/distributed/org/apache/cassandra/distributed/upgrade/MixedModeGossipTest.java#L47] > can fairly easy to reproduce it when the entire cluster is upgraded. And > there is another jvm dtest (with source changes to help make deterministic > result, see the attached > Demonstrate-a-scenario-that-a-node-may-hold-the-stale-status.patch) that > demonstrates the {{STATUS}} can be replicated to the peer and become stale. > The fix is to > 1) retain the legacy fields if the cluster is still in mixed mode > 2) remove the legacy field when cluster is fully upgraded -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org