[ https://issues.apache.org/jira/browse/CASSANDRA-5665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jason Brown updated CASSANDRA-5665: ----------------------------------- Fix Version/s: (was: 1.2.7) (was: 2.0 beta 1) > Gossiper.handleMajorStateChange can lose existing node ApplicationState > ----------------------------------------------------------------------- > > Key: CASSANDRA-5665 > URL: https://issues.apache.org/jira/browse/CASSANDRA-5665 > Project: Cassandra > Issue Type: Bug > Components: Core > Affects Versions: 1.2.5 > Reporter: Jason Brown > Assignee: Jason Brown > Priority: Minor > Labels: gossip, upgrade > Attachments: 5665-v1.diff, 5665-v2.diff > > > Dovetailing on CASSANDRA-5660, I discovered that further along during an > upgrade, when more nodes are on the new major version, a node the previous > version can get passed some incomplete Gossip info about another, already > upgraded node, and the older node drops AppStat info about that node. > I think what happens is that a 1.1 node (older rev) gets gossip info from a > 1.2 node (A), which includes incomplete (lacking some AppState data) gossip > info about another 1.2 node (B). The 1.1 node, which has marked incorrectly > kicked node B out of gossip due to the bug described in #5660, then takes > that incomplete node B info and wholesale replaces any previous known state > about node B in Gossiper.handleMajorStateChanged. Thus, if we previously had > DC/RACK info, it'll get dropped as part of the > endpointStateMap.put(endpointstate). When the data being pased is incomplete, > 1.1 will start referencing node B and gets into the NPE situation in #5498. > Anecdotally, this bad state is short-lived, less than a few minutes, even as > short as ten seconds, until gossip catches up and properly propagates the > AppState data. Furthermore, when upgrading a two datacenter, 48 node cluster, > it only occurred on two nodes for less than a minute each. Thus, the scope > seems limited but can occur. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira