[ https://issues.apache.org/jira/browse/CASSANDRA-5669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13703209#comment-13703209 ]
Jason Brown commented on CASSANDRA-5669: ---------------------------------------- I spent a lot of time thinking about this :), and I think the situation in this ticket is subtly different from what happened in CASSANDRA-5171. I commented on that ticket as to why I think it had a problem (short answer: connecting to publicIP on non-SSL port). This ticket does not get us into that situation as we will continue to connect to the publicIP/(SSL) port - we simply bypass reconnecting on the local port if we see the other node has a lower messaging version. I did test out this upgrade scenario a few weeks ago when we concocted it (and it worked), and will be happy to try it out again. It'll take a few hours (including time for dropping kids of at camp), so I'll update this ticket later in the morning. > Connection thrashing in multi-region ec2 during upgrade, due to messaging > version > --------------------------------------------------------------------------------- > > Key: CASSANDRA-5669 > URL: https://issues.apache.org/jira/browse/CASSANDRA-5669 > Project: Cassandra > Issue Type: Bug > Components: Core > Affects Versions: 1.2.5 > Reporter: Jason Brown > Assignee: Jason Brown > Priority: Minor > Labels: ec2, ec2multiregionsnitch, gossip > Fix For: 1.2.6, 2.0 beta 1 > > Attachments: 5669-v1.diff, 5669-v2.diff > > > While debugging the upgrading scenario described in CASSANDRA-5660, I > discovered the ITC.close() will reset the message protocol version of a peer > node that disconnects. CASSANDRA-5660 has a full description of the upgrade > path, but basically the Ec2MultiRegionSnitch will close connections on the > publicIP addr to reconnect on the privateIp, and this causes ITC to drop the > message protocol version of previously known nodes. I think we want to hang > onto that version so that when the newer node (re-)connects to the lower node > version, it passes the correct protocol version rather than the current > version (too high for the older node),the connection attempt getting dropped, > and going through the dance again. > To clarify, the 'thrashing' is at a rather low volume, from what I observed. > Anecdotaly, perhaps one connection per second gets turned over. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira