[ 
https://issues.apache.org/jira/browse/CASSANDRA-5669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13703209#comment-13703209
 ] 

Jason Brown commented on CASSANDRA-5669:
----------------------------------------

I spent a lot of time thinking about this :), and I think the situation in this 
ticket is subtly different from what happened in CASSANDRA-5171. I commented on 
that ticket as to why I think it had a problem (short answer: connecting to 
publicIP on non-SSL port). This ticket does not get us into that situation as 
we will continue to connect to the publicIP/(SSL) port - we simply bypass 
reconnecting on the local port if we see the other node has a lower messaging 
version.

I did test out this upgrade scenario a few weeks ago when we concocted it (and 
it worked), and will be happy to try it out again. It'll take a few hours 
(including time for dropping kids of at camp), so I'll update this ticket later 
in the morning.
                
> Connection thrashing in multi-region ec2 during upgrade, due to messaging 
> version
> ---------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-5669
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5669
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 1.2.5
>            Reporter: Jason Brown
>            Assignee: Jason Brown
>            Priority: Minor
>              Labels: ec2, ec2multiregionsnitch, gossip
>             Fix For: 1.2.6, 2.0 beta 1
>
>         Attachments: 5669-v1.diff, 5669-v2.diff
>
>
> While debugging the upgrading scenario described in CASSANDRA-5660, I 
> discovered the ITC.close() will reset the message protocol version of a peer 
> node that disconnects. CASSANDRA-5660 has a full description of the upgrade 
> path, but basically the Ec2MultiRegionSnitch will close connections on the 
> publicIP addr to reconnect on the privateIp, and this causes ITC to drop the 
> message protocol version of previously known nodes. I think we want to hang 
> onto that version so that when the newer node (re-)connects to the lower node 
> version, it passes the correct protocol version rather than the current 
> version (too high for the older node),the connection attempt getting dropped, 
> and going through the dance again.
> To clarify, the 'thrashing' is at a rather low volume, from what I observed. 
> Anecdotaly, perhaps one connection per second gets turned over.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to