[ 
https://issues.apache.org/jira/browse/CASSANDRA-3166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13101327#comment-13101327
 ] 

Peter Schuller commented on CASSANDRA-3166:
-------------------------------------------

I'm having difficulty coming up with a clean yet simple fix here. Reverting 
CASSANDRA-2860 certainly fixes this problem, but re-introduces CASSANDRA-2860 
instead.

I could imagine an environment variable/config option to disable the support 
for pretending you are older than you are, which could be used in a second 
round of rolling restarts after upgrading all nodes of a cluster to 0.8. A JMX 
tweakable setting would be nice, but upon changing it you'd want to tear down 
all the TCP connections to re-initiate versioning negotiation so maybe it's 
okay to leave it with an extra round of restarts required.

Alternatively, I think (not tested) things will tend to sort itself out 
incrementally every time you restart a 0.8 node since it will tend to initiate 
connections to other nodes immediately, but documenting for users that they 
need to restart nodes all over the place until everyone seems to have gotten it 
seems like a poor solution.

Adding some new kind of message that says "i really am this other version" or 
similar isn't clean.

Am I missing a much simpler and cleaner fix here?


> Rolling upgrades from 0.7 to 0.8 not possible
> ---------------------------------------------
>
>                 Key: CASSANDRA-3166
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3166
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.7.5, 0.7.9, 0.8.4
>            Reporter: Marcus Eriksson
>
> We are in the progress of upgrading to 0.8 and we need to do a rolling 
> upgrade, this fails miserably and it is reproducible;
> 1. set up a 3 node cluster with 0.7.9 and rf=3, read and write, QUORUM
> 2. upgrade one of the nodes (i upped a seednode, not sure if that is 
> important)
> 3. continue reading/writing
> 4. see logs on the 0.7 node fill up with: INFO 12:36:08,240 Received 
> connection from newer protocol version. Ignorning message.
> it does work if i start the 0.7.9 nodes *after* the 0.8.4 node which makes me 
> think that it matters if it is the 0.8 node connecting to the 0.7 nodes or 
> the other way round.
> Debug logging on the 0.8 node shows:
> /var/log/cassandra/system.log.9:DEBUG [pool-2-thread-82] 2011-09-09 
> 11:55:06,067 StorageProxy.java (line 178) Write timeout 
> java.util.concurrent.TimeoutException for one (or more) of: 
> /var/log/cassandra/system.log.9:DEBUG [pool-2-thread-76] 2011-09-09 
> 11:55:06,067 StorageProxy.java (line 584) Read timeout: 
> java.util.concurrent.TimeoutException: Operation timed out - received only 1 
> responses from /193.182.3.92,  .
> nothing except for the "newer protocol version..." in the 0.7-logs
> i will continue to look at this issue but if anyone has a quick patch, let me 
> know

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to