[ https://issues.apache.org/jira/browse/CASSANDRA-6082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13882786#comment-13882786 ]
Chris Burroughs commented on CASSANDRA-6082: -------------------------------------------- Duplicate of which ticket? > 1.1.12 --> 1.2.x upgrade may result inconsistent ring > ----------------------------------------------------- > > Key: CASSANDRA-6082 > URL: https://issues.apache.org/jira/browse/CASSANDRA-6082 > Project: Cassandra > Issue Type: Bug > Components: Core > Environment: 1.1.12 --> 1.2.9 > Reporter: Chris Burroughs > Priority: Minor > Attachments: c-gossipinfo, c-status > > > This happened to me once, and since I don't have any more 1.1.x clusters I > won't be testing again. I hope the attached files are enough for someone to > connect the dots. > I did a rolling restart to upgrade from 1.1.12 --> 1.2.9. About a week later > I discovered that one node was in an inconsistent state in the ring. It was > either: > * up > * host-id=null > * missing > Depending on which node I ran nodetool status from. I *think* I just missed > this during the upgrade but can not rule out the possibility that it "just > happened for no reason" some time after the upgrade. It was detected when > running repair in such a ring caused all sorts of terrible data "duplication" > and performance tanked. Restarting the seeds + "bad" node caused the ring to > be consistent again. > Two possibly suspicious things are a ArrayIndexOutOfBoundsException on > startup: > {noformat} > ERROR [GossipStage:1] 2013-09-06 10:45:35,213 CassandraDaemon.java (line 194) > Exception in thread Thread[GossipStage:1,5,main] > java.lang.ArrayIndexOutOfBoundsException: 2 > at > org.apache.cassandra.service.StorageService.extractExpireTime(StorageService.java:1660) > at > org.apache.cassandra.service.StorageService.handleStateRemoving(StorageService.java:1607) > at > org.apache.cassandra.service.StorageService.onChange(StorageService.java:1230) > at > org.apache.cassandra.service.StorageService.onJoin(StorageService.java:1958) > at > org.apache.cassandra.gms.Gossiper.handleMajorStateChange(Gossiper.java:841) > at > org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:919) > at > org.apache.cassandra.gms.GossipDigestAck2VerbHandler.doVerb(GossipDigestAck2VerbHandler.java:50) > at > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:56) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > {noformat} > and problems to hint delivery to multiple node. > {noformat} > ERROR [MutationStage:11] 2013-09-06 13:59:19,604 CassandraDaemon.java (line > 194) Exception in thread Thread[MutationStage:11,5,main] > java.lang.AssertionError: Missing host ID for 10.20.2.45 > at > org.apache.cassandra.service.StorageProxy.writeHintForMutation(StorageProxy.java:583) > at > org.apache.cassandra.service.StorageProxy$5.runMayThrow(StorageProxy.java:552) > at > org.apache.cassandra.service.StorageProxy$HintRunnable.run(StorageProxy.java:1658) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > {noformat} > Not however that while there were delivery problems to multiple nodes during > the rolling upgrade, only one node was in a funky state a week later. > Attached are the results of running gossipinfo and status on every node. -- This message was sent by Atlassian JIRA (v6.1.5#6160)