[ https://issues.apache.org/jira/browse/CASSANDRA-2768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13049243#comment-13049243 ]
Sasha Dolgy commented on CASSANDRA-2768: ---------------------------------------- Hi ... able to give more information now: cassandra:~$ nodetool ring Address Status State Load Owns Token 170141183460469231731687303715884105726 10.128.103.148 Up Normal 961.38 KB 11.22% 19095547144942516281182777765338228798 10.128.94.227 Up Normal 667.56 KB 22.11% 56713727820156410577229101238628035242 10.128.34.18 Up Normal 688.1 KB 33.33% 113427455640312821154458202477256070484 10.128.90.109 Up Normal 965.76 KB 33.33% 170141183460469231731687303715884105726 Not a lot of data. I created a new keyspace with (RF=2), dropped the old one. Ran repair on the nodes, and now I no longer get the error on some of the nodes. I can confirm again all systems are reporting: ReleaseVersion: 0.8.0 from 'nodetool version' I am seeing this error on two of the nodes: ERROR [pool-2-thread-14] 2011-06-14 23:33:40,544 CustomTThreadPoolServer.java (line 199) Thrift error occurred during processing of message. org.apache.thrift.protocol.TProtocolException: Missing version in readMessageBegin, old client? at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:213) at org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2877) at org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:187) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) ERROR [pool-2-thread-16] 2011-06-14 23:33:42,024 CustomTThreadPoolServer.java (line 199) Thrift error occurred during processing of message. org.apache.thrift.protocol.TProtocolException: Missing version in readMessageBegin, old client? at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:213) at org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2877) at org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:187) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) 109 and 148 look to be communicating fine. 18 --> 109 (version error) 18 --> 227 (version error) 227 --> 18 (version error) 227 --> 148 (version error) For my sanity, I checked and confirmed that all four instances are part of the same security group and there are firewall rules allow communication between all four nodes on ports 7000 and 9090 Configuration on all nodes is standard with the following exceptions: #listen_address: localhost endpoint_snitch: org.apache.cassandra.locator.Ec2Snitch > AntiEntropyService excluding nodes that are on version 0.7 or sooner > -------------------------------------------------------------------- > > Key: CASSANDRA-2768 > URL: https://issues.apache.org/jira/browse/CASSANDRA-2768 > Project: Cassandra > Issue Type: Bug > Components: Core > Affects Versions: 0.8.0 > Environment: 4 node environment -- > Originally 0.7.6-2 with a Keyspace defined with RF=3 > Upgraded all nodes ( 1 at a time ) to version 0.8.0: For each node, the node > was shut down, new version was turned on, using the existing data files / > directories and a nodetool repair was run. > Reporter: Sasha Dolgy > Assignee: Brandon Williams > > When I run nodetool repair on any of the nodes, the > /var/log/cassandra/system.log reports errors similar to: > INFO [manual-repair-1c6b33bc-ef14-4ec8-94f6-f1464ec8bdec] 2011-06-13 > 21:28:39,877 AntiEntropyService.java (line 177) Excluding /10.128.34.18 from > repair because it is on version 0.7 or sooner. You should consider updating > this node before running repair again. > ERROR [manual-repair-1c6b33bc-ef14-4ec8-94f6-f1464ec8bdec] 2011-06-13 > 21:28:39,877 AbstractCassandraDaemon.java (line 113) Fatal exception in > thread Thread[manual-repair-1c6b33bc-ef14-4ec8-94f6-f1464ec8bdec,5,RMI > Runtime] > java.util.ConcurrentModificationException > at java.util.HashMap$HashIterator.nextEntry(HashMap.java:793) > at java.util.HashMap$KeyIterator.next(HashMap.java:828) > at > org.apache.cassandra.service.AntiEntropyService.getNeighbors(AntiEntropyService.java:173) > at > org.apache.cassandra.service.AntiEntropyService$RepairSession.run(AntiEntropyService.java:776) > The INFO message and subsequent ERROR message are logged for 2 nodes .. I > suspect that this is because RF=3. > nodetool ring shows that all nodes are up. > Client connections (read / write) are not having issues.. > nodetool version on all nodes shows that each node is 0.8.0 > At suggestion of some contributors, I have restarted each node and tried to > run a nodetool repair again ... the result is the same with the messages > being logged. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira