Gossiper ConcurrentModificationException after Decommissioning --------------------------------------------------------------
Key: CASSANDRA-1494 URL: https://issues.apache.org/jira/browse/CASSANDRA-1494 Project: Cassandra Issue Type: Bug Components: Core Affects Versions: 0.6.5 Environment: Linux 2.6.33.8-149.fc13.x86_64 #1 SMP Tue Aug 17 22:53:15 UTC 2010 x86_64 x86_64 x86_64 GNU/Linux Reporter: Dan Retzlaff Priority: Critical After decommissioning 192.168.2.147, the Gossiper caused a ConcurrentModificationException in 192.168.2.55. This cascaded into 192.168.2.55 thinking that 192.168.2.148 and 192.168.2.149 repeatedly went UP and then DOWN. Eventually this left so many intranode (storage port) TCP connections in CLOSE_WAIT that other nodes started failing with "too many open files" exceptions. INFO [Timer-0] 2010-09-08 17:00:02,398 Gossiper.java (line 402) FatClient /192.168.2.147 has been silent for 3600000ms, removing from gossip ERROR [Timer-0] 2010-09-08 17:00:02,418 Gossiper.java (line 99) Gossip error java.util.ConcurrentModificationException at java.util.Hashtable$Enumerator.next(Hashtable.java:1031) at org.apache.cassandra.gms.Gossiper.doStatusCheck(Gossiper.java:383) at org.apache.cassandra.gms.Gossiper$GossipTimerTask.run(Gossiper.java:93) at java.util.TimerThread.mainLoop(Timer.java:512) at java.util.TimerThread.run(Timer.java:462) INFO [Timer-0] 2010-09-08 17:00:12,398 Gossiper.java (line 180) InetAddress /192.168.2.148 is now dead. INFO [Timer-0] 2010-09-08 17:00:14,399 Gossiper.java (line 180) InetAddress /192.168.2.149 is now dead. INFO [GMFD:1] 2010-09-08 17:00:19,400 Gossiper.java (line 578) InetAddress /192.168.2.149 is now UP INFO [HINTED-HANDOFF-POOL:1] 2010-09-08 17:00:19,400 HintedHandOffManager.java (line 165) Started hinted handoff for endPoint /192.168.2.149 INFO [HINTED-HANDOFF-POOL:1] 2010-09-08 17:00:19,401 HintedHandOffManager.java (line 222) Finished hinted handoff of 0 rows to endpoint /192.168.2.149 INFO [Timer-0] 2010-09-08 17:00:20,399 Gossiper.java (line 180) InetAddress /192.168.2.149 is now dead. INFO [GMFD:1] 2010-09-08 17:00:43,409 Gossiper.java (line 578) InetAddress /192.168.2.148 is now UP INFO [HINTED-HANDOFF-POOL:1] 2010-09-08 17:00:43,409 HintedHandOffManager.java (line 165) Started hinted handoff for endPoint /192.168.2.148 INFO [HINTED-HANDOFF-POOL:1] 2010-09-08 17:00:43,410 HintedHandOffManager.java (line 222) Finished hinted handoff of 0 rows to endpoint /192.168.2.148 INFO [Timer-0] 2010-09-08 17:00:44,404 Gossiper.java (line 180) InetAddress /192.168.2.148 is now dead. INFO [GMFD:1] 2010-09-08 17:01:18,415 Gossiper.java (line 578) InetAddress /192.168.2.149 is now UP (UP/DOWN cycle repeats until the target node *really* goes DOWN due to too many TCP sockets in CLOSE_WAIT.) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.