[ https://issues.apache.org/jira/browse/CASSANDRA-1494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12908612#action_12908612 ]
Dan Retzlaff commented on CASSANDRA-1494: ----------------------------------------- Okay. I'd suggest at least following the removeEndpoint() call with a "break" at least on aesthetic grounds, since otherwise that for loop will cause a ConcurrentModificationException every time. > Gossiper ConcurrentModificationException after Decommissioning > -------------------------------------------------------------- > > Key: CASSANDRA-1494 > URL: https://issues.apache.org/jira/browse/CASSANDRA-1494 > Project: Cassandra > Issue Type: Bug > Components: Core > Affects Versions: 0.6.5 > Environment: Linux 2.6.33.8-149.fc13.x86_64 #1 SMP Tue Aug 17 > 22:53:15 UTC 2010 x86_64 x86_64 x86_64 GNU/Linux > Reporter: Dan Retzlaff > > After decommissioning 192.168.2.147, the Gossiper caused a > ConcurrentModificationException in 192.168.2.55. This cascaded into > 192.168.2.55 thinking that 192.168.2.148 and 192.168.2.149 repeatedly went UP > and then DOWN. Eventually this left so many intranode (storage port) TCP > connections in CLOSE_WAIT that other nodes started failing with "too many > open files" exceptions. > INFO [Timer-0] 2010-09-08 17:00:02,398 Gossiper.java (line 402) FatClient > /192.168.2.147 has been silent for 3600000ms, removing from gossip > ERROR [Timer-0] 2010-09-08 17:00:02,418 Gossiper.java (line 99) Gossip error > java.util.ConcurrentModificationException > at java.util.Hashtable$Enumerator.next(Hashtable.java:1031) > at org.apache.cassandra.gms.Gossiper.doStatusCheck(Gossiper.java:383) > at > org.apache.cassandra.gms.Gossiper$GossipTimerTask.run(Gossiper.java:93) > at java.util.TimerThread.mainLoop(Timer.java:512) > at java.util.TimerThread.run(Timer.java:462) > INFO [Timer-0] 2010-09-08 17:00:12,398 Gossiper.java (line 180) InetAddress > /192.168.2.148 is now dead. > INFO [Timer-0] 2010-09-08 17:00:14,399 Gossiper.java (line 180) InetAddress > /192.168.2.149 is now dead. > INFO [GMFD:1] 2010-09-08 17:00:19,400 Gossiper.java (line 578) InetAddress > /192.168.2.149 is now UP > INFO [HINTED-HANDOFF-POOL:1] 2010-09-08 17:00:19,400 > HintedHandOffManager.java (line 165) Started hinted handoff for endPoint > /192.168.2.149 > INFO [HINTED-HANDOFF-POOL:1] 2010-09-08 17:00:19,401 > HintedHandOffManager.java (line 222) Finished hinted handoff of 0 rows to > endpoint /192.168.2.149 > INFO [Timer-0] 2010-09-08 17:00:20,399 Gossiper.java (line 180) InetAddress > /192.168.2.149 is now dead. > INFO [GMFD:1] 2010-09-08 17:00:43,409 Gossiper.java (line 578) InetAddress > /192.168.2.148 is now UP > INFO [HINTED-HANDOFF-POOL:1] 2010-09-08 17:00:43,409 > HintedHandOffManager.java (line 165) Started hinted handoff for endPoint > /192.168.2.148 > INFO [HINTED-HANDOFF-POOL:1] 2010-09-08 17:00:43,410 > HintedHandOffManager.java (line 222) Finished hinted handoff of 0 rows to > endpoint /192.168.2.148 > INFO [Timer-0] 2010-09-08 17:00:44,404 Gossiper.java (line 180) InetAddress > /192.168.2.148 is now dead. > INFO [GMFD:1] 2010-09-08 17:01:18,415 Gossiper.java (line 578) InetAddress > /192.168.2.149 is now UP > (UP/DOWN cycle repeats until the target node *really* goes DOWN due to too > many TCP sockets in CLOSE_WAIT.) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.