[ https://issues.apache.org/jira/browse/HADOOP-11604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14327005#comment-14327005 ]
Liang Xie commented on HADOOP-11604: ------------------------------------ Thanks for all the valuable comments. After checking the out file, i saw the ConcurrentModificationException be thrown at inside the finally block: {code} for (Entry entry : entries.values()) { <<<< HERE sendCallback("close", entries, fdSet, entry.getDomainSocket().fd); } entries.clear(); {code} the log is sth like: {code} Exception in thread "Thread-25" java.util.ConcurrentModificationException at java.util.TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1100) at java.util.TreeMap$ValueIterator.next(TreeMap.java:1145) at org.apache.hadoop.net.unix.DomainSocketWatcher$1.run(DomainSocketWatcher.java:484) at java.lang.Thread.run(Thread.java:662) {code} so the root cause in our case should be the non thread-safe pattern: foreach {treemap.remove}. > Reach xceiver limit once the watcherThread die > ---------------------------------------------- > > Key: HADOOP-11604 > URL: https://issues.apache.org/jira/browse/HADOOP-11604 > Project: Hadoop Common > Issue Type: Bug > Affects Versions: 2.6.0 > Reporter: Liang Xie > Assignee: Liang Xie > Priority: Critical > Attachments: HADOOP-11604-001.txt, HADOOP-11604-002.txt > > > Our product cluster hit the Xceiver limit even w/ HADOOP-10404 & > HADOOP-11333, i found it was caused by DomainSocketWatcher.watcherThread > gone. Attached is a possible fix, please review, thanks -- This message was sent by Atlassian JIRA (v6.3.4#6332)