[ 
https://issues.apache.org/jira/browse/HADOOP-11604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14327005#comment-14327005
 ] 

Liang Xie commented on HADOOP-11604:
------------------------------------

Thanks for all the valuable comments. After checking the out file, i saw the 
ConcurrentModificationException be thrown at inside the finally block:
{code}
        for (Entry entry : entries.values()) {      <<<< HERE
          sendCallback("close", entries, fdSet, entry.getDomainSocket().fd);
        }
        entries.clear();
{code}

the log is sth like:
{code}
Exception in thread "Thread-25" java.util.ConcurrentModificationException
        at java.util.TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1100)
        at java.util.TreeMap$ValueIterator.next(TreeMap.java:1145)
        at 
org.apache.hadoop.net.unix.DomainSocketWatcher$1.run(DomainSocketWatcher.java:484)
        at java.lang.Thread.run(Thread.java:662)
{code}

so the root cause in our case should be the non thread-safe pattern: foreach 
{treemap.remove}.

> Reach xceiver limit once the watcherThread die
> ----------------------------------------------
>
>                 Key: HADOOP-11604
>                 URL: https://issues.apache.org/jira/browse/HADOOP-11604
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 2.6.0
>            Reporter: Liang Xie
>            Assignee: Liang Xie
>            Priority: Critical
>         Attachments: HADOOP-11604-001.txt, HADOOP-11604-002.txt
>
>
> Our product cluster hit the Xceiver limit even w/ HADOOP-10404 & 
> HADOOP-11333, i found it was caused by DomainSocketWatcher.watcherThread 
> gone. Attached is a possible fix, please review, thanks



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to