[ 
https://issues.apache.org/jira/browse/HADOOP-11604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Nauroth updated HADOOP-11604:
-----------------------------------
    Attachment: HADOOP-11604.003.patch

[~xieliang007], thanks for looking into the .out file and sharing the stack 
trace.  This makes sense now.

I'm attaching a patch that avoids mutating the {{TreeMap}} during the iteration 
in the {{finally}} block.  It's not important to remove as we iterate, because 
after the loop, we {{clear}} the whole map anyway and let it drop out of scope. 
 I don't have a unit test for this.  I still need to look into whether or not 
that's feasible.  My intuition is that it won't be possible to repro the 
problem reliably in a unit test, because throwing 
{{ConcurrentModificationException}} is a best-effort check, not a guarantee.

For the logging problem, I decided to go with the approach of setting an 
uncaught exception handler.  The {{run}} method is already very deeply nested, 
and I think wrapping the whole thing in another try-catch for logging 
{{Throwable}} would hurt readability.

Liang, I hope I'm not intruding by posting a patch on an issue assigned to you. 
 I basically had to write this patch though while researching, so I'd rather 
share it than toss the work away.  :-)

> Reach xceiver limit once the watcherThread die
> ----------------------------------------------
>
>                 Key: HADOOP-11604
>                 URL: https://issues.apache.org/jira/browse/HADOOP-11604
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 2.6.0
>            Reporter: Liang Xie
>            Assignee: Liang Xie
>            Priority: Critical
>         Attachments: HADOOP-11604-001.txt, HADOOP-11604-002.txt, 
> HADOOP-11604.003.patch
>
>
> Our product cluster hit the Xceiver limit even w/ HADOOP-10404 & 
> HADOOP-11333, i found it was caused by DomainSocketWatcher.watcherThread 
> gone. Attached is a possible fix, please review, thanks



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to