[
https://issues.apache.org/jira/browse/HADOOP-11604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Chris Nauroth updated HADOOP-11604:
-----------------------------------
Attachment: HADOOP-11604.003.patch
[~xieliang007], thanks for looking into the .out file and sharing the stack
trace. This makes sense now.
I'm attaching a patch that avoids mutating the {{TreeMap}} during the iteration
in the {{finally}} block. It's not important to remove as we iterate, because
after the loop, we {{clear}} the whole map anyway and let it drop out of scope.
I don't have a unit test for this. I still need to look into whether or not
that's feasible. My intuition is that it won't be possible to repro the
problem reliably in a unit test, because throwing
{{ConcurrentModificationException}} is a best-effort check, not a guarantee.
For the logging problem, I decided to go with the approach of setting an
uncaught exception handler. The {{run}} method is already very deeply nested,
and I think wrapping the whole thing in another try-catch for logging
{{Throwable}} would hurt readability.
Liang, I hope I'm not intruding by posting a patch on an issue assigned to you.
I basically had to write this patch though while researching, so I'd rather
share it than toss the work away. :-)
> Reach xceiver limit once the watcherThread die
> ----------------------------------------------
>
> Key: HADOOP-11604
> URL: https://issues.apache.org/jira/browse/HADOOP-11604
> Project: Hadoop Common
> Issue Type: Bug
> Affects Versions: 2.6.0
> Reporter: Liang Xie
> Assignee: Liang Xie
> Priority: Critical
> Attachments: HADOOP-11604-001.txt, HADOOP-11604-002.txt,
> HADOOP-11604.003.patch
>
>
> Our product cluster hit the Xceiver limit even w/ HADOOP-10404 &
> HADOOP-11333, i found it was caused by DomainSocketWatcher.watcherThread
> gone. Attached is a possible fix, please review, thanks
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)