[ 
https://issues.apache.org/jira/browse/HDFS-11192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15831903#comment-15831903
 ] 

Daryn Sharp commented on HDFS-11192:
------------------------------------

Reverting is extreme and imho unwarranted for a very unlikely scenario.  The 
benefit of the parallel init is huge, while exhausting threads is symptomatic 
of a system management issue.

If another process by the same user as the nn exhausted the per-user thread 
limit, strongly consider running the nn as a distinct user.  If processes by 
different users managed to exhaust the os thread limit, then you should 
consider a dedicated nn host since that many other things running on the nn 
will surely degrade performance - horribly.

I agree in principal the NN should be able to detect + retry or abort but I 
suspect the issue is a bug in the jdk which is where it may need to be fixed.  
Or maybe a default exception handler might work.

> OOM during Quota Initialization lead to Namenode hang
> -----------------------------------------------------
>
>                 Key: HDFS-11192
>                 URL: https://issues.apache.org/jira/browse/HDFS-11192
>             Project: Hadoop HDFS
>          Issue Type: Bug
>            Reporter: Brahma Reddy Battula
>            Assignee: Brahma Reddy Battula
>         Attachments: namenodeThreadDump.out
>
>
> AFAIK ,In RecurisveTask Execution, When ForkjoinThreadpool's thread dies or 
> not able to create,it will not notify the parent.Parent still waiting for the 
> notify call..that's not timed waiting also.
>  *Trace from Namenode log* 
> {noformat}
> Exception in thread "ForkJoinPool-1-worker-2" Exception in thread 
> "ForkJoinPool-1-worker-3" java.lang.OutOfMemoryError: unable to create new 
> native thread
>         at java.lang.Thread.start0(Native Method)
>         at java.lang.Thread.start(Thread.java:714)
>         at 
> java.util.concurrent.ForkJoinPool.createWorker(ForkJoinPool.java:1486)
>         at 
> java.util.concurrent.ForkJoinPool.tryAddWorker(ForkJoinPool.java:1517)
>         at 
> java.util.concurrent.ForkJoinPool.deregisterWorker(ForkJoinPool.java:1609)
>         at 
> java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:167)
> java.lang.OutOfMemoryError: unable to create new native thread
>         at java.lang.Thread.start0(Native Method)
>         at java.lang.Thread.start(Thread.java:714)
>         at 
> java.util.concurrent.ForkJoinPool.createWorker(ForkJoinPool.java:1486)
>         at 
> java.util.concurrent.ForkJoinPool.tryAddWorker(ForkJoinPool.java:1517)
>         at 
> java.util.concurrent.ForkJoinPool.deregisterWorker(ForkJoinPool.java:1609)
>         at 
> java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:167)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to