[jira] [Updated] (HDFS-13393) Improve OOM logging

Wei-Chiu Chuang (JIRA) Tue, 03 Apr 2018 14:37:05 -0700

     [ 
https://issues.apache.org/jira/browse/HDFS-13393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Wei-Chiu Chuang updated HDFS-13393:
-----------------------------------
    Description: 
It is not uncommon to find "java.lang.OutOfMemoryError: unable to create new 
native thread" errors in a HDFS cluster. Most often this happens when DataNode 
creating DataXceiver threads, or when balancer creates threads for moving 
blocks around.

In most of cases, the "OOM" is a symptom of number of threads reaching system 
limit, rather than actually running out of memory, and the current logging of 
this message is usually misleading (suggesting this is due to insufficient 
memory)

How about capturing the OOM, and if it is due to "unable to create new native 
thread", print some more helpful message like "bump your ulimit" or "take a 
jstack of the process"?

  was:
It is not uncommon to find "java.lang.OutOfMemoryError: unable to create new 
native thread" error in a HDFS cluster. Most often this happens when DataNode 
creating DataXceiver threads, or when balancer creates threads for moving 
blocks around.

In most of cases, the "OOM" is a symptom of number of threads reaching system 
limit, rather than actually running out of memory.

How about capturing the OOM, and if it is due to "unable to create new native 
thread", print some more helpful message like "bump your ulimit" or "take a 
jstack of the process"?


> Improve OOM logging
> -------------------
>
>                 Key: HDFS-13393
>                 URL: https://issues.apache.org/jira/browse/HDFS-13393
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: balancer &amp; mover, datanode
>            Reporter: Wei-Chiu Chuang
>            Priority: Major
>
> It is not uncommon to find "java.lang.OutOfMemoryError: unable to create new 
> native thread" errors in a HDFS cluster. Most often this happens when 
> DataNode creating DataXceiver threads, or when balancer creates threads for 
> moving blocks around.
> In most of cases, the "OOM" is a symptom of number of threads reaching system 
> limit, rather than actually running out of memory, and the current logging of 
> this message is usually misleading (suggesting this is due to insufficient 
> memory)
> How about capturing the OOM, and if it is due to "unable to create new native 
> thread", print some more helpful message like "bump your ulimit" or "take a 
> jstack of the process"?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Updated] (HDFS-13393) Improve OOM logging

Reply via email to