[ https://issues.apache.org/jira/browse/HDFS-13393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Work on HDFS-13393 started by Gabor Bota. ----------------------------------------- > Improve OOM logging > ------------------- > > Key: HDFS-13393 > URL: https://issues.apache.org/jira/browse/HDFS-13393 > Project: Hadoop HDFS > Issue Type: Improvement > Components: balancer & mover, datanode > Reporter: Wei-Chiu Chuang > Assignee: Gabor Bota > Priority: Major > > It is not uncommon to find "java.lang.OutOfMemoryError: unable to create new > native thread" errors in a HDFS cluster. Most often this happens when > DataNode creating DataXceiver threads, or when balancer creates threads for > moving blocks around. > In most of cases, the "OOM" is a symptom of number of threads reaching system > limit, rather than actually running out of memory, and the current logging of > this message is usually misleading (suggesting this is due to insufficient > memory) > How about capturing the OOM, and if it is due to "unable to create new native > thread", print some more helpful message like "bump your ulimit" or "take a > jstack of the process"? > Even better, surface this error to make it more visible. It usually takes a > while for an in-depth investigation after users notice some job fails, by the > time the evidences may already been gone (like jstack output). -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org