[jira] [Commented] (HADOOP-10245) Hadoop command line always appends "-Xmx" option twice

shanyu zhao (JIRA) Mon, 20 Jan 2014 17:23:15 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-10245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13877068#comment-13877068
 ]


shanyu zhao commented on HADOOP-10245:
--------------------------------------

[~ywskycn] yes, it is the same issue. Sorry I didn't see HADOOP-9870 before I 
submit this one. I also found similar JIRAs HADOOP-9211 and HDFS-5087.

I went through these JIRAs and here are my thoughts:
We should only rely on $HADOOP_HEAPSIZE to control Java heap size, instead of 
$HADOOP_CLIENT_OPTS. Otherwise it would be very confusing and hard to debug 
issues. And I've seen many real world issues caused by this confusion.

There are arguments that $HADOOP_HEAPSIZE is only for service, and client 
should have its own settings. Well, we could create HADOOP_CLIENT_HEAPSIZE 
which is initialized to 512m and used in hadoop.sh. But personally I think it 
does not worth it to add this new env variable. The client can just simply use 
$HADOOP_HEAPSIZE which defaults to 1000m. Also, there are scenarios that a java 
class executed by "hadoop jar" command has a large memory requirements. A real 
world example: Hive's MapredLocalTask calls "hadoop jar" to build a local hash 
table.

Also, if there's a need to change the heapsize, one can always set env variable 
$HADOOP_HEAPSIZE.

> Hadoop command line always appends "-Xmx" option twice
> ------------------------------------------------------
>
>                 Key: HADOOP-10245
>                 URL: https://issues.apache.org/jira/browse/HADOOP-10245
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: bin
>    Affects Versions: 2.2.0
>            Reporter: shanyu zhao
>            Assignee: shanyu zhao
>         Attachments: HADOOP-10245.patch
>
>
> The Hadoop command line scripts (hadoop.sh or hadoop.cmd) will call java with 
> "-Xmx" options twice. The impact is that any user defined HADOOP_HEAP_SIZE 
> env variable will take no effect because it is overwritten by the second 
> "-Xmx" option.
> For example, here is the java cmd generated for command "hadoop fs -ls /", 
> Notice that there are two "-Xmx" options: "-Xmx1000m" and "-Xmx512m" in the 
> command line:
> java -Xmx1000m  -Dhadoop.log.dir=C:\tmp\logs -Dhadoop.log.file=hadoop.log 
> -Dhadoop.root.logger=INFO,c
> onsole,DRFA -Xmx512m  -Dhadoop.security.logger=INFO,RFAS -classpath XXX 
> org.apache.hadoop.fs.FsShell -ls /
> Here is the root cause:
> The call flow is: hadoop.sh calls hadoop_config.sh, which in turn calls 
> hadoop-env.sh. 
> In hadoop.sh, the command line is generated by the following pseudo code:
> java $JAVA_HEAP_MAX $HADOOP_CLIENT_OPTS -classpath ...
> In hadoop-config.sh, $JAVA_HEAP_MAX is initialized as "-Xmx1000m" if user 
> didn't set $HADOOP_HEAP_SIZE env variable.
> In hadoop-env.sh, $HADOOP_CLIENT_OPTS is set as this:
> export HADOOP_CLIENT_OPTS="-Xmx512m $HADOOP_CLIENT_OPTS"
> To fix this problem, we should remove the "-Xmx512m" from HADOOP_CLIENT_OPTS. 
> If we really want to change the memory settings we need to use 
> $HADOOP_HEAP_SIZE env variable.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HADOOP-10245) Hadoop command line always appends "-Xmx" option twice

Reply via email to