[ https://issues.apache.org/jira/browse/HADOOP-10245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13877068#comment-13877068 ]
shanyu zhao commented on HADOOP-10245: -------------------------------------- [~ywskycn] yes, it is the same issue. Sorry I didn't see HADOOP-9870 before I submit this one. I also found similar JIRAs HADOOP-9211 and HDFS-5087. I went through these JIRAs and here are my thoughts: We should only rely on $HADOOP_HEAPSIZE to control Java heap size, instead of $HADOOP_CLIENT_OPTS. Otherwise it would be very confusing and hard to debug issues. And I've seen many real world issues caused by this confusion. There are arguments that $HADOOP_HEAPSIZE is only for service, and client should have its own settings. Well, we could create HADOOP_CLIENT_HEAPSIZE which is initialized to 512m and used in hadoop.sh. But personally I think it does not worth it to add this new env variable. The client can just simply use $HADOOP_HEAPSIZE which defaults to 1000m. Also, there are scenarios that a java class executed by "hadoop jar" command has a large memory requirements. A real world example: Hive's MapredLocalTask calls "hadoop jar" to build a local hash table. Also, if there's a need to change the heapsize, one can always set env variable $HADOOP_HEAPSIZE. > Hadoop command line always appends "-Xmx" option twice > ------------------------------------------------------ > > Key: HADOOP-10245 > URL: https://issues.apache.org/jira/browse/HADOOP-10245 > Project: Hadoop Common > Issue Type: Bug > Components: bin > Affects Versions: 2.2.0 > Reporter: shanyu zhao > Assignee: shanyu zhao > Attachments: HADOOP-10245.patch > > > The Hadoop command line scripts (hadoop.sh or hadoop.cmd) will call java with > "-Xmx" options twice. The impact is that any user defined HADOOP_HEAP_SIZE > env variable will take no effect because it is overwritten by the second > "-Xmx" option. > For example, here is the java cmd generated for command "hadoop fs -ls /", > Notice that there are two "-Xmx" options: "-Xmx1000m" and "-Xmx512m" in the > command line: > java -Xmx1000m -Dhadoop.log.dir=C:\tmp\logs -Dhadoop.log.file=hadoop.log > -Dhadoop.root.logger=INFO,c > onsole,DRFA -Xmx512m -Dhadoop.security.logger=INFO,RFAS -classpath XXX > org.apache.hadoop.fs.FsShell -ls / > Here is the root cause: > The call flow is: hadoop.sh calls hadoop_config.sh, which in turn calls > hadoop-env.sh. > In hadoop.sh, the command line is generated by the following pseudo code: > java $JAVA_HEAP_MAX $HADOOP_CLIENT_OPTS -classpath ... > In hadoop-config.sh, $JAVA_HEAP_MAX is initialized as "-Xmx1000m" if user > didn't set $HADOOP_HEAP_SIZE env variable. > In hadoop-env.sh, $HADOOP_CLIENT_OPTS is set as this: > export HADOOP_CLIENT_OPTS="-Xmx512m $HADOOP_CLIENT_OPTS" > To fix this problem, we should remove the "-Xmx512m" from HADOOP_CLIENT_OPTS. > If we really want to change the memory settings we need to use > $HADOOP_HEAP_SIZE env variable. -- This message was sent by Atlassian JIRA (v6.1.5#6160)