[jira] [Commented] (HADOOP-10245) Hadoop command line always appends "-Xmx" option twice
[ https://issues.apache.org/jira/browse/HADOOP-10245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13881057#comment-13881057 ] Wei Yan commented on HADOOP-10245: -- [~shanyu], sorry for the late reply. So you mean we only let users specify -Xmx through $JAVA_HEAP_MAX? > Hadoop command line always appends "-Xmx" option twice > -- > > Key: HADOOP-10245 > URL: https://issues.apache.org/jira/browse/HADOOP-10245 > Project: Hadoop Common > Issue Type: Bug > Components: bin >Affects Versions: 2.2.0 >Reporter: shanyu zhao >Assignee: shanyu zhao > Attachments: HADOOP-10245.patch > > > The Hadoop command line scripts (hadoop.sh or hadoop.cmd) will call java with > "-Xmx" options twice. The impact is that any user defined HADOOP_HEAP_SIZE > env variable will take no effect because it is overwritten by the second > "-Xmx" option. > For example, here is the java cmd generated for command "hadoop fs -ls /", > Notice that there are two "-Xmx" options: "-Xmx1000m" and "-Xmx512m" in the > command line: > java -Xmx1000m -Dhadoop.log.dir=C:\tmp\logs -Dhadoop.log.file=hadoop.log > -Dhadoop.root.logger=INFO,c > onsole,DRFA -Xmx512m -Dhadoop.security.logger=INFO,RFAS -classpath XXX > org.apache.hadoop.fs.FsShell -ls / > Here is the root cause: > The call flow is: hadoop.sh calls hadoop_config.sh, which in turn calls > hadoop-env.sh. > In hadoop.sh, the command line is generated by the following pseudo code: > java $JAVA_HEAP_MAX $HADOOP_CLIENT_OPTS -classpath ... > In hadoop-config.sh, $JAVA_HEAP_MAX is initialized as "-Xmx1000m" if user > didn't set $HADOOP_HEAP_SIZE env variable. > In hadoop-env.sh, $HADOOP_CLIENT_OPTS is set as this: > export HADOOP_CLIENT_OPTS="-Xmx512m $HADOOP_CLIENT_OPTS" > To fix this problem, we should remove the "-Xmx512m" from HADOOP_CLIENT_OPTS. > If we really want to change the memory settings we need to use > $HADOOP_HEAP_SIZE env variable. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HADOOP-10245) Hadoop command line always appends "-Xmx" option twice
[ https://issues.apache.org/jira/browse/HADOOP-10245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13877905#comment-13877905 ] shanyu zhao commented on HADOOP-10245: -- [~ywskycn] Thank you for your comment! If we remove -Xmx512m from HADOOP_CLIENT_OPTS in hadoop_env.cmd, there will be one and only one -Xmx, which is the $JAVA_HEAP_MAX in bin/hadoop. HADOOP-9870 may have solved the problem for you, but I think the fix in HADOOP-9870 might be too complicated and hard to maintain. For example, what about user use "-Xmx" in HADOOP_OPTS instead of HADOOP_CLIENT_OPTS? I think we should avoid using HADOOP_CLIENT_OPTS or HADOOP_OPTS to specify memory, because the fact that we've defined HADOOP_HEAPSIZE but not using it for memory specification is confusing. If you want to change heap size, just change HADOOP_HEAPSIZE, I think this is simple and clear. Thoughts? > Hadoop command line always appends "-Xmx" option twice > -- > > Key: HADOOP-10245 > URL: https://issues.apache.org/jira/browse/HADOOP-10245 > Project: Hadoop Common > Issue Type: Bug > Components: bin >Affects Versions: 2.2.0 >Reporter: shanyu zhao >Assignee: shanyu zhao > Attachments: HADOOP-10245.patch > > > The Hadoop command line scripts (hadoop.sh or hadoop.cmd) will call java with > "-Xmx" options twice. The impact is that any user defined HADOOP_HEAP_SIZE > env variable will take no effect because it is overwritten by the second > "-Xmx" option. > For example, here is the java cmd generated for command "hadoop fs -ls /", > Notice that there are two "-Xmx" options: "-Xmx1000m" and "-Xmx512m" in the > command line: > java -Xmx1000m -Dhadoop.log.dir=C:\tmp\logs -Dhadoop.log.file=hadoop.log > -Dhadoop.root.logger=INFO,c > onsole,DRFA -Xmx512m -Dhadoop.security.logger=INFO,RFAS -classpath XXX > org.apache.hadoop.fs.FsShell -ls / > Here is the root cause: > The call flow is: hadoop.sh calls hadoop_config.sh, which in turn calls > hadoop-env.sh. > In hadoop.sh, the command line is generated by the following pseudo code: > java $JAVA_HEAP_MAX $HADOOP_CLIENT_OPTS -classpath ... > In hadoop-config.sh, $JAVA_HEAP_MAX is initialized as "-Xmx1000m" if user > didn't set $HADOOP_HEAP_SIZE env variable. > In hadoop-env.sh, $HADOOP_CLIENT_OPTS is set as this: > export HADOOP_CLIENT_OPTS="-Xmx512m $HADOOP_CLIENT_OPTS" > To fix this problem, we should remove the "-Xmx512m" from HADOOP_CLIENT_OPTS. > If we really want to change the memory settings we need to use > $HADOOP_HEAP_SIZE env variable. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HADOOP-10245) Hadoop command line always appends "-Xmx" option twice
[ https://issues.apache.org/jira/browse/HADOOP-10245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13877837#comment-13877837 ] Wei Yan commented on HADOOP-10245: -- [~shanyu], as discussed, there are multiple places configuring -Xmx. In the lastest patch in HADOOP-9870 provided by [~jhsenjaliya], $HADOOP_HEAPSIZE is checked firstly; if not set, assign -Xmx512m. Additionally, in bin/hadoop, also check the -Xmx configuration, to avoid duplicate configurations. Simply remove -Xmx512m from HADOOP_CLIENT_OPTS may still generate multiple -Xmx, as bin/hadoop also has a default $JAVA_HEAP_MAX, which is 1000m. IMO, I think HADOOP-9870 has fixed this issue. > Hadoop command line always appends "-Xmx" option twice > -- > > Key: HADOOP-10245 > URL: https://issues.apache.org/jira/browse/HADOOP-10245 > Project: Hadoop Common > Issue Type: Bug > Components: bin >Affects Versions: 2.2.0 >Reporter: shanyu zhao >Assignee: shanyu zhao > Attachments: HADOOP-10245.patch > > > The Hadoop command line scripts (hadoop.sh or hadoop.cmd) will call java with > "-Xmx" options twice. The impact is that any user defined HADOOP_HEAP_SIZE > env variable will take no effect because it is overwritten by the second > "-Xmx" option. > For example, here is the java cmd generated for command "hadoop fs -ls /", > Notice that there are two "-Xmx" options: "-Xmx1000m" and "-Xmx512m" in the > command line: > java -Xmx1000m -Dhadoop.log.dir=C:\tmp\logs -Dhadoop.log.file=hadoop.log > -Dhadoop.root.logger=INFO,c > onsole,DRFA -Xmx512m -Dhadoop.security.logger=INFO,RFAS -classpath XXX > org.apache.hadoop.fs.FsShell -ls / > Here is the root cause: > The call flow is: hadoop.sh calls hadoop_config.sh, which in turn calls > hadoop-env.sh. > In hadoop.sh, the command line is generated by the following pseudo code: > java $JAVA_HEAP_MAX $HADOOP_CLIENT_OPTS -classpath ... > In hadoop-config.sh, $JAVA_HEAP_MAX is initialized as "-Xmx1000m" if user > didn't set $HADOOP_HEAP_SIZE env variable. > In hadoop-env.sh, $HADOOP_CLIENT_OPTS is set as this: > export HADOOP_CLIENT_OPTS="-Xmx512m $HADOOP_CLIENT_OPTS" > To fix this problem, we should remove the "-Xmx512m" from HADOOP_CLIENT_OPTS. > If we really want to change the memory settings we need to use > $HADOOP_HEAP_SIZE env variable. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HADOOP-10245) Hadoop command line always appends "-Xmx" option twice
[ https://issues.apache.org/jira/browse/HADOOP-10245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1384#comment-1384 ] shanyu zhao commented on HADOOP-10245: -- [~drankye], [~qwertymaniac] would you please help review this patch? > Hadoop command line always appends "-Xmx" option twice > -- > > Key: HADOOP-10245 > URL: https://issues.apache.org/jira/browse/HADOOP-10245 > Project: Hadoop Common > Issue Type: Bug > Components: bin >Affects Versions: 2.2.0 >Reporter: shanyu zhao >Assignee: shanyu zhao > Attachments: HADOOP-10245.patch > > > The Hadoop command line scripts (hadoop.sh or hadoop.cmd) will call java with > "-Xmx" options twice. The impact is that any user defined HADOOP_HEAP_SIZE > env variable will take no effect because it is overwritten by the second > "-Xmx" option. > For example, here is the java cmd generated for command "hadoop fs -ls /", > Notice that there are two "-Xmx" options: "-Xmx1000m" and "-Xmx512m" in the > command line: > java -Xmx1000m -Dhadoop.log.dir=C:\tmp\logs -Dhadoop.log.file=hadoop.log > -Dhadoop.root.logger=INFO,c > onsole,DRFA -Xmx512m -Dhadoop.security.logger=INFO,RFAS -classpath XXX > org.apache.hadoop.fs.FsShell -ls / > Here is the root cause: > The call flow is: hadoop.sh calls hadoop_config.sh, which in turn calls > hadoop-env.sh. > In hadoop.sh, the command line is generated by the following pseudo code: > java $JAVA_HEAP_MAX $HADOOP_CLIENT_OPTS -classpath ... > In hadoop-config.sh, $JAVA_HEAP_MAX is initialized as "-Xmx1000m" if user > didn't set $HADOOP_HEAP_SIZE env variable. > In hadoop-env.sh, $HADOOP_CLIENT_OPTS is set as this: > export HADOOP_CLIENT_OPTS="-Xmx512m $HADOOP_CLIENT_OPTS" > To fix this problem, we should remove the "-Xmx512m" from HADOOP_CLIENT_OPTS. > If we really want to change the memory settings we need to use > $HADOOP_HEAP_SIZE env variable. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HADOOP-10245) Hadoop command line always appends "-Xmx" option twice
[ https://issues.apache.org/jira/browse/HADOOP-10245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13877068#comment-13877068 ] shanyu zhao commented on HADOOP-10245: -- [~ywskycn] yes, it is the same issue. Sorry I didn't see HADOOP-9870 before I submit this one. I also found similar JIRAs HADOOP-9211 and HDFS-5087. I went through these JIRAs and here are my thoughts: We should only rely on $HADOOP_HEAPSIZE to control Java heap size, instead of $HADOOP_CLIENT_OPTS. Otherwise it would be very confusing and hard to debug issues. And I've seen many real world issues caused by this confusion. There are arguments that $HADOOP_HEAPSIZE is only for service, and client should have its own settings. Well, we could create HADOOP_CLIENT_HEAPSIZE which is initialized to 512m and used in hadoop.sh. But personally I think it does not worth it to add this new env variable. The client can just simply use $HADOOP_HEAPSIZE which defaults to 1000m. Also, there are scenarios that a java class executed by "hadoop jar" command has a large memory requirements. A real world example: Hive's MapredLocalTask calls "hadoop jar" to build a local hash table. Also, if there's a need to change the heapsize, one can always set env variable $HADOOP_HEAPSIZE. > Hadoop command line always appends "-Xmx" option twice > -- > > Key: HADOOP-10245 > URL: https://issues.apache.org/jira/browse/HADOOP-10245 > Project: Hadoop Common > Issue Type: Bug > Components: bin >Affects Versions: 2.2.0 >Reporter: shanyu zhao >Assignee: shanyu zhao > Attachments: HADOOP-10245.patch > > > The Hadoop command line scripts (hadoop.sh or hadoop.cmd) will call java with > "-Xmx" options twice. The impact is that any user defined HADOOP_HEAP_SIZE > env variable will take no effect because it is overwritten by the second > "-Xmx" option. > For example, here is the java cmd generated for command "hadoop fs -ls /", > Notice that there are two "-Xmx" options: "-Xmx1000m" and "-Xmx512m" in the > command line: > java -Xmx1000m -Dhadoop.log.dir=C:\tmp\logs -Dhadoop.log.file=hadoop.log > -Dhadoop.root.logger=INFO,c > onsole,DRFA -Xmx512m -Dhadoop.security.logger=INFO,RFAS -classpath XXX > org.apache.hadoop.fs.FsShell -ls / > Here is the root cause: > The call flow is: hadoop.sh calls hadoop_config.sh, which in turn calls > hadoop-env.sh. > In hadoop.sh, the command line is generated by the following pseudo code: > java $JAVA_HEAP_MAX $HADOOP_CLIENT_OPTS -classpath ... > In hadoop-config.sh, $JAVA_HEAP_MAX is initialized as "-Xmx1000m" if user > didn't set $HADOOP_HEAP_SIZE env variable. > In hadoop-env.sh, $HADOOP_CLIENT_OPTS is set as this: > export HADOOP_CLIENT_OPTS="-Xmx512m $HADOOP_CLIENT_OPTS" > To fix this problem, we should remove the "-Xmx512m" from HADOOP_CLIENT_OPTS. > If we really want to change the memory settings we need to use > $HADOOP_HEAP_SIZE env variable. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HADOOP-10245) Hadoop command line always appends "-Xmx" option twice
[ https://issues.apache.org/jira/browse/HADOOP-10245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13877017#comment-13877017 ] Hadoop QA commented on HADOOP-10245: {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12624030/HADOOP-10245.patch against trunk revision . {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 eclipse:eclipse{color}. The patch built with eclipse:eclipse. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 core tests{color}. The patch passed unit tests in hadoop-common-project/hadoop-common. {color:green}+1 contrib tests{color}. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-HADOOP-Build/3449//testReport/ Console output: https://builds.apache.org/job/PreCommit-HADOOP-Build/3449//console This message is automatically generated. > Hadoop command line always appends "-Xmx" option twice > -- > > Key: HADOOP-10245 > URL: https://issues.apache.org/jira/browse/HADOOP-10245 > Project: Hadoop Common > Issue Type: Bug > Components: bin >Affects Versions: 2.2.0 >Reporter: shanyu zhao >Assignee: shanyu zhao > Attachments: HADOOP-10245.patch > > > The Hadoop command line scripts (hadoop.sh or hadoop.cmd) will call java with > "-Xmx" options twice. The impact is that any user defined HADOOP_HEAP_SIZE > env variable will take no effect because it is overwritten by the second > "-Xmx" option. > For example, here is the java cmd generated for command "hadoop fs -ls /", > Notice that there are two "-Xmx" options: "-Xmx1000m" and "-Xmx512m" in the > command line: > java -Xmx1000m -Dhadoop.log.dir=C:\tmp\logs -Dhadoop.log.file=hadoop.log > -Dhadoop.root.logger=INFO,c > onsole,DRFA -Xmx512m -Dhadoop.security.logger=INFO,RFAS -classpath XXX > org.apache.hadoop.fs.FsShell -ls / > Here is the root cause: > The call flow is: hadoop.sh calls hadoop_config.sh, which in turn calls > hadoop-env.sh. > In hadoop.sh, the command line is generated by the following pseudo code: > java $JAVA_HEAP_MAX $HADOOP_CLIENT_OPTS -classpath ... > In hadoop-config.sh, $JAVA_HEAP_MAX is initialized as "-Xmx1000m" if user > didn't set $HADOOP_HEAP_SIZE env variable. > In hadoop-env.sh, $HADOOP_CLIENT_OPTS is set as this: > export HADOOP_CLIENT_OPTS="-Xmx512m $HADOOP_CLIENT_OPTS" > To fix this problem, we should remove the "-Xmx512m" from HADOOP_CLIENT_OPTS. > If we really want to change the memory settings we need to use > $HADOOP_HEAP_SIZE env variable. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HADOOP-10245) Hadoop command line always appends "-Xmx" option twice
[ https://issues.apache.org/jira/browse/HADOOP-10245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13877002#comment-13877002 ] Wei Yan commented on HADOOP-10245: -- Hey, shanyu. Is this one related to HADOOP-9870? > Hadoop command line always appends "-Xmx" option twice > -- > > Key: HADOOP-10245 > URL: https://issues.apache.org/jira/browse/HADOOP-10245 > Project: Hadoop Common > Issue Type: Bug > Components: bin >Affects Versions: 2.2.0 >Reporter: shanyu zhao >Assignee: shanyu zhao > Attachments: HADOOP-10245.patch > > > The Hadoop command line scripts (hadoop.sh or hadoop.cmd) will call java with > "-Xmx" options twice. The impact is that any user defined HADOOP_HEAP_SIZE > env variable will take no effect because it is overwritten by the second > "-Xmx" option. > For example, here is the java cmd generated for command "hadoop fs -ls /", > Notice that there are two "-Xmx" options: "-Xmx1000m" and "-Xmx512m" in the > command line: > java -Xmx1000m -Dhadoop.log.dir=C:\tmp\logs -Dhadoop.log.file=hadoop.log > -Dhadoop.root.logger=INFO,c > onsole,DRFA -Xmx512m -Dhadoop.security.logger=INFO,RFAS -classpath XXX > org.apache.hadoop.fs.FsShell -ls / > Here is the root cause: > The call flow is: hadoop.sh calls hadoop_config.sh, which in turn calls > hadoop-env.sh. > In hadoop.sh, the command line is generated by the following pseudo code: > java $JAVA_HEAP_MAX $HADOOP_CLIENT_OPTS -classpath ... > In hadoop-config.sh, $JAVA_HEAP_MAX is initialized as "-Xmx1000m" if user > didn't set $HADOOP_HEAP_SIZE env variable. > In hadoop-env.sh, $HADOOP_CLIENT_OPTS is set as this: > export HADOOP_CLIENT_OPTS="-Xmx512m $HADOOP_CLIENT_OPTS" > To fix this problem, we should remove the "-Xmx512m" from HADOOP_CLIENT_OPTS. > If we really want to change the memory settings we need to use > $HADOOP_HEAP_SIZE env variable. -- This message was sent by Atlassian JIRA (v6.1.5#6160)