[jira] [Commented] (HADOOP-10245) Hadoop command line always appends "-Xmx" option twice

2014-01-24 Thread Wei Yan (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13881057#comment-13881057
 ] 

Wei Yan commented on HADOOP-10245:
--

[~shanyu], sorry for the late reply. So you mean we only let users specify -Xmx 
through $JAVA_HEAP_MAX?

> Hadoop command line always appends "-Xmx" option twice
> --
>
> Key: HADOOP-10245
> URL: https://issues.apache.org/jira/browse/HADOOP-10245
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: bin
>Affects Versions: 2.2.0
>Reporter: shanyu zhao
>Assignee: shanyu zhao
> Attachments: HADOOP-10245.patch
>
>
> The Hadoop command line scripts (hadoop.sh or hadoop.cmd) will call java with 
> "-Xmx" options twice. The impact is that any user defined HADOOP_HEAP_SIZE 
> env variable will take no effect because it is overwritten by the second 
> "-Xmx" option.
> For example, here is the java cmd generated for command "hadoop fs -ls /", 
> Notice that there are two "-Xmx" options: "-Xmx1000m" and "-Xmx512m" in the 
> command line:
> java -Xmx1000m  -Dhadoop.log.dir=C:\tmp\logs -Dhadoop.log.file=hadoop.log 
> -Dhadoop.root.logger=INFO,c
> onsole,DRFA -Xmx512m  -Dhadoop.security.logger=INFO,RFAS -classpath XXX 
> org.apache.hadoop.fs.FsShell -ls /
> Here is the root cause:
> The call flow is: hadoop.sh calls hadoop_config.sh, which in turn calls 
> hadoop-env.sh. 
> In hadoop.sh, the command line is generated by the following pseudo code:
> java $JAVA_HEAP_MAX $HADOOP_CLIENT_OPTS -classpath ...
> In hadoop-config.sh, $JAVA_HEAP_MAX is initialized as "-Xmx1000m" if user 
> didn't set $HADOOP_HEAP_SIZE env variable.
> In hadoop-env.sh, $HADOOP_CLIENT_OPTS is set as this:
> export HADOOP_CLIENT_OPTS="-Xmx512m $HADOOP_CLIENT_OPTS"
> To fix this problem, we should remove the "-Xmx512m" from HADOOP_CLIENT_OPTS. 
> If we really want to change the memory settings we need to use 
> $HADOOP_HEAP_SIZE env variable.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HADOOP-10245) Hadoop command line always appends "-Xmx" option twice

2014-01-21 Thread shanyu zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13877905#comment-13877905
 ] 

shanyu zhao commented on HADOOP-10245:
--

[~ywskycn] Thank you for your comment! If we remove -Xmx512m from 
HADOOP_CLIENT_OPTS in hadoop_env.cmd, there will be one and only one -Xmx, 
which is the $JAVA_HEAP_MAX in bin/hadoop. 

HADOOP-9870 may have solved the problem for you, but I think the fix in 
HADOOP-9870 might be too complicated and hard to maintain. For example, what 
about user use "-Xmx" in HADOOP_OPTS instead of HADOOP_CLIENT_OPTS? I think we 
should avoid using HADOOP_CLIENT_OPTS or HADOOP_OPTS to specify memory, because 
the fact that we've defined HADOOP_HEAPSIZE but not using it for memory 
specification is confusing. If you want to change heap size, just change 
HADOOP_HEAPSIZE, I think this is simple and clear. Thoughts?

> Hadoop command line always appends "-Xmx" option twice
> --
>
> Key: HADOOP-10245
> URL: https://issues.apache.org/jira/browse/HADOOP-10245
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: bin
>Affects Versions: 2.2.0
>Reporter: shanyu zhao
>Assignee: shanyu zhao
> Attachments: HADOOP-10245.patch
>
>
> The Hadoop command line scripts (hadoop.sh or hadoop.cmd) will call java with 
> "-Xmx" options twice. The impact is that any user defined HADOOP_HEAP_SIZE 
> env variable will take no effect because it is overwritten by the second 
> "-Xmx" option.
> For example, here is the java cmd generated for command "hadoop fs -ls /", 
> Notice that there are two "-Xmx" options: "-Xmx1000m" and "-Xmx512m" in the 
> command line:
> java -Xmx1000m  -Dhadoop.log.dir=C:\tmp\logs -Dhadoop.log.file=hadoop.log 
> -Dhadoop.root.logger=INFO,c
> onsole,DRFA -Xmx512m  -Dhadoop.security.logger=INFO,RFAS -classpath XXX 
> org.apache.hadoop.fs.FsShell -ls /
> Here is the root cause:
> The call flow is: hadoop.sh calls hadoop_config.sh, which in turn calls 
> hadoop-env.sh. 
> In hadoop.sh, the command line is generated by the following pseudo code:
> java $JAVA_HEAP_MAX $HADOOP_CLIENT_OPTS -classpath ...
> In hadoop-config.sh, $JAVA_HEAP_MAX is initialized as "-Xmx1000m" if user 
> didn't set $HADOOP_HEAP_SIZE env variable.
> In hadoop-env.sh, $HADOOP_CLIENT_OPTS is set as this:
> export HADOOP_CLIENT_OPTS="-Xmx512m $HADOOP_CLIENT_OPTS"
> To fix this problem, we should remove the "-Xmx512m" from HADOOP_CLIENT_OPTS. 
> If we really want to change the memory settings we need to use 
> $HADOOP_HEAP_SIZE env variable.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HADOOP-10245) Hadoop command line always appends "-Xmx" option twice

2014-01-21 Thread Wei Yan (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13877837#comment-13877837
 ] 

Wei Yan commented on HADOOP-10245:
--

[~shanyu], as discussed, there are multiple places configuring -Xmx. In the 
lastest patch in HADOOP-9870 provided by [~jhsenjaliya], $HADOOP_HEAPSIZE is 
checked firstly; if not set, assign -Xmx512m. Additionally, in bin/hadoop, also 
check the -Xmx configuration, to avoid duplicate configurations.
Simply remove -Xmx512m from HADOOP_CLIENT_OPTS may still generate multiple 
-Xmx, as bin/hadoop also has a default $JAVA_HEAP_MAX, which is 1000m.
IMO, I think HADOOP-9870 has fixed this issue.

> Hadoop command line always appends "-Xmx" option twice
> --
>
> Key: HADOOP-10245
> URL: https://issues.apache.org/jira/browse/HADOOP-10245
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: bin
>Affects Versions: 2.2.0
>Reporter: shanyu zhao
>Assignee: shanyu zhao
> Attachments: HADOOP-10245.patch
>
>
> The Hadoop command line scripts (hadoop.sh or hadoop.cmd) will call java with 
> "-Xmx" options twice. The impact is that any user defined HADOOP_HEAP_SIZE 
> env variable will take no effect because it is overwritten by the second 
> "-Xmx" option.
> For example, here is the java cmd generated for command "hadoop fs -ls /", 
> Notice that there are two "-Xmx" options: "-Xmx1000m" and "-Xmx512m" in the 
> command line:
> java -Xmx1000m  -Dhadoop.log.dir=C:\tmp\logs -Dhadoop.log.file=hadoop.log 
> -Dhadoop.root.logger=INFO,c
> onsole,DRFA -Xmx512m  -Dhadoop.security.logger=INFO,RFAS -classpath XXX 
> org.apache.hadoop.fs.FsShell -ls /
> Here is the root cause:
> The call flow is: hadoop.sh calls hadoop_config.sh, which in turn calls 
> hadoop-env.sh. 
> In hadoop.sh, the command line is generated by the following pseudo code:
> java $JAVA_HEAP_MAX $HADOOP_CLIENT_OPTS -classpath ...
> In hadoop-config.sh, $JAVA_HEAP_MAX is initialized as "-Xmx1000m" if user 
> didn't set $HADOOP_HEAP_SIZE env variable.
> In hadoop-env.sh, $HADOOP_CLIENT_OPTS is set as this:
> export HADOOP_CLIENT_OPTS="-Xmx512m $HADOOP_CLIENT_OPTS"
> To fix this problem, we should remove the "-Xmx512m" from HADOOP_CLIENT_OPTS. 
> If we really want to change the memory settings we need to use 
> $HADOOP_HEAP_SIZE env variable.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HADOOP-10245) Hadoop command line always appends "-Xmx" option twice

2014-01-21 Thread shanyu zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1384#comment-1384
 ] 

shanyu zhao commented on HADOOP-10245:
--

[~drankye], [~qwertymaniac] would you please help review this patch?

> Hadoop command line always appends "-Xmx" option twice
> --
>
> Key: HADOOP-10245
> URL: https://issues.apache.org/jira/browse/HADOOP-10245
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: bin
>Affects Versions: 2.2.0
>Reporter: shanyu zhao
>Assignee: shanyu zhao
> Attachments: HADOOP-10245.patch
>
>
> The Hadoop command line scripts (hadoop.sh or hadoop.cmd) will call java with 
> "-Xmx" options twice. The impact is that any user defined HADOOP_HEAP_SIZE 
> env variable will take no effect because it is overwritten by the second 
> "-Xmx" option.
> For example, here is the java cmd generated for command "hadoop fs -ls /", 
> Notice that there are two "-Xmx" options: "-Xmx1000m" and "-Xmx512m" in the 
> command line:
> java -Xmx1000m  -Dhadoop.log.dir=C:\tmp\logs -Dhadoop.log.file=hadoop.log 
> -Dhadoop.root.logger=INFO,c
> onsole,DRFA -Xmx512m  -Dhadoop.security.logger=INFO,RFAS -classpath XXX 
> org.apache.hadoop.fs.FsShell -ls /
> Here is the root cause:
> The call flow is: hadoop.sh calls hadoop_config.sh, which in turn calls 
> hadoop-env.sh. 
> In hadoop.sh, the command line is generated by the following pseudo code:
> java $JAVA_HEAP_MAX $HADOOP_CLIENT_OPTS -classpath ...
> In hadoop-config.sh, $JAVA_HEAP_MAX is initialized as "-Xmx1000m" if user 
> didn't set $HADOOP_HEAP_SIZE env variable.
> In hadoop-env.sh, $HADOOP_CLIENT_OPTS is set as this:
> export HADOOP_CLIENT_OPTS="-Xmx512m $HADOOP_CLIENT_OPTS"
> To fix this problem, we should remove the "-Xmx512m" from HADOOP_CLIENT_OPTS. 
> If we really want to change the memory settings we need to use 
> $HADOOP_HEAP_SIZE env variable.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HADOOP-10245) Hadoop command line always appends "-Xmx" option twice

2014-01-20 Thread shanyu zhao (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13877068#comment-13877068
 ] 

shanyu zhao commented on HADOOP-10245:
--

[~ywskycn] yes, it is the same issue. Sorry I didn't see HADOOP-9870 before I 
submit this one. I also found similar JIRAs HADOOP-9211 and HDFS-5087.

I went through these JIRAs and here are my thoughts:
We should only rely on $HADOOP_HEAPSIZE to control Java heap size, instead of 
$HADOOP_CLIENT_OPTS. Otherwise it would be very confusing and hard to debug 
issues. And I've seen many real world issues caused by this confusion.

There are arguments that $HADOOP_HEAPSIZE is only for service, and client 
should have its own settings. Well, we could create HADOOP_CLIENT_HEAPSIZE 
which is initialized to 512m and used in hadoop.sh. But personally I think it 
does not worth it to add this new env variable. The client can just simply use 
$HADOOP_HEAPSIZE which defaults to 1000m. Also, there are scenarios that a java 
class executed by "hadoop jar" command has a large memory requirements. A real 
world example: Hive's MapredLocalTask calls "hadoop jar" to build a local hash 
table.

Also, if there's a need to change the heapsize, one can always set env variable 
$HADOOP_HEAPSIZE.

> Hadoop command line always appends "-Xmx" option twice
> --
>
> Key: HADOOP-10245
> URL: https://issues.apache.org/jira/browse/HADOOP-10245
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: bin
>Affects Versions: 2.2.0
>Reporter: shanyu zhao
>Assignee: shanyu zhao
> Attachments: HADOOP-10245.patch
>
>
> The Hadoop command line scripts (hadoop.sh or hadoop.cmd) will call java with 
> "-Xmx" options twice. The impact is that any user defined HADOOP_HEAP_SIZE 
> env variable will take no effect because it is overwritten by the second 
> "-Xmx" option.
> For example, here is the java cmd generated for command "hadoop fs -ls /", 
> Notice that there are two "-Xmx" options: "-Xmx1000m" and "-Xmx512m" in the 
> command line:
> java -Xmx1000m  -Dhadoop.log.dir=C:\tmp\logs -Dhadoop.log.file=hadoop.log 
> -Dhadoop.root.logger=INFO,c
> onsole,DRFA -Xmx512m  -Dhadoop.security.logger=INFO,RFAS -classpath XXX 
> org.apache.hadoop.fs.FsShell -ls /
> Here is the root cause:
> The call flow is: hadoop.sh calls hadoop_config.sh, which in turn calls 
> hadoop-env.sh. 
> In hadoop.sh, the command line is generated by the following pseudo code:
> java $JAVA_HEAP_MAX $HADOOP_CLIENT_OPTS -classpath ...
> In hadoop-config.sh, $JAVA_HEAP_MAX is initialized as "-Xmx1000m" if user 
> didn't set $HADOOP_HEAP_SIZE env variable.
> In hadoop-env.sh, $HADOOP_CLIENT_OPTS is set as this:
> export HADOOP_CLIENT_OPTS="-Xmx512m $HADOOP_CLIENT_OPTS"
> To fix this problem, we should remove the "-Xmx512m" from HADOOP_CLIENT_OPTS. 
> If we really want to change the memory settings we need to use 
> $HADOOP_HEAP_SIZE env variable.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HADOOP-10245) Hadoop command line always appends "-Xmx" option twice

2014-01-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13877017#comment-13877017
 ] 

Hadoop QA commented on HADOOP-10245:


{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12624030/HADOOP-10245.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:green}+1 javadoc{color}.  The javadoc tool did not generate any 
warning messages.

{color:green}+1 eclipse:eclipse{color}.  The patch built with 
eclipse:eclipse.

{color:green}+1 findbugs{color}.  The patch does not introduce any new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in 
hadoop-common-project/hadoop-common.

{color:green}+1 contrib tests{color}.  The patch passed contrib unit tests.

Test results: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/3449//testReport/
Console output: 
https://builds.apache.org/job/PreCommit-HADOOP-Build/3449//console

This message is automatically generated.

> Hadoop command line always appends "-Xmx" option twice
> --
>
> Key: HADOOP-10245
> URL: https://issues.apache.org/jira/browse/HADOOP-10245
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: bin
>Affects Versions: 2.2.0
>Reporter: shanyu zhao
>Assignee: shanyu zhao
> Attachments: HADOOP-10245.patch
>
>
> The Hadoop command line scripts (hadoop.sh or hadoop.cmd) will call java with 
> "-Xmx" options twice. The impact is that any user defined HADOOP_HEAP_SIZE 
> env variable will take no effect because it is overwritten by the second 
> "-Xmx" option.
> For example, here is the java cmd generated for command "hadoop fs -ls /", 
> Notice that there are two "-Xmx" options: "-Xmx1000m" and "-Xmx512m" in the 
> command line:
> java -Xmx1000m  -Dhadoop.log.dir=C:\tmp\logs -Dhadoop.log.file=hadoop.log 
> -Dhadoop.root.logger=INFO,c
> onsole,DRFA -Xmx512m  -Dhadoop.security.logger=INFO,RFAS -classpath XXX 
> org.apache.hadoop.fs.FsShell -ls /
> Here is the root cause:
> The call flow is: hadoop.sh calls hadoop_config.sh, which in turn calls 
> hadoop-env.sh. 
> In hadoop.sh, the command line is generated by the following pseudo code:
> java $JAVA_HEAP_MAX $HADOOP_CLIENT_OPTS -classpath ...
> In hadoop-config.sh, $JAVA_HEAP_MAX is initialized as "-Xmx1000m" if user 
> didn't set $HADOOP_HEAP_SIZE env variable.
> In hadoop-env.sh, $HADOOP_CLIENT_OPTS is set as this:
> export HADOOP_CLIENT_OPTS="-Xmx512m $HADOOP_CLIENT_OPTS"
> To fix this problem, we should remove the "-Xmx512m" from HADOOP_CLIENT_OPTS. 
> If we really want to change the memory settings we need to use 
> $HADOOP_HEAP_SIZE env variable.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HADOOP-10245) Hadoop command line always appends "-Xmx" option twice

2014-01-20 Thread Wei Yan (JIRA)

[ 
https://issues.apache.org/jira/browse/HADOOP-10245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13877002#comment-13877002
 ] 

Wei Yan commented on HADOOP-10245:
--

Hey, shanyu. Is this one related to HADOOP-9870?

> Hadoop command line always appends "-Xmx" option twice
> --
>
> Key: HADOOP-10245
> URL: https://issues.apache.org/jira/browse/HADOOP-10245
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: bin
>Affects Versions: 2.2.0
>Reporter: shanyu zhao
>Assignee: shanyu zhao
> Attachments: HADOOP-10245.patch
>
>
> The Hadoop command line scripts (hadoop.sh or hadoop.cmd) will call java with 
> "-Xmx" options twice. The impact is that any user defined HADOOP_HEAP_SIZE 
> env variable will take no effect because it is overwritten by the second 
> "-Xmx" option.
> For example, here is the java cmd generated for command "hadoop fs -ls /", 
> Notice that there are two "-Xmx" options: "-Xmx1000m" and "-Xmx512m" in the 
> command line:
> java -Xmx1000m  -Dhadoop.log.dir=C:\tmp\logs -Dhadoop.log.file=hadoop.log 
> -Dhadoop.root.logger=INFO,c
> onsole,DRFA -Xmx512m  -Dhadoop.security.logger=INFO,RFAS -classpath XXX 
> org.apache.hadoop.fs.FsShell -ls /
> Here is the root cause:
> The call flow is: hadoop.sh calls hadoop_config.sh, which in turn calls 
> hadoop-env.sh. 
> In hadoop.sh, the command line is generated by the following pseudo code:
> java $JAVA_HEAP_MAX $HADOOP_CLIENT_OPTS -classpath ...
> In hadoop-config.sh, $JAVA_HEAP_MAX is initialized as "-Xmx1000m" if user 
> didn't set $HADOOP_HEAP_SIZE env variable.
> In hadoop-env.sh, $HADOOP_CLIENT_OPTS is set as this:
> export HADOOP_CLIENT_OPTS="-Xmx512m $HADOOP_CLIENT_OPTS"
> To fix this problem, we should remove the "-Xmx512m" from HADOOP_CLIENT_OPTS. 
> If we really want to change the memory settings we need to use 
> $HADOOP_HEAP_SIZE env variable.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)