[jira] [Commented] (SPARK-12650) No means to specify Xmx settings for SparkSubmit in yarn-cluster mode

2016-01-21 Thread Marcelo Vanzin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15111024#comment-15111024
 ] 

Marcelo Vanzin commented on SPARK-12650:


If it works it's a workaround; that's a pretty obscure legacy deprecated 
option, and we should have a more explicit alternative to it.

> No means to specify Xmx settings for SparkSubmit in yarn-cluster mode
> -
>
> Key: SPARK-12650
> URL: https://issues.apache.org/jira/browse/SPARK-12650
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 1.5.2
> Environment: Hadoop 2.6.0
>Reporter: John Vines
>
> Background-
> I have an app master designed to do some work and then launch a spark job.
> Issue-
> If I use yarn-cluster, then the SparkSubmit does not Xmx itself at all, 
> leading to the jvm taking a default heap which is relatively large. This 
> causes a large amount of vmem to be taken, so that it is killed by yarn. This 
> can be worked around by disabling Yarn's vmem check, but that is a hack.
> If I run it in yarn-client mode, it's fine as long as my container has enough 
> space for the driver, which is manageable. But I feel that the utter lack of 
> Xmx settings for what I believe is a very small jvm is a problem.
> I believe this was introduced with the fix for SPARK-3884



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12650) No means to specify Xmx settings for SparkSubmit in yarn-cluster mode

2016-01-21 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15110964#comment-15110964
 ] 

Sean Owen commented on SPARK-12650:
---

[~vanzin] is that the intended way to set this? if so it sounds like that's the 
resolution.

> No means to specify Xmx settings for SparkSubmit in yarn-cluster mode
> -
>
> Key: SPARK-12650
> URL: https://issues.apache.org/jira/browse/SPARK-12650
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 1.5.2
> Environment: Hadoop 2.6.0
>Reporter: John Vines
>
> Background-
> I have an app master designed to do some work and then launch a spark job.
> Issue-
> If I use yarn-cluster, then the SparkSubmit does not Xmx itself at all, 
> leading to the jvm taking a default heap which is relatively large. This 
> causes a large amount of vmem to be taken, so that it is killed by yarn. This 
> can be worked around by disabling Yarn's vmem check, but that is a hack.
> If I run it in yarn-client mode, it's fine as long as my container has enough 
> space for the driver, which is manageable. But I feel that the utter lack of 
> Xmx settings for what I believe is a very small jvm is a problem.
> I believe this was introduced with the fix for SPARK-3884



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12650) No means to specify Xmx settings for SparkSubmit in yarn-cluster mode

2016-01-19 Thread John Vines (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15107061#comment-15107061
 ] 

John Vines commented on SPARK-12650:


SPARK_SUBMIT_OPTS seems to work. -Xmx256m changed the heap settings for 
SparkSubmitJob, but left the driver alone and did not appear to cause the same 
conflict in the executors as mentioned above. I also did not see any logging 
about that setting (unlike SPARK_JAVA_OPTS which I mentioned above).

> No means to specify Xmx settings for SparkSubmit in yarn-cluster mode
> -
>
> Key: SPARK-12650
> URL: https://issues.apache.org/jira/browse/SPARK-12650
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 1.5.2
> Environment: Hadoop 2.6.0
>Reporter: John Vines
>
> Background-
> I have an app master designed to do some work and then launch a spark job.
> Issue-
> If I use yarn-cluster, then the SparkSubmit does not Xmx itself at all, 
> leading to the jvm taking a default heap which is relatively large. This 
> causes a large amount of vmem to be taken, so that it is killed by yarn. This 
> can be worked around by disabling Yarn's vmem check, but that is a hack.
> If I run it in yarn-client mode, it's fine as long as my container has enough 
> space for the driver, which is manageable. But I feel that the utter lack of 
> Xmx settings for what I believe is a very small jvm is a problem.
> I believe this was introduced with the fix for SPARK-3884



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12650) No means to specify Xmx settings for SparkSubmit in yarn-cluster mode

2016-01-19 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15106786#comment-15106786
 ] 

Sean Owen commented on SPARK-12650:
---

[~jvi...@gmail.com] are you able to follow up on this one? I'm trying to figure 
out if this is something that needs a change or not.

> No means to specify Xmx settings for SparkSubmit in yarn-cluster mode
> -
>
> Key: SPARK-12650
> URL: https://issues.apache.org/jira/browse/SPARK-12650
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 1.5.2
> Environment: Hadoop 2.6.0
>Reporter: John Vines
>
> Background-
> I have an app master designed to do some work and then launch a spark job.
> Issue-
> If I use yarn-cluster, then the SparkSubmit does not Xmx itself at all, 
> leading to the jvm taking a default heap which is relatively large. This 
> causes a large amount of vmem to be taken, so that it is killed by yarn. This 
> can be worked around by disabling Yarn's vmem check, but that is a hack.
> If I run it in yarn-client mode, it's fine as long as my container has enough 
> space for the driver, which is manageable. But I feel that the utter lack of 
> Xmx settings for what I believe is a very small jvm is a problem.
> I believe this was introduced with the fix for SPARK-3884



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12650) No means to specify Xmx settings for SparkSubmit in yarn-cluster mode

2016-01-12 Thread Marcelo Vanzin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15095354#comment-15095354
 ] 

Marcelo Vanzin commented on SPARK-12650:


Can you try the other one ({{SPARK_SUBMIT_OPTS}})?

> No means to specify Xmx settings for SparkSubmit in yarn-cluster mode
> -
>
> Key: SPARK-12650
> URL: https://issues.apache.org/jira/browse/SPARK-12650
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 1.5.2
> Environment: Hadoop 2.6.0
>Reporter: John Vines
>
> Background-
> I have an app master designed to do some work and then launch a spark job.
> Issue-
> If I use yarn-cluster, then the SparkSubmit does not Xmx itself at all, 
> leading to the jvm taking a default heap which is relatively large. This 
> causes a large amount of vmem to be taken, so that it is killed by yarn. This 
> can be worked around by disabling Yarn's vmem check, but that is a hack.
> If I run it in yarn-client mode, it's fine as long as my container has enough 
> space for the driver, which is manageable. But I feel that the utter lack of 
> Xmx settings for what I believe is a very small jvm is a problem.
> I believe this was introduced with the fix for SPARK-3884



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12650) No means to specify Xmx settings for SparkSubmit in yarn-cluster mode

2016-01-12 Thread John Vines (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15094189#comment-15094189
 ] 

John Vines commented on SPARK-12650:


So I ran it and got this message
{code}SPARK_JAVA_OPTS was detected (set to '-Xmx512M').
This is deprecated in Spark 1.0+.

Please instead use:
 - ./spark-submit with conf/spark-defaults.conf to set defaults for an 
application
 - ./spark-submit with --driver-java-options to set -X options for a driver
 - spark.executor.extraJavaOptions to set -X options for executors
 - SPARK_DAEMON_JAVA_OPTS to set java options for standalone daemons (master or 
worker)

{code}

but that was just a warning (so small complaint if this is the proper 
solution), but it did properly cap the vmem use.

> No means to specify Xmx settings for SparkSubmit in yarn-cluster mode
> -
>
> Key: SPARK-12650
> URL: https://issues.apache.org/jira/browse/SPARK-12650
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 1.5.2
> Environment: Hadoop 2.6.0
>Reporter: John Vines
>
> Background-
> I have an app master designed to do some work and then launch a spark job.
> Issue-
> If I use yarn-cluster, then the SparkSubmit does not Xmx itself at all, 
> leading to the jvm taking a default heap which is relatively large. This 
> causes a large amount of vmem to be taken, so that it is killed by yarn. This 
> can be worked around by disabling Yarn's vmem check, but that is a hack.
> If I run it in yarn-client mode, it's fine as long as my container has enough 
> space for the driver, which is manageable. But I feel that the utter lack of 
> Xmx settings for what I believe is a very small jvm is a problem.
> I believe this was introduced with the fix for SPARK-3884



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12650) No means to specify Xmx settings for SparkSubmit in yarn-cluster mode

2016-01-08 Thread Marcelo Vanzin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15089684#comment-15089684
 ] 

Marcelo Vanzin commented on SPARK-12650:


{{SparkLauncher}} has a constructor that takes a map of environment variables 
to set.

> No means to specify Xmx settings for SparkSubmit in yarn-cluster mode
> -
>
> Key: SPARK-12650
> URL: https://issues.apache.org/jira/browse/SPARK-12650
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 1.5.2
> Environment: Hadoop 2.6.0
>Reporter: John Vines
>
> Background-
> I have an app master designed to do some work and then launch a spark job.
> Issue-
> If I use yarn-cluster, then the SparkSubmit does not Xmx itself at all, 
> leading to the jvm taking a default heap which is relatively large. This 
> causes a large amount of vmem to be taken, so that it is killed by yarn. This 
> can be worked around by disabling Yarn's vmem check, but that is a hack.
> If I run it in yarn-client mode, it's fine as long as my container has enough 
> space for the driver, which is manageable. But I feel that the utter lack of 
> Xmx settings for what I believe is a very small jvm is a problem.
> I believe this was introduced with the fix for SPARK-3884



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12650) No means to specify Xmx settings for SparkSubmit in yarn-cluster mode

2016-01-08 Thread John Vines (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15089673#comment-15089673
 ] 

John Vines commented on SPARK-12650:


How would I set those through the java SparkLauncher?

> No means to specify Xmx settings for SparkSubmit in yarn-cluster mode
> -
>
> Key: SPARK-12650
> URL: https://issues.apache.org/jira/browse/SPARK-12650
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 1.5.2
> Environment: Hadoop 2.6.0
>Reporter: John Vines
>
> Background-
> I have an app master designed to do some work and then launch a spark job.
> Issue-
> If I use yarn-cluster, then the SparkSubmit does not Xmx itself at all, 
> leading to the jvm taking a default heap which is relatively large. This 
> causes a large amount of vmem to be taken, so that it is killed by yarn. This 
> can be worked around by disabling Yarn's vmem check, but that is a hack.
> If I run it in yarn-client mode, it's fine as long as my container has enough 
> space for the driver, which is manageable. But I feel that the utter lack of 
> Xmx settings for what I believe is a very small jvm is a problem.
> I believe this was introduced with the fix for SPARK-3884



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12650) No means to specify Xmx settings for SparkSubmit in yarn-cluster mode

2016-01-06 Thread Marcelo Vanzin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086173#comment-15086173
 ] 

Marcelo Vanzin commented on SPARK-12650:


Can you try the env variables I mention below? I have a suspicion that Xmx 
won't help, because from my tests the heap doesn't really grow (and in your VM, 
Xms is 512m, so most of the vmem would not be heap space).

> No means to specify Xmx settings for SparkSubmit in yarn-cluster mode
> -
>
> Key: SPARK-12650
> URL: https://issues.apache.org/jira/browse/SPARK-12650
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 1.5.2
> Environment: Hadoop 2.6.0
>Reporter: John Vines
>
> Background-
> I have an app master designed to do some work and then launch a spark job.
> Issue-
> If I use yarn-cluster, then the SparkSubmit does not Xmx itself at all, 
> leading to the jvm taking a default heap which is relatively large. This 
> causes a large amount of vmem to be taken, so that it is killed by yarn. This 
> can be worked around by disabling Yarn's vmem check, but that is a hack.
> If I run it in yarn-client mode, it's fine as long as my container has enough 
> space for the driver, which is manageable. But I feel that the utter lack of 
> Xmx settings for what I believe is a very small jvm is a problem.
> I believe this was introduced with the fix for SPARK-3884



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12650) No means to specify Xmx settings for SparkSubmit in yarn-cluster mode

2016-01-06 Thread John Vines (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086167#comment-15086167
 ] 

John Vines commented on SPARK-12650:


I do care about knowing when the spark job is finished. Unless there's another 
way to track a the spark job, I need to wait for it to complete

> No means to specify Xmx settings for SparkSubmit in yarn-cluster mode
> -
>
> Key: SPARK-12650
> URL: https://issues.apache.org/jira/browse/SPARK-12650
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 1.5.2
> Environment: Hadoop 2.6.0
>Reporter: John Vines
>
> Background-
> I have an app master designed to do some work and then launch a spark job.
> Issue-
> If I use yarn-cluster, then the SparkSubmit does not Xmx itself at all, 
> leading to the jvm taking a default heap which is relatively large. This 
> causes a large amount of vmem to be taken, so that it is killed by yarn. This 
> can be worked around by disabling Yarn's vmem check, but that is a hack.
> If I run it in yarn-client mode, it's fine as long as my container has enough 
> space for the driver, which is manageable. But I feel that the utter lack of 
> Xmx settings for what I believe is a very small jvm is a problem.
> I believe this was introduced with the fix for SPARK-3884



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12650) No means to specify Xmx settings for SparkSubmit in yarn-cluster mode

2016-01-06 Thread Marcelo Vanzin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086148#comment-15086148
 ] 

Marcelo Vanzin commented on SPARK-12650:


Do you need the launcher process around after it's launched? If you don't you 
can set {{spark.yarn.submit.waitAppCompletion}} to false.

> No means to specify Xmx settings for SparkSubmit in yarn-cluster mode
> -
>
> Key: SPARK-12650
> URL: https://issues.apache.org/jira/browse/SPARK-12650
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 1.5.2
> Environment: Hadoop 2.6.0
>Reporter: John Vines
>
> Background-
> I have an app master designed to do some work and then launch a spark job.
> Issue-
> If I use yarn-cluster, then the SparkSubmit does not Xmx itself at all, 
> leading to the jvm taking a default heap which is relatively large. This 
> causes a large amount of vmem to be taken, so that it is killed by yarn. This 
> can be worked around by disabling Yarn's vmem check, but that is a hack.
> If I run it in yarn-client mode, it's fine as long as my container has enough 
> space for the driver, which is manageable. But I feel that the utter lack of 
> Xmx settings for what I believe is a very small jvm is a problem.
> I believe this was introduced with the fix for SPARK-3884



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12650) No means to specify Xmx settings for SparkSubmit in yarn-cluster mode

2016-01-06 Thread John Vines (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086138#comment-15086138
 ] 

John Vines commented on SPARK-12650:


I'm launching the spark job from inside an App Master, as I said in my 
background.

> No means to specify Xmx settings for SparkSubmit in yarn-cluster mode
> -
>
> Key: SPARK-12650
> URL: https://issues.apache.org/jira/browse/SPARK-12650
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 1.5.2
> Environment: Hadoop 2.6.0
>Reporter: John Vines
>
> Background-
> I have an app master designed to do some work and then launch a spark job.
> Issue-
> If I use yarn-cluster, then the SparkSubmit does not Xmx itself at all, 
> leading to the jvm taking a default heap which is relatively large. This 
> causes a large amount of vmem to be taken, so that it is killed by yarn. This 
> can be worked around by disabling Yarn's vmem check, but that is a hack.
> If I run it in yarn-client mode, it's fine as long as my container has enough 
> space for the driver, which is manageable. But I feel that the utter lack of 
> Xmx settings for what I believe is a very small jvm is a problem.
> I believe this was introduced with the fix for SPARK-3884



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12650) No means to specify Xmx settings for SparkSubmit in yarn-cluster mode

2016-01-06 Thread John Vines (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086135#comment-15086135
 ] 

John Vines commented on SPARK-12650:


{code}[root@datanode1-systemtest-john-1 /]# java -XX:+PrintFlagsFinal -version 
| grep HeapSize
uintx ErgoHeapSizeLimit = 0   {product} 
  
uintx HeapSizePerGCThread   = 87241520{product} 
  
uintx InitialHeapSize  := 525375744   {product} 
  
uintx LargePageHeapSizeThreshold= 134217728   {product} 
  
uintx MaxHeapSize  := 8407482368  {product} 
  
java version "1.7.0_80"
Java(TM) SE Runtime Environment (build 1.7.0_80-b15)
Java HotSpot(TM) 64-Bit Server VM (build 24.80-b11, mixed mode)
{code}

> No means to specify Xmx settings for SparkSubmit in yarn-cluster mode
> -
>
> Key: SPARK-12650
> URL: https://issues.apache.org/jira/browse/SPARK-12650
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 1.5.2
> Environment: Hadoop 2.6.0
>Reporter: John Vines
>
> Background-
> I have an app master designed to do some work and then launch a spark job.
> Issue-
> If I use yarn-cluster, then the SparkSubmit does not Xmx itself at all, 
> leading to the jvm taking a default heap which is relatively large. This 
> causes a large amount of vmem to be taken, so that it is killed by yarn. This 
> can be worked around by disabling Yarn's vmem check, but that is a hack.
> If I run it in yarn-client mode, it's fine as long as my container has enough 
> space for the driver, which is manageable. But I feel that the utter lack of 
> Xmx settings for what I believe is a very small jvm is a problem.
> I believe this was introduced with the fix for SPARK-3884



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12650) No means to specify Xmx settings for SparkSubmit in yarn-cluster mode

2016-01-06 Thread John Vines (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086133#comment-15086133
 ] 

John Vines commented on SPARK-12650:


In the test example I was using, I set driver and executor to 512MB. When I 
disabled vmem checking, they seemed to be running with the appropriate memory 
settings. Getting the actual commind line executed it's a bit of a PITA, but I 
can get it if it's actually needed.

> No means to specify Xmx settings for SparkSubmit in yarn-cluster mode
> -
>
> Key: SPARK-12650
> URL: https://issues.apache.org/jira/browse/SPARK-12650
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 1.5.2
> Environment: Hadoop 2.6.0
>Reporter: John Vines
>
> Background-
> I have an app master designed to do some work and then launch a spark job.
> Issue-
> If I use yarn-cluster, then the SparkSubmit does not Xmx itself at all, 
> leading to the jvm taking a default heap which is relatively large. This 
> causes a large amount of vmem to be taken, so that it is killed by yarn. This 
> can be worked around by disabling Yarn's vmem check, but that is a hack.
> If I run it in yarn-client mode, it's fine as long as my container has enough 
> space for the driver, which is manageable. But I feel that the utter lack of 
> Xmx settings for what I believe is a very small jvm is a problem.
> I believe this was introduced with the fix for SPARK-3884



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12650) No means to specify Xmx settings for SparkSubmit in yarn-cluster mode

2016-01-06 Thread Marcelo Vanzin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086029#comment-15086029
 ] 

Marcelo Vanzin commented on SPARK-12650:


And one last comment: you can set either the {{SPARK_SUBMIT_OPTS}} or 
{{SPARK_JAVA_OPTS}} env variables to add jvm options to the launcher VM, so 
while those are sort-of-deprecated and probably not well documented, there's a 
way to control the launcher jvm options today.

> No means to specify Xmx settings for SparkSubmit in yarn-cluster mode
> -
>
> Key: SPARK-12650
> URL: https://issues.apache.org/jira/browse/SPARK-12650
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 1.5.2
> Environment: Hadoop 2.6.0
>Reporter: John Vines
>
> Background-
> I have an app master designed to do some work and then launch a spark job.
> Issue-
> If I use yarn-cluster, then the SparkSubmit does not Xmx itself at all, 
> leading to the jvm taking a default heap which is relatively large. This 
> causes a large amount of vmem to be taken, so that it is killed by yarn. This 
> can be worked around by disabling Yarn's vmem check, but that is a hack.
> If I run it in yarn-client mode, it's fine as long as my container has enough 
> space for the driver, which is manageable. But I feel that the utter lack of 
> Xmx settings for what I believe is a very small jvm is a problem.
> I believe this was introduced with the fix for SPARK-3884



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12650) No means to specify Xmx settings for SparkSubmit in yarn-cluster mode

2016-01-06 Thread Marcelo Vanzin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086023#comment-15086023
 ] 

Marcelo Vanzin commented on SPARK-12650:


Also, can you clarify this statement:

{quote}
This causes a large amount of vmem to be taken, so that it is killed by yarn.
{quote}

If you're talking about the launcher process, YARN has no control over it, so 
how can it be killed by YARN?

BTW a quick test in my cluster shows the launcher using about 256m of resident 
memory while running (and about 10g of virtual memory), and those values seem 
stable while it's running.

> No means to specify Xmx settings for SparkSubmit in yarn-cluster mode
> -
>
> Key: SPARK-12650
> URL: https://issues.apache.org/jira/browse/SPARK-12650
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 1.5.2
> Environment: Hadoop 2.6.0
>Reporter: John Vines
>
> Background-
> I have an app master designed to do some work and then launch a spark job.
> Issue-
> If I use yarn-cluster, then the SparkSubmit does not Xmx itself at all, 
> leading to the jvm taking a default heap which is relatively large. This 
> causes a large amount of vmem to be taken, so that it is killed by yarn. This 
> can be worked around by disabling Yarn's vmem check, but that is a hack.
> If I run it in yarn-client mode, it's fine as long as my container has enough 
> space for the driver, which is manageable. But I feel that the utter lack of 
> Xmx settings for what I believe is a very small jvm is a problem.
> I believe this was introduced with the fix for SPARK-3884



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12650) No means to specify Xmx settings for SparkSubmit in yarn-cluster mode

2016-01-06 Thread Marcelo Vanzin (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15085928#comment-15085928
 ] 

Marcelo Vanzin commented on SPARK-12650:


You can find the default Xmx like this:

{code}
java -XX:+PrintFlagsFinal -version | grep HeapSize
{code}

It's true that Spark does not define an Xmx for the launcher in cluster mode; 
that being said, I'm a little surprised it ends up using so much memory, since 
it's not really doing much. Would you be able to take a {{jmap -histo }} 
to see what's using to much memory?

We could add an option to control that and force it to a smaller value by 
default.

> No means to specify Xmx settings for SparkSubmit in yarn-cluster mode
> -
>
> Key: SPARK-12650
> URL: https://issues.apache.org/jira/browse/SPARK-12650
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 1.5.2
> Environment: Hadoop 2.6.0
>Reporter: John Vines
>
> Background-
> I have an app master designed to do some work and then launch a spark job.
> Issue-
> If I use yarn-cluster, then the SparkSubmit does not Xmx itself at all, 
> leading to the jvm taking a default heap which is relatively large. This 
> causes a large amount of vmem to be taken, so that it is killed by yarn. This 
> can be worked around by disabling Yarn's vmem check, but that is a hack.
> If I run it in yarn-client mode, it's fine as long as my container has enough 
> space for the driver, which is manageable. But I feel that the utter lack of 
> Xmx settings for what I believe is a very small jvm is a problem.
> I believe this was introduced with the fix for SPARK-3884



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12650) No means to specify Xmx settings for SparkSubmit in yarn-cluster mode

2016-01-06 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15085773#comment-15085773
 ] 

Sean Owen commented on SPARK-12650:
---

This may be an aside but I'm curious -- what is your default max heap size on 
your Java 7 JVM? I don't recall it being nearly that big ever, but maybe it is 
dependent on the system memory and you have a lot and I haven't been keeping 
up. I take your point about the JVM feeling welcome to use its max heap rather 
than GC.

Anyway [~vanzin] might know a little more about how the 3 (?) JVMs in play here 
have their memory set in the context of the new spark-submit setup.

BTW how are you running these things and what do you set for driver, executor 
memory? the full command line plus any config in your conf file might be 
helpful.

> No means to specify Xmx settings for SparkSubmit in yarn-cluster mode
> -
>
> Key: SPARK-12650
> URL: https://issues.apache.org/jira/browse/SPARK-12650
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 1.5.2
> Environment: Hadoop 2.6.0
>Reporter: John Vines
>
> Background-
> I have an app master designed to do some work and then launch a spark job.
> Issue-
> If I use yarn-cluster, then the SparkSubmit does not Xmx itself at all, 
> leading to the jvm taking a default heap which is relatively large. This 
> causes a large amount of vmem to be taken, so that it is killed by yarn. This 
> can be worked around by disabling Yarn's vmem check, but that is a hack.
> If I run it in yarn-client mode, it's fine as long as my container has enough 
> space for the driver, which is manageable. But I feel that the utter lack of 
> Xmx settings for what I believe is a very small jvm is a problem.
> I believe this was introduced with the fix for SPARK-3884



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12650) No means to specify Xmx settings for SparkSubmit in yarn-cluster mode

2016-01-06 Thread John Vines (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15085747#comment-15085747
 ] 

John Vines commented on SPARK-12650:


No, neither Xmx nor Xms are set. This has to do with java 7 and it's default 
heap allocation (7 got very agressive vs. 6 which was relatively sane) in my 
experience.

> No means to specify Xmx settings for SparkSubmit in yarn-cluster mode
> -
>
> Key: SPARK-12650
> URL: https://issues.apache.org/jira/browse/SPARK-12650
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 1.5.2
> Environment: Hadoop 2.6.0
>Reporter: John Vines
>
> Background-
> I have an app master designed to do some work and then launch a spark job.
> Issue-
> If I use yarn-cluster, then the SparkSubmit does not Xmx itself at all, 
> leading to the jvm taking a default heap which is relatively large. This 
> causes a large amount of vmem to be taken, so that it is killed by yarn. This 
> can be worked around by disabling Yarn's vmem check, but that is a hack.
> If I run it in yarn-client mode, it's fine as long as my container has enough 
> space for the driver, which is manageable. But I feel that the utter lack of 
> Xmx settings for what I believe is a very small jvm is a problem.
> I believe this was introduced with the fix for SPARK-3884



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12650) No means to specify Xmx settings for SparkSubmit in yarn-cluster mode

2016-01-06 Thread Sean Owen (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15085734#comment-15085734
 ] 

Sean Owen commented on SPARK-12650:
---

Hm, the default heap size in the JVM isn't 8GB is it? I just peeked at my Java 
8 and it's about 512MB. 
Is it that it's picking up some other setting causing it to set a large "-Xmx"? 
but even that doesn't make the JVM allocate memory. Is "-Xms" set by something? 
I know we discussed removing "-Xms" everywhere for similar reasons.

> No means to specify Xmx settings for SparkSubmit in yarn-cluster mode
> -
>
> Key: SPARK-12650
> URL: https://issues.apache.org/jira/browse/SPARK-12650
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 1.5.2
> Environment: Hadoop 2.6.0
>Reporter: John Vines
>
> Background-
> I have an app master designed to do some work and then launch a spark job.
> Issue-
> If I use yarn-cluster, then the SparkSubmit does not Xmx itself at all, 
> leading to the jvm taking a default heap which is relatively large. This 
> causes a large amount of vmem to be taken, so that it is killed by yarn. This 
> can be worked around by disabling Yarn's vmem check, but that is a hack.
> If I run it in yarn-client mode, it's fine as long as my container has enough 
> space for the driver, which is manageable. But I feel that the utter lack of 
> Xmx settings for what I believe is a very small jvm is a problem.
> I believe this was introduced with the fix for SPARK-3884



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12650) No means to specify Xmx settings for SparkSubmit in yarn-cluster mode

2016-01-06 Thread John Vines (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15085722#comment-15085722
 ] 

John Vines commented on SPARK-12650:


I'm referring to the client which is launching the driver in yarn-cluster mode. 
Without the SparkSubmit jvm, which is operating as the client, specifying Xmx, 
it's taking ~8GB of vmem on my machine, which causes yarn to kill the whole 
container.

This is in spark 1.5.2.

> No means to specify Xmx settings for SparkSubmit in yarn-cluster mode
> -
>
> Key: SPARK-12650
> URL: https://issues.apache.org/jira/browse/SPARK-12650
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 1.5.2
> Environment: Hadoop 2.6.0
>Reporter: John Vines
>
> Background-
> I have an app master designed to do some work and then launch a spark job.
> Issue-
> If I use yarn-cluster, then the SparkSubmit does not Xmx itself at all, 
> leading to the jvm taking a default heap which is relatively large. This 
> causes a large amount of vmem to be taken, so that it is killed by yarn. This 
> can be worked around by disabling Yarn's vmem check, but that is a hack.
> If I run it in yarn-client mode, it's fine as long as my container has enough 
> space for the driver, which is manageable. But I feel that the utter lack of 
> Xmx settings for what I believe is a very small jvm is a problem.
> I believe this was introduced with the fix for SPARK-3884



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-12650) No means to specify Xmx settings for SparkSubmit in yarn-cluster mode

2016-01-05 Thread Saisai Shao (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-12650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15084497#comment-15084497
 ] 

Saisai Shao commented on SPARK-12650:
-

[~vines], what is your meaning of "SparkSubmit does not Xmx itself at all", 
what do you mean by "itself", client, or driver? Can't be worked with 
{{spark.driver.memory}}?

For the yarn-client mode, driver is not managed by YARN, it is a local JVM 
process managed by yourself, so {{spark.driver.memory}} control the memory size 
of this process, the memory of AM is controlled by {{spark.yarn.am.memory}}.

I'm not exactly know what actual your problem it is, can you elaborate it, like 
Spark version, your configurations...

> No means to specify Xmx settings for SparkSubmit in yarn-cluster mode
> -
>
> Key: SPARK-12650
> URL: https://issues.apache.org/jira/browse/SPARK-12650
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.5.2
> Environment: Hadoop 2.6.0
>Reporter: John Vines
>
> Background-
> I have an app master designed to do some work and then launch a spark job.
> Issue-
> If I use yarn-cluster, then the SparkSubmit does not Xmx itself at all, 
> leading to the jvm taking a default heap which is relatively large. This 
> causes a large amount of vmem to be taken, so that it is killed by yarn. This 
> can be worked around by disabling Yarn's vmem check, but that is a hack.
> If I run it in yarn-client mode, it's fine as long as my container has enough 
> space for the driver, which is manageable. But I feel that the utter lack of 
> Xmx settings for what I believe is a very small jvm is a problem.
> I believe this was introduced with the fix for SPARK-3884



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org