[jira] [Commented] (SPARK-12650) No means to specify Xmx settings for SparkSubmit in yarn-cluster mode
[ https://issues.apache.org/jira/browse/SPARK-12650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15111024#comment-15111024 ] Marcelo Vanzin commented on SPARK-12650: If it works it's a workaround; that's a pretty obscure legacy deprecated option, and we should have a more explicit alternative to it. > No means to specify Xmx settings for SparkSubmit in yarn-cluster mode > - > > Key: SPARK-12650 > URL: https://issues.apache.org/jira/browse/SPARK-12650 > Project: Spark > Issue Type: Bug > Components: Spark Submit >Affects Versions: 1.5.2 > Environment: Hadoop 2.6.0 >Reporter: John Vines > > Background- > I have an app master designed to do some work and then launch a spark job. > Issue- > If I use yarn-cluster, then the SparkSubmit does not Xmx itself at all, > leading to the jvm taking a default heap which is relatively large. This > causes a large amount of vmem to be taken, so that it is killed by yarn. This > can be worked around by disabling Yarn's vmem check, but that is a hack. > If I run it in yarn-client mode, it's fine as long as my container has enough > space for the driver, which is manageable. But I feel that the utter lack of > Xmx settings for what I believe is a very small jvm is a problem. > I believe this was introduced with the fix for SPARK-3884 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12650) No means to specify Xmx settings for SparkSubmit in yarn-cluster mode
[ https://issues.apache.org/jira/browse/SPARK-12650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15110964#comment-15110964 ] Sean Owen commented on SPARK-12650: --- [~vanzin] is that the intended way to set this? if so it sounds like that's the resolution. > No means to specify Xmx settings for SparkSubmit in yarn-cluster mode > - > > Key: SPARK-12650 > URL: https://issues.apache.org/jira/browse/SPARK-12650 > Project: Spark > Issue Type: Bug > Components: Spark Submit >Affects Versions: 1.5.2 > Environment: Hadoop 2.6.0 >Reporter: John Vines > > Background- > I have an app master designed to do some work and then launch a spark job. > Issue- > If I use yarn-cluster, then the SparkSubmit does not Xmx itself at all, > leading to the jvm taking a default heap which is relatively large. This > causes a large amount of vmem to be taken, so that it is killed by yarn. This > can be worked around by disabling Yarn's vmem check, but that is a hack. > If I run it in yarn-client mode, it's fine as long as my container has enough > space for the driver, which is manageable. But I feel that the utter lack of > Xmx settings for what I believe is a very small jvm is a problem. > I believe this was introduced with the fix for SPARK-3884 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12650) No means to specify Xmx settings for SparkSubmit in yarn-cluster mode
[ https://issues.apache.org/jira/browse/SPARK-12650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15107061#comment-15107061 ] John Vines commented on SPARK-12650: SPARK_SUBMIT_OPTS seems to work. -Xmx256m changed the heap settings for SparkSubmitJob, but left the driver alone and did not appear to cause the same conflict in the executors as mentioned above. I also did not see any logging about that setting (unlike SPARK_JAVA_OPTS which I mentioned above). > No means to specify Xmx settings for SparkSubmit in yarn-cluster mode > - > > Key: SPARK-12650 > URL: https://issues.apache.org/jira/browse/SPARK-12650 > Project: Spark > Issue Type: Bug > Components: Spark Submit >Affects Versions: 1.5.2 > Environment: Hadoop 2.6.0 >Reporter: John Vines > > Background- > I have an app master designed to do some work and then launch a spark job. > Issue- > If I use yarn-cluster, then the SparkSubmit does not Xmx itself at all, > leading to the jvm taking a default heap which is relatively large. This > causes a large amount of vmem to be taken, so that it is killed by yarn. This > can be worked around by disabling Yarn's vmem check, but that is a hack. > If I run it in yarn-client mode, it's fine as long as my container has enough > space for the driver, which is manageable. But I feel that the utter lack of > Xmx settings for what I believe is a very small jvm is a problem. > I believe this was introduced with the fix for SPARK-3884 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12650) No means to specify Xmx settings for SparkSubmit in yarn-cluster mode
[ https://issues.apache.org/jira/browse/SPARK-12650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15106786#comment-15106786 ] Sean Owen commented on SPARK-12650: --- [~jvi...@gmail.com] are you able to follow up on this one? I'm trying to figure out if this is something that needs a change or not. > No means to specify Xmx settings for SparkSubmit in yarn-cluster mode > - > > Key: SPARK-12650 > URL: https://issues.apache.org/jira/browse/SPARK-12650 > Project: Spark > Issue Type: Bug > Components: Spark Submit >Affects Versions: 1.5.2 > Environment: Hadoop 2.6.0 >Reporter: John Vines > > Background- > I have an app master designed to do some work and then launch a spark job. > Issue- > If I use yarn-cluster, then the SparkSubmit does not Xmx itself at all, > leading to the jvm taking a default heap which is relatively large. This > causes a large amount of vmem to be taken, so that it is killed by yarn. This > can be worked around by disabling Yarn's vmem check, but that is a hack. > If I run it in yarn-client mode, it's fine as long as my container has enough > space for the driver, which is manageable. But I feel that the utter lack of > Xmx settings for what I believe is a very small jvm is a problem. > I believe this was introduced with the fix for SPARK-3884 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12650) No means to specify Xmx settings for SparkSubmit in yarn-cluster mode
[ https://issues.apache.org/jira/browse/SPARK-12650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15095354#comment-15095354 ] Marcelo Vanzin commented on SPARK-12650: Can you try the other one ({{SPARK_SUBMIT_OPTS}})? > No means to specify Xmx settings for SparkSubmit in yarn-cluster mode > - > > Key: SPARK-12650 > URL: https://issues.apache.org/jira/browse/SPARK-12650 > Project: Spark > Issue Type: Bug > Components: Spark Submit >Affects Versions: 1.5.2 > Environment: Hadoop 2.6.0 >Reporter: John Vines > > Background- > I have an app master designed to do some work and then launch a spark job. > Issue- > If I use yarn-cluster, then the SparkSubmit does not Xmx itself at all, > leading to the jvm taking a default heap which is relatively large. This > causes a large amount of vmem to be taken, so that it is killed by yarn. This > can be worked around by disabling Yarn's vmem check, but that is a hack. > If I run it in yarn-client mode, it's fine as long as my container has enough > space for the driver, which is manageable. But I feel that the utter lack of > Xmx settings for what I believe is a very small jvm is a problem. > I believe this was introduced with the fix for SPARK-3884 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12650) No means to specify Xmx settings for SparkSubmit in yarn-cluster mode
[ https://issues.apache.org/jira/browse/SPARK-12650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15094189#comment-15094189 ] John Vines commented on SPARK-12650: So I ran it and got this message {code}SPARK_JAVA_OPTS was detected (set to '-Xmx512M'). This is deprecated in Spark 1.0+. Please instead use: - ./spark-submit with conf/spark-defaults.conf to set defaults for an application - ./spark-submit with --driver-java-options to set -X options for a driver - spark.executor.extraJavaOptions to set -X options for executors - SPARK_DAEMON_JAVA_OPTS to set java options for standalone daemons (master or worker) {code} but that was just a warning (so small complaint if this is the proper solution), but it did properly cap the vmem use. > No means to specify Xmx settings for SparkSubmit in yarn-cluster mode > - > > Key: SPARK-12650 > URL: https://issues.apache.org/jira/browse/SPARK-12650 > Project: Spark > Issue Type: Bug > Components: Spark Submit >Affects Versions: 1.5.2 > Environment: Hadoop 2.6.0 >Reporter: John Vines > > Background- > I have an app master designed to do some work and then launch a spark job. > Issue- > If I use yarn-cluster, then the SparkSubmit does not Xmx itself at all, > leading to the jvm taking a default heap which is relatively large. This > causes a large amount of vmem to be taken, so that it is killed by yarn. This > can be worked around by disabling Yarn's vmem check, but that is a hack. > If I run it in yarn-client mode, it's fine as long as my container has enough > space for the driver, which is manageable. But I feel that the utter lack of > Xmx settings for what I believe is a very small jvm is a problem. > I believe this was introduced with the fix for SPARK-3884 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12650) No means to specify Xmx settings for SparkSubmit in yarn-cluster mode
[ https://issues.apache.org/jira/browse/SPARK-12650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15089684#comment-15089684 ] Marcelo Vanzin commented on SPARK-12650: {{SparkLauncher}} has a constructor that takes a map of environment variables to set. > No means to specify Xmx settings for SparkSubmit in yarn-cluster mode > - > > Key: SPARK-12650 > URL: https://issues.apache.org/jira/browse/SPARK-12650 > Project: Spark > Issue Type: Bug > Components: Spark Submit >Affects Versions: 1.5.2 > Environment: Hadoop 2.6.0 >Reporter: John Vines > > Background- > I have an app master designed to do some work and then launch a spark job. > Issue- > If I use yarn-cluster, then the SparkSubmit does not Xmx itself at all, > leading to the jvm taking a default heap which is relatively large. This > causes a large amount of vmem to be taken, so that it is killed by yarn. This > can be worked around by disabling Yarn's vmem check, but that is a hack. > If I run it in yarn-client mode, it's fine as long as my container has enough > space for the driver, which is manageable. But I feel that the utter lack of > Xmx settings for what I believe is a very small jvm is a problem. > I believe this was introduced with the fix for SPARK-3884 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12650) No means to specify Xmx settings for SparkSubmit in yarn-cluster mode
[ https://issues.apache.org/jira/browse/SPARK-12650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15089673#comment-15089673 ] John Vines commented on SPARK-12650: How would I set those through the java SparkLauncher? > No means to specify Xmx settings for SparkSubmit in yarn-cluster mode > - > > Key: SPARK-12650 > URL: https://issues.apache.org/jira/browse/SPARK-12650 > Project: Spark > Issue Type: Bug > Components: Spark Submit >Affects Versions: 1.5.2 > Environment: Hadoop 2.6.0 >Reporter: John Vines > > Background- > I have an app master designed to do some work and then launch a spark job. > Issue- > If I use yarn-cluster, then the SparkSubmit does not Xmx itself at all, > leading to the jvm taking a default heap which is relatively large. This > causes a large amount of vmem to be taken, so that it is killed by yarn. This > can be worked around by disabling Yarn's vmem check, but that is a hack. > If I run it in yarn-client mode, it's fine as long as my container has enough > space for the driver, which is manageable. But I feel that the utter lack of > Xmx settings for what I believe is a very small jvm is a problem. > I believe this was introduced with the fix for SPARK-3884 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12650) No means to specify Xmx settings for SparkSubmit in yarn-cluster mode
[ https://issues.apache.org/jira/browse/SPARK-12650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086173#comment-15086173 ] Marcelo Vanzin commented on SPARK-12650: Can you try the env variables I mention below? I have a suspicion that Xmx won't help, because from my tests the heap doesn't really grow (and in your VM, Xms is 512m, so most of the vmem would not be heap space). > No means to specify Xmx settings for SparkSubmit in yarn-cluster mode > - > > Key: SPARK-12650 > URL: https://issues.apache.org/jira/browse/SPARK-12650 > Project: Spark > Issue Type: Bug > Components: Spark Submit >Affects Versions: 1.5.2 > Environment: Hadoop 2.6.0 >Reporter: John Vines > > Background- > I have an app master designed to do some work and then launch a spark job. > Issue- > If I use yarn-cluster, then the SparkSubmit does not Xmx itself at all, > leading to the jvm taking a default heap which is relatively large. This > causes a large amount of vmem to be taken, so that it is killed by yarn. This > can be worked around by disabling Yarn's vmem check, but that is a hack. > If I run it in yarn-client mode, it's fine as long as my container has enough > space for the driver, which is manageable. But I feel that the utter lack of > Xmx settings for what I believe is a very small jvm is a problem. > I believe this was introduced with the fix for SPARK-3884 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12650) No means to specify Xmx settings for SparkSubmit in yarn-cluster mode
[ https://issues.apache.org/jira/browse/SPARK-12650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086167#comment-15086167 ] John Vines commented on SPARK-12650: I do care about knowing when the spark job is finished. Unless there's another way to track a the spark job, I need to wait for it to complete > No means to specify Xmx settings for SparkSubmit in yarn-cluster mode > - > > Key: SPARK-12650 > URL: https://issues.apache.org/jira/browse/SPARK-12650 > Project: Spark > Issue Type: Bug > Components: Spark Submit >Affects Versions: 1.5.2 > Environment: Hadoop 2.6.0 >Reporter: John Vines > > Background- > I have an app master designed to do some work and then launch a spark job. > Issue- > If I use yarn-cluster, then the SparkSubmit does not Xmx itself at all, > leading to the jvm taking a default heap which is relatively large. This > causes a large amount of vmem to be taken, so that it is killed by yarn. This > can be worked around by disabling Yarn's vmem check, but that is a hack. > If I run it in yarn-client mode, it's fine as long as my container has enough > space for the driver, which is manageable. But I feel that the utter lack of > Xmx settings for what I believe is a very small jvm is a problem. > I believe this was introduced with the fix for SPARK-3884 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12650) No means to specify Xmx settings for SparkSubmit in yarn-cluster mode
[ https://issues.apache.org/jira/browse/SPARK-12650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086148#comment-15086148 ] Marcelo Vanzin commented on SPARK-12650: Do you need the launcher process around after it's launched? If you don't you can set {{spark.yarn.submit.waitAppCompletion}} to false. > No means to specify Xmx settings for SparkSubmit in yarn-cluster mode > - > > Key: SPARK-12650 > URL: https://issues.apache.org/jira/browse/SPARK-12650 > Project: Spark > Issue Type: Bug > Components: Spark Submit >Affects Versions: 1.5.2 > Environment: Hadoop 2.6.0 >Reporter: John Vines > > Background- > I have an app master designed to do some work and then launch a spark job. > Issue- > If I use yarn-cluster, then the SparkSubmit does not Xmx itself at all, > leading to the jvm taking a default heap which is relatively large. This > causes a large amount of vmem to be taken, so that it is killed by yarn. This > can be worked around by disabling Yarn's vmem check, but that is a hack. > If I run it in yarn-client mode, it's fine as long as my container has enough > space for the driver, which is manageable. But I feel that the utter lack of > Xmx settings for what I believe is a very small jvm is a problem. > I believe this was introduced with the fix for SPARK-3884 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12650) No means to specify Xmx settings for SparkSubmit in yarn-cluster mode
[ https://issues.apache.org/jira/browse/SPARK-12650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086138#comment-15086138 ] John Vines commented on SPARK-12650: I'm launching the spark job from inside an App Master, as I said in my background. > No means to specify Xmx settings for SparkSubmit in yarn-cluster mode > - > > Key: SPARK-12650 > URL: https://issues.apache.org/jira/browse/SPARK-12650 > Project: Spark > Issue Type: Bug > Components: Spark Submit >Affects Versions: 1.5.2 > Environment: Hadoop 2.6.0 >Reporter: John Vines > > Background- > I have an app master designed to do some work and then launch a spark job. > Issue- > If I use yarn-cluster, then the SparkSubmit does not Xmx itself at all, > leading to the jvm taking a default heap which is relatively large. This > causes a large amount of vmem to be taken, so that it is killed by yarn. This > can be worked around by disabling Yarn's vmem check, but that is a hack. > If I run it in yarn-client mode, it's fine as long as my container has enough > space for the driver, which is manageable. But I feel that the utter lack of > Xmx settings for what I believe is a very small jvm is a problem. > I believe this was introduced with the fix for SPARK-3884 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12650) No means to specify Xmx settings for SparkSubmit in yarn-cluster mode
[ https://issues.apache.org/jira/browse/SPARK-12650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086135#comment-15086135 ] John Vines commented on SPARK-12650: {code}[root@datanode1-systemtest-john-1 /]# java -XX:+PrintFlagsFinal -version | grep HeapSize uintx ErgoHeapSizeLimit = 0 {product} uintx HeapSizePerGCThread = 87241520{product} uintx InitialHeapSize := 525375744 {product} uintx LargePageHeapSizeThreshold= 134217728 {product} uintx MaxHeapSize := 8407482368 {product} java version "1.7.0_80" Java(TM) SE Runtime Environment (build 1.7.0_80-b15) Java HotSpot(TM) 64-Bit Server VM (build 24.80-b11, mixed mode) {code} > No means to specify Xmx settings for SparkSubmit in yarn-cluster mode > - > > Key: SPARK-12650 > URL: https://issues.apache.org/jira/browse/SPARK-12650 > Project: Spark > Issue Type: Bug > Components: Spark Submit >Affects Versions: 1.5.2 > Environment: Hadoop 2.6.0 >Reporter: John Vines > > Background- > I have an app master designed to do some work and then launch a spark job. > Issue- > If I use yarn-cluster, then the SparkSubmit does not Xmx itself at all, > leading to the jvm taking a default heap which is relatively large. This > causes a large amount of vmem to be taken, so that it is killed by yarn. This > can be worked around by disabling Yarn's vmem check, but that is a hack. > If I run it in yarn-client mode, it's fine as long as my container has enough > space for the driver, which is manageable. But I feel that the utter lack of > Xmx settings for what I believe is a very small jvm is a problem. > I believe this was introduced with the fix for SPARK-3884 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12650) No means to specify Xmx settings for SparkSubmit in yarn-cluster mode
[ https://issues.apache.org/jira/browse/SPARK-12650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086133#comment-15086133 ] John Vines commented on SPARK-12650: In the test example I was using, I set driver and executor to 512MB. When I disabled vmem checking, they seemed to be running with the appropriate memory settings. Getting the actual commind line executed it's a bit of a PITA, but I can get it if it's actually needed. > No means to specify Xmx settings for SparkSubmit in yarn-cluster mode > - > > Key: SPARK-12650 > URL: https://issues.apache.org/jira/browse/SPARK-12650 > Project: Spark > Issue Type: Bug > Components: Spark Submit >Affects Versions: 1.5.2 > Environment: Hadoop 2.6.0 >Reporter: John Vines > > Background- > I have an app master designed to do some work and then launch a spark job. > Issue- > If I use yarn-cluster, then the SparkSubmit does not Xmx itself at all, > leading to the jvm taking a default heap which is relatively large. This > causes a large amount of vmem to be taken, so that it is killed by yarn. This > can be worked around by disabling Yarn's vmem check, but that is a hack. > If I run it in yarn-client mode, it's fine as long as my container has enough > space for the driver, which is manageable. But I feel that the utter lack of > Xmx settings for what I believe is a very small jvm is a problem. > I believe this was introduced with the fix for SPARK-3884 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12650) No means to specify Xmx settings for SparkSubmit in yarn-cluster mode
[ https://issues.apache.org/jira/browse/SPARK-12650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086029#comment-15086029 ] Marcelo Vanzin commented on SPARK-12650: And one last comment: you can set either the {{SPARK_SUBMIT_OPTS}} or {{SPARK_JAVA_OPTS}} env variables to add jvm options to the launcher VM, so while those are sort-of-deprecated and probably not well documented, there's a way to control the launcher jvm options today. > No means to specify Xmx settings for SparkSubmit in yarn-cluster mode > - > > Key: SPARK-12650 > URL: https://issues.apache.org/jira/browse/SPARK-12650 > Project: Spark > Issue Type: Bug > Components: Spark Submit >Affects Versions: 1.5.2 > Environment: Hadoop 2.6.0 >Reporter: John Vines > > Background- > I have an app master designed to do some work and then launch a spark job. > Issue- > If I use yarn-cluster, then the SparkSubmit does not Xmx itself at all, > leading to the jvm taking a default heap which is relatively large. This > causes a large amount of vmem to be taken, so that it is killed by yarn. This > can be worked around by disabling Yarn's vmem check, but that is a hack. > If I run it in yarn-client mode, it's fine as long as my container has enough > space for the driver, which is manageable. But I feel that the utter lack of > Xmx settings for what I believe is a very small jvm is a problem. > I believe this was introduced with the fix for SPARK-3884 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12650) No means to specify Xmx settings for SparkSubmit in yarn-cluster mode
[ https://issues.apache.org/jira/browse/SPARK-12650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15086023#comment-15086023 ] Marcelo Vanzin commented on SPARK-12650: Also, can you clarify this statement: {quote} This causes a large amount of vmem to be taken, so that it is killed by yarn. {quote} If you're talking about the launcher process, YARN has no control over it, so how can it be killed by YARN? BTW a quick test in my cluster shows the launcher using about 256m of resident memory while running (and about 10g of virtual memory), and those values seem stable while it's running. > No means to specify Xmx settings for SparkSubmit in yarn-cluster mode > - > > Key: SPARK-12650 > URL: https://issues.apache.org/jira/browse/SPARK-12650 > Project: Spark > Issue Type: Bug > Components: Spark Submit >Affects Versions: 1.5.2 > Environment: Hadoop 2.6.0 >Reporter: John Vines > > Background- > I have an app master designed to do some work and then launch a spark job. > Issue- > If I use yarn-cluster, then the SparkSubmit does not Xmx itself at all, > leading to the jvm taking a default heap which is relatively large. This > causes a large amount of vmem to be taken, so that it is killed by yarn. This > can be worked around by disabling Yarn's vmem check, but that is a hack. > If I run it in yarn-client mode, it's fine as long as my container has enough > space for the driver, which is manageable. But I feel that the utter lack of > Xmx settings for what I believe is a very small jvm is a problem. > I believe this was introduced with the fix for SPARK-3884 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12650) No means to specify Xmx settings for SparkSubmit in yarn-cluster mode
[ https://issues.apache.org/jira/browse/SPARK-12650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15085928#comment-15085928 ] Marcelo Vanzin commented on SPARK-12650: You can find the default Xmx like this: {code} java -XX:+PrintFlagsFinal -version | grep HeapSize {code} It's true that Spark does not define an Xmx for the launcher in cluster mode; that being said, I'm a little surprised it ends up using so much memory, since it's not really doing much. Would you be able to take a {{jmap -histo }} to see what's using to much memory? We could add an option to control that and force it to a smaller value by default. > No means to specify Xmx settings for SparkSubmit in yarn-cluster mode > - > > Key: SPARK-12650 > URL: https://issues.apache.org/jira/browse/SPARK-12650 > Project: Spark > Issue Type: Bug > Components: Spark Submit >Affects Versions: 1.5.2 > Environment: Hadoop 2.6.0 >Reporter: John Vines > > Background- > I have an app master designed to do some work and then launch a spark job. > Issue- > If I use yarn-cluster, then the SparkSubmit does not Xmx itself at all, > leading to the jvm taking a default heap which is relatively large. This > causes a large amount of vmem to be taken, so that it is killed by yarn. This > can be worked around by disabling Yarn's vmem check, but that is a hack. > If I run it in yarn-client mode, it's fine as long as my container has enough > space for the driver, which is manageable. But I feel that the utter lack of > Xmx settings for what I believe is a very small jvm is a problem. > I believe this was introduced with the fix for SPARK-3884 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12650) No means to specify Xmx settings for SparkSubmit in yarn-cluster mode
[ https://issues.apache.org/jira/browse/SPARK-12650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15085773#comment-15085773 ] Sean Owen commented on SPARK-12650: --- This may be an aside but I'm curious -- what is your default max heap size on your Java 7 JVM? I don't recall it being nearly that big ever, but maybe it is dependent on the system memory and you have a lot and I haven't been keeping up. I take your point about the JVM feeling welcome to use its max heap rather than GC. Anyway [~vanzin] might know a little more about how the 3 (?) JVMs in play here have their memory set in the context of the new spark-submit setup. BTW how are you running these things and what do you set for driver, executor memory? the full command line plus any config in your conf file might be helpful. > No means to specify Xmx settings for SparkSubmit in yarn-cluster mode > - > > Key: SPARK-12650 > URL: https://issues.apache.org/jira/browse/SPARK-12650 > Project: Spark > Issue Type: Bug > Components: Spark Submit >Affects Versions: 1.5.2 > Environment: Hadoop 2.6.0 >Reporter: John Vines > > Background- > I have an app master designed to do some work and then launch a spark job. > Issue- > If I use yarn-cluster, then the SparkSubmit does not Xmx itself at all, > leading to the jvm taking a default heap which is relatively large. This > causes a large amount of vmem to be taken, so that it is killed by yarn. This > can be worked around by disabling Yarn's vmem check, but that is a hack. > If I run it in yarn-client mode, it's fine as long as my container has enough > space for the driver, which is manageable. But I feel that the utter lack of > Xmx settings for what I believe is a very small jvm is a problem. > I believe this was introduced with the fix for SPARK-3884 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12650) No means to specify Xmx settings for SparkSubmit in yarn-cluster mode
[ https://issues.apache.org/jira/browse/SPARK-12650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15085747#comment-15085747 ] John Vines commented on SPARK-12650: No, neither Xmx nor Xms are set. This has to do with java 7 and it's default heap allocation (7 got very agressive vs. 6 which was relatively sane) in my experience. > No means to specify Xmx settings for SparkSubmit in yarn-cluster mode > - > > Key: SPARK-12650 > URL: https://issues.apache.org/jira/browse/SPARK-12650 > Project: Spark > Issue Type: Bug > Components: Spark Submit >Affects Versions: 1.5.2 > Environment: Hadoop 2.6.0 >Reporter: John Vines > > Background- > I have an app master designed to do some work and then launch a spark job. > Issue- > If I use yarn-cluster, then the SparkSubmit does not Xmx itself at all, > leading to the jvm taking a default heap which is relatively large. This > causes a large amount of vmem to be taken, so that it is killed by yarn. This > can be worked around by disabling Yarn's vmem check, but that is a hack. > If I run it in yarn-client mode, it's fine as long as my container has enough > space for the driver, which is manageable. But I feel that the utter lack of > Xmx settings for what I believe is a very small jvm is a problem. > I believe this was introduced with the fix for SPARK-3884 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12650) No means to specify Xmx settings for SparkSubmit in yarn-cluster mode
[ https://issues.apache.org/jira/browse/SPARK-12650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15085734#comment-15085734 ] Sean Owen commented on SPARK-12650: --- Hm, the default heap size in the JVM isn't 8GB is it? I just peeked at my Java 8 and it's about 512MB. Is it that it's picking up some other setting causing it to set a large "-Xmx"? but even that doesn't make the JVM allocate memory. Is "-Xms" set by something? I know we discussed removing "-Xms" everywhere for similar reasons. > No means to specify Xmx settings for SparkSubmit in yarn-cluster mode > - > > Key: SPARK-12650 > URL: https://issues.apache.org/jira/browse/SPARK-12650 > Project: Spark > Issue Type: Bug > Components: Spark Submit >Affects Versions: 1.5.2 > Environment: Hadoop 2.6.0 >Reporter: John Vines > > Background- > I have an app master designed to do some work and then launch a spark job. > Issue- > If I use yarn-cluster, then the SparkSubmit does not Xmx itself at all, > leading to the jvm taking a default heap which is relatively large. This > causes a large amount of vmem to be taken, so that it is killed by yarn. This > can be worked around by disabling Yarn's vmem check, but that is a hack. > If I run it in yarn-client mode, it's fine as long as my container has enough > space for the driver, which is manageable. But I feel that the utter lack of > Xmx settings for what I believe is a very small jvm is a problem. > I believe this was introduced with the fix for SPARK-3884 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12650) No means to specify Xmx settings for SparkSubmit in yarn-cluster mode
[ https://issues.apache.org/jira/browse/SPARK-12650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15085722#comment-15085722 ] John Vines commented on SPARK-12650: I'm referring to the client which is launching the driver in yarn-cluster mode. Without the SparkSubmit jvm, which is operating as the client, specifying Xmx, it's taking ~8GB of vmem on my machine, which causes yarn to kill the whole container. This is in spark 1.5.2. > No means to specify Xmx settings for SparkSubmit in yarn-cluster mode > - > > Key: SPARK-12650 > URL: https://issues.apache.org/jira/browse/SPARK-12650 > Project: Spark > Issue Type: Bug > Components: Spark Submit >Affects Versions: 1.5.2 > Environment: Hadoop 2.6.0 >Reporter: John Vines > > Background- > I have an app master designed to do some work and then launch a spark job. > Issue- > If I use yarn-cluster, then the SparkSubmit does not Xmx itself at all, > leading to the jvm taking a default heap which is relatively large. This > causes a large amount of vmem to be taken, so that it is killed by yarn. This > can be worked around by disabling Yarn's vmem check, but that is a hack. > If I run it in yarn-client mode, it's fine as long as my container has enough > space for the driver, which is manageable. But I feel that the utter lack of > Xmx settings for what I believe is a very small jvm is a problem. > I believe this was introduced with the fix for SPARK-3884 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12650) No means to specify Xmx settings for SparkSubmit in yarn-cluster mode
[ https://issues.apache.org/jira/browse/SPARK-12650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15084497#comment-15084497 ] Saisai Shao commented on SPARK-12650: - [~vines], what is your meaning of "SparkSubmit does not Xmx itself at all", what do you mean by "itself", client, or driver? Can't be worked with {{spark.driver.memory}}? For the yarn-client mode, driver is not managed by YARN, it is a local JVM process managed by yourself, so {{spark.driver.memory}} control the memory size of this process, the memory of AM is controlled by {{spark.yarn.am.memory}}. I'm not exactly know what actual your problem it is, can you elaborate it, like Spark version, your configurations... > No means to specify Xmx settings for SparkSubmit in yarn-cluster mode > - > > Key: SPARK-12650 > URL: https://issues.apache.org/jira/browse/SPARK-12650 > Project: Spark > Issue Type: Bug >Affects Versions: 1.5.2 > Environment: Hadoop 2.6.0 >Reporter: John Vines > > Background- > I have an app master designed to do some work and then launch a spark job. > Issue- > If I use yarn-cluster, then the SparkSubmit does not Xmx itself at all, > leading to the jvm taking a default heap which is relatively large. This > causes a large amount of vmem to be taken, so that it is killed by yarn. This > can be worked around by disabling Yarn's vmem check, but that is a hack. > If I run it in yarn-client mode, it's fine as long as my container has enough > space for the driver, which is manageable. But I feel that the utter lack of > Xmx settings for what I believe is a very small jvm is a problem. > I believe this was introduced with the fix for SPARK-3884 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org