[jira] [Commented] (SPARK-2604) Spark Application hangs on yarn in edge case scenario of executor memory requirement
[ https://issues.apache.org/jira/browse/SPARK-2604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14243221#comment-14243221 ] Sean Owen commented on SPARK-2604: -- PR comments suggest this was fixed by SPARK-2140? https://github.com/apache/spark/pull/1571 > Spark Application hangs on yarn in edge case scenario of executor memory > requirement > > > Key: SPARK-2604 > URL: https://issues.apache.org/jira/browse/SPARK-2604 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.0.0 >Reporter: Twinkle Sachdeva > > In yarn environment, let's say : > MaxAM = Maximum allocatable memory > ExecMem - Executor's memory > if (MaxAM > ExecMem && ( MaxAM - ExecMem) > 384m )) > then Maximum resource validation fails w.r.t executor memory , and > application master gets launched, but when resource is allocated and again > validated, they are returned and application appears to be hanged. > Typical use case is to ask for executor memory = maximum allowed memory as > per yarn config -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-2604) Spark Application hangs on yarn in edge case scenario of executor memory requirement
[ https://issues.apache.org/jira/browse/SPARK-2604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14068391#comment-14068391 ] Twinkle Sachdeva commented on SPARK-2604: - Please assign this issue to me. > Spark Application hangs on yarn in edge case scenario of executor memory > requirement > > > Key: SPARK-2604 > URL: https://issues.apache.org/jira/browse/SPARK-2604 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.0.0 >Reporter: Twinkle Sachdeva > > In yarn environment, let's say : > MaxAM = Maximum allocatable memory > ExecMem - Executor's memory > if (MaxAM > ExecMem && ( MaxAM - ExecMem) > 384m )) > then Maximum resource validation fails w.r.t executor memory , and > application master gets launched, but when resource is allocated and again > validated, they are returned and application appears to be hanged. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2604) Spark Application hangs on yarn in edge case scenario of executor memory requirement
[ https://issues.apache.org/jira/browse/SPARK-2604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14068528#comment-14068528 ] Thomas Graves commented on SPARK-2604: -- Just to clarify, are you referring to the max checks in ClientBase.verifyClusterResources? Basically they don't take into account for the memory overhead. > Spark Application hangs on yarn in edge case scenario of executor memory > requirement > > > Key: SPARK-2604 > URL: https://issues.apache.org/jira/browse/SPARK-2604 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.0.0 >Reporter: Twinkle Sachdeva > > In yarn environment, let's say : > MaxAM = Maximum allocatable memory > ExecMem - Executor's memory > if (MaxAM > ExecMem && ( MaxAM - ExecMem) > 384m )) > then Maximum resource validation fails w.r.t executor memory , and > application master gets launched, but when resource is allocated and again > validated, they are returned and application appears to be hanged. > Typical use case is to ask for executor memory = maximum allowed memory as > per yarn config -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2604) Spark Application hangs on yarn in edge case scenario of executor memory requirement
[ https://issues.apache.org/jira/browse/SPARK-2604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14068531#comment-14068531 ] Thomas Graves commented on SPARK-2604: -- Also note that it shouldn't hang, it should fail after a certain number of retries. The AM retries is configured by the resource manager, the executor failure number is (although this only work in yarn-cluster mode) There is Pr up to fix in client mode if that is what you are using. private val maxNumExecutorFailures = sparkConf.getInt("spark.yarn.max.executor.failures", sparkConf.getInt("spark.yarn.max.worker.failures", math.max(args.numExecutors * 2, 3))) > Spark Application hangs on yarn in edge case scenario of executor memory > requirement > > > Key: SPARK-2604 > URL: https://issues.apache.org/jira/browse/SPARK-2604 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.0.0 >Reporter: Twinkle Sachdeva > > In yarn environment, let's say : > MaxAM = Maximum allocatable memory > ExecMem - Executor's memory > if (MaxAM > ExecMem && ( MaxAM - ExecMem) > 384m )) > then Maximum resource validation fails w.r.t executor memory , and > application master gets launched, but when resource is allocated and again > validated, they are returned and application appears to be hanged. > Typical use case is to ask for executor memory = maximum allowed memory as > per yarn config -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2604) Spark Application hangs on yarn in edge case scenario of executor memory requirement
[ https://issues.apache.org/jira/browse/SPARK-2604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14068648#comment-14068648 ] Twinkle Sachdeva commented on SPARK-2604: - For Executors, In verifyClusterResources we do not take into account the overhead, where as in YarnAllocationHandler.scala, following def is provided: isResourceConstraintSatisfied() : itif the container memory is >= executormemory + MemoryOverhead. In the case,when container is not allocated with enough memory to satisfy the condition, container is release. As executor has not been launched, it is not counted as failures. Please see the code below: for (container <- allocatedContainers) { if (isResourceConstraintSatisfied(container)) { // Add the accepted `container` to the host's list of already accepted, // allocated containers val host = container.getNodeId.getHost val containersForHost = hostToContainers.getOrElseUpdate(host, new ArrayBuffer[Container]()) containersForHost += container } else { // Release container, since it doesn't satisfy resource constraints. releaseContainer(container) } } So allocation happens and container is then returned and not counted as failed, due to which only App master is launched. > Spark Application hangs on yarn in edge case scenario of executor memory > requirement > > > Key: SPARK-2604 > URL: https://issues.apache.org/jira/browse/SPARK-2604 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.0.0 >Reporter: Twinkle Sachdeva > > In yarn environment, let's say : > MaxAM = Maximum allocatable memory > ExecMem - Executor's memory > if (MaxAM > ExecMem && ( MaxAM - ExecMem) > 384m )) > then Maximum resource validation fails w.r.t executor memory , and > application master gets launched, but when resource is allocated and again > validated, they are returned and application appears to be hanged. > Typical use case is to ask for executor memory = maximum allowed memory as > per yarn config -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2604) Spark Application hangs on yarn in edge case scenario of executor memory requirement
[ https://issues.apache.org/jira/browse/SPARK-2604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14070227#comment-14070227 ] Twinkle Sachdeva commented on SPARK-2604: - I tried running in yarn-cluster mode. After setting property of spark.yarn.max.executor.failures to some number. Application do gets failed, but with misleading exception ( pasted at the end ). Instead of handling the condition this way, probably we should be doing the check for the overhead memory amount at the validation itself. Please share your thoughts, if you think otherwise. Stacktrace : Application application_1405933848949_0024 failed 2 times due to Error launching appattempt_1405933848949_0024_02. Got exception: java.net.ConnectException: Call From NN46/192.168.156.46 to localhost:51322 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source) at java.lang.reflect.Constructor.newInstance(Unknown Source) at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783) at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:730) at org.apache.hadoop.ipc.Client.call(Client.java:1414) at org.apache.hadoop.ipc.Client.call(Client.java:1363) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) at com.sun.proxy.$Proxy28.startContainers(Unknown Source) at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:96) at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:118) at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:249) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) > Spark Application hangs on yarn in edge case scenario of executor memory > requirement > > > Key: SPARK-2604 > URL: https://issues.apache.org/jira/browse/SPARK-2604 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.0.0 >Reporter: Twinkle Sachdeva > > In yarn environment, let's say : > MaxAM = Maximum allocatable memory > ExecMem - Executor's memory > if (MaxAM > ExecMem && ( MaxAM - ExecMem) > 384m )) > then Maximum resource validation fails w.r.t executor memory , and > application master gets launched, but when resource is allocated and again > validated, they are returned and application appears to be hanged. > Typical use case is to ask for executor memory = maximum allowed memory as > per yarn config -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2604) Spark Application hangs on yarn in edge case scenario of executor memory requirement
[ https://issues.apache.org/jira/browse/SPARK-2604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14071786#comment-14071786 ] Thomas Graves commented on SPARK-2604: -- Yes we should be adding the overhead in at the check. > Spark Application hangs on yarn in edge case scenario of executor memory > requirement > > > Key: SPARK-2604 > URL: https://issues.apache.org/jira/browse/SPARK-2604 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.0.0 >Reporter: Twinkle Sachdeva > > In yarn environment, let's say : > MaxAM = Maximum allocatable memory > ExecMem - Executor's memory > if (MaxAM > ExecMem && ( MaxAM - ExecMem) > 384m )) > then Maximum resource validation fails w.r.t executor memory , and > application master gets launched, but when resource is allocated and again > validated, they are returned and application appears to be hanged. > Typical use case is to ask for executor memory = maximum allowed memory as > per yarn config -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2604) Spark Application hangs on yarn in edge case scenario of executor memory requirement
[ https://issues.apache.org/jira/browse/SPARK-2604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14073085#comment-14073085 ] Apache Spark commented on SPARK-2604: - User 'twinkle-sachdeva' has created a pull request for this issue: https://github.com/apache/spark/pull/1571 > Spark Application hangs on yarn in edge case scenario of executor memory > requirement > > > Key: SPARK-2604 > URL: https://issues.apache.org/jira/browse/SPARK-2604 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.0.0 >Reporter: Twinkle Sachdeva > > In yarn environment, let's say : > MaxAM = Maximum allocatable memory > ExecMem - Executor's memory > if (MaxAM > ExecMem && ( MaxAM - ExecMem) > 384m )) > then Maximum resource validation fails w.r.t executor memory , and > application master gets launched, but when resource is allocated and again > validated, they are returned and application appears to be hanged. > Typical use case is to ask for executor memory = maximum allowed memory as > per yarn config -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (SPARK-2604) Spark Application hangs on yarn in edge case scenario of executor memory requirement
[ https://issues.apache.org/jira/browse/SPARK-2604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14073086#comment-14073086 ] Twinkle Sachdeva commented on SPARK-2604: - Please review the pull request : https://github.com/apache/spark/pull/1571 > Spark Application hangs on yarn in edge case scenario of executor memory > requirement > > > Key: SPARK-2604 > URL: https://issues.apache.org/jira/browse/SPARK-2604 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.0.0 >Reporter: Twinkle Sachdeva > > In yarn environment, let's say : > MaxAM = Maximum allocatable memory > ExecMem - Executor's memory > if (MaxAM > ExecMem && ( MaxAM - ExecMem) > 384m )) > then Maximum resource validation fails w.r.t executor memory , and > application master gets launched, but when resource is allocated and again > validated, they are returned and application appears to be hanged. > Typical use case is to ask for executor memory = maximum allowed memory as > per yarn config -- This message was sent by Atlassian JIRA (v6.2#6252)