[ 
https://issues.apache.org/jira/browse/SPARK-2604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14070227#comment-14070227
 ] 

Twinkle Sachdeva commented on SPARK-2604:
-----------------------------------------

I tried running in yarn-cluster mode. After setting property of 
spark.yarn.max.executor.failures to some number. Application do gets failed, 
but with misleading exception ( pasted at the end ). Instead of handling the 
condition this way, probably we should be doing the check for the overhead 
memory amount at the validation itself. Please share your thoughts, if you 
think otherwise.

Stacktrace :
Application application_1405933848949_0024 failed 2 times due to Error 
launching appattempt_1405933848949_0024_000002. Got exception: 
java.net.ConnectException: Call From NN46/192.168.156.46 to localhost:51322 
failed on connection exception: java.net.ConnectException: Connection refused; 
For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
at java.lang.reflect.Constructor.newInstance(Unknown Source)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:730)
at org.apache.hadoop.ipc.Client.call(Client.java:1414)
at org.apache.hadoop.ipc.Client.call(Client.java:1363)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
at com.sun.proxy.$Proxy28.startContainers(Unknown Source)
at 
org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:96)
at 
org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:118)
at 
org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:249)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)

> Spark Application hangs on yarn in edge case scenario of executor memory 
> requirement
> ------------------------------------------------------------------------------------
>
>                 Key: SPARK-2604
>                 URL: https://issues.apache.org/jira/browse/SPARK-2604
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 1.0.0
>            Reporter: Twinkle Sachdeva
>
> In yarn environment, let's say :
> MaxAM = Maximum allocatable memory
> ExecMem - Executor's memory
> if (MaxAM > ExecMem && ( MaxAM - ExecMem) > 384m ))
>   then Maximum resource validation fails w.r.t executor memory , and 
> application master gets launched, but when resource is allocated and again 
> validated, they are returned and application appears to be hanged.
> Typical use case is to ask for executor memory = maximum allowed memory as 
> per yarn config



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to