[ 
https://issues.apache.org/jira/browse/SPARK-18288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15642138#comment-15642138
 ] 

Elkhan Dadashov commented on SPARK-18288:
-----------------------------------------

Actually in while and if statements I already call getState(), in addition to 
log line. This is very weird case, as logging line should not change anything 
in the program behavior. log line commented it does not work (the spawned Spark 
job successfully completes, but neither appState nor appId is received, and map 
task which spawned Spark job fails), uncommented log line (the only change) it 
works (both spark and map task successfully complete).

> SparkLauncer 2.0.1 version working incosistently in yarn-client mode
> --------------------------------------------------------------------
>
>                 Key: SPARK-18288
>                 URL: https://issues.apache.org/jira/browse/SPARK-18288
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Submit
>    Affects Versions: 2.0.1
>         Environment: I'm running Spark 2.0.1 version with Spark Launcher 
> 2.0.1 version on Yarn cluster. Deploy mode is yarn-client.
>            Reporter: Elkhan Dadashov
>
> I'm running Spark 2.0.1 version with Spark Launcher 2.0.1 version on Yarn 
> cluster. I launch map task which spawns Spark job via 
> SparkLauncher#startApplication().
> Deploy mode is yarn-client. 
> I'm running in Mac laptop.
> I have this snippet of code:
> {code:title=Bar.java|borderStyle=solid}
> SparkAppHandle appHandle = sparkLauncher.startApplication();
> while (appHandle.getState() == null || !appHandle.getState().isFinal()) {
>     if (appHandle.getState() != null) {
>         // If the line below is commented, then appState and appId cannot be 
> retrieved.
>         log.info("while: Spark job state is : " + appHandle.getState());
>         if (appHandle.getAppId() != null) {
>             log.info("\t App id: " + appHandle.getAppId() + "\tState: " + 
> appHandle.getState());
>         }
>     }
> }
> {code}
> The above snippet of code works fine, both spark job and the map task which 
> spawns that Spark job successfully completes.
> But if i comment out the red highlighted line, then the Spark job launches 
> and finishes successfully, but the map task hangs for a while (in Running 
> state) and then fails with the exception below.
> I run exact same code in exact same environment except that one line 
> commented out. 
> When the highlighted line is commented out, I even do not see the 2nd log 
> line in the stderr either, it seems appHandle hook never returns back 
> anything (neither app id nor app state), even though spark application 
> starts, runs and finishes successfully. Inside the same stderr, i can see 
> Spark job related logs, and spark job results printed, and application report 
> indicating status.
> You can see the exception below (this is from the stderr of the mapper 
> container which launches Spark job):
> ---
> INFO: Communication exception: java.net.ConnectException: Call From 
> <my-hostname>/10.3.8.118 to <my-hostname>:53567 failed on connection 
> exception: java.net.ConnectException: Connection refused;
> Caused by: java.net.ConnectException: Connection refused
>         at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>         at 
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
>         at 
> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
>         at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
>         at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495)
>         at 
> org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:614)
>         at 
> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:712)
>         at 
> org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:375)
>         at org.apache.hadoop.ipc.Client.getConnection(Client.java:1528)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1451)
>         ... 5 more
> ---
> Nov 05, 2016 2:41:54 AM org.apache.hadoop.ipc.Client handleConnectionFailure
> INFO: Retrying connect to server: <my-hostname>/10.3.8.118:53567. Already 
> tried 9 time(s); retry policy is 
> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
> MILLISECONDS)
> Nov 05, 2016 2:41:54 AM org.apache.hadoop.mapred.Task run
> INFO: Communication exception: java.net.ConnectException: Call From 
> <my-hostname>/10.3.8.118 to <my-hostname>:53567 failed on connection 
> exception: java.net.ConnectException: Connection refused; For more details 
> see:  http://wiki.apache.org/hadoop/ConnectionRefused
>         at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
>         at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>         at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>         at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>         at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792)
>         at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:732)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1479)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1412)
>         at 
> org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:242)
>         at com.sun.proxy.$Proxy9.ping(Unknown Source)
>         at org.apache.hadoop.mapred.Task$TaskReporter.run(Task.java:767)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: java.net.ConnectException: Connection refused
>         at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>         at 
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
>         at 
> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
>         at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
>         at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495)
>         at 
> org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:614)
>         at 
> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:712)
>         at 
> org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:375)
>         at org.apache.hadoop.ipc.Client.getConnection(Client.java:1528)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1451)
>         ... 5 more
> ---
> Nov 05, 2016 2:41:54 AM org.apache.hadoop.mapred.Task logThreadInfo
> INFO: Process Thread Dump: Communication exception
> 10 active threads
> Thread 24 (org.apache.hadoop.hdfs.PeerCache@4763c727):
>   State: TIMED_WAITING
>   Blocked count: 0
>   Waited count: 79
>   Stack:
>     java.lang.Thread.sleep(Native Method)
>     org.apache.hadoop.hdfs.PeerCache.run(PeerCache.java:255)
>     org.apache.hadoop.hdfs.PeerCache.access$000(PeerCache.java:46)
>     org.apache.hadoop.hdfs.PeerCache$1.run(PeerCache.java:124)
>     java.lang.Thread.run(Thread.java:745)
> 0 New
> Reply to all



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to