[ https://issues.apache.org/jira/browse/SPARK-18288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15644927#comment-15644927 ]
Elkhan Dadashov commented on SPARK-18288: ----------------------------------------- Let me replace while loop with ScheduledExecutorService, and then will update the issue. > SparkLauncer 2.0.1 version working incosistently in yarn-client mode > -------------------------------------------------------------------- > > Key: SPARK-18288 > URL: https://issues.apache.org/jira/browse/SPARK-18288 > Project: Spark > Issue Type: Bug > Components: Spark Submit > Affects Versions: 2.0.1 > Environment: I'm running Spark 2.0.1 version with Spark Launcher > 2.0.1 version on Yarn cluster. Deploy mode is yarn-client. > Reporter: Elkhan Dadashov > > I'm running Spark 2.0.1 version with Spark Launcher 2.0.1 version on Yarn > cluster. I launch map task which spawns Spark job via > SparkLauncher#startApplication(). > Deploy mode is yarn-client. > I'm running in Mac laptop. > I have this snippet of code: > {code:title=Bar.java|borderStyle=solid} > SparkAppHandle appHandle = sparkLauncher.startApplication(); > while (appHandle.getState() == null || !appHandle.getState().isFinal()) { > if (appHandle.getState() != null) { > // If the line below is commented, then appState and appId cannot be > retrieved. > log.info("while: Spark job state is : " + appHandle.getState()); > if (appHandle.getAppId() != null) { > log.info("\t App id: " + appHandle.getAppId() + "\tState: " + > appHandle.getState()); > } > } > } > {code} > The above snippet of code works fine, both spark job and the map task which > spawns that Spark job successfully completes. > But if i comment out the red highlighted line, then the Spark job launches > and finishes successfully, but the map task hangs for a while (in Running > state) and then fails with the exception below. > I run exact same code in exact same environment except that one line > commented out. > When the highlighted line is commented out, I even do not see the 2nd log > line in the stderr either, it seems appHandle hook never returns back > anything (neither app id nor app state), even though spark application > starts, runs and finishes successfully. Inside the same stderr, i can see > Spark job related logs, and spark job results printed, and application report > indicating status. > You can see the exception below (this is from the stderr of the mapper > container which launches Spark job): > --- > INFO: Communication exception: java.net.ConnectException: Call From > <my-hostname>/10.3.8.118 to <my-hostname>:53567 failed on connection > exception: java.net.ConnectException: Connection refused; > Caused by: java.net.ConnectException: Connection refused > at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) > at > sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) > at > org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) > at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531) > at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495) > at > org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:614) > at > org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:712) > at > org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:375) > at org.apache.hadoop.ipc.Client.getConnection(Client.java:1528) > at org.apache.hadoop.ipc.Client.call(Client.java:1451) > ... 5 more > --- > Nov 05, 2016 2:41:54 AM org.apache.hadoop.ipc.Client handleConnectionFailure > INFO: Retrying connect to server: <my-hostname>/10.3.8.118:53567. Already > tried 9 time(s); retry policy is > RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 > MILLISECONDS) > Nov 05, 2016 2:41:54 AM org.apache.hadoop.mapred.Task run > INFO: Communication exception: java.net.ConnectException: Call From > <my-hostname>/10.3.8.118 to <my-hostname>:53567 failed on connection > exception: java.net.ConnectException: Connection refused; For more details > see: http://wiki.apache.org/hadoop/ConnectionRefused > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native > Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at java.lang.reflect.Constructor.newInstance(Constructor.java:423) > at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792) > at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:732) > at org.apache.hadoop.ipc.Client.call(Client.java:1479) > at org.apache.hadoop.ipc.Client.call(Client.java:1412) > at > org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:242) > at com.sun.proxy.$Proxy9.ping(Unknown Source) > at org.apache.hadoop.mapred.Task$TaskReporter.run(Task.java:767) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.net.ConnectException: Connection refused > at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) > at > sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) > at > org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) > at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531) > at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495) > at > org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:614) > at > org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:712) > at > org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:375) > at org.apache.hadoop.ipc.Client.getConnection(Client.java:1528) > at org.apache.hadoop.ipc.Client.call(Client.java:1451) > ... 5 more > --- > Nov 05, 2016 2:41:54 AM org.apache.hadoop.mapred.Task logThreadInfo > INFO: Process Thread Dump: Communication exception > 10 active threads > Thread 24 (org.apache.hadoop.hdfs.PeerCache@4763c727): > State: TIMED_WAITING > Blocked count: 0 > Waited count: 79 > Stack: > java.lang.Thread.sleep(Native Method) > org.apache.hadoop.hdfs.PeerCache.run(PeerCache.java:255) > org.apache.hadoop.hdfs.PeerCache.access$000(PeerCache.java:46) > org.apache.hadoop.hdfs.PeerCache$1.run(PeerCache.java:124) > java.lang.Thread.run(Thread.java:745) > 0 New > Reply to all -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org