Elkhan Dadashov created SPARK-18288:
---------------------------------------

             Summary: SparkLauncer 2.0.1 version working incosistently in 
yarn-client mode
                 Key: SPARK-18288
                 URL: https://issues.apache.org/jira/browse/SPARK-18288
             Project: Spark
          Issue Type: Bug
          Components: Spark Submit
    Affects Versions: 2.0.1
         Environment: I'm running Spark 2.0.1 version with Spark Launcher 2.0.1 
version on Yarn cluster. Deploy mode is yarn-client.
            Reporter: Elkhan Dadashov


I'm running Spark 2.0.1 version with Spark Launcher 2.0.1 version on Yarn 
cluster. I launch map task which spawns Spark job via 
SparkLauncher#startApplication().

Deploy mode is yarn-client. I'm running in Mac laptop.

I have this snippet of code:
SparkAppHandle appHandle = sparkLauncher.startApplication();

while (appHandle.getState() == null || !appHandle.getState().isFinal()) {
    if (appHandle.getState() != null) {
        log.info("while: Spark job state is : " + 
appHandle.getState());//highlighted
        if (appHandle.getAppId() != null) {
            log.info("\t App id: " + appHandle.getAppId() + "\tState: " + 
appHandle.getState());
        }
    }
}

The above snippet of code works fine, both spark job and the map task which 
spawns that Spark job successfully completes.

But if i comment out the red highlighted line, then the Spark job launches and 
finishes successfully, but the map task hangs for a while (in Running state) 
and then fails with the exception below.

I run exact same code in exact same environment except that one line commented 
out. 

When the highlighted line is commented out, I even do not see the 2nd log line 
in the stderr either, it seems appHandle hook never returns back anything 
(neither app id nor app state), even though spark application starts, runs and 
finishes successfully. Inside the same stderr, i can see Spark job related 
logs, and spark job results printed, and application report indicating status.

You can see the exception below (this is from the stderr of the mapper 
container which launches Spark job):
---
INFO: Communication exception: java.net.ConnectException: Call From 
<my-hostname>/10.3.8.118 to <my-hostname>:53567 failed on connection exception: 
java.net.ConnectException: Connection refused;

Caused by: java.net.ConnectException: Connection refused

        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)

        at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)

        at 
org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)

        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)

        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495)

        at 
org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:614)

        at 
org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:712)

        at org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:375)

        at org.apache.hadoop.ipc.Client.getConnection(Client.java:1528)

        at org.apache.hadoop.ipc.Client.call(Client.java:1451)


        ... 5 more

---

Nov 05, 2016 2:41:54 AM org.apache.hadoop.ipc.Client handleConnectionFailure

INFO: Retrying connect to server: <my-hostname>/10.3.8.118:53567. Already tried 
9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, 
sleepTime=1000 MILLISECONDS)

Nov 05, 2016 2:41:54 AM org.apache.hadoop.mapred.Task run

INFO: Communication exception: java.net.ConnectException: Call From 
<my-hostname>/10.3.8.118 to <my-hostname>:53567 failed on connection exception: 
java.net.ConnectException: Connection refused; For more details see:  
http://wiki.apache.org/hadoop/ConnectionRefused

        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)

        at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)

        at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)

        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)

        at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792)

        at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:732)

        at org.apache.hadoop.ipc.Client.call(Client.java:1479)

        at org.apache.hadoop.ipc.Client.call(Client.java:1412)

        at 
org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:242)

        at com.sun.proxy.$Proxy9.ping(Unknown Source)

        at org.apache.hadoop.mapred.Task$TaskReporter.run(Task.java:767)

        at java.lang.Thread.run(Thread.java:745)

Caused by: java.net.ConnectException: Connection refused

        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)

        at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)

        at 
org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)

        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)

        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495)

        at 
org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:614)

        at 
org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:712)

        at org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:375)

        at org.apache.hadoop.ipc.Client.getConnection(Client.java:1528)

        at org.apache.hadoop.ipc.Client.call(Client.java:1451)


        ... 5 more

---

Nov 05, 2016 2:41:54 AM org.apache.hadoop.mapred.Task logThreadInfo

INFO: Process Thread Dump: Communication exception

10 active threads

Thread 24 (org.apache.hadoop.hdfs.PeerCache@4763c727):

  State: TIMED_WAITING

  Blocked count: 0

  Waited count: 79

  Stack:

    java.lang.Thread.sleep(Native Method)

    org.apache.hadoop.hdfs.PeerCache.run(PeerCache.java:255)

    org.apache.hadoop.hdfs.PeerCache.access$000(PeerCache.java:46)

    org.apache.hadoop.hdfs.PeerCache$1.run(PeerCache.java:124)


    java.lang.Thread.run(Thread.java:745)


0 New

Reply to all




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to