[ 
https://issues.apache.org/jira/browse/SPARK-18288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15646950#comment-15646950
 ] 

Elkhan Dadashov commented on SPARK-18288:
-----------------------------------------

The issue was related to while() loop stalling the CPU.
Instead I changed the code using CountDownLatch, and it works as expected.
{code:title=Bar.java|borderStyle=solid}
...
final CountDownLatch countDownLatch = new CountDownLatch(1);
SparkAppListener sparkAppListener = new SparkAppListener(countDownLatch);
SparkAppHandle appHandle = sparkLauncher.startApplication(sparkAppListener);
Thread sparkAppListenerThread = new Thread(sparkAppListener);
sparkAppListenerThread.start();
long timeout = 120;
countDownLatch.await(timeout, TimeUnit.SECONDS);    
...

    private static class SparkAppListener implements SparkAppHandle.Listener, 
Runnable {
        private static final Log log = 
LogFactory.getLog(SparkAppListener.class);
        private final CountDownLatch countDownLatch;
        public SparkAppListener(CountDownLatch countDownLatch) {
            this.countDownLatch = countDownLatch;
        }
        @Override
        public void stateChanged(SparkAppHandle handle) {
            String sparkAppId = handle.getAppId();
            State appState = handle.getState();
            if (sparkAppId != null) {
                log.info("Spark job with app id: " + sparkAppId + ",\t State 
changed to: " + appState + " - "
                        + SPARK_STATE_MSG.get(appState));
            } else {
                log.info("Spark job's state changed to: " + appState + " - " + 
SPARK_STATE_MSG.get(appState));
            }
            if (appState != null && appState.isFinal()) {
                countDownLatch.countDown();
            }
        }
        @Override
        public void infoChanged(SparkAppHandle handle) {}
        @Override
        public void run() {}
    }
{code}

> SparkLauncer 2.0.1 version working incosistently in yarn-client mode
> --------------------------------------------------------------------
>
>                 Key: SPARK-18288
>                 URL: https://issues.apache.org/jira/browse/SPARK-18288
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Submit
>    Affects Versions: 2.0.1
>         Environment: I'm running Spark 2.0.1 version with Spark Launcher 
> 2.0.1 version on Yarn cluster. Deploy mode is yarn-client.
>            Reporter: Elkhan Dadashov
>
> I'm running Spark 2.0.1 version with Spark Launcher 2.0.1 version on Yarn 
> cluster. I launch map task which spawns Spark job via 
> SparkLauncher#startApplication().
> Deploy mode is yarn-client. 
> I'm running in Mac laptop.
> I have this snippet of code:
> {code:title=Bar.java|borderStyle=solid}
> SparkAppHandle appHandle = sparkLauncher.startApplication();
> while (appHandle.getState() == null || !appHandle.getState().isFinal()) {
>     if (appHandle.getState() != null) {
>         // If the line below is commented, then appState and appId cannot be 
> retrieved.
>         log.info("while: Spark job state is : " + appHandle.getState());
>         if (appHandle.getAppId() != null) {
>             log.info("\t App id: " + appHandle.getAppId() + "\tState: " + 
> appHandle.getState());
>         }
>     }
> }
> {code}
> The above snippet of code works fine, both spark job and the map task which 
> spawns that Spark job successfully completes.
> But if i comment out the red highlighted line, then the Spark job launches 
> and finishes successfully, but the map task hangs for a while (in Running 
> state) and then fails with the exception below.
> I run exact same code in exact same environment except that one line 
> commented out. 
> When the highlighted line is commented out, I even do not see the 2nd log 
> line in the stderr either, it seems appHandle hook never returns back 
> anything (neither app id nor app state), even though spark application 
> starts, runs and finishes successfully. Inside the same stderr, i can see 
> Spark job related logs, and spark job results printed, and application report 
> indicating status.
> You can see the exception below (this is from the stderr of the mapper 
> container which launches Spark job):
> ---
> INFO: Communication exception: java.net.ConnectException: Call From 
> <my-hostname>/10.3.8.118 to <my-hostname>:53567 failed on connection 
> exception: java.net.ConnectException: Connection refused;
> Caused by: java.net.ConnectException: Connection refused
>         at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>         at 
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
>         at 
> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
>         at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
>         at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495)
>         at 
> org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:614)
>         at 
> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:712)
>         at 
> org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:375)
>         at org.apache.hadoop.ipc.Client.getConnection(Client.java:1528)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1451)
>         ... 5 more
> ---
> Nov 05, 2016 2:41:54 AM org.apache.hadoop.ipc.Client handleConnectionFailure
> INFO: Retrying connect to server: <my-hostname>/10.3.8.118:53567. Already 
> tried 9 time(s); retry policy is 
> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
> MILLISECONDS)
> Nov 05, 2016 2:41:54 AM org.apache.hadoop.mapred.Task run
> INFO: Communication exception: java.net.ConnectException: Call From 
> <my-hostname>/10.3.8.118 to <my-hostname>:53567 failed on connection 
> exception: java.net.ConnectException: Connection refused; For more details 
> see:  http://wiki.apache.org/hadoop/ConnectionRefused
>         at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
>         at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>         at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>         at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>         at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792)
>         at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:732)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1479)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1412)
>         at 
> org.apache.hadoop.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:242)
>         at com.sun.proxy.$Proxy9.ping(Unknown Source)
>         at org.apache.hadoop.mapred.Task$TaskReporter.run(Task.java:767)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: java.net.ConnectException: Connection refused
>         at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>         at 
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
>         at 
> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
>         at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
>         at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495)
>         at 
> org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:614)
>         at 
> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:712)
>         at 
> org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:375)
>         at org.apache.hadoop.ipc.Client.getConnection(Client.java:1528)
>         at org.apache.hadoop.ipc.Client.call(Client.java:1451)
>         ... 5 more
> ---
> Nov 05, 2016 2:41:54 AM org.apache.hadoop.mapred.Task logThreadInfo
> INFO: Process Thread Dump: Communication exception
> 10 active threads
> Thread 24 (org.apache.hadoop.hdfs.PeerCache@4763c727):
>   State: TIMED_WAITING
>   Blocked count: 0
>   Waited count: 79
>   Stack:
>     java.lang.Thread.sleep(Native Method)
>     org.apache.hadoop.hdfs.PeerCache.run(PeerCache.java:255)
>     org.apache.hadoop.hdfs.PeerCache.access$000(PeerCache.java:46)
>     org.apache.hadoop.hdfs.PeerCache$1.run(PeerCache.java:124)
>     java.lang.Thread.run(Thread.java:745)
> 0 New
> Reply to all



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to