Sun Rui created SPARK-16522:
-------------------------------

             Summary: [MESOS] Spark application throws exception on exit
                 Key: SPARK-16522
                 URL: https://issues.apache.org/jira/browse/SPARK-16522
             Project: Spark
          Issue Type: Bug
          Components: Mesos
    Affects Versions: 2.0.0
            Reporter: Sun Rui


Spark applications running on Mesos throw exception upon exit as follows:
{panel}
16/07/13 15:20:46 WARN NettyRpcEndpointRef: Error sending message [message = 
RemoveExecutor(1,Executor finished with state FINISHED)] in 3 attempts
org.apache.spark.SparkException: Exception thrown in awaitResult
        at 
org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:77)
        at 
org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:75)
        at 
scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
        at 
org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
        at 
org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
        at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167)
        at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:83)
        at 
org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:102)
        at 
org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:78)
        at 
org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.removeExecutor(CoarseGrainedSchedulerBackend.scala:412)
        at 
org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.executorTerminated(MesosCoarseGrainedSchedulerBackend.scala:555)
        at 
org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.statusUpdate(MesosCoarseGrainedSchedulerBackend.scala:495)
Caused by: org.apache.spark.SparkException: Could not find 
CoarseGrainedScheduler.
        at 
org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:152)
        at 
org.apache.spark.rpc.netty.Dispatcher.postLocalMessage(Dispatcher.scala:127)
        at org.apache.spark.rpc.netty.NettyRpcEnv.ask(NettyRpcEnv.scala:225)
        at 
org.apache.spark.rpc.netty.NettyRpcEndpointRef.ask(NettyRpcEnv.scala:508)
        at 
org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:101)
        ... 4 more
Exception in thread "Thread-47" org.apache.spark.SparkException: Error 
notifying standalone scheduler's driver endpoint
        at 
org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.removeExecutor(CoarseGrainedSchedulerBackend.scala:415)
        at 
org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.executorTerminated(MesosCoarseGrainedSchedulerBackend.scala:555)
        at 
org.apache.spark.scheduler.cluster.mesos.MesosCoarseGrainedSchedulerBackend.statusUpdate(MesosCoarseGrainedSchedulerBackend.scala:495)
Caused by: org.apache.spark.SparkException: Error sending message [message = 
RemoveExecutor(1,Executor finished with state FINISHED)]
        at 
org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:119)
        at 
org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:78)
        at 
org.apache.spark.scheduler.cluster.CoarseGrainedSchedulerBackend.removeExecutor(CoarseGrainedSchedulerBackend.scala:412)
        ... 2 more
Caused by: org.apache.spark.SparkException: Exception thrown in awaitResult
        at 
org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:77)
        at 
org.apache.spark.rpc.RpcTimeout$$anonfun$1.applyOrElse(RpcTimeout.scala:75)
        at 
scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)
        at 
org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
        at 
org.apache.spark.rpc.RpcTimeout$$anonfun$addMessageIfTimeout$1.applyOrElse(RpcTimeout.scala:59)
        at scala.PartialFunction$OrElse.apply(PartialFunction.scala:167)
        at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:83)
        at 
org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:102)
        ... 4 more
Caused by: org.apache.spark.SparkException: Could not find 
CoarseGrainedScheduler.
        at 
org.apache.spark.rpc.netty.Dispatcher.postMessage(Dispatcher.scala:152)
        at 
org.apache.spark.rpc.netty.Dispatcher.postLocalMessage(Dispatcher.scala:127)
        at org.apache.spark.rpc.netty.NettyRpcEnv.ask(NettyRpcEnv.scala:225)
        at 
org.apache.spark.rpc.netty.NettyRpcEndpointRef.ask(NettyRpcEnv.scala:508)
        at 
org.apache.spark.rpc.RpcEndpointRef.askWithRetry(RpcEndpointRef.scala:101)
        ... 4 more
{panel}

Applications' result is not affected by this error.

This issue can be simply reproduced by launching a spark-shell, and exit after 
running the following commands:
{code}
val rdd = sc.parallelize(1 to 10, 10)
rdd.map { _ + 1} collect
{code}

The root cause is that in SparkContext.stop(), 
MesosCoarseGrainedSchedulerBackend.stop() calls 
CoarseGrainedSchedulerBackend.stop(). The latter sends messages to stop 
executors and also stop the driver endpoint without waiting for the actual stop 
of executors. MesosCoarseGrainedSchedulerBackend.stop() still waits for the 
executors to stop in a timeout. During the wait, 
MesosCoarseGrainedSchedulerBackend.statusUpdate() generally will be called to 
update executors' status, and in turn removeExecutor() is called. But at that 
time, the driver endpoint is not available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to