I am running a spark streaming application on a cluster composed by three
nodes, each one with a worker and three executors (so a total of 9
executors). I am using the spark standalone mode.

The application is run with a spark-submit command with option --deploy-mode
client. The submit command is run from one of the nodes, let's call it node
1.

As a fault tolerance test I am stopping the worker on node 2 with the
command sudo service spark-worker stop.

In logs i can see that the Master keeps trying to run executors on the
shutting down worker (I can see thousands of tries, all with status FAILED,
for few seconds), and then the whole application is terminated by spark.

I tried to get more information about how spark handle worker failures but I
was not able to find any useful answer.

In spark source code I can see that the worker call for a driver kill when
we stop the worker: method onStop here
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala
This might explain why the whole application is stopped eventually.

Is this the expected behavior in case of a worker explicitly stopped?

Is this a case of worker failure or it has to be considered differently (I
am explicitly shutting down the node here)?

Would it be the same behavior if the worker process was killed (and not
explicitly stopped)?

Thank you 
Davide



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-Application-is-stopped-after-stopping-a-worker-tp29111.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Reply via email to