Re: Driver Memory taken up by BlockManager
Hello, I am facing a similar issue, have you found a solution for that issue? Cheers, Davide -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Re: Programmatically get status of job (WAITING/RUNNING)
In this case, the only way to check the status is via REST calls to the Spark json API, accessible at http://:/json/ -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Re: Programmatically get status of job (WAITING/RUNNING)
Hello Behroz, you can use a SparkListener to get updates from the underlying process (c.f. https://spark.apache.org/docs/2.2.0/api/java/org/apache/spark/scheduler/SparkListener.html ) You need first to create your own SparkAppListener class that extends it: - private static class SparkAppListener implements SparkAppHandle.Listener, Runnable { SparkAppListener() {} @Override public void stateChanged(SparkAppHandle handle) { String sparkAppId = handle.getAppId(); SparkAppHandle.State appState = handle.getState(); log.info("Spark job with app id: " + sparkAppId + ", State changed to: " + appState); } @Override public void infoChanged(SparkAppHandle handle) {} @Override public void run() {} } - Then you can run it in a thread via a Executors.newCachedThreadPool (or with a simple New Thread()) - private final static ExecutorService listenerService = Executors.newCachedThreadPool(); SparkAppListener appListener = new SparkAppListener(); listenerService.execute(appListener); SparkLauncher command = new SparkLauncher() .setAppName(appName) .setSparkHome(sparkHome) .setAppResource(appResource) .setMainClass(mainClass) .setMaster(master) . SparkAppHandle appHandle = launcher.startApplication(appListener); - At this point, every time the state changes, you will execute the SparkAppListener.stateChanged method. Hope it helps, Davide -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
[Spark Streaming] - Stopped worker throws FileNotFoundException
I am running a spark streaming application on a cluster composed by three nodes, each one with a worker and three executors (so a total of 9 executors). I am using the spark standalone mode (version 2.1.1). The application is run with a spark-submit command with option "--deploy-mode" client and "--conf spark.streaming.stopGracefullyOnShutdown=true". The submit command is run from one of the nodes, let's call it node 1. As a fault tolerance test I am stopping the worker on node 2 by calling the script "stop-slave.sh". In executor logs on node 2 I can see several errors related to a FileNotFoundException during a shuffle operation: I can see 4 errors of this kind on the same task in each of the 3 executors on node 2. In driver logs I can see: This is taking down the application, as expected: the executor reached the spark.task.maxFailures on a single task and the application is then stopped. I ran different tests and all of them but one ended with the app stopped. My idea is that the behaviour can vary depending on the precise step in the stream process I ask the worker to stop. In any case, all other tests failed with the same error described above. Increasing the parameter spark.task.maxFailures to 8 did not help either, with the TaskSetManager signalling task failed 8 times instead of 4. What if the worker is killed? I also ran a different test: I killed the worker and 3 executors processes on node 2 with the command "kill -9". And in this case, the streaming app adapted to the remaining resources and kept working. In driver log we can see the driver noticing the missing executors: Then, we notice the a long long serie of the following errors: This errors appears in the log until the killed worker is started again (as said before, these errors do not cause the application to stop). Conclusion Stopping a worker with the dedicated command has a unexpected behaviour: the app should be able to cope with the missed worked, adapting to the remaining resources and keep working (as it does in the case of kill). What are your observations on this issue? Thank you, Davide -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Re: Spark standalone API...
Hello, you might get the information you are looking for from this hidden API: http://:/json/ Hope it helps, Davide -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
[Spark Streaming] - Stopped worker throws FileNotFoundException
I am running a spark streaming application on a cluster composed by three nodes, each one with a worker and three executors (so a total of 9 executors). I am using the spark standalone mode (version 2.1.1). The application is run with a spark-submit command with option "-deploy-mode client" and "--conf spark.streaming.stopGracefullyOnShutdown=true". The submit command is run from one of the nodes, let's call it node 1. As a fault tolerance test I am stopping the worker on node 2 by calling the script "stop-slave.sh". In executor logs on node 2 I can see several errors related to a FileNotFoundException during a shuffle operation: I can see 4 errors of this kind on the same task in each of the 3 executors on node 2. In driver logs I can see: This is taking down the application, as expected: the executor reached the "spark.task.maxFailures" on a single task and the application is then stopped. I ran different tests and all of them but one ended with the app stopped. My idea is that the behaviour can vary depending on the precise step in the stream process I ask the worker to stop. In any case, all other tests failed with the same error described above. Increasing the parameter "spark.task.maxFailures" to 8 did not help either, with the TaskSetManager signalling task failed 8 times instead of 4. What if the worker is killed? I also ran a different test: I killed the worker and 3 executors processes on node 2 with the command kill -9. And in this case, the streaming app adapted to the remaining resources and kept working. In driver log we can see the driver noticing the missing executors: Then, we notice the a long long serie of the following errors: This errors appears in the log until the killed worker is started again. Conclusion Stopping a worker with the dedicated command has a unexpected behaviour: the app should be able to cope with the missed worked, adapting to the remaining resources and keep working (as it does in the case of kill). What are your observations on this issue? Thank you, Davide -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
[Spark Streaming] - Stopped worker throws FileNotFoundException
I am running a spark streaming application on a cluster composed by three nodes, each one with a worker and three executors (so a total of 9 executors). I am using the spark standalone mode (version 2.1.1). The application is run with a spark-submit command with option "-deploy-mode client" and "--conf spark.streaming.stopGracefullyOnShutdown=true". The submit command is run from one of the nodes, let's call it node 1. As a fault tolerance test I am stopping the worker on node 2 by calling the script "stop-slave.sh". In executor logs on node 2 I can see several errors related to a FileNotFoundException during a shuffle operation: I can see 4 errors of this kind on the same task in each of the 3 executors on node 2. In driver logs I can see: This is taking down the application, as expected: the executor reached the "spark.task.maxFailures" on a single task and the application is then stopped. I ran different tests and all of them but one ended with the app stopped. My idea is that the behaviour can vary depending on the precise step in the stream process I ask the worker to stop. In any case, all other tests failed with the same error described above. Increasing the parameter "spark.task.maxFailures" to 8 did not help either, with the TaskSetManager signalling task failed 8 times instead of 4. What if the worker is killed? I also ran a different test: I killed the worker and 3 executors processes on node 2 with the command kill -9. And in this case, the streaming app adapted to the remaining resources and kept working. In driver log we can see the driver noticing the missing executors: Then, we notice the a long long serie of the following errors: This errors appears in the log until the killed worker is started again. Conclusion Stopping a worker with the dedicated command has a unexpected behaviour: the app should be able to cope with the missed worked, adapting to the remaining resources and keep working (as it does in the case of kill). What are your observations on this issue? Thank you, Davide -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Re: Spark standalone API...
Hello, you might get the information you are looking for from this hidden API: http://:/json/ Hope it helps, Davide -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Re: Spark standalone API...
Hello, you might get the information you are looking for from this hidden API: http://:/json/ Hope it helps, Davide -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Re: Spark standalone API...
Hello, you might get the information you are looking for from this hidden API: http://:/json/ Hope it helps, Davide -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Re: Spark standalone API...
Hello, you might get the information you are looking for from this hidden API: http://:/json/ Hope it helps, Davide -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Re: Spark standalone API...
Hello, you might get the information you are looking for from this hidden API: http://:/json/ Hope it helps, Davide -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Spark Streaming - Stopped worker throws FileNotFoundException
I am running a spark streaming application on a cluster composed by three nodes, each one with a worker and three executors (so a total of 9 executors). I am using the spark standalone mode (version 2.1.1). The application is run with a spark-submit command with option "-deploy-mode client" and "--conf spark.streaming.stopGracefullyOnShutdown=true". The submit command is run from one of the nodes, let's call it node 1. As a fault tolerance test I am stopping the worker on node 2 by calling the script "stop-slave.sh". In executor logs on node 2 I can see several errors related to a FileNotFoundException during a shuffle operation: I can see 4 errors of this kind on the same task in each of the 3 executors on node 2. In driver logs I can see: This is taking down the application, as expected: the executor reached the "spark.task.maxFailures" on a single task and the application is then stopped. I ran different tests and all of them but one ended with the app stopped. My idea is that the behaviour can vary depending on the precise step in the stream process I ask the worker to stop. In any case, all other tests failed with the same error described above. Increasing the parameter "spark.task.maxFailures" to 8 did not help either, with the TaskSetManager signalling task failed 8 times instead of 4. What if the worker is killed? I also ran a different test: I killed the worker and 3 executors processes on node 2 with the command kill -9. And in this case, the streaming app adapted to the remaining resources and kept working. In driver log we can see the driver noticing the missing executors: Then, we notice the a long long serie of the following errors: This errors appears in the log until the killed worker is started again. Conclusion Stopping a worker with the dedicated command has a unexpected behaviour: the app should be able to cope with the missed worked, adapting to the remaining resources and keep working (as it does in the case of kill). What are your observations on this issue? Thank you, Davide -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
[Spark Streaming] Application is stopped after stopping a worker
I am running a spark streaming application on a cluster composed by three nodes, each one with a worker and three executors (so a total of 9 executors). I am using the spark standalone mode. The application is run with a spark-submit command with option --deploy-mode client. The submit command is run from one of the nodes, let's call it node 1. As a fault tolerance test I am stopping the worker on node 2 with the command sudo service spark-worker stop. In logs i can see that the Master keeps trying to run executors on the shutting down worker (I can see thousands of tries, all with status FAILED, for few seconds), and then the whole application is terminated by spark. I tried to get more information about how spark handle worker failures but I was not able to find any useful answer. In spark source code I can see that the worker call for a driver kill when we stop the worker: method onStop here https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala This might explain why the whole application is stopped eventually. Is this the expected behavior in case of a worker explicitly stopped? Is this a case of worker failure or it has to be considered differently (I am explicitly shutting down the node here)? Would it be the same behavior if the worker process was killed (and not explicitly stopped)? Thank you Davide -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-Application-is-stopped-after-stopping-a-worker-tp29111.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe e-mail: user-unsubscr...@spark.apache.org