Re: Driver Memory taken up by BlockManager

2018-12-14 Thread Davide.Mandrini
Hello,

I am facing a similar issue, have you found a solution for that issue?

Cheers,
Davide



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Programmatically get status of job (WAITING/RUNNING)

2017-11-08 Thread Davide.Mandrini
In this case, the only way to check the status is via REST calls to the Spark
json API, accessible at http://:/json/



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Programmatically get status of job (WAITING/RUNNING)

2017-11-07 Thread Davide.Mandrini
Hello Behroz,

you can use a SparkListener to get updates from the underlying process (c.f.
https://spark.apache.org/docs/2.2.0/api/java/org/apache/spark/scheduler/SparkListener.html
)

You need first to create your own SparkAppListener class that extends it:
-
private static class SparkAppListener implements SparkAppHandle.Listener,
Runnable {

SparkAppListener() {}

@Override
public void stateChanged(SparkAppHandle handle) {
String sparkAppId = handle.getAppId();
SparkAppHandle.State appState = handle.getState();
log.info("Spark job with app id: " + sparkAppId + ", State
changed to: " + appState);
}

@Override
public void infoChanged(SparkAppHandle handle) {}

@Override
public void run() {}
}
-



Then you can run it in a thread via a Executors.newCachedThreadPool (or with
a simple New Thread())
-
private final static ExecutorService listenerService =
Executors.newCachedThreadPool();

SparkAppListener appListener = new SparkAppListener();
listenerService.execute(appListener);

SparkLauncher command = new SparkLauncher()
.setAppName(appName)
.setSparkHome(sparkHome)
.setAppResource(appResource)
.setMainClass(mainClass)
.setMaster(master) .

SparkAppHandle appHandle = launcher.startApplication(appListener);
-

At this point, every time the state changes, you will execute the
SparkAppListener.stateChanged method.

Hope it helps,
Davide



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



[Spark Streaming] - Stopped worker throws FileNotFoundException

2017-09-10 Thread Davide.Mandrini
I am running a spark streaming application on a cluster composed by three
nodes, each one with a worker and three executors (so a total of 9
executors). I am using the spark standalone mode (version 2.1.1).

The application is run with a spark-submit command with option
"--deploy-mode" client and "--conf
spark.streaming.stopGracefullyOnShutdown=true". The submit command is run
from one of the nodes, let's call it node 1.

As a fault tolerance test I am stopping the worker on node 2 by calling the
script "stop-slave.sh".

In executor logs on node 2 I can see several errors related to a
FileNotFoundException during a shuffle operation:



I can see 4 errors of this kind on the same task in each of the 3 executors
on node 2.

In driver logs I can see:



This is taking down the application, as expected: the executor reached the
spark.task.maxFailures on a single task and the application is then stopped.

I ran different tests and all of them but one ended with the app stopped. My
idea is that the behaviour can vary depending on the precise step in the
stream process I ask the worker to stop. In any case, all other tests failed
with the same error described above.

Increasing the parameter spark.task.maxFailures to 8 did not help either,
with the TaskSetManager signalling task failed 8 times instead of 4.

What if the worker is killed?


I also ran a different test: I killed the worker and 3 executors processes
on node 2 with the command "kill -9". And in this case, the streaming app
adapted to the remaining resources and kept working.

In driver log we can see the driver noticing the missing executors:



Then, we notice the a long long serie of the following errors:



This errors appears in the log until the killed worker is started again (as
said before, these errors do not cause the application to stop). 

Conclusion


Stopping a worker with the dedicated command has a unexpected behaviour: the
app should be able to cope with the missed worked, adapting to the remaining
resources and keep working (as it does in the case of kill).

What are your observations on this issue?

Thank you, Davide



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Spark standalone API...

2017-09-10 Thread Davide.Mandrini
Hello, 

you might get the information you are looking for from this hidden API: 

http://:/json/ 

Hope it helps, 
Davide



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



[Spark Streaming] - Stopped worker throws FileNotFoundException

2017-09-09 Thread Davide.Mandrini
I am running a spark streaming application on a cluster composed by three
nodes, each one with a worker and three executors (so a total of 9
executors). I am using the spark standalone mode (version 2.1.1).

The application is run with a spark-submit command with option "-deploy-mode
client" and "--conf spark.streaming.stopGracefullyOnShutdown=true". The
submit command is run from one of the nodes, let's call it node 1.

As a fault tolerance test I am stopping the worker on node 2 by calling the
script "stop-slave.sh".

In executor logs on node 2 I can see several errors related to a
FileNotFoundException during a shuffle operation:



I can see 4 errors of this kind on the same task in each of the 3 executors
on node 2.

In driver logs I can see:



This is taking down the application, as expected: the executor reached the
"spark.task.maxFailures" on a single task and the application is then
stopped.

I ran different tests and all of them but one ended with the app stopped. My
idea is that the behaviour can vary depending on the precise step in the
stream process I ask the worker to stop. In any case, all other tests failed
with the same error described above.

Increasing the parameter "spark.task.maxFailures" to 8 did not help either,
with the TaskSetManager signalling task failed 8 times instead of 4.

What if the worker is killed?


I also ran a different test: I killed the worker and 3 executors processes
on node 2 with the command kill -9. And in this case, the streaming app
adapted to the remaining resources and kept working.

In driver log we can see the driver noticing the missing executors:



Then, we notice the a long long serie of the following errors:



This errors appears in the log until the killed worker is started again.

Conclusion


Stopping a worker with the dedicated command has a unexpected behaviour: the
app should be able to cope with the missed worked, adapting to the remaining
resources and keep working (as it does in the case of kill).

What are your observations on this issue?

Thank you, Davide



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



[Spark Streaming] - Stopped worker throws FileNotFoundException

2017-09-09 Thread Davide.Mandrini
I am running a spark streaming application on a cluster composed by three
nodes, each one with a worker and three executors (so a total of 9
executors). I am using the spark standalone mode (version 2.1.1).

The application is run with a spark-submit command with option "-deploy-mode
client" and "--conf spark.streaming.stopGracefullyOnShutdown=true". The
submit command is run from one of the nodes, let's call it node 1.

As a fault tolerance test I am stopping the worker on node 2 by calling the
script "stop-slave.sh".

In executor logs on node 2 I can see several errors related to a
FileNotFoundException during a shuffle operation:



I can see 4 errors of this kind on the same task in each of the 3 executors
on node 2.

In driver logs I can see:



This is taking down the application, as expected: the executor reached the
"spark.task.maxFailures" on a single task and the application is then
stopped.

I ran different tests and all of them but one ended with the app stopped. My
idea is that the behaviour can vary depending on the precise step in the
stream process I ask the worker to stop. In any case, all other tests failed
with the same error described above.

Increasing the parameter "spark.task.maxFailures" to 8 did not help either,
with the TaskSetManager signalling task failed 8 times instead of 4.

What if the worker is killed?


I also ran a different test: I killed the worker and 3 executors processes
on node 2 with the command kill -9. And in this case, the streaming app
adapted to the remaining resources and kept working.

In driver log we can see the driver noticing the missing executors:



Then, we notice the a long long serie of the following errors:



This errors appears in the log until the killed worker is started again.

Conclusion


Stopping a worker with the dedicated command has a unexpected behaviour: the
app should be able to cope with the missed worked, adapting to the remaining
resources and keep working (as it does in the case of kill).

What are your observations on this issue?

Thank you, Davide



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Spark standalone API...

2017-09-09 Thread Davide.Mandrini
Hello, 

you might get the information you are looking for from this hidden API: 

http://:/json/ 

Hope it helps, 
Davide



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Spark standalone API...

2017-09-09 Thread Davide.Mandrini
Hello, 

you might get the information you are looking for from this hidden API: 

http://:/json/ 

Hope it helps, 
Davide 



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Spark standalone API...

2017-09-09 Thread Davide.Mandrini
Hello, 

you might get the information you are looking for from this hidden API: 

http://:/json/ 

Hope it helps, 
Davide 



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Spark standalone API...

2017-09-09 Thread Davide.Mandrini
Hello, 

you might get the information you are looking for from this hidden API: 

http://:/json/ 

Hope it helps, 
Davide 



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Spark standalone API...

2017-09-09 Thread Davide.Mandrini
Hello,

you might get the information you are looking for from this hidden API:

http://:/json/

Hope it helps,
Davide




--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Spark Streaming - Stopped worker throws FileNotFoundException

2017-09-09 Thread Davide.Mandrini
I am running a spark streaming application on a cluster composed by three
nodes, each one with a worker and three executors (so a total of 9
executors). I am using the spark standalone mode (version 2.1.1).

The application is run with a spark-submit command with option "-deploy-mode
client" and "--conf spark.streaming.stopGracefullyOnShutdown=true". The
submit command is run from one of the nodes, let's call it node 1.

As a fault tolerance test I am stopping the worker on node 2 by calling the
script "stop-slave.sh".

In executor logs on node 2 I can see several errors related to a
FileNotFoundException during a shuffle operation:



I can see 4 errors of this kind on the same task in each of the 3 executors
on node 2.

In driver logs I can see:



This is taking down the application, as expected: the executor reached the
"spark.task.maxFailures" on a single task and the application is then
stopped.

I ran different tests and all of them but one ended with the app stopped. My
idea is that the behaviour can vary depending on the precise step in the
stream process I ask the worker to stop. In any case, all other tests failed
with the same error described above.

Increasing the parameter "spark.task.maxFailures" to 8 did not help either,
with the TaskSetManager signalling task failed 8 times instead of 4.

What if the worker is killed?


I also ran a different test: I killed the worker and 3 executors processes
on node 2 with the command kill -9. And in this case, the streaming app
adapted to the remaining resources and kept working.

In driver log we can see the driver noticing the missing executors:



Then, we notice the a long long serie of the following errors:



This errors appears in the log until the killed worker is started again.

Conclusion


Stopping a worker with the dedicated command has a unexpected behaviour: the
app should be able to cope with the missed worked, adapting to the remaining
resources and keep working (as it does in the case of kill).

What are your observations on this issue?

Thank you, Davide



--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



[Spark Streaming] Application is stopped after stopping a worker

2017-08-28 Thread Davide.Mandrini
I am running a spark streaming application on a cluster composed by three
nodes, each one with a worker and three executors (so a total of 9
executors). I am using the spark standalone mode.

The application is run with a spark-submit command with option --deploy-mode
client. The submit command is run from one of the nodes, let's call it node
1.

As a fault tolerance test I am stopping the worker on node 2 with the
command sudo service spark-worker stop.

In logs i can see that the Master keeps trying to run executors on the
shutting down worker (I can see thousands of tries, all with status FAILED,
for few seconds), and then the whole application is terminated by spark.

I tried to get more information about how spark handle worker failures but I
was not able to find any useful answer.

In spark source code I can see that the worker call for a driver kill when
we stop the worker: method onStop here
https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala
This might explain why the whole application is stopped eventually.

Is this the expected behavior in case of a worker explicitly stopped?

Is this a case of worker failure or it has to be considered differently (I
am explicitly shutting down the node here)?

Would it be the same behavior if the worker process was killed (and not
explicitly stopped)?

Thank you 
Davide



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-Application-is-stopped-after-stopping-a-worker-tp29111.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org