Mesos killing Spark Driver

2014-11-27 Thread Gerard Maas
Hi,

We are currently running our Spark + Spark Streaming jobs on Mesos,
submitting our jobs through Marathon.
We see with some regularity that the Spark Streaming driver gets killed by
Mesos and then restarted on some other node by Marathon.

I've no clue why Mesos is killing the driver and looking at both the Mesos
and Spark logs didn't make me any wiser.

On the Spark Streaming driver logs, I find this entry of Mesos "signing
off" my driver:

Shutting down
Sending SIGTERM to process tree at pid 17845
Killing the following process trees:
[
-+- 17845 sh -c sh ./run-mesos.sh application-ts.conf
 \-+- 17846 sh ./run-mesos.sh application-ts.conf
   \--- 17847 java -cp core-compute-job.jar
-Dconfig.file=application-ts.conf com.compute.job.FooJob 31326
]
Command terminated with signal Terminated (pid: 17845)


Have anybody seen something similar? Any hints on where to start digging?

-kr, Gerard.


Re: Mesos killing Spark Driver

2014-11-28 Thread Gerard Maas
[Ping]
Any hints?

On Thu, Nov 27, 2014 at 3:38 PM, Gerard Maas  wrote:

> Hi,
>
> We are currently running our Spark + Spark Streaming jobs on Mesos,
> submitting our jobs through Marathon.
> We see with some regularity that the Spark Streaming driver gets killed by
> Mesos and then restarted on some other node by Marathon.
>
> I've no clue why Mesos is killing the driver and looking at both the Mesos
> and Spark logs didn't make me any wiser.
>
> On the Spark Streaming driver logs, I find this entry of Mesos "signing
> off" my driver:
>
> Shutting down
> Sending SIGTERM to process tree at pid 17845
> Killing the following process trees:
> [
> -+- 17845 sh -c sh ./run-mesos.sh application-ts.conf
>  \-+- 17846 sh ./run-mesos.sh application-ts.conf
>\--- 17847 java -cp core-compute-job.jar
> -Dconfig.file=application-ts.conf com.compute.job.FooJob 31326
> ]
> Command terminated with signal Terminated (pid: 17845)
>
>
> Have anybody seen something similar? Any hints on where to start digging?
>
> -kr, Gerard.
>
>