Re: Spark driver hangs on start of job

2015-07-02 Thread Sjoerd Mulder
Hi Richard, I have actually applied the following fix to our 1.4.0 version and this seem to resolve the zombies :) https://github.com/apache/spark/pull/7077/files Sjoerd 2015-06-26 20:08 GMT+02:00 Richard Marscher rmarsc...@localytics.com: Hi, we are on 1.3.1 right now so in case there are

Re: Spark driver hangs on start of job

2015-07-02 Thread Richard Marscher
Ah I see, glad that simple patch works for your problem. That seems to be a different underlying problem than we have been experiencing. In our case, the executors are failing properly, its just that none of the new ones will ever escape experiencing the same exact issue. So we start a death

Re: Spark driver hangs on start of job

2015-06-26 Thread Richard Marscher
We've seen this issue as well in production. We also aren't sure what causes it, but have just recently shaded some of the Spark code in TaskSchedulerImpl that we use to effectively bubble up an exception from Spark instead of zombie in this situation. If you are interested I can go into more

Re: Spark driver hangs on start of job

2015-06-26 Thread Richard Marscher
Hi, we are on 1.3.1 right now so in case there are differences in the Spark files I'll walk through the logic of what we did and post a couple gists at the end. We haven't committed to forking Spark for our own deployments yet, so right now we shadow some Spark classes in our application code

Spark driver hangs on start of job

2015-06-26 Thread Sjoerd Mulder
Hi, I have a really annoying issue that i cannot replicate consistently, still it happens every +- 100 submissions. (it's a job that's running every 3 minutes). Already reported an issue for this: https://issues.apache.org/jira/browse/SPARK-8592 Here are the Thread dump of the Driver and the