Re: Some spark apps fail with "All masters are unresponsive", while others pass normally

2015-11-09 Thread Akhil Das
Is that all you have in the executor logs? I suspect some of those jobs are having a hard time managing the memory. Thanks Best Regards On Sun, Nov 1, 2015 at 9:38 PM, Romi Kuntsman wrote: > [adding dev list since it's probably a bug, but i'm not sure how to > reproduce so I

Re: Some spark apps fail with "All masters are unresponsive", while others pass normally

2015-11-09 Thread Romi Kuntsman
If they have a problem managing memory, wouldn't there should be a OOM? Why does AppClient throw a NPE? *Romi Kuntsman*, *Big Data Engineer* http://www.totango.com On Mon, Nov 9, 2015 at 4:59 PM, Akhil Das wrote: > Is that all you have in the executor logs? I

Re: Some spark apps fail with "All masters are unresponsive", while others pass normally

2015-11-09 Thread Akhil Das
Did you find anything regarding the OOM in the executor logs? Thanks Best Regards On Mon, Nov 9, 2015 at 8:44 PM, Romi Kuntsman wrote: > If they have a problem managing memory, wouldn't there should be a OOM? > Why does AppClient throw a NPE? > > *Romi Kuntsman*, *Big Data

Re: Some spark apps fail with "All masters are unresponsive", while others pass normally

2015-11-09 Thread Romi Kuntsman
I didn't see anything about a OOM. This happens sometimes before anything in the application happened, and happens to a few applications at the same time - so I guess it's a communication failure, but the problem is that the error shown doesn't represent the actual problem (which may be a network

Re: Some spark apps fail with "All masters are unresponsive", while others pass normally

2015-11-09 Thread Tim Preece
-masters-are-unresponsive-while-others-pass-normally-tp14858p15096.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e

Some spark apps fail with "All masters are unresponsive", while others pass normally

2015-11-01 Thread Romi Kuntsman
[adding dev list since it's probably a bug, but i'm not sure how to reproduce so I can open a bug about it] Hi, I have a standalone Spark 1.4.0 cluster with 100s of applications running every day. >From time to time, the applications crash with the following error (see below) But at the same