Re: BlockManager crashing applications

2016-05-08 Thread Ashish Dubey
1. Caused by: java.io.IOException: Failed to connect to ip-10-12-46-235.us-west-2.compute.internal/10.12.46.235:55681 2. at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:216) 3. at

Re: BlockManager crashing applications

2016-05-08 Thread Brandon White
I'm not quite sure how this is a memory problem. There are no OOM exceptions and the job only breaks when actions are ran in parallel, submitted to the scheduler by different threads. The issue is that the doGetRemote function does not retry when it is denied access to a cache block. On May 8,

Re: BlockManager crashing applications

2016-05-08 Thread Ashish Dubey
Brandon, how much memory are you giving to your executors - did you check if there were dead executors in your application logs.. Most likely you require higher memory for executors.. Ashish On Sun, May 8, 2016 at 1:01 PM, Brandon White wrote: > Hello all, > > I am

BlockManager crashing applications

2016-05-08 Thread Brandon White
Hello all, I am running a Spark application which schedules multiple Spark jobs. Something like: val df = sqlContext.read.parquet("/path/to/file") filterExpressions.par.foreach { expression => df.filter(expression).count() } When the block manager fails to fetch a block, it throws an