Re: BlockManager crashing applications

2016-05-08 Thread Ashish Dubey
   1. Caused by: java.io.IOException: Failed to connect to
   ip-10-12-46-235.us-west-2.compute.internal/10.12.46.235:55681
   2. at
   
org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:216)
   3. at
   
org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:167)
   4. at
   
org.apache.spark.network.netty.NettyBlockTransferService$$anon$1.createAndStart(NettyBlockTransferService.scala:90)
   5. at
   
org.apache.spark.network.shuffle.RetryingBlockFetcher.fetchAllOutstanding(RetryingBlockFetcher.java:140)
   6. at
   
org.apache.spark.network.shuffle.RetryingBlockFetcher.access$200(RetryingBlockFetcher.java:43)
   7. at
   
org.apache.spark.network.shuffle.RetryingBlockFetcher$1.run(RetryingBlockFetcher.java:170)


Above message indicates that there used to be a executor on that address
and by the time other executor was about to read - it did not exist. You
may also be able to confirm ( if this is the case )by looking at spark App
ui - you may find dead executors..

On Sun, May 8, 2016 at 6:02 PM, Brandon White 
wrote:

> I'm not quite sure how this is a memory problem. There are no OOM
> exceptions and the job only breaks when actions are ran in parallel,
> submitted to the scheduler by different threads.
>
> The issue is that the doGetRemote function does not retry when it is
> denied access to a cache block.
> On May 8, 2016 5:55 PM, "Ashish Dubey"  wrote:
>
> Brandon,
>
> how much memory are you giving to your executors - did you check if there
> were dead executors in your application logs.. Most likely you require
> higher memory for executors..
>
> Ashish
>
> On Sun, May 8, 2016 at 1:01 PM, Brandon White 
> wrote:
>
>> Hello all,
>>
>> I am running a Spark application which schedules multiple Spark jobs.
>> Something like:
>>
>> val df  = sqlContext.read.parquet("/path/to/file")
>>
>> filterExpressions.par.foreach { expression =>
>>   df.filter(expression).count()
>> }
>>
>> When the block manager fails to fetch a block, it throws an exception
>> which eventually kills the exception: http://pastebin.com/2ggwv68P
>>
>> This code works when I run it on one thread with:
>>
>> filterExpressions.foreach { expression =>
>>   df.filter(expression).count()
>> }
>>
>> But I really need the parallel execution of the jobs. Is there anyway
>> around this? It seems like a bug in the BlockManagers doGetRemote function.
>> I have tried the HTTP Block Manager as well.
>>
>
>


Re: BlockManager crashing applications

2016-05-08 Thread Brandon White
I'm not quite sure how this is a memory problem. There are no OOM
exceptions and the job only breaks when actions are ran in parallel,
submitted to the scheduler by different threads.

The issue is that the doGetRemote function does not retry when it is denied
access to a cache block.
On May 8, 2016 5:55 PM, "Ashish Dubey"  wrote:

Brandon,

how much memory are you giving to your executors - did you check if there
were dead executors in your application logs.. Most likely you require
higher memory for executors..

Ashish

On Sun, May 8, 2016 at 1:01 PM, Brandon White 
wrote:

> Hello all,
>
> I am running a Spark application which schedules multiple Spark jobs.
> Something like:
>
> val df  = sqlContext.read.parquet("/path/to/file")
>
> filterExpressions.par.foreach { expression =>
>   df.filter(expression).count()
> }
>
> When the block manager fails to fetch a block, it throws an exception
> which eventually kills the exception: http://pastebin.com/2ggwv68P
>
> This code works when I run it on one thread with:
>
> filterExpressions.foreach { expression =>
>   df.filter(expression).count()
> }
>
> But I really need the parallel execution of the jobs. Is there anyway
> around this? It seems like a bug in the BlockManagers doGetRemote function.
> I have tried the HTTP Block Manager as well.
>


Re: BlockManager crashing applications

2016-05-08 Thread Ashish Dubey
Brandon,

how much memory are you giving to your executors - did you check if there
were dead executors in your application logs.. Most likely you require
higher memory for executors..

Ashish

On Sun, May 8, 2016 at 1:01 PM, Brandon White 
wrote:

> Hello all,
>
> I am running a Spark application which schedules multiple Spark jobs.
> Something like:
>
> val df  = sqlContext.read.parquet("/path/to/file")
>
> filterExpressions.par.foreach { expression =>
>   df.filter(expression).count()
> }
>
> When the block manager fails to fetch a block, it throws an exception
> which eventually kills the exception: http://pastebin.com/2ggwv68P
>
> This code works when I run it on one thread with:
>
> filterExpressions.foreach { expression =>
>   df.filter(expression).count()
> }
>
> But I really need the parallel execution of the jobs. Is there anyway
> around this? It seems like a bug in the BlockManagers doGetRemote function.
> I have tried the HTTP Block Manager as well.
>


BlockManager crashing applications

2016-05-08 Thread Brandon White
Hello all,

I am running a Spark application which schedules multiple Spark jobs.
Something like:

val df  = sqlContext.read.parquet("/path/to/file")

filterExpressions.par.foreach { expression =>
  df.filter(expression).count()
}

When the block manager fails to fetch a block, it throws an exception which
eventually kills the exception: http://pastebin.com/2ggwv68P

This code works when I run it on one thread with:

filterExpressions.foreach { expression =>
  df.filter(expression).count()
}

But I really need the parallel execution of the jobs. Is there anyway
around this? It seems like a bug in the BlockManagers doGetRemote function.
I have tried the HTTP Block Manager as well.