Re: OOM Error

2019-09-07 Thread Ankit Khettry
Sure folks, will try later today!

Best Regards
Ankit Khettry

On Sat, 7 Sep, 2019, 6:56 PM Sunil Kalra,  wrote:

> Ankit
>
> Can you try reducing number of cores or increasing memory. Because with
> below configuration your each core is getting ~3.5 GB. Otherwise your data
> is skewed, that one of cores is getting too much data based key.
>
> spark.executor.cores 6 spark.executor.memory 36g
>
> On Sat, Sep 7, 2019 at 6:35 AM Chris Teoh  wrote:
>
>> It says you have 3811 tasks in earlier stages and you're going down to
>> 2001 partitions, that would make it more memory intensive. I'm guessing the
>> default spark shuffle partition was 200 so that would have failed. Go for
>> higher number, maybe even higher than 3811. What was your shuffle write
>> from stage 7 and shuffle read from stage 8?
>>
>> On Sat, 7 Sep 2019, 7:57 pm Ankit Khettry, 
>> wrote:
>>
>>> Still unable to overcome the error. Attaching some screenshots for
>>> reference.
>>> Following are the configs used:
>>> spark.yarn.max.executor.failures 1000 spark.yarn.driver.memoryOverhead
>>> 6g spark.executor.cores 6 spark.executor.memory 36g
>>> spark.sql.shuffle.partitions 2001 spark.memory.offHeap.size 8g
>>> spark.memory.offHeap.enabled true spark.executor.instances 10
>>> spark.driver.memory 14g spark.yarn.executor.memoryOverhead 10g
>>>
>>> Best Regards
>>> Ankit Khettry
>>>
>>> On Sat, Sep 7, 2019 at 2:56 PM Chris Teoh  wrote:
>>>
 You can try, consider processing each partition separately if your data
 is heavily skewed when you partition it.

 On Sat, 7 Sep 2019, 7:19 pm Ankit Khettry, 
 wrote:

> Thanks Chris
>
> Going to try it soon by setting maybe spark.sql.shuffle.partitions to
> 2001. Also, I was wondering if it would help if I repartition the data by
> the fields I am using in group by and window operations?
>
> Best Regards
> Ankit Khettry
>
> On Sat, 7 Sep, 2019, 1:05 PM Chris Teoh,  wrote:
>
>> Hi Ankit,
>>
>> Without looking at the Spark UI and the stages/DAG, I'm guessing
>> you're running on default number of Spark shuffle partitions.
>>
>> If you're seeing a lot of shuffle spill, you likely have to increase
>> the number of shuffle partitions to accommodate the huge shuffle size.
>>
>> I hope that helps
>> Chris
>>
>> On Sat, 7 Sep 2019, 4:18 pm Ankit Khettry, 
>> wrote:
>>
>>> Nope, it's a batch job.
>>>
>>> Best Regards
>>> Ankit Khettry
>>>
>>> On Sat, 7 Sep, 2019, 6:52 AM Upasana Sharma, <
>>> 028upasana...@gmail.com> wrote:
>>>
 Is it a streaming job?

 On Sat, Sep 7, 2019, 5:04 AM Ankit Khettry 
 wrote:

> I have a Spark job that consists of a large number of Window
> operations and hence involves large shuffles. I have roughly 900 GiBs 
> of
> data, although I am using a large enough cluster (10 * m5.4xlarge
> instances). I am using the following configurations for the job, 
> although I
> have tried various other combinations without any success.
>
> spark.yarn.driver.memoryOverhead 6g
> spark.storage.memoryFraction 0.1
> spark.executor.cores 6
> spark.executor.memory 36g
> spark.memory.offHeap.size 8g
> spark.memory.offHeap.enabled true
> spark.executor.instances 10
> spark.driver.memory 14g
> spark.yarn.executor.memoryOverhead 10g
>
> I keep running into the following OOM error:
>
> org.apache.spark.memory.SparkOutOfMemoryError: Unable to acquire
> 16384 bytes of memory, got 0
> at
> org.apache.spark.memory.MemoryConsumer.throwOom(MemoryConsumer.java:157)
> at
> org.apache.spark.memory.MemoryConsumer.allocateArray(MemoryConsumer.java:98)
> at
> org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter.(UnsafeInMemorySorter.java:128)
> at
> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.(UnsafeExternalSorter.java:163)
>
> I see there are a large number of JIRAs in place for similar
> issues and a great many of them are even marked resolved.
> Can someone guide me as to how to approach this problem? I am
> using Databricks Spark 2.4.1.
>
> Best Regards
> Ankit Khettry
>



Re: OOM Error

2019-09-07 Thread Sunil Kalra
Ankit

Can you try reducing number of cores or increasing memory. Because with
below configuration your each core is getting ~3.5 GB. Otherwise your data
is skewed, that one of cores is getting too much data based key.

spark.executor.cores 6 spark.executor.memory 36g

On Sat, Sep 7, 2019 at 6:35 AM Chris Teoh  wrote:

> It says you have 3811 tasks in earlier stages and you're going down to
> 2001 partitions, that would make it more memory intensive. I'm guessing the
> default spark shuffle partition was 200 so that would have failed. Go for
> higher number, maybe even higher than 3811. What was your shuffle write
> from stage 7 and shuffle read from stage 8?
>
> On Sat, 7 Sep 2019, 7:57 pm Ankit Khettry, 
> wrote:
>
>> Still unable to overcome the error. Attaching some screenshots for
>> reference.
>> Following are the configs used:
>> spark.yarn.max.executor.failures 1000 spark.yarn.driver.memoryOverhead 6g
>> spark.executor.cores 6 spark.executor.memory 36g
>> spark.sql.shuffle.partitions 2001 spark.memory.offHeap.size 8g
>> spark.memory.offHeap.enabled true spark.executor.instances 10
>> spark.driver.memory 14g spark.yarn.executor.memoryOverhead 10g
>>
>> Best Regards
>> Ankit Khettry
>>
>> On Sat, Sep 7, 2019 at 2:56 PM Chris Teoh  wrote:
>>
>>> You can try, consider processing each partition separately if your data
>>> is heavily skewed when you partition it.
>>>
>>> On Sat, 7 Sep 2019, 7:19 pm Ankit Khettry, 
>>> wrote:
>>>
 Thanks Chris

 Going to try it soon by setting maybe spark.sql.shuffle.partitions to
 2001. Also, I was wondering if it would help if I repartition the data by
 the fields I am using in group by and window operations?

 Best Regards
 Ankit Khettry

 On Sat, 7 Sep, 2019, 1:05 PM Chris Teoh,  wrote:

> Hi Ankit,
>
> Without looking at the Spark UI and the stages/DAG, I'm guessing
> you're running on default number of Spark shuffle partitions.
>
> If you're seeing a lot of shuffle spill, you likely have to increase
> the number of shuffle partitions to accommodate the huge shuffle size.
>
> I hope that helps
> Chris
>
> On Sat, 7 Sep 2019, 4:18 pm Ankit Khettry, 
> wrote:
>
>> Nope, it's a batch job.
>>
>> Best Regards
>> Ankit Khettry
>>
>> On Sat, 7 Sep, 2019, 6:52 AM Upasana Sharma, <028upasana...@gmail.com>
>> wrote:
>>
>>> Is it a streaming job?
>>>
>>> On Sat, Sep 7, 2019, 5:04 AM Ankit Khettry 
>>> wrote:
>>>
 I have a Spark job that consists of a large number of Window
 operations and hence involves large shuffles. I have roughly 900 GiBs 
 of
 data, although I am using a large enough cluster (10 * m5.4xlarge
 instances). I am using the following configurations for the job, 
 although I
 have tried various other combinations without any success.

 spark.yarn.driver.memoryOverhead 6g
 spark.storage.memoryFraction 0.1
 spark.executor.cores 6
 spark.executor.memory 36g
 spark.memory.offHeap.size 8g
 spark.memory.offHeap.enabled true
 spark.executor.instances 10
 spark.driver.memory 14g
 spark.yarn.executor.memoryOverhead 10g

 I keep running into the following OOM error:

 org.apache.spark.memory.SparkOutOfMemoryError: Unable to acquire
 16384 bytes of memory, got 0
 at
 org.apache.spark.memory.MemoryConsumer.throwOom(MemoryConsumer.java:157)
 at
 org.apache.spark.memory.MemoryConsumer.allocateArray(MemoryConsumer.java:98)
 at
 org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter.(UnsafeInMemorySorter.java:128)
 at
 org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.(UnsafeExternalSorter.java:163)

 I see there are a large number of JIRAs in place for similar issues
 and a great many of them are even marked resolved.
 Can someone guide me as to how to approach this problem? I am using
 Databricks Spark 2.4.1.

 Best Regards
 Ankit Khettry

>>>


Re: OOM Error

2019-09-07 Thread Chris Teoh
It says you have 3811 tasks in earlier stages and you're going down to 2001
partitions, that would make it more memory intensive. I'm guessing the
default spark shuffle partition was 200 so that would have failed. Go for
higher number, maybe even higher than 3811. What was your shuffle write
from stage 7 and shuffle read from stage 8?

On Sat, 7 Sep 2019, 7:57 pm Ankit Khettry,  wrote:

> Still unable to overcome the error. Attaching some screenshots for
> reference.
> Following are the configs used:
> spark.yarn.max.executor.failures 1000 spark.yarn.driver.memoryOverhead 6g
> spark.executor.cores 6 spark.executor.memory 36g
> spark.sql.shuffle.partitions 2001 spark.memory.offHeap.size 8g
> spark.memory.offHeap.enabled true spark.executor.instances 10
> spark.driver.memory 14g spark.yarn.executor.memoryOverhead 10g
>
> Best Regards
> Ankit Khettry
>
> On Sat, Sep 7, 2019 at 2:56 PM Chris Teoh  wrote:
>
>> You can try, consider processing each partition separately if your data
>> is heavily skewed when you partition it.
>>
>> On Sat, 7 Sep 2019, 7:19 pm Ankit Khettry, 
>> wrote:
>>
>>> Thanks Chris
>>>
>>> Going to try it soon by setting maybe spark.sql.shuffle.partitions to
>>> 2001. Also, I was wondering if it would help if I repartition the data by
>>> the fields I am using in group by and window operations?
>>>
>>> Best Regards
>>> Ankit Khettry
>>>
>>> On Sat, 7 Sep, 2019, 1:05 PM Chris Teoh,  wrote:
>>>
 Hi Ankit,

 Without looking at the Spark UI and the stages/DAG, I'm guessing you're
 running on default number of Spark shuffle partitions.

 If you're seeing a lot of shuffle spill, you likely have to increase
 the number of shuffle partitions to accommodate the huge shuffle size.

 I hope that helps
 Chris

 On Sat, 7 Sep 2019, 4:18 pm Ankit Khettry, 
 wrote:

> Nope, it's a batch job.
>
> Best Regards
> Ankit Khettry
>
> On Sat, 7 Sep, 2019, 6:52 AM Upasana Sharma, <028upasana...@gmail.com>
> wrote:
>
>> Is it a streaming job?
>>
>> On Sat, Sep 7, 2019, 5:04 AM Ankit Khettry 
>> wrote:
>>
>>> I have a Spark job that consists of a large number of Window
>>> operations and hence involves large shuffles. I have roughly 900 GiBs of
>>> data, although I am using a large enough cluster (10 * m5.4xlarge
>>> instances). I am using the following configurations for the job, 
>>> although I
>>> have tried various other combinations without any success.
>>>
>>> spark.yarn.driver.memoryOverhead 6g
>>> spark.storage.memoryFraction 0.1
>>> spark.executor.cores 6
>>> spark.executor.memory 36g
>>> spark.memory.offHeap.size 8g
>>> spark.memory.offHeap.enabled true
>>> spark.executor.instances 10
>>> spark.driver.memory 14g
>>> spark.yarn.executor.memoryOverhead 10g
>>>
>>> I keep running into the following OOM error:
>>>
>>> org.apache.spark.memory.SparkOutOfMemoryError: Unable to acquire
>>> 16384 bytes of memory, got 0
>>> at
>>> org.apache.spark.memory.MemoryConsumer.throwOom(MemoryConsumer.java:157)
>>> at
>>> org.apache.spark.memory.MemoryConsumer.allocateArray(MemoryConsumer.java:98)
>>> at
>>> org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter.(UnsafeInMemorySorter.java:128)
>>> at
>>> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.(UnsafeExternalSorter.java:163)
>>>
>>> I see there are a large number of JIRAs in place for similar issues
>>> and a great many of them are even marked resolved.
>>> Can someone guide me as to how to approach this problem? I am using
>>> Databricks Spark 2.4.1.
>>>
>>> Best Regards
>>> Ankit Khettry
>>>
>>


Re: OOM Error

2019-09-07 Thread Chris Teoh
You can try, consider processing each partition separately if your data is
heavily skewed when you partition it.

On Sat, 7 Sep 2019, 7:19 pm Ankit Khettry,  wrote:

> Thanks Chris
>
> Going to try it soon by setting maybe spark.sql.shuffle.partitions to
> 2001. Also, I was wondering if it would help if I repartition the data by
> the fields I am using in group by and window operations?
>
> Best Regards
> Ankit Khettry
>
> On Sat, 7 Sep, 2019, 1:05 PM Chris Teoh,  wrote:
>
>> Hi Ankit,
>>
>> Without looking at the Spark UI and the stages/DAG, I'm guessing you're
>> running on default number of Spark shuffle partitions.
>>
>> If you're seeing a lot of shuffle spill, you likely have to increase the
>> number of shuffle partitions to accommodate the huge shuffle size.
>>
>> I hope that helps
>> Chris
>>
>> On Sat, 7 Sep 2019, 4:18 pm Ankit Khettry, 
>> wrote:
>>
>>> Nope, it's a batch job.
>>>
>>> Best Regards
>>> Ankit Khettry
>>>
>>> On Sat, 7 Sep, 2019, 6:52 AM Upasana Sharma, <028upasana...@gmail.com>
>>> wrote:
>>>
 Is it a streaming job?

 On Sat, Sep 7, 2019, 5:04 AM Ankit Khettry 
 wrote:

> I have a Spark job that consists of a large number of Window
> operations and hence involves large shuffles. I have roughly 900 GiBs of
> data, although I am using a large enough cluster (10 * m5.4xlarge
> instances). I am using the following configurations for the job, although 
> I
> have tried various other combinations without any success.
>
> spark.yarn.driver.memoryOverhead 6g
> spark.storage.memoryFraction 0.1
> spark.executor.cores 6
> spark.executor.memory 36g
> spark.memory.offHeap.size 8g
> spark.memory.offHeap.enabled true
> spark.executor.instances 10
> spark.driver.memory 14g
> spark.yarn.executor.memoryOverhead 10g
>
> I keep running into the following OOM error:
>
> org.apache.spark.memory.SparkOutOfMemoryError: Unable to acquire 16384
> bytes of memory, got 0
> at
> org.apache.spark.memory.MemoryConsumer.throwOom(MemoryConsumer.java:157)
> at
> org.apache.spark.memory.MemoryConsumer.allocateArray(MemoryConsumer.java:98)
> at
> org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter.(UnsafeInMemorySorter.java:128)
> at
> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.(UnsafeExternalSorter.java:163)
>
> I see there are a large number of JIRAs in place for similar issues
> and a great many of them are even marked resolved.
> Can someone guide me as to how to approach this problem? I am using
> Databricks Spark 2.4.1.
>
> Best Regards
> Ankit Khettry
>



Re: OOM Error

2019-09-07 Thread Ankit Khettry
Thanks Chris

Going to try it soon by setting maybe spark.sql.shuffle.partitions to 2001.
Also, I was wondering if it would help if I repartition the data by the
fields I am using in group by and window operations?

Best Regards
Ankit Khettry

On Sat, 7 Sep, 2019, 1:05 PM Chris Teoh,  wrote:

> Hi Ankit,
>
> Without looking at the Spark UI and the stages/DAG, I'm guessing you're
> running on default number of Spark shuffle partitions.
>
> If you're seeing a lot of shuffle spill, you likely have to increase the
> number of shuffle partitions to accommodate the huge shuffle size.
>
> I hope that helps
> Chris
>
> On Sat, 7 Sep 2019, 4:18 pm Ankit Khettry, 
> wrote:
>
>> Nope, it's a batch job.
>>
>> Best Regards
>> Ankit Khettry
>>
>> On Sat, 7 Sep, 2019, 6:52 AM Upasana Sharma, <028upasana...@gmail.com>
>> wrote:
>>
>>> Is it a streaming job?
>>>
>>> On Sat, Sep 7, 2019, 5:04 AM Ankit Khettry 
>>> wrote:
>>>
 I have a Spark job that consists of a large number of Window operations
 and hence involves large shuffles. I have roughly 900 GiBs of data,
 although I am using a large enough cluster (10 * m5.4xlarge instances). I
 am using the following configurations for the job, although I have tried
 various other combinations without any success.

 spark.yarn.driver.memoryOverhead 6g
 spark.storage.memoryFraction 0.1
 spark.executor.cores 6
 spark.executor.memory 36g
 spark.memory.offHeap.size 8g
 spark.memory.offHeap.enabled true
 spark.executor.instances 10
 spark.driver.memory 14g
 spark.yarn.executor.memoryOverhead 10g

 I keep running into the following OOM error:

 org.apache.spark.memory.SparkOutOfMemoryError: Unable to acquire 16384
 bytes of memory, got 0
 at
 org.apache.spark.memory.MemoryConsumer.throwOom(MemoryConsumer.java:157)
 at
 org.apache.spark.memory.MemoryConsumer.allocateArray(MemoryConsumer.java:98)
 at
 org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter.(UnsafeInMemorySorter.java:128)
 at
 org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.(UnsafeExternalSorter.java:163)

 I see there are a large number of JIRAs in place for similar issues and
 a great many of them are even marked resolved.
 Can someone guide me as to how to approach this problem? I am using
 Databricks Spark 2.4.1.

 Best Regards
 Ankit Khettry

>>>


Re: OOM Error

2019-09-07 Thread Chris Teoh
Hi Ankit,

Without looking at the Spark UI and the stages/DAG, I'm guessing you're
running on default number of Spark shuffle partitions.

If you're seeing a lot of shuffle spill, you likely have to increase the
number of shuffle partitions to accommodate the huge shuffle size.

I hope that helps
Chris

On Sat, 7 Sep 2019, 4:18 pm Ankit Khettry,  wrote:

> Nope, it's a batch job.
>
> Best Regards
> Ankit Khettry
>
> On Sat, 7 Sep, 2019, 6:52 AM Upasana Sharma, <028upasana...@gmail.com>
> wrote:
>
>> Is it a streaming job?
>>
>> On Sat, Sep 7, 2019, 5:04 AM Ankit Khettry 
>> wrote:
>>
>>> I have a Spark job that consists of a large number of Window operations
>>> and hence involves large shuffles. I have roughly 900 GiBs of data,
>>> although I am using a large enough cluster (10 * m5.4xlarge instances). I
>>> am using the following configurations for the job, although I have tried
>>> various other combinations without any success.
>>>
>>> spark.yarn.driver.memoryOverhead 6g
>>> spark.storage.memoryFraction 0.1
>>> spark.executor.cores 6
>>> spark.executor.memory 36g
>>> spark.memory.offHeap.size 8g
>>> spark.memory.offHeap.enabled true
>>> spark.executor.instances 10
>>> spark.driver.memory 14g
>>> spark.yarn.executor.memoryOverhead 10g
>>>
>>> I keep running into the following OOM error:
>>>
>>> org.apache.spark.memory.SparkOutOfMemoryError: Unable to acquire 16384
>>> bytes of memory, got 0
>>> at
>>> org.apache.spark.memory.MemoryConsumer.throwOom(MemoryConsumer.java:157)
>>> at
>>> org.apache.spark.memory.MemoryConsumer.allocateArray(MemoryConsumer.java:98)
>>> at
>>> org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter.(UnsafeInMemorySorter.java:128)
>>> at
>>> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.(UnsafeExternalSorter.java:163)
>>>
>>> I see there are a large number of JIRAs in place for similar issues and
>>> a great many of them are even marked resolved.
>>> Can someone guide me as to how to approach this problem? I am using
>>> Databricks Spark 2.4.1.
>>>
>>> Best Regards
>>> Ankit Khettry
>>>
>>


Re: OOM Error

2019-09-07 Thread Ankit Khettry
Nope, it's a batch job.

Best Regards
Ankit Khettry

On Sat, 7 Sep, 2019, 6:52 AM Upasana Sharma, <028upasana...@gmail.com>
wrote:

> Is it a streaming job?
>
> On Sat, Sep 7, 2019, 5:04 AM Ankit Khettry 
> wrote:
>
>> I have a Spark job that consists of a large number of Window operations
>> and hence involves large shuffles. I have roughly 900 GiBs of data,
>> although I am using a large enough cluster (10 * m5.4xlarge instances). I
>> am using the following configurations for the job, although I have tried
>> various other combinations without any success.
>>
>> spark.yarn.driver.memoryOverhead 6g
>> spark.storage.memoryFraction 0.1
>> spark.executor.cores 6
>> spark.executor.memory 36g
>> spark.memory.offHeap.size 8g
>> spark.memory.offHeap.enabled true
>> spark.executor.instances 10
>> spark.driver.memory 14g
>> spark.yarn.executor.memoryOverhead 10g
>>
>> I keep running into the following OOM error:
>>
>> org.apache.spark.memory.SparkOutOfMemoryError: Unable to acquire 16384
>> bytes of memory, got 0
>> at
>> org.apache.spark.memory.MemoryConsumer.throwOom(MemoryConsumer.java:157)
>> at
>> org.apache.spark.memory.MemoryConsumer.allocateArray(MemoryConsumer.java:98)
>> at
>> org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter.(UnsafeInMemorySorter.java:128)
>> at
>> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.(UnsafeExternalSorter.java:163)
>>
>> I see there are a large number of JIRAs in place for similar issues and a
>> great many of them are even marked resolved.
>> Can someone guide me as to how to approach this problem? I am using
>> Databricks Spark 2.4.1.
>>
>> Best Regards
>> Ankit Khettry
>>
>


Re: OOM Error

2019-09-06 Thread Upasana Sharma
Is it a streaming job?

On Sat, Sep 7, 2019, 5:04 AM Ankit Khettry  wrote:

> I have a Spark job that consists of a large number of Window operations
> and hence involves large shuffles. I have roughly 900 GiBs of data,
> although I am using a large enough cluster (10 * m5.4xlarge instances). I
> am using the following configurations for the job, although I have tried
> various other combinations without any success.
>
> spark.yarn.driver.memoryOverhead 6g
> spark.storage.memoryFraction 0.1
> spark.executor.cores 6
> spark.executor.memory 36g
> spark.memory.offHeap.size 8g
> spark.memory.offHeap.enabled true
> spark.executor.instances 10
> spark.driver.memory 14g
> spark.yarn.executor.memoryOverhead 10g
>
> I keep running into the following OOM error:
>
> org.apache.spark.memory.SparkOutOfMemoryError: Unable to acquire 16384
> bytes of memory, got 0
> at org.apache.spark.memory.MemoryConsumer.throwOom(MemoryConsumer.java:157)
> at
> org.apache.spark.memory.MemoryConsumer.allocateArray(MemoryConsumer.java:98)
> at
> org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter.(UnsafeInMemorySorter.java:128)
> at
> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.(UnsafeExternalSorter.java:163)
>
> I see there are a large number of JIRAs in place for similar issues and a
> great many of them are even marked resolved.
> Can someone guide me as to how to approach this problem? I am using
> Databricks Spark 2.4.1.
>
> Best Regards
> Ankit Khettry
>


Re: OOM error with GMMs on 4GB dataset

2015-05-06 Thread Xiangrui Meng
Did you set `--driver-memory` with spark-submit? -Xiangrui

On Mon, May 4, 2015 at 5:16 PM, Vinay Muttineni vmuttin...@ebay.com wrote:
 Hi, I am training a GMM with 10 gaussians on a 4 GB dataset(720,000 * 760).
 The spark (1.3.1) job is allocated 120 executors with 6GB each and the
 driver also has 6GB.
 Spark Config Params:

 .set(spark.hadoop.validateOutputSpecs,
 false).set(spark.dynamicAllocation.enabled,
 false).set(spark.driver.maxResultSize,
 4g).set(spark.default.parallelism, 300).set(spark.serializer,
 org.apache.spark.serializer.KryoSerializer).set(spark.kryoserializer.buffer.mb,
 500).set(spark.akka.frameSize, 256).set(spark.akka.timeout, 300)

 However, at the aggregate step (Line 168)
 val sums = breezeData.aggregate(ExpectationSum.zero(k, d))(compute.value, _
 += _)

 I get OOM error and the application hangs indefinitely. Is this an issue or
 am I missing something?
 java.lang.OutOfMemoryError: Java heap space
 at akka.util.CompactByteString$.apply(ByteString.scala:410)
 at akka.util.ByteString$.apply(ByteString.scala:22)
 at
 akka.remote.transport.netty.TcpHandlers$class.onMessage(TcpSupport.scala:45)
 at
 akka.remote.transport.netty.TcpServerHandler.onMessage(TcpSupport.scala:57)
 at
 akka.remote.transport.netty.NettyServerHelpers$class.messageReceived(NettyHelpers.scala:43)
 at
 akka.remote.transport.netty.ServerHandler.messageReceived(NettyTransport.scala:180)
 at
 org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
 at
 org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:462)
 at
 org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:443)
 at
 org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:310)
 at
 org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)
 at
 org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)
 at
 org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
 at
 org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
 at
 org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318)
 at
 org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
 at
 org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
 at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)

 15/05/04 16:23:38 ERROR util.Utils: Uncaught exception in thread
 task-result-getter-2
 java.lang.OutOfMemoryError: Java heap space
 Exception in thread task-result-getter-2 java.lang.OutOfMemoryError: Java
 heap space
 15/05/04 16:23:45 INFO scheduler.TaskSetManager: Finished task 1070.0 in
 stage 6.0 (TID 8276) in 382069 ms on [] (160/3600)
 15/05/04 16:23:54 WARN channel.DefaultChannelPipeline: An exception was
 thrown by a user handler while handling an exception event ([id: 0xc57da871,
 ] EXCEPTION: java.lang.OutOfMemoryError: Java heap space)
 java.lang.OutOfMemoryError: Java heap space
 15/05/04 16:23:55 WARN channel.DefaultChannelPipeline: An exception was
 thrown by a user handler while handling an exception event ([id: 0x3c3dbb0c,
 ] EXCEPTION: java.lang.OutOfMemoryError: Java heap space)
 15/05/04 16:24:45 ERROR actor.ActorSystemImpl: Uncaught fatal error from
 thread [sparkDriver-akka.remote.default-remote-dispatcher-6] shutting down
 ActorSystem [sparkDriver]



 Thanks!
 Vinay

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: OOM error

2015-02-17 Thread Harshvardhan Chauhan
Thanks for the pointer it led me to
http://spark.apache.org/docs/1.2.0/tuning.html increasing parallelism
resolved the issue.



On Mon, Feb 16, 2015 at 11:57 PM, Akhil Das ak...@sigmoidanalytics.com
wrote:

 Increase your executor memory, Also you can play around with increasing
 the number of partitions/parallelism etc.

 Thanks
 Best Regards

 On Tue, Feb 17, 2015 at 3:39 AM, Harshvardhan Chauhan ha...@gumgum.com
 wrote:

 Hi All,


 I need some help with Out Of Memory errors in my application. I am using
 Spark 1.1.0 and my application is using Java API. I am running my app on
 EC2  25 m3.xlarge (4 Cores 15GB Memory) instances. The app only fails
 sometimes. Lots of mapToPair tasks a failing.  My app is configured to run
 120 executors and executor memory is 2G.

 These are various errors i see the in my logs.

 15/02/16 10:53:48 INFO storage.MemoryStore: Block broadcast_1 of size 4680 
 dropped from memory (free 257277829)
 15/02/16 10:53:49 WARN channel.DefaultChannelPipeline: An exception was 
 thrown by a user handler while handling an exception event ([id: 0x6e0138a3, 
 /10.61.192.194:35196 = /10.164.164.228:49445] EXCEPTION: 
 java.lang.OutOfMemoryError: Java heap space)
 java.lang.OutOfMemoryError: Java heap space
  at java.nio.HeapByteBuffer.init(HeapByteBuffer.java:57)
  at java.nio.ByteBuffer.allocate(ByteBuffer.java:331)
  at 
 org.jboss.netty.buffer.CompositeChannelBuffer.toByteBuffer(CompositeChannelBuffer.java:649)
  at 
 org.jboss.netty.buffer.AbstractChannelBuffer.toByteBuffer(AbstractChannelBuffer.java:530)
  at 
 org.jboss.netty.channel.socket.nio.SocketSendBufferPool.acquire(SocketSendBufferPool.java:77)
  at 
 org.jboss.netty.channel.socket.nio.SocketSendBufferPool.acquire(SocketSendBufferPool.java:46)
  at 
 org.jboss.netty.channel.socket.nio.AbstractNioWorker.write0(AbstractNioWorker.java:194)
  at 
 org.jboss.netty.channel.socket.nio.AbstractNioWorker.writeFromTaskLoop(AbstractNioWorker.java:152)
  at 
 org.jboss.netty.channel.socket.nio.AbstractNioChannel$WriteTask.run(AbstractNioChannel.java:335)
  at 
 org.jboss.netty.channel.socket.nio.AbstractNioSelector.processTaskQueue(AbstractNioSelector.java:366)
  at 
 org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:290)
  at 
 org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90)
  at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
  at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  at java.lang.Thread.run(Thread.java:745)
 15/02/16 10:53:49 WARN channel.DefaultChannelPipeline: An exception was 
 thrown by a user handler while handling an exception event ([id: 0x2d0c1db1, 
 /10.169.226.254:55790 = /10.164.164.228:49445] EXCEPTION: 
 java.lang.OutOfMemoryError: Java heap space)
 java.lang.OutOfMemoryError: Java heap space
  at java.nio.HeapByteBuffer.init(HeapByteBuffer.java:57)
  at java.nio.ByteBuffer.allocate(ByteBuffer.java:331)
  at 
 org.jboss.netty.buffer.CompositeChannelBuffer.toByteBuffer(CompositeChannelBuffer.java:649)
  at 
 org.jboss.netty.buffer.AbstractChannelBuffer.toByteBuffer(AbstractChannelBuffer.java:530)
  at 
 org.jboss.netty.channel.socket.nio.SocketSendBufferPool.acquire(SocketSendBufferPool.java:77)
  at 
 org.jboss.netty.channel.socket.nio.SocketSendBufferPool.acquire(SocketSendBufferPool.java:46)
  at 
 org.jboss.netty.channel.socket.nio.AbstractNioWorker.write0(AbstractNioWorker.java:194)
  at 
 org.jboss.netty.channel.socket.nio.AbstractNioWorker.writeFromTaskLoop(AbstractNioWorker.java:152)
  at 
 org.jboss.netty.channel.socket.nio.AbstractNioChannel$WriteTask.run(AbstractNioChannel.java:335)
  at 
 org.jboss.netty.channel.socket.nio.AbstractNioSelector.processTaskQueue(AbstractNioSelector.java:366)
  at 
 org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:290)
  at 
 org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90)
  at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
  at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  at java.lang.Thread.run(Thread.java:745)
 15/02/16 10:53:50 WARN channel.DefaultChannelPipeline: An exception was 
 thrown by a user handler while handling an exception event ([id: 0xd4211985, 
 /10.181.125.52:60959 = /10.164.164.228:49445] EXCEPTION: 
 java.lang.OutOfMemoryError: Java heap space)
 java.lang.OutOfMemoryError: Java heap space
  at java.nio.HeapByteBuffer.init(HeapByteBuffer.java:57)
  at java.nio.ByteBuffer.allocate(ByteBuffer.java:331)
  at 
 

Re: OOM error

2015-02-16 Thread Akhil Das
Increase your executor memory, Also you can play around with increasing the
number of partitions/parallelism etc.

Thanks
Best Regards

On Tue, Feb 17, 2015 at 3:39 AM, Harshvardhan Chauhan ha...@gumgum.com
wrote:

 Hi All,


 I need some help with Out Of Memory errors in my application. I am using
 Spark 1.1.0 and my application is using Java API. I am running my app on
 EC2  25 m3.xlarge (4 Cores 15GB Memory) instances. The app only fails
 sometimes. Lots of mapToPair tasks a failing.  My app is configured to run
 120 executors and executor memory is 2G.

 These are various errors i see the in my logs.

 15/02/16 10:53:48 INFO storage.MemoryStore: Block broadcast_1 of size 4680 
 dropped from memory (free 257277829)
 15/02/16 10:53:49 WARN channel.DefaultChannelPipeline: An exception was 
 thrown by a user handler while handling an exception event ([id: 0x6e0138a3, 
 /10.61.192.194:35196 = /10.164.164.228:49445] EXCEPTION: 
 java.lang.OutOfMemoryError: Java heap space)
 java.lang.OutOfMemoryError: Java heap space
   at java.nio.HeapByteBuffer.init(HeapByteBuffer.java:57)
   at java.nio.ByteBuffer.allocate(ByteBuffer.java:331)
   at 
 org.jboss.netty.buffer.CompositeChannelBuffer.toByteBuffer(CompositeChannelBuffer.java:649)
   at 
 org.jboss.netty.buffer.AbstractChannelBuffer.toByteBuffer(AbstractChannelBuffer.java:530)
   at 
 org.jboss.netty.channel.socket.nio.SocketSendBufferPool.acquire(SocketSendBufferPool.java:77)
   at 
 org.jboss.netty.channel.socket.nio.SocketSendBufferPool.acquire(SocketSendBufferPool.java:46)
   at 
 org.jboss.netty.channel.socket.nio.AbstractNioWorker.write0(AbstractNioWorker.java:194)
   at 
 org.jboss.netty.channel.socket.nio.AbstractNioWorker.writeFromTaskLoop(AbstractNioWorker.java:152)
   at 
 org.jboss.netty.channel.socket.nio.AbstractNioChannel$WriteTask.run(AbstractNioChannel.java:335)
   at 
 org.jboss.netty.channel.socket.nio.AbstractNioSelector.processTaskQueue(AbstractNioSelector.java:366)
   at 
 org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:290)
   at 
 org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90)
   at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)
 15/02/16 10:53:49 WARN channel.DefaultChannelPipeline: An exception was 
 thrown by a user handler while handling an exception event ([id: 0x2d0c1db1, 
 /10.169.226.254:55790 = /10.164.164.228:49445] EXCEPTION: 
 java.lang.OutOfMemoryError: Java heap space)
 java.lang.OutOfMemoryError: Java heap space
   at java.nio.HeapByteBuffer.init(HeapByteBuffer.java:57)
   at java.nio.ByteBuffer.allocate(ByteBuffer.java:331)
   at 
 org.jboss.netty.buffer.CompositeChannelBuffer.toByteBuffer(CompositeChannelBuffer.java:649)
   at 
 org.jboss.netty.buffer.AbstractChannelBuffer.toByteBuffer(AbstractChannelBuffer.java:530)
   at 
 org.jboss.netty.channel.socket.nio.SocketSendBufferPool.acquire(SocketSendBufferPool.java:77)
   at 
 org.jboss.netty.channel.socket.nio.SocketSendBufferPool.acquire(SocketSendBufferPool.java:46)
   at 
 org.jboss.netty.channel.socket.nio.AbstractNioWorker.write0(AbstractNioWorker.java:194)
   at 
 org.jboss.netty.channel.socket.nio.AbstractNioWorker.writeFromTaskLoop(AbstractNioWorker.java:152)
   at 
 org.jboss.netty.channel.socket.nio.AbstractNioChannel$WriteTask.run(AbstractNioChannel.java:335)
   at 
 org.jboss.netty.channel.socket.nio.AbstractNioSelector.processTaskQueue(AbstractNioSelector.java:366)
   at 
 org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:290)
   at 
 org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90)
   at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)
 15/02/16 10:53:50 WARN channel.DefaultChannelPipeline: An exception was 
 thrown by a user handler while handling an exception event ([id: 0xd4211985, 
 /10.181.125.52:60959 = /10.164.164.228:49445] EXCEPTION: 
 java.lang.OutOfMemoryError: Java heap space)
 java.lang.OutOfMemoryError: Java heap space
   at java.nio.HeapByteBuffer.init(HeapByteBuffer.java:57)
   at java.nio.ByteBuffer.allocate(ByteBuffer.java:331)
   at 
 org.jboss.netty.buffer.CompositeChannelBuffer.toByteBuffer(CompositeChannelBuffer.java:649)
   at 
 org.jboss.netty.buffer.AbstractChannelBuffer.toByteBuffer(AbstractChannelBuffer.java:530)
   at