Re: Will it lead to OOM error?

2022-06-22 Thread Sid
Thanks all for your answers. Much appreciated.

On Thu, Jun 23, 2022 at 6:07 AM Yong Walt  wrote:

> We have many cases like this. it won't cause OOM.
>
> Thanks
>
> On Wed, Jun 22, 2022 at 8:28 PM Sid  wrote:
>
>> I have a 150TB CSV file.
>>
>> I have a total of 100 TB RAM and 100TB disk. So If I do something like
>> this
>>
>> spark.read.option("header","true").csv(filepath).show(false)
>>
>> Will it lead to an OOM error since it doesn't have enough memory? or it
>> will spill data onto the disk and process it?
>>
>> Thanks,
>> Sid
>>
>


Re: Will it lead to OOM error?

2022-06-22 Thread Yong Walt
We have many cases like this. it won't cause OOM.

Thanks

On Wed, Jun 22, 2022 at 8:28 PM Sid  wrote:

> I have a 150TB CSV file.
>
> I have a total of 100 TB RAM and 100TB disk. So If I do something like this
>
> spark.read.option("header","true").csv(filepath).show(false)
>
> Will it lead to an OOM error since it doesn't have enough memory? or it
> will spill data onto the disk and process it?
>
> Thanks,
> Sid
>


Re: Will it lead to OOM error?

2022-06-22 Thread Enrico Minack
Yes, a single file compressed with a non-splitable compression (e.g. 
gzip) would have to be read by a single executor. That takes forever.


You should consider to recompress the file with a splitable compression 
first. You will not want to read that file more than once, so you should 
uncompress it only once (in order to recompress).


Enrico


Am 22.06.22 um 20:17 schrieb Sid:

Hi Enrico,

Thanks for the insights.

Could you please help me to understand with one example of compressed 
files where the file wouldn't be split in partitions and will put load 
on a single partition and might lead to OOM error?


Thanks,
Sid

On Wed, Jun 22, 2022 at 6:40 PM Enrico Minack  
wrote:


The RAM and disk memory consumtion depends on what you do with the
data after reading them.

Your particular action will read 20 lines from the first partition
and show them. So it will not use any RAM or disk, no matter how
large the CSV is.

If you do a count instead of show, it will iterate over the each
partition and return a count per partition, so no RAM here needed
as well.

If you do some real processing of the data, the requirement RAM
and disk again depends on involved shuffles and intermediate
results that need to be store in RAM or on disk.

Enrico


Am 22.06.22 um 14:54 schrieb Deepak Sharma:

It will spill to disk if everything can’t be loaded in memory .


On Wed, 22 Jun 2022 at 5:58 PM, Sid  wrote:

I have a 150TB CSV file.

I have a total of 100 TB RAM and 100TB disk. So If I do
something like this

spark.read.option("header","true").csv(filepath).show(false)

    Will it lead to an OOM error since it doesn't have enough
memory? or it will spill data onto the disk and process it?

Thanks,
Sid

-- 
Thanks

Deepak
www.bigdatabig.com <http://www.bigdatabig.com>
www.keosha.net <http://www.keosha.net>





Re: Will it lead to OOM error?

2022-06-22 Thread Sid
Hi Enrico,

Thanks for the insights.

Could you please help me to understand with one example of compressed files
where the file wouldn't be split in partitions and will put load on a
single partition and might lead to OOM error?

Thanks,
Sid

On Wed, Jun 22, 2022 at 6:40 PM Enrico Minack 
wrote:

> The RAM and disk memory consumtion depends on what you do with the data
> after reading them.
>
> Your particular action will read 20 lines from the first partition and
> show them. So it will not use any RAM or disk, no matter how large the CSV
> is.
>
> If you do a count instead of show, it will iterate over the each partition
> and return a count per partition, so no RAM here needed as well.
>
> If you do some real processing of the data, the requirement RAM and disk
> again depends on involved shuffles and intermediate results that need to be
> store in RAM or on disk.
>
> Enrico
>
>
> Am 22.06.22 um 14:54 schrieb Deepak Sharma:
>
> It will spill to disk if everything can’t be loaded in memory .
>
>
> On Wed, 22 Jun 2022 at 5:58 PM, Sid  wrote:
>
>> I have a 150TB CSV file.
>>
>> I have a total of 100 TB RAM and 100TB disk. So If I do something like
>> this
>>
>> spark.read.option("header","true").csv(filepath).show(false)
>>
>> Will it lead to an OOM error since it doesn't have enough memory? or it
>> will spill data onto the disk and process it?
>>
>> Thanks,
>> Sid
>>
> --
> Thanks
> Deepak
> www.bigdatabig.com
> www.keosha.net
>
>
>


Re: Will it lead to OOM error?

2022-06-22 Thread Enrico Minack
The RAM and disk memory consumtion depends on what you do with the data 
after reading them.


Your particular action will read 20 lines from the first partition and 
show them. So it will not use any RAM or disk, no matter how large the 
CSV is.


If you do a count instead of show, it will iterate over the each 
partition and return a count per partition, so no RAM here needed as well.


If you do some real processing of the data, the requirement RAM and disk 
again depends on involved shuffles and intermediate results that need to 
be store in RAM or on disk.


Enrico


Am 22.06.22 um 14:54 schrieb Deepak Sharma:

It will spill to disk if everything can’t be loaded in memory .


On Wed, 22 Jun 2022 at 5:58 PM, Sid  wrote:

I have a 150TB CSV file.

I have a total of 100 TB RAM and 100TB disk. So If I do something
like this

spark.read.option("header","true").csv(filepath).show(false)

    Will it lead to an OOM error since it doesn't have enough memory?
or it will spill data onto the disk and process it?

Thanks,
Sid

--
Thanks
Deepak
www.bigdatabig.com <http://www.bigdatabig.com>
www.keosha.net <http://www.keosha.net>




Re: Will it lead to OOM error?

2022-06-22 Thread Deepak Sharma
It will spill to disk if everything can’t be loaded in memory .


On Wed, 22 Jun 2022 at 5:58 PM, Sid  wrote:

> I have a 150TB CSV file.
>
> I have a total of 100 TB RAM and 100TB disk. So If I do something like this
>
> spark.read.option("header","true").csv(filepath).show(false)
>
> Will it lead to an OOM error since it doesn't have enough memory? or it
> will spill data onto the disk and process it?
>
> Thanks,
> Sid
>
-- 
Thanks
Deepak
www.bigdatabig.com
www.keosha.net


Will it lead to OOM error?

2022-06-22 Thread Sid
I have a 150TB CSV file.

I have a total of 100 TB RAM and 100TB disk. So If I do something like this

spark.read.option("header","true").csv(filepath).show(false)

Will it lead to an OOM error since it doesn't have enough memory? or it
will spill data onto the disk and process it?

Thanks,
Sid


Re: OOM Error

2019-09-07 Thread Ankit Khettry
Sure folks, will try later today!

Best Regards
Ankit Khettry

On Sat, 7 Sep, 2019, 6:56 PM Sunil Kalra,  wrote:

> Ankit
>
> Can you try reducing number of cores or increasing memory. Because with
> below configuration your each core is getting ~3.5 GB. Otherwise your data
> is skewed, that one of cores is getting too much data based key.
>
> spark.executor.cores 6 spark.executor.memory 36g
>
> On Sat, Sep 7, 2019 at 6:35 AM Chris Teoh  wrote:
>
>> It says you have 3811 tasks in earlier stages and you're going down to
>> 2001 partitions, that would make it more memory intensive. I'm guessing the
>> default spark shuffle partition was 200 so that would have failed. Go for
>> higher number, maybe even higher than 3811. What was your shuffle write
>> from stage 7 and shuffle read from stage 8?
>>
>> On Sat, 7 Sep 2019, 7:57 pm Ankit Khettry, 
>> wrote:
>>
>>> Still unable to overcome the error. Attaching some screenshots for
>>> reference.
>>> Following are the configs used:
>>> spark.yarn.max.executor.failures 1000 spark.yarn.driver.memoryOverhead
>>> 6g spark.executor.cores 6 spark.executor.memory 36g
>>> spark.sql.shuffle.partitions 2001 spark.memory.offHeap.size 8g
>>> spark.memory.offHeap.enabled true spark.executor.instances 10
>>> spark.driver.memory 14g spark.yarn.executor.memoryOverhead 10g
>>>
>>> Best Regards
>>> Ankit Khettry
>>>
>>> On Sat, Sep 7, 2019 at 2:56 PM Chris Teoh  wrote:
>>>
>>>> You can try, consider processing each partition separately if your data
>>>> is heavily skewed when you partition it.
>>>>
>>>> On Sat, 7 Sep 2019, 7:19 pm Ankit Khettry, 
>>>> wrote:
>>>>
>>>>> Thanks Chris
>>>>>
>>>>> Going to try it soon by setting maybe spark.sql.shuffle.partitions to
>>>>> 2001. Also, I was wondering if it would help if I repartition the data by
>>>>> the fields I am using in group by and window operations?
>>>>>
>>>>> Best Regards
>>>>> Ankit Khettry
>>>>>
>>>>> On Sat, 7 Sep, 2019, 1:05 PM Chris Teoh,  wrote:
>>>>>
>>>>>> Hi Ankit,
>>>>>>
>>>>>> Without looking at the Spark UI and the stages/DAG, I'm guessing
>>>>>> you're running on default number of Spark shuffle partitions.
>>>>>>
>>>>>> If you're seeing a lot of shuffle spill, you likely have to increase
>>>>>> the number of shuffle partitions to accommodate the huge shuffle size.
>>>>>>
>>>>>> I hope that helps
>>>>>> Chris
>>>>>>
>>>>>> On Sat, 7 Sep 2019, 4:18 pm Ankit Khettry, 
>>>>>> wrote:
>>>>>>
>>>>>>> Nope, it's a batch job.
>>>>>>>
>>>>>>> Best Regards
>>>>>>> Ankit Khettry
>>>>>>>
>>>>>>> On Sat, 7 Sep, 2019, 6:52 AM Upasana Sharma, <
>>>>>>> 028upasana...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Is it a streaming job?
>>>>>>>>
>>>>>>>> On Sat, Sep 7, 2019, 5:04 AM Ankit Khettry 
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> I have a Spark job that consists of a large number of Window
>>>>>>>>> operations and hence involves large shuffles. I have roughly 900 GiBs 
>>>>>>>>> of
>>>>>>>>> data, although I am using a large enough cluster (10 * m5.4xlarge
>>>>>>>>> instances). I am using the following configurations for the job, 
>>>>>>>>> although I
>>>>>>>>> have tried various other combinations without any success.
>>>>>>>>>
>>>>>>>>> spark.yarn.driver.memoryOverhead 6g
>>>>>>>>> spark.storage.memoryFraction 0.1
>>>>>>>>> spark.executor.cores 6
>>>>>>>>> spark.executor.memory 36g
>>>>>>>>> spark.memory.offHeap.size 8g
>>>>>>>>> spark.memory.offHeap.enabled true
>>>>>>>>> spark.executor.instances 10
>>>>>>>>> spark.driver.memory 14g
>>>>>>>>> spark.yarn.executor.memoryOverhead 10g
>>>>>>>>>
>>>>>>>>> I keep running into the following OOM error:
>>>>>>>>>
>>>>>>>>> org.apache.spark.memory.SparkOutOfMemoryError: Unable to acquire
>>>>>>>>> 16384 bytes of memory, got 0
>>>>>>>>> at
>>>>>>>>> org.apache.spark.memory.MemoryConsumer.throwOom(MemoryConsumer.java:157)
>>>>>>>>> at
>>>>>>>>> org.apache.spark.memory.MemoryConsumer.allocateArray(MemoryConsumer.java:98)
>>>>>>>>> at
>>>>>>>>> org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter.(UnsafeInMemorySorter.java:128)
>>>>>>>>> at
>>>>>>>>> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.(UnsafeExternalSorter.java:163)
>>>>>>>>>
>>>>>>>>> I see there are a large number of JIRAs in place for similar
>>>>>>>>> issues and a great many of them are even marked resolved.
>>>>>>>>> Can someone guide me as to how to approach this problem? I am
>>>>>>>>> using Databricks Spark 2.4.1.
>>>>>>>>>
>>>>>>>>> Best Regards
>>>>>>>>> Ankit Khettry
>>>>>>>>>
>>>>>>>>


Re: OOM Error

2019-09-07 Thread Sunil Kalra
Ankit

Can you try reducing number of cores or increasing memory. Because with
below configuration your each core is getting ~3.5 GB. Otherwise your data
is skewed, that one of cores is getting too much data based key.

spark.executor.cores 6 spark.executor.memory 36g

On Sat, Sep 7, 2019 at 6:35 AM Chris Teoh  wrote:

> It says you have 3811 tasks in earlier stages and you're going down to
> 2001 partitions, that would make it more memory intensive. I'm guessing the
> default spark shuffle partition was 200 so that would have failed. Go for
> higher number, maybe even higher than 3811. What was your shuffle write
> from stage 7 and shuffle read from stage 8?
>
> On Sat, 7 Sep 2019, 7:57 pm Ankit Khettry, 
> wrote:
>
>> Still unable to overcome the error. Attaching some screenshots for
>> reference.
>> Following are the configs used:
>> spark.yarn.max.executor.failures 1000 spark.yarn.driver.memoryOverhead 6g
>> spark.executor.cores 6 spark.executor.memory 36g
>> spark.sql.shuffle.partitions 2001 spark.memory.offHeap.size 8g
>> spark.memory.offHeap.enabled true spark.executor.instances 10
>> spark.driver.memory 14g spark.yarn.executor.memoryOverhead 10g
>>
>> Best Regards
>> Ankit Khettry
>>
>> On Sat, Sep 7, 2019 at 2:56 PM Chris Teoh  wrote:
>>
>>> You can try, consider processing each partition separately if your data
>>> is heavily skewed when you partition it.
>>>
>>> On Sat, 7 Sep 2019, 7:19 pm Ankit Khettry, 
>>> wrote:
>>>
>>>> Thanks Chris
>>>>
>>>> Going to try it soon by setting maybe spark.sql.shuffle.partitions to
>>>> 2001. Also, I was wondering if it would help if I repartition the data by
>>>> the fields I am using in group by and window operations?
>>>>
>>>> Best Regards
>>>> Ankit Khettry
>>>>
>>>> On Sat, 7 Sep, 2019, 1:05 PM Chris Teoh,  wrote:
>>>>
>>>>> Hi Ankit,
>>>>>
>>>>> Without looking at the Spark UI and the stages/DAG, I'm guessing
>>>>> you're running on default number of Spark shuffle partitions.
>>>>>
>>>>> If you're seeing a lot of shuffle spill, you likely have to increase
>>>>> the number of shuffle partitions to accommodate the huge shuffle size.
>>>>>
>>>>> I hope that helps
>>>>> Chris
>>>>>
>>>>> On Sat, 7 Sep 2019, 4:18 pm Ankit Khettry, 
>>>>> wrote:
>>>>>
>>>>>> Nope, it's a batch job.
>>>>>>
>>>>>> Best Regards
>>>>>> Ankit Khettry
>>>>>>
>>>>>> On Sat, 7 Sep, 2019, 6:52 AM Upasana Sharma, <028upasana...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Is it a streaming job?
>>>>>>>
>>>>>>> On Sat, Sep 7, 2019, 5:04 AM Ankit Khettry 
>>>>>>> wrote:
>>>>>>>
>>>>>>>> I have a Spark job that consists of a large number of Window
>>>>>>>> operations and hence involves large shuffles. I have roughly 900 GiBs 
>>>>>>>> of
>>>>>>>> data, although I am using a large enough cluster (10 * m5.4xlarge
>>>>>>>> instances). I am using the following configurations for the job, 
>>>>>>>> although I
>>>>>>>> have tried various other combinations without any success.
>>>>>>>>
>>>>>>>> spark.yarn.driver.memoryOverhead 6g
>>>>>>>> spark.storage.memoryFraction 0.1
>>>>>>>> spark.executor.cores 6
>>>>>>>> spark.executor.memory 36g
>>>>>>>> spark.memory.offHeap.size 8g
>>>>>>>> spark.memory.offHeap.enabled true
>>>>>>>> spark.executor.instances 10
>>>>>>>> spark.driver.memory 14g
>>>>>>>> spark.yarn.executor.memoryOverhead 10g
>>>>>>>>
>>>>>>>> I keep running into the following OOM error:
>>>>>>>>
>>>>>>>> org.apache.spark.memory.SparkOutOfMemoryError: Unable to acquire
>>>>>>>> 16384 bytes of memory, got 0
>>>>>>>> at
>>>>>>>> org.apache.spark.memory.MemoryConsumer.throwOom(MemoryConsumer.java:157)
>>>>>>>> at
>>>>>>>> org.apache.spark.memory.MemoryConsumer.allocateArray(MemoryConsumer.java:98)
>>>>>>>> at
>>>>>>>> org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter.(UnsafeInMemorySorter.java:128)
>>>>>>>> at
>>>>>>>> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.(UnsafeExternalSorter.java:163)
>>>>>>>>
>>>>>>>> I see there are a large number of JIRAs in place for similar issues
>>>>>>>> and a great many of them are even marked resolved.
>>>>>>>> Can someone guide me as to how to approach this problem? I am using
>>>>>>>> Databricks Spark 2.4.1.
>>>>>>>>
>>>>>>>> Best Regards
>>>>>>>> Ankit Khettry
>>>>>>>>
>>>>>>>


Re: OOM Error

2019-09-07 Thread Chris Teoh
It says you have 3811 tasks in earlier stages and you're going down to 2001
partitions, that would make it more memory intensive. I'm guessing the
default spark shuffle partition was 200 so that would have failed. Go for
higher number, maybe even higher than 3811. What was your shuffle write
from stage 7 and shuffle read from stage 8?

On Sat, 7 Sep 2019, 7:57 pm Ankit Khettry,  wrote:

> Still unable to overcome the error. Attaching some screenshots for
> reference.
> Following are the configs used:
> spark.yarn.max.executor.failures 1000 spark.yarn.driver.memoryOverhead 6g
> spark.executor.cores 6 spark.executor.memory 36g
> spark.sql.shuffle.partitions 2001 spark.memory.offHeap.size 8g
> spark.memory.offHeap.enabled true spark.executor.instances 10
> spark.driver.memory 14g spark.yarn.executor.memoryOverhead 10g
>
> Best Regards
> Ankit Khettry
>
> On Sat, Sep 7, 2019 at 2:56 PM Chris Teoh  wrote:
>
>> You can try, consider processing each partition separately if your data
>> is heavily skewed when you partition it.
>>
>> On Sat, 7 Sep 2019, 7:19 pm Ankit Khettry, 
>> wrote:
>>
>>> Thanks Chris
>>>
>>> Going to try it soon by setting maybe spark.sql.shuffle.partitions to
>>> 2001. Also, I was wondering if it would help if I repartition the data by
>>> the fields I am using in group by and window operations?
>>>
>>> Best Regards
>>> Ankit Khettry
>>>
>>> On Sat, 7 Sep, 2019, 1:05 PM Chris Teoh,  wrote:
>>>
>>>> Hi Ankit,
>>>>
>>>> Without looking at the Spark UI and the stages/DAG, I'm guessing you're
>>>> running on default number of Spark shuffle partitions.
>>>>
>>>> If you're seeing a lot of shuffle spill, you likely have to increase
>>>> the number of shuffle partitions to accommodate the huge shuffle size.
>>>>
>>>> I hope that helps
>>>> Chris
>>>>
>>>> On Sat, 7 Sep 2019, 4:18 pm Ankit Khettry, 
>>>> wrote:
>>>>
>>>>> Nope, it's a batch job.
>>>>>
>>>>> Best Regards
>>>>> Ankit Khettry
>>>>>
>>>>> On Sat, 7 Sep, 2019, 6:52 AM Upasana Sharma, <028upasana...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Is it a streaming job?
>>>>>>
>>>>>> On Sat, Sep 7, 2019, 5:04 AM Ankit Khettry 
>>>>>> wrote:
>>>>>>
>>>>>>> I have a Spark job that consists of a large number of Window
>>>>>>> operations and hence involves large shuffles. I have roughly 900 GiBs of
>>>>>>> data, although I am using a large enough cluster (10 * m5.4xlarge
>>>>>>> instances). I am using the following configurations for the job, 
>>>>>>> although I
>>>>>>> have tried various other combinations without any success.
>>>>>>>
>>>>>>> spark.yarn.driver.memoryOverhead 6g
>>>>>>> spark.storage.memoryFraction 0.1
>>>>>>> spark.executor.cores 6
>>>>>>> spark.executor.memory 36g
>>>>>>> spark.memory.offHeap.size 8g
>>>>>>> spark.memory.offHeap.enabled true
>>>>>>> spark.executor.instances 10
>>>>>>> spark.driver.memory 14g
>>>>>>> spark.yarn.executor.memoryOverhead 10g
>>>>>>>
>>>>>>> I keep running into the following OOM error:
>>>>>>>
>>>>>>> org.apache.spark.memory.SparkOutOfMemoryError: Unable to acquire
>>>>>>> 16384 bytes of memory, got 0
>>>>>>> at
>>>>>>> org.apache.spark.memory.MemoryConsumer.throwOom(MemoryConsumer.java:157)
>>>>>>> at
>>>>>>> org.apache.spark.memory.MemoryConsumer.allocateArray(MemoryConsumer.java:98)
>>>>>>> at
>>>>>>> org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter.(UnsafeInMemorySorter.java:128)
>>>>>>> at
>>>>>>> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.(UnsafeExternalSorter.java:163)
>>>>>>>
>>>>>>> I see there are a large number of JIRAs in place for similar issues
>>>>>>> and a great many of them are even marked resolved.
>>>>>>> Can someone guide me as to how to approach this problem? I am using
>>>>>>> Databricks Spark 2.4.1.
>>>>>>>
>>>>>>> Best Regards
>>>>>>> Ankit Khettry
>>>>>>>
>>>>>>


Re: OOM Error

2019-09-07 Thread Chris Teoh
You can try, consider processing each partition separately if your data is
heavily skewed when you partition it.

On Sat, 7 Sep 2019, 7:19 pm Ankit Khettry,  wrote:

> Thanks Chris
>
> Going to try it soon by setting maybe spark.sql.shuffle.partitions to
> 2001. Also, I was wondering if it would help if I repartition the data by
> the fields I am using in group by and window operations?
>
> Best Regards
> Ankit Khettry
>
> On Sat, 7 Sep, 2019, 1:05 PM Chris Teoh,  wrote:
>
>> Hi Ankit,
>>
>> Without looking at the Spark UI and the stages/DAG, I'm guessing you're
>> running on default number of Spark shuffle partitions.
>>
>> If you're seeing a lot of shuffle spill, you likely have to increase the
>> number of shuffle partitions to accommodate the huge shuffle size.
>>
>> I hope that helps
>> Chris
>>
>> On Sat, 7 Sep 2019, 4:18 pm Ankit Khettry, 
>> wrote:
>>
>>> Nope, it's a batch job.
>>>
>>> Best Regards
>>> Ankit Khettry
>>>
>>> On Sat, 7 Sep, 2019, 6:52 AM Upasana Sharma, <028upasana...@gmail.com>
>>> wrote:
>>>
>>>> Is it a streaming job?
>>>>
>>>> On Sat, Sep 7, 2019, 5:04 AM Ankit Khettry 
>>>> wrote:
>>>>
>>>>> I have a Spark job that consists of a large number of Window
>>>>> operations and hence involves large shuffles. I have roughly 900 GiBs of
>>>>> data, although I am using a large enough cluster (10 * m5.4xlarge
>>>>> instances). I am using the following configurations for the job, although 
>>>>> I
>>>>> have tried various other combinations without any success.
>>>>>
>>>>> spark.yarn.driver.memoryOverhead 6g
>>>>> spark.storage.memoryFraction 0.1
>>>>> spark.executor.cores 6
>>>>> spark.executor.memory 36g
>>>>> spark.memory.offHeap.size 8g
>>>>> spark.memory.offHeap.enabled true
>>>>> spark.executor.instances 10
>>>>> spark.driver.memory 14g
>>>>> spark.yarn.executor.memoryOverhead 10g
>>>>>
>>>>> I keep running into the following OOM error:
>>>>>
>>>>> org.apache.spark.memory.SparkOutOfMemoryError: Unable to acquire 16384
>>>>> bytes of memory, got 0
>>>>> at
>>>>> org.apache.spark.memory.MemoryConsumer.throwOom(MemoryConsumer.java:157)
>>>>> at
>>>>> org.apache.spark.memory.MemoryConsumer.allocateArray(MemoryConsumer.java:98)
>>>>> at
>>>>> org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter.(UnsafeInMemorySorter.java:128)
>>>>> at
>>>>> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.(UnsafeExternalSorter.java:163)
>>>>>
>>>>> I see there are a large number of JIRAs in place for similar issues
>>>>> and a great many of them are even marked resolved.
>>>>> Can someone guide me as to how to approach this problem? I am using
>>>>> Databricks Spark 2.4.1.
>>>>>
>>>>> Best Regards
>>>>> Ankit Khettry
>>>>>
>>>>


Re: OOM Error

2019-09-07 Thread Ankit Khettry
Thanks Chris

Going to try it soon by setting maybe spark.sql.shuffle.partitions to 2001.
Also, I was wondering if it would help if I repartition the data by the
fields I am using in group by and window operations?

Best Regards
Ankit Khettry

On Sat, 7 Sep, 2019, 1:05 PM Chris Teoh,  wrote:

> Hi Ankit,
>
> Without looking at the Spark UI and the stages/DAG, I'm guessing you're
> running on default number of Spark shuffle partitions.
>
> If you're seeing a lot of shuffle spill, you likely have to increase the
> number of shuffle partitions to accommodate the huge shuffle size.
>
> I hope that helps
> Chris
>
> On Sat, 7 Sep 2019, 4:18 pm Ankit Khettry, 
> wrote:
>
>> Nope, it's a batch job.
>>
>> Best Regards
>> Ankit Khettry
>>
>> On Sat, 7 Sep, 2019, 6:52 AM Upasana Sharma, <028upasana...@gmail.com>
>> wrote:
>>
>>> Is it a streaming job?
>>>
>>> On Sat, Sep 7, 2019, 5:04 AM Ankit Khettry 
>>> wrote:
>>>
>>>> I have a Spark job that consists of a large number of Window operations
>>>> and hence involves large shuffles. I have roughly 900 GiBs of data,
>>>> although I am using a large enough cluster (10 * m5.4xlarge instances). I
>>>> am using the following configurations for the job, although I have tried
>>>> various other combinations without any success.
>>>>
>>>> spark.yarn.driver.memoryOverhead 6g
>>>> spark.storage.memoryFraction 0.1
>>>> spark.executor.cores 6
>>>> spark.executor.memory 36g
>>>> spark.memory.offHeap.size 8g
>>>> spark.memory.offHeap.enabled true
>>>> spark.executor.instances 10
>>>> spark.driver.memory 14g
>>>> spark.yarn.executor.memoryOverhead 10g
>>>>
>>>> I keep running into the following OOM error:
>>>>
>>>> org.apache.spark.memory.SparkOutOfMemoryError: Unable to acquire 16384
>>>> bytes of memory, got 0
>>>> at
>>>> org.apache.spark.memory.MemoryConsumer.throwOom(MemoryConsumer.java:157)
>>>> at
>>>> org.apache.spark.memory.MemoryConsumer.allocateArray(MemoryConsumer.java:98)
>>>> at
>>>> org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter.(UnsafeInMemorySorter.java:128)
>>>> at
>>>> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.(UnsafeExternalSorter.java:163)
>>>>
>>>> I see there are a large number of JIRAs in place for similar issues and
>>>> a great many of them are even marked resolved.
>>>> Can someone guide me as to how to approach this problem? I am using
>>>> Databricks Spark 2.4.1.
>>>>
>>>> Best Regards
>>>> Ankit Khettry
>>>>
>>>


Re: OOM Error

2019-09-07 Thread Chris Teoh
Hi Ankit,

Without looking at the Spark UI and the stages/DAG, I'm guessing you're
running on default number of Spark shuffle partitions.

If you're seeing a lot of shuffle spill, you likely have to increase the
number of shuffle partitions to accommodate the huge shuffle size.

I hope that helps
Chris

On Sat, 7 Sep 2019, 4:18 pm Ankit Khettry,  wrote:

> Nope, it's a batch job.
>
> Best Regards
> Ankit Khettry
>
> On Sat, 7 Sep, 2019, 6:52 AM Upasana Sharma, <028upasana...@gmail.com>
> wrote:
>
>> Is it a streaming job?
>>
>> On Sat, Sep 7, 2019, 5:04 AM Ankit Khettry 
>> wrote:
>>
>>> I have a Spark job that consists of a large number of Window operations
>>> and hence involves large shuffles. I have roughly 900 GiBs of data,
>>> although I am using a large enough cluster (10 * m5.4xlarge instances). I
>>> am using the following configurations for the job, although I have tried
>>> various other combinations without any success.
>>>
>>> spark.yarn.driver.memoryOverhead 6g
>>> spark.storage.memoryFraction 0.1
>>> spark.executor.cores 6
>>> spark.executor.memory 36g
>>> spark.memory.offHeap.size 8g
>>> spark.memory.offHeap.enabled true
>>> spark.executor.instances 10
>>> spark.driver.memory 14g
>>> spark.yarn.executor.memoryOverhead 10g
>>>
>>> I keep running into the following OOM error:
>>>
>>> org.apache.spark.memory.SparkOutOfMemoryError: Unable to acquire 16384
>>> bytes of memory, got 0
>>> at
>>> org.apache.spark.memory.MemoryConsumer.throwOom(MemoryConsumer.java:157)
>>> at
>>> org.apache.spark.memory.MemoryConsumer.allocateArray(MemoryConsumer.java:98)
>>> at
>>> org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter.(UnsafeInMemorySorter.java:128)
>>> at
>>> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.(UnsafeExternalSorter.java:163)
>>>
>>> I see there are a large number of JIRAs in place for similar issues and
>>> a great many of them are even marked resolved.
>>> Can someone guide me as to how to approach this problem? I am using
>>> Databricks Spark 2.4.1.
>>>
>>> Best Regards
>>> Ankit Khettry
>>>
>>


Re: OOM Error

2019-09-07 Thread Ankit Khettry
Nope, it's a batch job.

Best Regards
Ankit Khettry

On Sat, 7 Sep, 2019, 6:52 AM Upasana Sharma, <028upasana...@gmail.com>
wrote:

> Is it a streaming job?
>
> On Sat, Sep 7, 2019, 5:04 AM Ankit Khettry 
> wrote:
>
>> I have a Spark job that consists of a large number of Window operations
>> and hence involves large shuffles. I have roughly 900 GiBs of data,
>> although I am using a large enough cluster (10 * m5.4xlarge instances). I
>> am using the following configurations for the job, although I have tried
>> various other combinations without any success.
>>
>> spark.yarn.driver.memoryOverhead 6g
>> spark.storage.memoryFraction 0.1
>> spark.executor.cores 6
>> spark.executor.memory 36g
>> spark.memory.offHeap.size 8g
>> spark.memory.offHeap.enabled true
>> spark.executor.instances 10
>> spark.driver.memory 14g
>> spark.yarn.executor.memoryOverhead 10g
>>
>> I keep running into the following OOM error:
>>
>> org.apache.spark.memory.SparkOutOfMemoryError: Unable to acquire 16384
>> bytes of memory, got 0
>> at
>> org.apache.spark.memory.MemoryConsumer.throwOom(MemoryConsumer.java:157)
>> at
>> org.apache.spark.memory.MemoryConsumer.allocateArray(MemoryConsumer.java:98)
>> at
>> org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter.(UnsafeInMemorySorter.java:128)
>> at
>> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.(UnsafeExternalSorter.java:163)
>>
>> I see there are a large number of JIRAs in place for similar issues and a
>> great many of them are even marked resolved.
>> Can someone guide me as to how to approach this problem? I am using
>> Databricks Spark 2.4.1.
>>
>> Best Regards
>> Ankit Khettry
>>
>


Re: OOM Error

2019-09-06 Thread Upasana Sharma
Is it a streaming job?

On Sat, Sep 7, 2019, 5:04 AM Ankit Khettry  wrote:

> I have a Spark job that consists of a large number of Window operations
> and hence involves large shuffles. I have roughly 900 GiBs of data,
> although I am using a large enough cluster (10 * m5.4xlarge instances). I
> am using the following configurations for the job, although I have tried
> various other combinations without any success.
>
> spark.yarn.driver.memoryOverhead 6g
> spark.storage.memoryFraction 0.1
> spark.executor.cores 6
> spark.executor.memory 36g
> spark.memory.offHeap.size 8g
> spark.memory.offHeap.enabled true
> spark.executor.instances 10
> spark.driver.memory 14g
> spark.yarn.executor.memoryOverhead 10g
>
> I keep running into the following OOM error:
>
> org.apache.spark.memory.SparkOutOfMemoryError: Unable to acquire 16384
> bytes of memory, got 0
> at org.apache.spark.memory.MemoryConsumer.throwOom(MemoryConsumer.java:157)
> at
> org.apache.spark.memory.MemoryConsumer.allocateArray(MemoryConsumer.java:98)
> at
> org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter.(UnsafeInMemorySorter.java:128)
> at
> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.(UnsafeExternalSorter.java:163)
>
> I see there are a large number of JIRAs in place for similar issues and a
> great many of them are even marked resolved.
> Can someone guide me as to how to approach this problem? I am using
> Databricks Spark 2.4.1.
>
> Best Regards
> Ankit Khettry
>


OOM Error

2019-09-06 Thread Ankit Khettry
I have a Spark job that consists of a large number of Window operations and
hence involves large shuffles. I have roughly 900 GiBs of data, although I
am using a large enough cluster (10 * m5.4xlarge instances). I am using the
following configurations for the job, although I have tried various other
combinations without any success.

spark.yarn.driver.memoryOverhead 6g
spark.storage.memoryFraction 0.1
spark.executor.cores 6
spark.executor.memory 36g
spark.memory.offHeap.size 8g
spark.memory.offHeap.enabled true
spark.executor.instances 10
spark.driver.memory 14g
spark.yarn.executor.memoryOverhead 10g

I keep running into the following OOM error:

org.apache.spark.memory.SparkOutOfMemoryError: Unable to acquire 16384
bytes of memory, got 0
at org.apache.spark.memory.MemoryConsumer.throwOom(MemoryConsumer.java:157)
at
org.apache.spark.memory.MemoryConsumer.allocateArray(MemoryConsumer.java:98)
at
org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter.(UnsafeInMemorySorter.java:128)
at
org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.(UnsafeExternalSorter.java:163)

I see there are a large number of JIRAs in place for similar issues and a
great many of them are even marked resolved.
Can someone guide me as to how to approach this problem? I am using
Databricks Spark 2.4.1.

Best Regards
Ankit Khettry


Re: Spark 2.0.0 OOM error at beginning of RDD map on AWS

2016-08-24 Thread Arun Luthra
Also for the record, turning on kryo was not able to help.

On Tue, Aug 23, 2016 at 12:58 PM, Arun Luthra <arun.lut...@gmail.com> wrote:

> Splitting up the Maps to separate objects did not help.
>
> However, I was able to work around the problem by reimplementing it with
> RDD joins.
>
> On Aug 18, 2016 5:16 PM, "Arun Luthra" <arun.lut...@gmail.com> wrote:
>
>> This might be caused by a few large Map objects that Spark is trying to
>> serialize. These are not broadcast variables or anything, they're just
>> regular objects.
>>
>> Would it help if I further indexed these maps into a two-level Map i.e.
>> Map[String, Map[String, Int]] ? Or would this still count against me?
>>
>> What if I manually split them up into numerous Map variables?
>>
>> On Mon, Aug 15, 2016 at 2:12 PM, Arun Luthra <arun.lut...@gmail.com>
>> wrote:
>>
>>> I got this OOM error in Spark local mode. The error seems to have been
>>> at the start of a stage (all of the stages on the UI showed as complete,
>>> there were more stages to do but had not showed up on the UI yet).
>>>
>>> There appears to be ~100G of free memory at the time of the error.
>>>
>>> Spark 2.0.0
>>> 200G driver memory
>>> local[30]
>>> 8 /mntX/tmp directories for spark.local.dir
>>> "spark.sql.shuffle.partitions", "500"
>>> "spark.driver.maxResultSize","500"
>>> "spark.default.parallelism", "1000"
>>>
>>> The line number for the error is at an RDD map operation where there are
>>> some potentially large Map objects that are going to be accessed by each
>>> record. Does it matter if they are broadcast variables or not? I imagine
>>> not because its in local mode they should be available in memory to every
>>> executor/core.
>>>
>>> Possibly related:
>>> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Cl
>>> osureCleaner-or-java-serializer-OOM-when-trying-to-grow-td24796.html
>>>
>>> Exception in thread "main" java.lang.OutOfMemoryError
>>> at java.io.ByteArrayOutputStream.hugeCapacity(ByteArrayOutputSt
>>> ream.java:123)
>>> at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:117)
>>> at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutput
>>> Stream.java:93)
>>> at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:153)
>>> at java.io.ObjectOutputStream$BlockDataOutputStream.drain(Objec
>>> tOutputStream.java:1877)
>>> at java.io.ObjectOutputStream$BlockDataOutputStream.setBlockDat
>>> aMode(ObjectOutputStream.java:1786)
>>> at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1189)
>>> at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
>>> at org.apache.spark.serializer.JavaSerializationStream.writeObj
>>> ect(JavaSerializer.scala:43)
>>> at org.apache.spark.serializer.JavaSerializerInstance.serialize
>>> (JavaSerializer.scala:100)
>>> at org.apache.spark.util.ClosureCleaner$.ensureSerializable(Clo
>>> sureCleaner.scala:295)
>>> at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$
>>> ClosureCleaner$$clean(ClosureCleaner.scala:288)
>>> at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:108)
>>> at org.apache.spark.SparkContext.clean(SparkContext.scala:2037)
>>> at org.apache.spark.rdd.RDD$$anonfun$map$1.apply(RDD.scala:366)
>>> at org.apache.spark.rdd.RDD$$anonfun$map$1.apply(RDD.scala:365)
>>> at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperati
>>> onScope.scala:151)
>>> at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperati
>>> onScope.scala:112)
>>> at org.apache.spark.rdd.RDD.withScope(RDD.scala:358)
>>> at org.apache.spark.rdd.RDD.map(RDD.scala:365)
>>> at abc.Abc$.main(abc.scala:395)
>>> at abc.Abc.main(abc.scala)
>>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAcce
>>> ssorImpl.java:62)
>>> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMe
>>> thodAccessorImpl.java:43)
>>> at java.lang.reflect.Method.invoke(Method.java:498)
>>> at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy
>>> $SparkSubmit$$runMain(SparkSubmit.scala:729)
>>> at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit
>>> .scala:185)
>>> at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210)
>>> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124)
>>> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>>>
>>>
>>


Re: Spark 2.0.0 OOM error at beginning of RDD map on AWS

2016-08-23 Thread Arun Luthra
Splitting up the Maps to separate objects did not help.

However, I was able to work around the problem by reimplementing it with
RDD joins.

On Aug 18, 2016 5:16 PM, "Arun Luthra" <arun.lut...@gmail.com> wrote:

> This might be caused by a few large Map objects that Spark is trying to
> serialize. These are not broadcast variables or anything, they're just
> regular objects.
>
> Would it help if I further indexed these maps into a two-level Map i.e.
> Map[String, Map[String, Int]] ? Or would this still count against me?
>
> What if I manually split them up into numerous Map variables?
>
> On Mon, Aug 15, 2016 at 2:12 PM, Arun Luthra <arun.lut...@gmail.com>
> wrote:
>
>> I got this OOM error in Spark local mode. The error seems to have been at
>> the start of a stage (all of the stages on the UI showed as complete, there
>> were more stages to do but had not showed up on the UI yet).
>>
>> There appears to be ~100G of free memory at the time of the error.
>>
>> Spark 2.0.0
>> 200G driver memory
>> local[30]
>> 8 /mntX/tmp directories for spark.local.dir
>> "spark.sql.shuffle.partitions", "500"
>> "spark.driver.maxResultSize","500"
>> "spark.default.parallelism", "1000"
>>
>> The line number for the error is at an RDD map operation where there are
>> some potentially large Map objects that are going to be accessed by each
>> record. Does it matter if they are broadcast variables or not? I imagine
>> not because its in local mode they should be available in memory to every
>> executor/core.
>>
>> Possibly related:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Cl
>> osureCleaner-or-java-serializer-OOM-when-trying-to-grow-td24796.html
>>
>> Exception in thread "main" java.lang.OutOfMemoryError
>> at java.io.ByteArrayOutputStream.hugeCapacity(ByteArrayOutputSt
>> ream.java:123)
>> at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:117)
>> at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutput
>> Stream.java:93)
>> at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:153)
>> at java.io.ObjectOutputStream$BlockDataOutputStream.drain(Objec
>> tOutputStream.java:1877)
>> at java.io.ObjectOutputStream$BlockDataOutputStream.setBlockDat
>> aMode(ObjectOutputStream.java:1786)
>> at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1189)
>> at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
>> at org.apache.spark.serializer.JavaSerializationStream.writeObj
>> ect(JavaSerializer.scala:43)
>> at org.apache.spark.serializer.JavaSerializerInstance.serialize
>> (JavaSerializer.scala:100)
>> at org.apache.spark.util.ClosureCleaner$.ensureSerializable(Clo
>> sureCleaner.scala:295)
>> at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$
>> ClosureCleaner$$clean(ClosureCleaner.scala:288)
>> at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:108)
>> at org.apache.spark.SparkContext.clean(SparkContext.scala:2037)
>> at org.apache.spark.rdd.RDD$$anonfun$map$1.apply(RDD.scala:366)
>> at org.apache.spark.rdd.RDD$$anonfun$map$1.apply(RDD.scala:365)
>> at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperati
>> onScope.scala:151)
>> at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperati
>> onScope.scala:112)
>> at org.apache.spark.rdd.RDD.withScope(RDD.scala:358)
>> at org.apache.spark.rdd.RDD.map(RDD.scala:365)
>> at abc.Abc$.main(abc.scala:395)
>> at abc.Abc.main(abc.scala)
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAcce
>> ssorImpl.java:62)
>> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMe
>> thodAccessorImpl.java:43)
>> at java.lang.reflect.Method.invoke(Method.java:498)
>> at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy
>> $SparkSubmit$$runMain(SparkSubmit.scala:729)
>> at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit
>> .scala:185)
>> at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210)
>> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124)
>> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>>
>>
>


Re: Spark 2.0.0 OOM error at beginning of RDD map on AWS

2016-08-18 Thread Arun Luthra
This might be caused by a few large Map objects that Spark is trying to
serialize. These are not broadcast variables or anything, they're just
regular objects.

Would it help if I further indexed these maps into a two-level Map i.e.
Map[String, Map[String, Int]] ? Or would this still count against me?

What if I manually split them up into numerous Map variables?

On Mon, Aug 15, 2016 at 2:12 PM, Arun Luthra <arun.lut...@gmail.com> wrote:

> I got this OOM error in Spark local mode. The error seems to have been at
> the start of a stage (all of the stages on the UI showed as complete, there
> were more stages to do but had not showed up on the UI yet).
>
> There appears to be ~100G of free memory at the time of the error.
>
> Spark 2.0.0
> 200G driver memory
> local[30]
> 8 /mntX/tmp directories for spark.local.dir
> "spark.sql.shuffle.partitions", "500"
> "spark.driver.maxResultSize","500"
> "spark.default.parallelism", "1000"
>
> The line number for the error is at an RDD map operation where there are
> some potentially large Map objects that are going to be accessed by each
> record. Does it matter if they are broadcast variables or not? I imagine
> not because its in local mode they should be available in memory to every
> executor/core.
>
> Possibly related:
> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-
> ClosureCleaner-or-java-serializer-OOM-when-trying-to-grow-td24796.html
>
> Exception in thread "main" java.lang.OutOfMemoryError
> at java.io.ByteArrayOutputStream.hugeCapacity(ByteArrayOutputStream.java:
> 123)
> at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:117)
> at java.io.ByteArrayOutputStream.ensureCapacity(
> ByteArrayOutputStream.java:93)
> at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:153)
> at java.io.ObjectOutputStream$BlockDataOutputStream.drain(
> ObjectOutputStream.java:1877)
> at java.io.ObjectOutputStream$BlockDataOutputStream.setBlockDataMode(
> ObjectOutputStream.java:1786)
> at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1189)
> at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
> at org.apache.spark.serializer.JavaSerializationStream.
> writeObject(JavaSerializer.scala:43)
> at org.apache.spark.serializer.JavaSerializerInstance.
> serialize(JavaSerializer.scala:100)
> at org.apache.spark.util.ClosureCleaner$.ensureSerializable(
> ClosureCleaner.scala:295)
> at org.apache.spark.util.ClosureCleaner$.org$apache$
> spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:288)
> at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:108)
> at org.apache.spark.SparkContext.clean(SparkContext.scala:2037)
> at org.apache.spark.rdd.RDD$$anonfun$map$1.apply(RDD.scala:366)
> at org.apache.spark.rdd.RDD$$anonfun$map$1.apply(RDD.scala:365)
> at org.apache.spark.rdd.RDDOperationScope$.withScope(
> RDDOperationScope.scala:151)
> at org.apache.spark.rdd.RDDOperationScope$.withScope(
> RDDOperationScope.scala:112)
> at org.apache.spark.rdd.RDD.withScope(RDD.scala:358)
> at org.apache.spark.rdd.RDD.map(RDD.scala:365)
> at abc.Abc$.main(abc.scala:395)
> at abc.Abc.main(abc.scala)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at sun.reflect.NativeMethodAccessorImpl.invoke(
> NativeMethodAccessorImpl.java:62)
> at sun.reflect.DelegatingMethodAccessorImpl.invoke(
> DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$
> deploy$SparkSubmit$$runMain(SparkSubmit.scala:729)
> at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:185)
> at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210)
> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124)
> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
>
>


Spark 2.0.0 OOM error at beginning of RDD map on AWS

2016-08-15 Thread Arun Luthra
I got this OOM error in Spark local mode. The error seems to have been at
the start of a stage (all of the stages on the UI showed as complete, there
were more stages to do but had not showed up on the UI yet).

There appears to be ~100G of free memory at the time of the error.

Spark 2.0.0
200G driver memory
local[30]
8 /mntX/tmp directories for spark.local.dir
"spark.sql.shuffle.partitions", "500"
"spark.driver.maxResultSize","500"
"spark.default.parallelism", "1000"

The line number for the error is at an RDD map operation where there are
some potentially large Map objects that are going to be accessed by each
record. Does it matter if they are broadcast variables or not? I imagine
not because its in local mode they should be available in memory to every
executor/core.

Possibly related:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-ClosureCleaner-or-java-serializer-OOM-when-trying-to-grow-td24796.html

Exception in thread "main" java.lang.OutOfMemoryError
at
java.io.ByteArrayOutputStream.hugeCapacity(ByteArrayOutputStream.java:123)
at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:117)
at
java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93)
at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:153)
at
java.io.ObjectOutputStream$BlockDataOutputStream.drain(ObjectOutputStream.java:1877)
at
java.io.ObjectOutputStream$BlockDataOutputStream.setBlockDataMode(ObjectOutputStream.java:1786)
at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1189)
at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348)
at
org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:43)
at
org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:100)
at
org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:295)
at
org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:288)
at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:108)
at org.apache.spark.SparkContext.clean(SparkContext.scala:2037)
at org.apache.spark.rdd.RDD$$anonfun$map$1.apply(RDD.scala:366)
at org.apache.spark.rdd.RDD$$anonfun$map$1.apply(RDD.scala:365)
at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:358)
at org.apache.spark.rdd.RDD.map(RDD.scala:365)
at abc.Abc$.main(abc.scala:395)
at abc.Abc.main(abc.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:729)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:185)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)


OOM error in Spark worker

2015-10-01 Thread varun sharma
My workers are going OOM over time. I am running a streaming job in spark
1.4.0.
Here is the heap dump of workers.







*16,802 instances of "org.apache.spark.deploy.worker.ExecutorRunner",
loaded by "sun.misc.Launcher$AppClassLoader @ 0xdff94088" occupy
488,249,688 (95.80%) bytes. These instances are referenced from one
instance of "java.lang.Object[]", loaded by "" Keywords org.apache.spark.deploy.worker.ExecutorRunner
java.lang.Object[] sun.misc.Launcher$AppClassLoader
@ 0xdff94088 *
is this because of this bug:
http://apache-spark-developers-list.1001551.n3.nabble.com/Worker-memory-leaks-td13341.html
https://issues.apache.org/jira/browse/SPARK-9202

Also,
I am getting below error continuously if one of the worker/executor dies on
any node in my spark cluster.
If I start the worker also, error doesn't go. I have to force_kill my
streaming job and restart to fix the issue. Is it some bug?
I am using Spark 1.4.0.


*MY_IP in logs is IP of worker node which failed. *
























*15/09/03 11:29:11 WARN BlockManagerMaster: Failed to remove RDD 194218 -
Ask timed out on
[Actor[akka.tcp://sparkExecutor@MY_IP:38223/user/BlockManagerEndpoint1#656884654]]
after [12 ms]} akka.pattern.AskTimeoutException: Ask timed out on
[Actor[akka.tcp://sparkExecutor@MY_IP:38223/user/BlockManagerEndpoint1#656884654]]
after [12 ms] at
akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:333)
  at akka.actor.Scheduler$$anon$7.run(Scheduler.scala:117) at
scala.concurrent.Future$InternalCallbackExecutor$.scala$concurrent$Future$InternalCallbackExecutor$$unbatchedExecute(Future.scala:694)
  at
scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:691)
  at
akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(Scheduler.scala:467)
  at
akka.actor.LightArrayRevolverScheduler$$anon$8.executeBucket$1(Scheduler.scala:419)
  at
akka.actor.LightArrayRevolverScheduler$$anon$8.nextTick(Scheduler.scala:423)
  at
akka.actor.LightArrayRevolverScheduler$$anon$8.run(Scheduler.scala:375)
at java.lang.Thread.run(Thread.java:745) 15/09/03 11:29:11 WARN
BlockManagerMaster: Failed to remove RDD 194217 - Ask timed out on
[Actor[akka.tcp://sparkExecutor@MY_IP:38223/user/BlockManagerEndpoint1#656884654]]
after [12 ms]} akka.pattern.AskTimeoutException: Ask timed out on
[Actor[akka.tcp://sparkExecutor@MY_IP:38223/user/BlockManagerEndpoint1#656884654]]
after [12 ms] at
akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:333)
  at akka.actor.Scheduler$$anon$7.run(Scheduler.scala:117) at
scala.concurrent.Future$InternalCallbackExecutor$.scala$concurrent$Future$InternalCallbackExecutor$$unbatchedExecute(Future.scala:694)
  at
scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:691)
  at
akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(Scheduler.scala:467)
  at
akka.actor.LightArrayRevolverScheduler$$anon$8.executeBucket$1(Scheduler.scala:419)
  at
akka.actor.LightArrayRevolverScheduler$$anon$8.nextTick(Scheduler.scala:423)
  at
akka.actor.LightArrayRevolverScheduler$$anon$8.run(Scheduler.scala:375)
at java.lang.Thread.run(Thread.java:745) 15/09/03 11:29:11 ERROR
SparkDeploySchedulerBackend: Asked to remove non-existent executor
16723 15/09/03 11:29:11 WARN BlockManagerMaster: Failed to remove RDD
194216 - Ask timed out on
[Actor[akka.tcp://sparkExecutor@MY_IP:38223/user/BlockManagerEndpoint1#656884654]]
after [12 ms]} *

*It is easily reproducible if I manually stop a worker on one of my node. *


*15/09/03 23:52:18 ERROR SparkDeploySchedulerBackend: Asked to remove
non-existent executor 329 15/09/03 23:52:18 ERROR
SparkDeploySchedulerBackend: Asked to remove non-existent executor
333 15/09/03 23:52:18 ERROR SparkDeploySchedulerBackend: Asked to remove
non-existent executor 334 *


*It doesn't go even if I start the worker again. Follow up question: If my
streaming job has consumed some events from Kafka topic and are pending to
be scheduled because of delay in processing... Will my force killing the
streaming job lose that data which is not yet scheduled? *


-- 
*VARUN SHARMA*
*Flipkart*
*Bangalore*


OOM error in Spark worker

2015-09-29 Thread varun sharma
My workers are going OOM over time. I am running a streaming job in spark
1.4.0.
Here is the heap dump of workers.

/16,802 instances of "org.apache.spark.deploy.worker.ExecutorRunner", loaded
by "sun.misc.Launcher$AppClassLoader @ 0xdff94088" occupy 488,249,688
(95.80%) bytes. These instances are referenced from one instance of
"java.lang.Object[]", loaded by ""

Keywords
org.apache.spark.deploy.worker.ExecutorRunner
java.lang.Object[]
sun.misc.Launcher$AppClassLoader @ 0xdff94088
/

I am getting below error continuously if one of the worker/executor dies on
any node in my spark cluster.
If I start the worker also, error doesn't go. I have to force_kill my
streaming job and restart to fix the issue. Is it some bug?
I am using Spark 1.4.0.

MY_IP in logs is IP of worker node which failed.

/15/09/03 11:29:11 WARN BlockManagerMaster: Failed to remove RDD 194218 -
Ask timed out on
[Actor[akka.tcp://sparkExecutor@MY_IP:38223/user/BlockManagerEndpoint1#656884654]]
after [12 ms]}
akka.pattern.AskTimeoutException: Ask timed out on
[Actor[akka.tcp://sparkExecutor@MY_IP:38223/user/BlockManagerEndpoint1#656884654]]
after [12 ms]
at
akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:333)
at akka.actor.Scheduler$$anon$7.run(Scheduler.scala:117)
at
scala.concurrent.Future$InternalCallbackExecutor$.scala$concurrent$Future$InternalCallbackExecutor$$unbatchedExecute(Future.scala:694)
at
scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:691)
at
akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(Scheduler.scala:467)
at
akka.actor.LightArrayRevolverScheduler$$anon$8.executeBucket$1(Scheduler.scala:419)
at
akka.actor.LightArrayRevolverScheduler$$anon$8.nextTick(Scheduler.scala:423)
at 
akka.actor.LightArrayRevolverScheduler$$anon$8.run(Scheduler.scala:375)
at java.lang.Thread.run(Thread.java:745)
15/09/03 11:29:11 WARN BlockManagerMaster: Failed to remove RDD 194217 - Ask
timed out on
[Actor[akka.tcp://sparkExecutor@MY_IP:38223/user/BlockManagerEndpoint1#656884654]]
after [12 ms]}
akka.pattern.AskTimeoutException: Ask timed out on
[Actor[akka.tcp://sparkExecutor@MY_IP:38223/user/BlockManagerEndpoint1#656884654]]
after [12 ms]
at
akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:333)
at akka.actor.Scheduler$$anon$7.run(Scheduler.scala:117)
at
scala.concurrent.Future$InternalCallbackExecutor$.scala$concurrent$Future$InternalCallbackExecutor$$unbatchedExecute(Future.scala:694)
at
scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:691)
at
akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(Scheduler.scala:467)
at
akka.actor.LightArrayRevolverScheduler$$anon$8.executeBucket$1(Scheduler.scala:419)
at
akka.actor.LightArrayRevolverScheduler$$anon$8.nextTick(Scheduler.scala:423)
at 
akka.actor.LightArrayRevolverScheduler$$anon$8.run(Scheduler.scala:375)
at java.lang.Thread.run(Thread.java:745)
15/09/03 11:29:11 ERROR SparkDeploySchedulerBackend: Asked to remove
non-existent executor 16723
15/09/03 11:29:11 WARN BlockManagerMaster: Failed to remove RDD 194216 - Ask
timed out on
[Actor[akka.tcp://sparkExecutor@MY_IP:38223/user/BlockManagerEndpoint1#656884654]]
after [12 ms]}
/
It is easily reproducible if I manually stop a worker on one of my node.
/15/09/03 23:52:18 ERROR SparkDeploySchedulerBackend: Asked to remove
non-existent executor 329
15/09/03 23:52:18 ERROR SparkDeploySchedulerBackend: Asked to remove
non-existent executor 333
15/09/03 23:52:18 ERROR SparkDeploySchedulerBackend: Asked to remove
non-existent executor 334
/
It doesn't go even if I start the worker again.

Follow up question: If my streaming job has consumed some events from Kafka
topic and are pending to be scheduled because of delay in processing... Will
my force killing the streaming job lose that data which is not yet
scheduled?

Please help ASAP.



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/OOM-error-in-Spark-worker-tp24856.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: OOM error with GMMs on 4GB dataset

2015-05-06 Thread Xiangrui Meng
Did you set `--driver-memory` with spark-submit? -Xiangrui

On Mon, May 4, 2015 at 5:16 PM, Vinay Muttineni vmuttin...@ebay.com wrote:
 Hi, I am training a GMM with 10 gaussians on a 4 GB dataset(720,000 * 760).
 The spark (1.3.1) job is allocated 120 executors with 6GB each and the
 driver also has 6GB.
 Spark Config Params:

 .set(spark.hadoop.validateOutputSpecs,
 false).set(spark.dynamicAllocation.enabled,
 false).set(spark.driver.maxResultSize,
 4g).set(spark.default.parallelism, 300).set(spark.serializer,
 org.apache.spark.serializer.KryoSerializer).set(spark.kryoserializer.buffer.mb,
 500).set(spark.akka.frameSize, 256).set(spark.akka.timeout, 300)

 However, at the aggregate step (Line 168)
 val sums = breezeData.aggregate(ExpectationSum.zero(k, d))(compute.value, _
 += _)

 I get OOM error and the application hangs indefinitely. Is this an issue or
 am I missing something?
 java.lang.OutOfMemoryError: Java heap space
 at akka.util.CompactByteString$.apply(ByteString.scala:410)
 at akka.util.ByteString$.apply(ByteString.scala:22)
 at
 akka.remote.transport.netty.TcpHandlers$class.onMessage(TcpSupport.scala:45)
 at
 akka.remote.transport.netty.TcpServerHandler.onMessage(TcpSupport.scala:57)
 at
 akka.remote.transport.netty.NettyServerHelpers$class.messageReceived(NettyHelpers.scala:43)
 at
 akka.remote.transport.netty.ServerHandler.messageReceived(NettyTransport.scala:180)
 at
 org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
 at
 org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:462)
 at
 org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:443)
 at
 org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:310)
 at
 org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)
 at
 org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)
 at
 org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
 at
 org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
 at
 org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318)
 at
 org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
 at
 org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
 at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
 at java.lang.Thread.run(Thread.java:745)

 15/05/04 16:23:38 ERROR util.Utils: Uncaught exception in thread
 task-result-getter-2
 java.lang.OutOfMemoryError: Java heap space
 Exception in thread task-result-getter-2 java.lang.OutOfMemoryError: Java
 heap space
 15/05/04 16:23:45 INFO scheduler.TaskSetManager: Finished task 1070.0 in
 stage 6.0 (TID 8276) in 382069 ms on [] (160/3600)
 15/05/04 16:23:54 WARN channel.DefaultChannelPipeline: An exception was
 thrown by a user handler while handling an exception event ([id: 0xc57da871,
 ] EXCEPTION: java.lang.OutOfMemoryError: Java heap space)
 java.lang.OutOfMemoryError: Java heap space
 15/05/04 16:23:55 WARN channel.DefaultChannelPipeline: An exception was
 thrown by a user handler while handling an exception event ([id: 0x3c3dbb0c,
 ] EXCEPTION: java.lang.OutOfMemoryError: Java heap space)
 15/05/04 16:24:45 ERROR actor.ActorSystemImpl: Uncaught fatal error from
 thread [sparkDriver-akka.remote.default-remote-dispatcher-6] shutting down
 ActorSystem [sparkDriver]



 Thanks!
 Vinay

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



OOM error with GMMs on 4GB dataset

2015-05-04 Thread Vinay Muttineni
Hi, I am training a GMM with 10 gaussians on a 4 GB dataset(720,000 * 760).
The spark (1.3.1) job is allocated 120 executors with 6GB each and the
driver also has 6GB.
Spark Config Params:

.set(spark.hadoop.validateOutputSpecs,
false).set(spark.dynamicAllocation.enabled,
false).set(spark.driver.maxResultSize,
4g).set(spark.default.parallelism, 300).set(spark.serializer,
org.apache.spark.serializer.KryoSerializer).set(spark.kryoserializer.buffer.mb,
500).set(spark.akka.frameSize, 256).set(spark.akka.timeout, 300)

However, at the aggregate step (Line 168)
val sums = breezeData.aggregate(ExpectationSum.zero(k, d))(compute.value, _
+= _)

I get OOM error and the application hangs indefinitely. Is this an issue or
am I missing something?
java.lang.OutOfMemoryError: Java heap space
at akka.util.CompactByteString$.apply(ByteString.scala:410)
at akka.util.ByteString$.apply(ByteString.scala:22)
at
akka.remote.transport.netty.TcpHandlers$class.onMessage(TcpSupport.scala:45)
at
akka.remote.transport.netty.TcpServerHandler.onMessage(TcpSupport.scala:57)
at
akka.remote.transport.netty.NettyServerHelpers$class.messageReceived(NettyHelpers.scala:43)
at
akka.remote.transport.netty.ServerHandler.messageReceived(NettyTransport.scala:180)
at
org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
at
org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:462)
at
org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:443)
at
org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:310)
at
org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)
at
org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)
at
org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
at
org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
at
org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318)
at
org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
at
org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

15/05/04 16:23:38 ERROR util.Utils: Uncaught exception in thread
task-result-getter-2
java.lang.OutOfMemoryError: Java heap space
Exception in thread task-result-getter-2 java.lang.OutOfMemoryError: Java
heap space
15/05/04 16:23:45 INFO scheduler.TaskSetManager: Finished task 1070.0 in
stage 6.0 (TID 8276) in 382069 ms on [] (160/3600)
15/05/04 16:23:54 WARN channel.DefaultChannelPipeline: An exception was
thrown by a user handler while handling an exception event ([id:
0xc57da871, ] EXCEPTION: java.lang.OutOfMemoryError: Java heap space)
java.lang.OutOfMemoryError: Java heap space
15/05/04 16:23:55 WARN channel.DefaultChannelPipeline: An exception was
thrown by a user handler while handling an exception event ([id:
0x3c3dbb0c, ] EXCEPTION: java.lang.OutOfMemoryError: Java heap space)
15/05/04 16:24:45 ERROR actor.ActorSystemImpl: Uncaught fatal error from
thread [sparkDriver-akka.remote.default-remote-dispatcher-6] shutting down
ActorSystem [sparkDriver]



Thanks!
Vinay


Re: OOM error

2015-02-17 Thread Harshvardhan Chauhan
Thanks for the pointer it led me to
http://spark.apache.org/docs/1.2.0/tuning.html increasing parallelism
resolved the issue.



On Mon, Feb 16, 2015 at 11:57 PM, Akhil Das ak...@sigmoidanalytics.com
wrote:

 Increase your executor memory, Also you can play around with increasing
 the number of partitions/parallelism etc.

 Thanks
 Best Regards

 On Tue, Feb 17, 2015 at 3:39 AM, Harshvardhan Chauhan ha...@gumgum.com
 wrote:

 Hi All,


 I need some help with Out Of Memory errors in my application. I am using
 Spark 1.1.0 and my application is using Java API. I am running my app on
 EC2  25 m3.xlarge (4 Cores 15GB Memory) instances. The app only fails
 sometimes. Lots of mapToPair tasks a failing.  My app is configured to run
 120 executors and executor memory is 2G.

 These are various errors i see the in my logs.

 15/02/16 10:53:48 INFO storage.MemoryStore: Block broadcast_1 of size 4680 
 dropped from memory (free 257277829)
 15/02/16 10:53:49 WARN channel.DefaultChannelPipeline: An exception was 
 thrown by a user handler while handling an exception event ([id: 0x6e0138a3, 
 /10.61.192.194:35196 = /10.164.164.228:49445] EXCEPTION: 
 java.lang.OutOfMemoryError: Java heap space)
 java.lang.OutOfMemoryError: Java heap space
  at java.nio.HeapByteBuffer.init(HeapByteBuffer.java:57)
  at java.nio.ByteBuffer.allocate(ByteBuffer.java:331)
  at 
 org.jboss.netty.buffer.CompositeChannelBuffer.toByteBuffer(CompositeChannelBuffer.java:649)
  at 
 org.jboss.netty.buffer.AbstractChannelBuffer.toByteBuffer(AbstractChannelBuffer.java:530)
  at 
 org.jboss.netty.channel.socket.nio.SocketSendBufferPool.acquire(SocketSendBufferPool.java:77)
  at 
 org.jboss.netty.channel.socket.nio.SocketSendBufferPool.acquire(SocketSendBufferPool.java:46)
  at 
 org.jboss.netty.channel.socket.nio.AbstractNioWorker.write0(AbstractNioWorker.java:194)
  at 
 org.jboss.netty.channel.socket.nio.AbstractNioWorker.writeFromTaskLoop(AbstractNioWorker.java:152)
  at 
 org.jboss.netty.channel.socket.nio.AbstractNioChannel$WriteTask.run(AbstractNioChannel.java:335)
  at 
 org.jboss.netty.channel.socket.nio.AbstractNioSelector.processTaskQueue(AbstractNioSelector.java:366)
  at 
 org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:290)
  at 
 org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90)
  at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
  at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  at java.lang.Thread.run(Thread.java:745)
 15/02/16 10:53:49 WARN channel.DefaultChannelPipeline: An exception was 
 thrown by a user handler while handling an exception event ([id: 0x2d0c1db1, 
 /10.169.226.254:55790 = /10.164.164.228:49445] EXCEPTION: 
 java.lang.OutOfMemoryError: Java heap space)
 java.lang.OutOfMemoryError: Java heap space
  at java.nio.HeapByteBuffer.init(HeapByteBuffer.java:57)
  at java.nio.ByteBuffer.allocate(ByteBuffer.java:331)
  at 
 org.jboss.netty.buffer.CompositeChannelBuffer.toByteBuffer(CompositeChannelBuffer.java:649)
  at 
 org.jboss.netty.buffer.AbstractChannelBuffer.toByteBuffer(AbstractChannelBuffer.java:530)
  at 
 org.jboss.netty.channel.socket.nio.SocketSendBufferPool.acquire(SocketSendBufferPool.java:77)
  at 
 org.jboss.netty.channel.socket.nio.SocketSendBufferPool.acquire(SocketSendBufferPool.java:46)
  at 
 org.jboss.netty.channel.socket.nio.AbstractNioWorker.write0(AbstractNioWorker.java:194)
  at 
 org.jboss.netty.channel.socket.nio.AbstractNioWorker.writeFromTaskLoop(AbstractNioWorker.java:152)
  at 
 org.jboss.netty.channel.socket.nio.AbstractNioChannel$WriteTask.run(AbstractNioChannel.java:335)
  at 
 org.jboss.netty.channel.socket.nio.AbstractNioSelector.processTaskQueue(AbstractNioSelector.java:366)
  at 
 org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:290)
  at 
 org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90)
  at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
  at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
  at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
  at java.lang.Thread.run(Thread.java:745)
 15/02/16 10:53:50 WARN channel.DefaultChannelPipeline: An exception was 
 thrown by a user handler while handling an exception event ([id: 0xd4211985, 
 /10.181.125.52:60959 = /10.164.164.228:49445] EXCEPTION: 
 java.lang.OutOfMemoryError: Java heap space)
 java.lang.OutOfMemoryError: Java heap space
  at java.nio.HeapByteBuffer.init(HeapByteBuffer.java:57)
  at java.nio.ByteBuffer.allocate(ByteBuffer.java:331)
  at 
 

OOM error

2015-02-16 Thread Harshvardhan Chauhan
Hi All,


I need some help with Out Of Memory errors in my application. I am using
Spark 1.1.0 and my application is using Java API. I am running my app on
EC2  25 m3.xlarge (4 Cores 15GB Memory) instances. The app only fails
sometimes. Lots of mapToPair tasks a failing.  My app is configured to run
120 executors and executor memory is 2G.

These are various errors i see the in my logs.

15/02/16 10:53:48 INFO storage.MemoryStore: Block broadcast_1 of size
4680 dropped from memory (free 257277829)
15/02/16 10:53:49 WARN channel.DefaultChannelPipeline: An exception
was thrown by a user handler while handling an exception event ([id:
0x6e0138a3, /10.61.192.194:35196 = /10.164.164.228:49445] EXCEPTION:
java.lang.OutOfMemoryError: Java heap space)
java.lang.OutOfMemoryError: Java heap space
at java.nio.HeapByteBuffer.init(HeapByteBuffer.java:57)
at java.nio.ByteBuffer.allocate(ByteBuffer.java:331)
at 
org.jboss.netty.buffer.CompositeChannelBuffer.toByteBuffer(CompositeChannelBuffer.java:649)
at 
org.jboss.netty.buffer.AbstractChannelBuffer.toByteBuffer(AbstractChannelBuffer.java:530)
at 
org.jboss.netty.channel.socket.nio.SocketSendBufferPool.acquire(SocketSendBufferPool.java:77)
at 
org.jboss.netty.channel.socket.nio.SocketSendBufferPool.acquire(SocketSendBufferPool.java:46)
at 
org.jboss.netty.channel.socket.nio.AbstractNioWorker.write0(AbstractNioWorker.java:194)
at 
org.jboss.netty.channel.socket.nio.AbstractNioWorker.writeFromTaskLoop(AbstractNioWorker.java:152)
at 
org.jboss.netty.channel.socket.nio.AbstractNioChannel$WriteTask.run(AbstractNioChannel.java:335)
at 
org.jboss.netty.channel.socket.nio.AbstractNioSelector.processTaskQueue(AbstractNioSelector.java:366)
at 
org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:290)
at 
org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90)
at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
15/02/16 10:53:49 WARN channel.DefaultChannelPipeline: An exception
was thrown by a user handler while handling an exception event ([id:
0x2d0c1db1, /10.169.226.254:55790 = /10.164.164.228:49445] EXCEPTION:
java.lang.OutOfMemoryError: Java heap space)
java.lang.OutOfMemoryError: Java heap space
at java.nio.HeapByteBuffer.init(HeapByteBuffer.java:57)
at java.nio.ByteBuffer.allocate(ByteBuffer.java:331)
at 
org.jboss.netty.buffer.CompositeChannelBuffer.toByteBuffer(CompositeChannelBuffer.java:649)
at 
org.jboss.netty.buffer.AbstractChannelBuffer.toByteBuffer(AbstractChannelBuffer.java:530)
at 
org.jboss.netty.channel.socket.nio.SocketSendBufferPool.acquire(SocketSendBufferPool.java:77)
at 
org.jboss.netty.channel.socket.nio.SocketSendBufferPool.acquire(SocketSendBufferPool.java:46)
at 
org.jboss.netty.channel.socket.nio.AbstractNioWorker.write0(AbstractNioWorker.java:194)
at 
org.jboss.netty.channel.socket.nio.AbstractNioWorker.writeFromTaskLoop(AbstractNioWorker.java:152)
at 
org.jboss.netty.channel.socket.nio.AbstractNioChannel$WriteTask.run(AbstractNioChannel.java:335)
at 
org.jboss.netty.channel.socket.nio.AbstractNioSelector.processTaskQueue(AbstractNioSelector.java:366)
at 
org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:290)
at 
org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90)
at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
15/02/16 10:53:50 WARN channel.DefaultChannelPipeline: An exception
was thrown by a user handler while handling an exception event ([id:
0xd4211985, /10.181.125.52:60959 = /10.164.164.228:49445] EXCEPTION:
java.lang.OutOfMemoryError: Java heap space)
java.lang.OutOfMemoryError: Java heap space
at java.nio.HeapByteBuffer.init(HeapByteBuffer.java:57)
at java.nio.ByteBuffer.allocate(ByteBuffer.java:331)
at 
org.jboss.netty.buffer.CompositeChannelBuffer.toByteBuffer(CompositeChannelBuffer.java:649)
at 
org.jboss.netty.buffer.AbstractChannelBuffer.toByteBuffer(AbstractChannelBuffer.java:530)
at 
org.jboss.netty.channel.socket.nio.SocketSendBufferPool.acquire(SocketSendBufferPool.java:77)
at 
org.jboss.netty.channel.socket.nio.SocketSendBufferPool.acquire(SocketSendBufferPool.java:46)
at 

Re: OOM error

2015-02-16 Thread Akhil Das
Increase your executor memory, Also you can play around with increasing the
number of partitions/parallelism etc.

Thanks
Best Regards

On Tue, Feb 17, 2015 at 3:39 AM, Harshvardhan Chauhan ha...@gumgum.com
wrote:

 Hi All,


 I need some help with Out Of Memory errors in my application. I am using
 Spark 1.1.0 and my application is using Java API. I am running my app on
 EC2  25 m3.xlarge (4 Cores 15GB Memory) instances. The app only fails
 sometimes. Lots of mapToPair tasks a failing.  My app is configured to run
 120 executors and executor memory is 2G.

 These are various errors i see the in my logs.

 15/02/16 10:53:48 INFO storage.MemoryStore: Block broadcast_1 of size 4680 
 dropped from memory (free 257277829)
 15/02/16 10:53:49 WARN channel.DefaultChannelPipeline: An exception was 
 thrown by a user handler while handling an exception event ([id: 0x6e0138a3, 
 /10.61.192.194:35196 = /10.164.164.228:49445] EXCEPTION: 
 java.lang.OutOfMemoryError: Java heap space)
 java.lang.OutOfMemoryError: Java heap space
   at java.nio.HeapByteBuffer.init(HeapByteBuffer.java:57)
   at java.nio.ByteBuffer.allocate(ByteBuffer.java:331)
   at 
 org.jboss.netty.buffer.CompositeChannelBuffer.toByteBuffer(CompositeChannelBuffer.java:649)
   at 
 org.jboss.netty.buffer.AbstractChannelBuffer.toByteBuffer(AbstractChannelBuffer.java:530)
   at 
 org.jboss.netty.channel.socket.nio.SocketSendBufferPool.acquire(SocketSendBufferPool.java:77)
   at 
 org.jboss.netty.channel.socket.nio.SocketSendBufferPool.acquire(SocketSendBufferPool.java:46)
   at 
 org.jboss.netty.channel.socket.nio.AbstractNioWorker.write0(AbstractNioWorker.java:194)
   at 
 org.jboss.netty.channel.socket.nio.AbstractNioWorker.writeFromTaskLoop(AbstractNioWorker.java:152)
   at 
 org.jboss.netty.channel.socket.nio.AbstractNioChannel$WriteTask.run(AbstractNioChannel.java:335)
   at 
 org.jboss.netty.channel.socket.nio.AbstractNioSelector.processTaskQueue(AbstractNioSelector.java:366)
   at 
 org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:290)
   at 
 org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90)
   at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)
 15/02/16 10:53:49 WARN channel.DefaultChannelPipeline: An exception was 
 thrown by a user handler while handling an exception event ([id: 0x2d0c1db1, 
 /10.169.226.254:55790 = /10.164.164.228:49445] EXCEPTION: 
 java.lang.OutOfMemoryError: Java heap space)
 java.lang.OutOfMemoryError: Java heap space
   at java.nio.HeapByteBuffer.init(HeapByteBuffer.java:57)
   at java.nio.ByteBuffer.allocate(ByteBuffer.java:331)
   at 
 org.jboss.netty.buffer.CompositeChannelBuffer.toByteBuffer(CompositeChannelBuffer.java:649)
   at 
 org.jboss.netty.buffer.AbstractChannelBuffer.toByteBuffer(AbstractChannelBuffer.java:530)
   at 
 org.jboss.netty.channel.socket.nio.SocketSendBufferPool.acquire(SocketSendBufferPool.java:77)
   at 
 org.jboss.netty.channel.socket.nio.SocketSendBufferPool.acquire(SocketSendBufferPool.java:46)
   at 
 org.jboss.netty.channel.socket.nio.AbstractNioWorker.write0(AbstractNioWorker.java:194)
   at 
 org.jboss.netty.channel.socket.nio.AbstractNioWorker.writeFromTaskLoop(AbstractNioWorker.java:152)
   at 
 org.jboss.netty.channel.socket.nio.AbstractNioChannel$WriteTask.run(AbstractNioChannel.java:335)
   at 
 org.jboss.netty.channel.socket.nio.AbstractNioSelector.processTaskQueue(AbstractNioSelector.java:366)
   at 
 org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:290)
   at 
 org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90)
   at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:745)
 15/02/16 10:53:50 WARN channel.DefaultChannelPipeline: An exception was 
 thrown by a user handler while handling an exception event ([id: 0xd4211985, 
 /10.181.125.52:60959 = /10.164.164.228:49445] EXCEPTION: 
 java.lang.OutOfMemoryError: Java heap space)
 java.lang.OutOfMemoryError: Java heap space
   at java.nio.HeapByteBuffer.init(HeapByteBuffer.java:57)
   at java.nio.ByteBuffer.allocate(ByteBuffer.java:331)
   at 
 org.jboss.netty.buffer.CompositeChannelBuffer.toByteBuffer(CompositeChannelBuffer.java:649)
   at 
 org.jboss.netty.buffer.AbstractChannelBuffer.toByteBuffer(AbstractChannelBuffer.java:530)
   at