Re: Spark 1.5.2 memory error

2016-02-03 Thread Nirav Patel
Hi Jerry,

I agree code + framework goes hand in hand. I am totally in for tuning the
hack out of system as well.  Spark offers tremendous flexibility in that
regards. We have real time application that serves data in ms backed by
spark rdds. It took lot of testing and tuning effort before we could reach
there. We love it here. But sometime when you can't find a solution for a
long time even with help of experts it gets to you. I am still working
towards solution for my job as well. I think I am on to something with
reducing number of cores per executor.

Regarding adapting code to 'bad' framework requires lot of rework and
framework should mention its limitation in first place via documentations.
That can help developer to make decision about framework it self whether
it's a right one for a job at hand or not.

Thanks

On Wed, Feb 3, 2016 at 2:39 PM, Ted Yu  wrote:

> There is also (deprecated) spark.storage.unrollFraction to consider
>
> On Wed, Feb 3, 2016 at 2:21 PM, Nirav Patel  wrote:
>
>> What I meant is executor.cores and task.cpus can dictate how many
>> parallel tasks will run on given executor.
>>
>> Let's take this example setting.
>>
>> spark.executor.memory = 16GB
>> spark.executor.cores = 6
>> spark.task.cpus = 1
>>
>> SO here I think spark will assign 6 tasks to One executor each using 1
>> core and 16/6=2.6GB.
>>
>> ANd out of those 2.6 gb some goes to shuffle and some goes to storage.
>>
>> spark.shuffle.memoryFraction = 0.4
>> spark.storage.memoryFraction = 0.6
>>
>> Again my speculation from some past articles I read.
>>
>>
>>
>>
>>
>>
>>
>>
>> On Wed, Feb 3, 2016 at 2:09 PM, Rishabh Wadhawan 
>> wrote:
>>
>>> As of what I know, Cores won’t give you more portion of executor memory,
>>> because its just cpu cores that you are using per executor. Reducing the
>>> number of cores however would result in lack of parallel processing power.
>>> The executor memory that we specify with spark.executor.memory would be the
>>> max memory that your executor might have. But the memory that you get is
>>> less then that. I don’t clearly remember but i think its either memory/2 or
>>> memory/4. But I may be wrong as I have been out of spark for months.
>>>
>>> On Feb 3, 2016, at 2:58 PM, Nirav Patel  wrote:
>>>
>>> About OP.
>>>
>>> How many cores you assign per executor? May be reducing that number will
>>> give more portion of executor memory to each task being executed on that
>>> executor. Others please comment if that make sense.
>>>
>>>
>>>
>>> On Wed, Feb 3, 2016 at 1:52 PM, Nirav Patel 
>>> wrote:
>>>
>>>> I know it;s a strong word but when I have a case open for that with
>>>> MapR and Databricks for a month and their only solution to change to
>>>> DataFrame it frustrate you. I know DataFrame/Sql catalyst has internal
>>>> optimizations but it requires lot of code change. I think there's something
>>>> fundamentally wrong (or different from hadoop) in framework that is not
>>>> allowing it to do robust memory management. I know my job is memory hogger,
>>>> it does a groupBy and perform combinatorics in reducer side; uses
>>>> additional datastructures at task levels. May be spark is running multiple
>>>> heavier tasks on same executor and collectively they cause OOM. But
>>>> suggesting DataFrame is NOT a Solution for me (and most others who already
>>>> invested time with RDD and loves the type safety it provides). Not even
>>>> sure if changing to DataFrame will for sure solve the issue.
>>>>
>>>> On Wed, Feb 3, 2016 at 1:33 PM, Mohammed Guller >>> > wrote:
>>>>
>>>>> Nirav,
>>>>>
>>>>> Sorry to hear about your experience with Spark; however, sucks is a
>>>>> very strong word. Many organizations are processing a lot more than 150GB
>>>>> of data  with Spark.
>>>>>
>>>>>
>>>>>
>>>>> Mohammed
>>>>>
>>>>> Author: Big Data Analytics with Spark
>>>>> <http://www.amazon.com/Big-Data-Analytics-Spark-Practitioners/dp/1484209656/>
>>>>>
>>>>>
>>>>>
>>>>> *From:* Nirav Patel [mailto:npa...@xactlycorp.com]
>>>>> *Sent:* Wednesday, February 3, 2016 11:31 AM
>>>>> *To:* Stefan Panayotov
>>>>> *Cc:* Jim Green; Ted Yu; Jakob Oders

Re: Spark 1.5.2 memory error

2016-02-03 Thread Ted Yu
There is also (deprecated) spark.storage.unrollFraction to consider

On Wed, Feb 3, 2016 at 2:21 PM, Nirav Patel  wrote:

> What I meant is executor.cores and task.cpus can dictate how many parallel
> tasks will run on given executor.
>
> Let's take this example setting.
>
> spark.executor.memory = 16GB
> spark.executor.cores = 6
> spark.task.cpus = 1
>
> SO here I think spark will assign 6 tasks to One executor each using 1
> core and 16/6=2.6GB.
>
> ANd out of those 2.6 gb some goes to shuffle and some goes to storage.
>
> spark.shuffle.memoryFraction = 0.4
> spark.storage.memoryFraction = 0.6
>
> Again my speculation from some past articles I read.
>
>
>
>
>
>
>
>
> On Wed, Feb 3, 2016 at 2:09 PM, Rishabh Wadhawan 
> wrote:
>
>> As of what I know, Cores won’t give you more portion of executor memory,
>> because its just cpu cores that you are using per executor. Reducing the
>> number of cores however would result in lack of parallel processing power.
>> The executor memory that we specify with spark.executor.memory would be the
>> max memory that your executor might have. But the memory that you get is
>> less then that. I don’t clearly remember but i think its either memory/2 or
>> memory/4. But I may be wrong as I have been out of spark for months.
>>
>> On Feb 3, 2016, at 2:58 PM, Nirav Patel  wrote:
>>
>> About OP.
>>
>> How many cores you assign per executor? May be reducing that number will
>> give more portion of executor memory to each task being executed on that
>> executor. Others please comment if that make sense.
>>
>>
>>
>> On Wed, Feb 3, 2016 at 1:52 PM, Nirav Patel 
>> wrote:
>>
>>> I know it;s a strong word but when I have a case open for that with MapR
>>> and Databricks for a month and their only solution to change to DataFrame
>>> it frustrate you. I know DataFrame/Sql catalyst has internal optimizations
>>> but it requires lot of code change. I think there's something fundamentally
>>> wrong (or different from hadoop) in framework that is not allowing it to do
>>> robust memory management. I know my job is memory hogger, it does a groupBy
>>> and perform combinatorics in reducer side; uses additional datastructures
>>> at task levels. May be spark is running multiple heavier tasks on same
>>> executor and collectively they cause OOM. But suggesting DataFrame is NOT a
>>> Solution for me (and most others who already invested time with RDD and
>>> loves the type safety it provides). Not even sure if changing to DataFrame
>>> will for sure solve the issue.
>>>
>>> On Wed, Feb 3, 2016 at 1:33 PM, Mohammed Guller 
>>> wrote:
>>>
>>>> Nirav,
>>>>
>>>> Sorry to hear about your experience with Spark; however, sucks is a
>>>> very strong word. Many organizations are processing a lot more than 150GB
>>>> of data  with Spark.
>>>>
>>>>
>>>>
>>>> Mohammed
>>>>
>>>> Author: Big Data Analytics with Spark
>>>> <http://www.amazon.com/Big-Data-Analytics-Spark-Practitioners/dp/1484209656/>
>>>>
>>>>
>>>>
>>>> *From:* Nirav Patel [mailto:npa...@xactlycorp.com]
>>>> *Sent:* Wednesday, February 3, 2016 11:31 AM
>>>> *To:* Stefan Panayotov
>>>> *Cc:* Jim Green; Ted Yu; Jakob Odersky; user@spark.apache.org
>>>>
>>>> *Subject:* Re: Spark 1.5.2 memory error
>>>>
>>>>
>>>>
>>>> Hi Stefan,
>>>>
>>>>
>>>>
>>>> Welcome to the OOM - heap space club. I have been struggling with
>>>> similar errors (OOM and yarn executor being killed) and failing job or
>>>> sending it in retry loops. I bet the same job will run perfectly fine with
>>>> less resource on Hadoop MapReduce program. I have tested it for my program
>>>> and it does work.
>>>>
>>>>
>>>>
>>>> Bottomline from my experience. Spark sucks with memory management when
>>>> job is processing large (not huge) amount of data. It's failing for me with
>>>> 16gb executors, 10 executors, 6 threads each. And data its processing is
>>>> only 150GB! It's 1 billion rows for me. Same job works perfectly fine with
>>>> 1 million rows.
>>>>
>>>>
>>>>
>>>> Hope that saves you some trouble.
>>>>
>>>>
>>>>
>>>> Ni

Re: Spark 1.5.2 memory error

2016-02-03 Thread Jerry Lam
Hi guys,

I was processing 300GB data with lot of joins today. I have a combination
of RDD->Dataframe->RDD due to legacy code. I have memory issues at the
beginning. After fine-tuning those configurations that many already
suggested above, it works with 0 task failed. I think it is fair to say any
memory intensive applications would face similar memory issue. It is not
very fair to say it sucks just because it has memory issues. The memory
issue comes in many forms such as 1. bad framework 2. bad code. 3. bad
framework and bad code. I usually blame bad code first, then bad framework.
If it is truly it fails because of the bad framework (mesos+spark+fine
grain mode = disaster), then make the code changes to adapt to the bad
framework.

I never see code that can magically run with 100% completion when data is
close to terabyte without some serious engineering efforts. A framework can
only help a bit but you are still responsible for making conscious
decisions on how much memory and data you are working with. For instance, a
k-v pair with v having 100GB and you allocate 1GB per executor, this is
going to blow up no matter how many times you execute it.

The memory/core is what I fine tune most. Making sure the task/core has
enough memory to execute to completion. Some times you really don't know
how much data you keep in memory until you profile your application.
(calculate some statistics help).

Best Regards,

Jerry



On Wed, Feb 3, 2016 at 4:58 PM, Nirav Patel  wrote:

> About OP.
>
> How many cores you assign per executor? May be reducing that number will
> give more portion of executor memory to each task being executed on that
> executor. Others please comment if that make sense.
>
>
>
> On Wed, Feb 3, 2016 at 1:52 PM, Nirav Patel  wrote:
>
>> I know it;s a strong word but when I have a case open for that with MapR
>> and Databricks for a month and their only solution to change to DataFrame
>> it frustrate you. I know DataFrame/Sql catalyst has internal optimizations
>> but it requires lot of code change. I think there's something fundamentally
>> wrong (or different from hadoop) in framework that is not allowing it to do
>> robust memory management. I know my job is memory hogger, it does a groupBy
>> and perform combinatorics in reducer side; uses additional datastructures
>> at task levels. May be spark is running multiple heavier tasks on same
>> executor and collectively they cause OOM. But suggesting DataFrame is NOT a
>> Solution for me (and most others who already invested time with RDD and
>> loves the type safety it provides). Not even sure if changing to DataFrame
>> will for sure solve the issue.
>>
>> On Wed, Feb 3, 2016 at 1:33 PM, Mohammed Guller 
>> wrote:
>>
>>> Nirav,
>>>
>>> Sorry to hear about your experience with Spark; however, sucks is a very
>>> strong word. Many organizations are processing a lot more than 150GB of
>>> data  with Spark.
>>>
>>>
>>>
>>> Mohammed
>>>
>>> Author: Big Data Analytics with Spark
>>> <http://www.amazon.com/Big-Data-Analytics-Spark-Practitioners/dp/1484209656/>
>>>
>>>
>>>
>>> *From:* Nirav Patel [mailto:npa...@xactlycorp.com]
>>> *Sent:* Wednesday, February 3, 2016 11:31 AM
>>> *To:* Stefan Panayotov
>>> *Cc:* Jim Green; Ted Yu; Jakob Odersky; user@spark.apache.org
>>>
>>> *Subject:* Re: Spark 1.5.2 memory error
>>>
>>>
>>>
>>> Hi Stefan,
>>>
>>>
>>>
>>> Welcome to the OOM - heap space club. I have been struggling with
>>> similar errors (OOM and yarn executor being killed) and failing job or
>>> sending it in retry loops. I bet the same job will run perfectly fine with
>>> less resource on Hadoop MapReduce program. I have tested it for my program
>>> and it does work.
>>>
>>>
>>>
>>> Bottomline from my experience. Spark sucks with memory management when
>>> job is processing large (not huge) amount of data. It's failing for me with
>>> 16gb executors, 10 executors, 6 threads each. And data its processing is
>>> only 150GB! It's 1 billion rows for me. Same job works perfectly fine with
>>> 1 million rows.
>>>
>>>
>>>
>>> Hope that saves you some trouble.
>>>
>>>
>>>
>>> Nirav
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Wed, Feb 3, 2016 at 11:00 AM, Stefan Panayotov 
>>> wrote:
>>>
>>> I drastically increased the memory:
>>>
>>>

Re: Spark 1.5.2 memory error

2016-02-03 Thread Nirav Patel
What I meant is executor.cores and task.cpus can dictate how many parallel
tasks will run on given executor.

Let's take this example setting.

spark.executor.memory = 16GB
spark.executor.cores = 6
spark.task.cpus = 1

SO here I think spark will assign 6 tasks to One executor each using 1 core
and 16/6=2.6GB.

ANd out of those 2.6 gb some goes to shuffle and some goes to storage.

spark.shuffle.memoryFraction = 0.4
spark.storage.memoryFraction = 0.6

Again my speculation from some past articles I read.








On Wed, Feb 3, 2016 at 2:09 PM, Rishabh Wadhawan 
wrote:

> As of what I know, Cores won’t give you more portion of executor memory,
> because its just cpu cores that you are using per executor. Reducing the
> number of cores however would result in lack of parallel processing power.
> The executor memory that we specify with spark.executor.memory would be the
> max memory that your executor might have. But the memory that you get is
> less then that. I don’t clearly remember but i think its either memory/2 or
> memory/4. But I may be wrong as I have been out of spark for months.
>
> On Feb 3, 2016, at 2:58 PM, Nirav Patel  wrote:
>
> About OP.
>
> How many cores you assign per executor? May be reducing that number will
> give more portion of executor memory to each task being executed on that
> executor. Others please comment if that make sense.
>
>
>
> On Wed, Feb 3, 2016 at 1:52 PM, Nirav Patel  wrote:
>
>> I know it;s a strong word but when I have a case open for that with MapR
>> and Databricks for a month and their only solution to change to DataFrame
>> it frustrate you. I know DataFrame/Sql catalyst has internal optimizations
>> but it requires lot of code change. I think there's something fundamentally
>> wrong (or different from hadoop) in framework that is not allowing it to do
>> robust memory management. I know my job is memory hogger, it does a groupBy
>> and perform combinatorics in reducer side; uses additional datastructures
>> at task levels. May be spark is running multiple heavier tasks on same
>> executor and collectively they cause OOM. But suggesting DataFrame is NOT a
>> Solution for me (and most others who already invested time with RDD and
>> loves the type safety it provides). Not even sure if changing to DataFrame
>> will for sure solve the issue.
>>
>> On Wed, Feb 3, 2016 at 1:33 PM, Mohammed Guller 
>> wrote:
>>
>>> Nirav,
>>>
>>> Sorry to hear about your experience with Spark; however, sucks is a very
>>> strong word. Many organizations are processing a lot more than 150GB of
>>> data  with Spark.
>>>
>>>
>>>
>>> Mohammed
>>>
>>> Author: Big Data Analytics with Spark
>>> <http://www.amazon.com/Big-Data-Analytics-Spark-Practitioners/dp/1484209656/>
>>>
>>>
>>>
>>> *From:* Nirav Patel [mailto:npa...@xactlycorp.com]
>>> *Sent:* Wednesday, February 3, 2016 11:31 AM
>>> *To:* Stefan Panayotov
>>> *Cc:* Jim Green; Ted Yu; Jakob Odersky; user@spark.apache.org
>>>
>>> *Subject:* Re: Spark 1.5.2 memory error
>>>
>>>
>>>
>>> Hi Stefan,
>>>
>>>
>>>
>>> Welcome to the OOM - heap space club. I have been struggling with
>>> similar errors (OOM and yarn executor being killed) and failing job or
>>> sending it in retry loops. I bet the same job will run perfectly fine with
>>> less resource on Hadoop MapReduce program. I have tested it for my program
>>> and it does work.
>>>
>>>
>>>
>>> Bottomline from my experience. Spark sucks with memory management when
>>> job is processing large (not huge) amount of data. It's failing for me with
>>> 16gb executors, 10 executors, 6 threads each. And data its processing is
>>> only 150GB! It's 1 billion rows for me. Same job works perfectly fine with
>>> 1 million rows.
>>>
>>>
>>>
>>> Hope that saves you some trouble.
>>>
>>>
>>>
>>> Nirav
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> On Wed, Feb 3, 2016 at 11:00 AM, Stefan Panayotov 
>>> wrote:
>>>
>>> I drastically increased the memory:
>>>
>>> spark.executor.memory = 50g
>>> spark.driver.memory = 8g
>>> spark.driver.maxResultSize = 8g
>>> spark.yarn.executor.memoryOverhead = 768
>>>
>>> I still see executors killed, but this time the memory does not seem to
>>> be the issue.
>>> The error on the Jupyter notebook is:
&

Re: Spark 1.5.2 memory error

2016-02-03 Thread Rishabh Wadhawan
As of what I know, Cores won’t give you more portion of executor memory, 
because its just cpu cores that you are using per executor. Reducing the number 
of cores however would result in lack of parallel processing power. The 
executor memory that we specify with spark.executor.memory would be the max 
memory that your executor might have. But the memory that you get is less then 
that. I don’t clearly remember but i think its either memory/2 or memory/4. But 
I may be wrong as I have been out of spark for months. 
> On Feb 3, 2016, at 2:58 PM, Nirav Patel  wrote:
> 
> About OP.
> 
> How many cores you assign per executor? May be reducing that number will give 
> more portion of executor memory to each task being executed on that executor. 
> Others please comment if that make sense.
> 
> 
> 
> On Wed, Feb 3, 2016 at 1:52 PM, Nirav Patel  <mailto:npa...@xactlycorp.com>> wrote:
> I know it;s a strong word but when I have a case open for that with MapR and 
> Databricks for a month and their only solution to change to DataFrame it 
> frustrate you. I know DataFrame/Sql catalyst has internal optimizations but 
> it requires lot of code change. I think there's something fundamentally wrong 
> (or different from hadoop) in framework that is not allowing it to do robust 
> memory management. I know my job is memory hogger, it does a groupBy and 
> perform combinatorics in reducer side; uses additional datastructures at task 
> levels. May be spark is running multiple heavier tasks on same executor and 
> collectively they cause OOM. But suggesting DataFrame is NOT a Solution for 
> me (and most others who already invested time with RDD and loves the type 
> safety it provides). Not even sure if changing to DataFrame will for sure 
> solve the issue. 
> 
> On Wed, Feb 3, 2016 at 1:33 PM, Mohammed Guller  <mailto:moham...@glassbeam.com>> wrote:
> Nirav,
> 
> Sorry to hear about your experience with Spark; however, sucks is a very 
> strong word. Many organizations are processing a lot more than 150GB of data  
> with Spark.
> 
>  
> 
> Mohammed
> 
> Author: Big Data Analytics with Spark 
> <http://www.amazon.com/Big-Data-Analytics-Spark-Practitioners/dp/1484209656/>
>  
> 
> From: Nirav Patel [mailto:npa...@xactlycorp.com 
> <mailto:npa...@xactlycorp.com>] 
> Sent: Wednesday, February 3, 2016 11:31 AM
> To: Stefan Panayotov
> Cc: Jim Green; Ted Yu; Jakob Odersky; user@spark.apache.org 
> <mailto:user@spark.apache.org>
> 
> Subject: Re: Spark 1.5.2 memory error
> 
>  
> 
> Hi Stefan,
> 
>  
> 
> Welcome to the OOM - heap space club. I have been struggling with similar 
> errors (OOM and yarn executor being killed) and failing job or sending it in 
> retry loops. I bet the same job will run perfectly fine with less resource on 
> Hadoop MapReduce program. I have tested it for my program and it does work.
> 
>  
> 
> Bottomline from my experience. Spark sucks with memory management when job is 
> processing large (not huge) amount of data. It's failing for me with 16gb 
> executors, 10 executors, 6 threads each. And data its processing is only 
> 150GB! It's 1 billion rows for me. Same job works perfectly fine with 1 
> million rows. 
> 
>  
> 
> Hope that saves you some trouble.
> 
>  
> 
> Nirav
> 
>  
> 
>  
> 
>  
> 
> On Wed, Feb 3, 2016 at 11:00 AM, Stefan Panayotov  <mailto:spanayo...@msn.com>> wrote:
> 
> I drastically increased the memory:
>  
> spark.executor.memory = 50g
> spark.driver.memory = 8g
> spark.driver.maxResultSize = 8g
> spark.yarn.executor.memoryOverhead = 768
>  
> I still see executors killed, but this time the memory does not seem to be 
> the issue.
> The error on the Jupyter notebook is:
>  
> 
> 
> Py4JJavaError: An error occurred while calling 
> z:org.apache.spark.api.python.PythonRDD.collectAndServe.
> : org.apache.spark.SparkException: Job aborted due to stage failure: 
> Exception while getting task result: java.io.IOException: Failed to connect 
> to /10.0.0.9:48755 <http://10.0.0.9:48755/>
>  
> From nodemanagers log corresponding to worker 10.0.0.9 <http://10.0.0.9/>:
>  
> 
> 2016-02-03 17:31:44,917 INFO  yarn.YarnShuffleService 
> (YarnShuffleService.java:initializeApplication(129)) - Initializing 
> application application_1454509557526_0014
> 
>  
> 
> 2016-02-03 17:31:44,918 INFO  container.ContainerImpl 
> (ContainerImpl.java:handle(1131)) - Container 
> container_1454509557526_0014_01_93 transitioned from LOCALIZING to 
> LOCALIZED
> 
>  
> 
> 2016-02-03 17:31:44,947 INFO  container.ContainerImpl 
> (ContainerImpl.java:handle(1131)) - Contain

Re: Spark 1.5.2 memory error

2016-02-03 Thread Nirav Patel
About OP.

How many cores you assign per executor? May be reducing that number will
give more portion of executor memory to each task being executed on that
executor. Others please comment if that make sense.



On Wed, Feb 3, 2016 at 1:52 PM, Nirav Patel  wrote:

> I know it;s a strong word but when I have a case open for that with MapR
> and Databricks for a month and their only solution to change to DataFrame
> it frustrate you. I know DataFrame/Sql catalyst has internal optimizations
> but it requires lot of code change. I think there's something fundamentally
> wrong (or different from hadoop) in framework that is not allowing it to do
> robust memory management. I know my job is memory hogger, it does a groupBy
> and perform combinatorics in reducer side; uses additional datastructures
> at task levels. May be spark is running multiple heavier tasks on same
> executor and collectively they cause OOM. But suggesting DataFrame is NOT a
> Solution for me (and most others who already invested time with RDD and
> loves the type safety it provides). Not even sure if changing to DataFrame
> will for sure solve the issue.
>
> On Wed, Feb 3, 2016 at 1:33 PM, Mohammed Guller 
> wrote:
>
>> Nirav,
>>
>> Sorry to hear about your experience with Spark; however, sucks is a very
>> strong word. Many organizations are processing a lot more than 150GB of
>> data  with Spark.
>>
>>
>>
>> Mohammed
>>
>> Author: Big Data Analytics with Spark
>> <http://www.amazon.com/Big-Data-Analytics-Spark-Practitioners/dp/1484209656/>
>>
>>
>>
>> *From:* Nirav Patel [mailto:npa...@xactlycorp.com]
>> *Sent:* Wednesday, February 3, 2016 11:31 AM
>> *To:* Stefan Panayotov
>> *Cc:* Jim Green; Ted Yu; Jakob Odersky; user@spark.apache.org
>>
>> *Subject:* Re: Spark 1.5.2 memory error
>>
>>
>>
>> Hi Stefan,
>>
>>
>>
>> Welcome to the OOM - heap space club. I have been struggling with similar
>> errors (OOM and yarn executor being killed) and failing job or sending it
>> in retry loops. I bet the same job will run perfectly fine with less
>> resource on Hadoop MapReduce program. I have tested it for my program and
>> it does work.
>>
>>
>>
>> Bottomline from my experience. Spark sucks with memory management when
>> job is processing large (not huge) amount of data. It's failing for me with
>> 16gb executors, 10 executors, 6 threads each. And data its processing is
>> only 150GB! It's 1 billion rows for me. Same job works perfectly fine with
>> 1 million rows.
>>
>>
>>
>> Hope that saves you some trouble.
>>
>>
>>
>> Nirav
>>
>>
>>
>>
>>
>>
>>
>> On Wed, Feb 3, 2016 at 11:00 AM, Stefan Panayotov 
>> wrote:
>>
>> I drastically increased the memory:
>>
>> spark.executor.memory = 50g
>> spark.driver.memory = 8g
>> spark.driver.maxResultSize = 8g
>> spark.yarn.executor.memoryOverhead = 768
>>
>> I still see executors killed, but this time the memory does not seem to
>> be the issue.
>> The error on the Jupyter notebook is:
>>
>>
>> Py4JJavaError: An error occurred while calling 
>> z:org.apache.spark.api.python.PythonRDD.collectAndServe.
>>
>> : org.apache.spark.SparkException: Job aborted due to stage failure: 
>> Exception while getting task result: java.io.IOException: Failed to connect 
>> to /10.0.0.9:48755
>>
>>
>> From nodemanagers log corresponding to worker 10.0.0.9:
>>
>>
>> 2016-02-03 17:31:44,917 INFO  yarn.YarnShuffleService
>> (YarnShuffleService.java:initializeApplication(129)) - Initializing
>> application application_1454509557526_0014
>>
>>
>>
>> 2016-02-03 17:31:44,918 INFO  container.ContainerImpl
>> (ContainerImpl.java:handle(1131)) - Container
>> container_1454509557526_0014_01_93 transitioned from LOCALIZING to
>> LOCALIZED
>>
>>
>>
>> 2016-02-03 17:31:44,947 INFO  container.ContainerImpl
>> (ContainerImpl.java:handle(1131)) - Container
>> container_1454509557526_0014_01_93 transitioned from LOCALIZED to
>> RUNNING
>>
>>
>>
>> 2016-02-03 17:31:44,951 INFO  nodemanager.DefaultContainerExecutor
>> (DefaultContainerExecutor.java:buildCommandExecutor(267)) -
>> launchContainer: [bash,
>> /mnt/resource/hadoop/yarn/local/usercache/root/appcache/application_1454509557526_0014/container_1454509557526_0014_01_93/default_container_executor.sh]
>>
>>
>>
>> 2016-02-03 17:31:45,686 INFO  monitor.ContainersMo

Re: Spark 1.5.2 memory error

2016-02-03 Thread Nirav Patel
I know it;s a strong word but when I have a case open for that with MapR
and Databricks for a month and their only solution to change to DataFrame
it frustrate you. I know DataFrame/Sql catalyst has internal optimizations
but it requires lot of code change. I think there's something fundamentally
wrong (or different from hadoop) in framework that is not allowing it to do
robust memory management. I know my job is memory hogger, it does a groupBy
and perform combinatorics in reducer side; uses additional datastructures
at task levels. May be spark is running multiple heavier tasks on same
executor and collectively they cause OOM. But suggesting DataFrame is NOT a
Solution for me (and most others who already invested time with RDD and
loves the type safety it provides). Not even sure if changing to DataFrame
will for sure solve the issue.

On Wed, Feb 3, 2016 at 1:33 PM, Mohammed Guller 
wrote:

> Nirav,
>
> Sorry to hear about your experience with Spark; however, sucks is a very
> strong word. Many organizations are processing a lot more than 150GB of
> data  with Spark.
>
>
>
> Mohammed
>
> Author: Big Data Analytics with Spark
> <http://www.amazon.com/Big-Data-Analytics-Spark-Practitioners/dp/1484209656/>
>
>
>
> *From:* Nirav Patel [mailto:npa...@xactlycorp.com]
> *Sent:* Wednesday, February 3, 2016 11:31 AM
> *To:* Stefan Panayotov
> *Cc:* Jim Green; Ted Yu; Jakob Odersky; user@spark.apache.org
>
> *Subject:* Re: Spark 1.5.2 memory error
>
>
>
> Hi Stefan,
>
>
>
> Welcome to the OOM - heap space club. I have been struggling with similar
> errors (OOM and yarn executor being killed) and failing job or sending it
> in retry loops. I bet the same job will run perfectly fine with less
> resource on Hadoop MapReduce program. I have tested it for my program and
> it does work.
>
>
>
> Bottomline from my experience. Spark sucks with memory management when job
> is processing large (not huge) amount of data. It's failing for me with
> 16gb executors, 10 executors, 6 threads each. And data its processing is
> only 150GB! It's 1 billion rows for me. Same job works perfectly fine with
> 1 million rows.
>
>
>
> Hope that saves you some trouble.
>
>
>
> Nirav
>
>
>
>
>
>
>
> On Wed, Feb 3, 2016 at 11:00 AM, Stefan Panayotov 
> wrote:
>
> I drastically increased the memory:
>
> spark.executor.memory = 50g
> spark.driver.memory = 8g
> spark.driver.maxResultSize = 8g
> spark.yarn.executor.memoryOverhead = 768
>
> I still see executors killed, but this time the memory does not seem to be
> the issue.
> The error on the Jupyter notebook is:
>
>
> Py4JJavaError: An error occurred while calling 
> z:org.apache.spark.api.python.PythonRDD.collectAndServe.
>
> : org.apache.spark.SparkException: Job aborted due to stage failure: 
> Exception while getting task result: java.io.IOException: Failed to connect 
> to /10.0.0.9:48755
>
>
> From nodemanagers log corresponding to worker 10.0.0.9:
>
>
> 2016-02-03 17:31:44,917 INFO  yarn.YarnShuffleService
> (YarnShuffleService.java:initializeApplication(129)) - Initializing
> application application_1454509557526_0014
>
>
>
> 2016-02-03 17:31:44,918 INFO  container.ContainerImpl
> (ContainerImpl.java:handle(1131)) - Container
> container_1454509557526_0014_01_93 transitioned from LOCALIZING to
> LOCALIZED
>
>
>
> 2016-02-03 17:31:44,947 INFO  container.ContainerImpl
> (ContainerImpl.java:handle(1131)) - Container
> container_1454509557526_0014_01_93 transitioned from LOCALIZED to
> RUNNING
>
>
>
> 2016-02-03 17:31:44,951 INFO  nodemanager.DefaultContainerExecutor
> (DefaultContainerExecutor.java:buildCommandExecutor(267)) -
> launchContainer: [bash,
> /mnt/resource/hadoop/yarn/local/usercache/root/appcache/application_1454509557526_0014/container_1454509557526_0014_01_93/default_container_executor.sh]
>
>
>
> 2016-02-03 17:31:45,686 INFO  monitor.ContainersMonitorImpl
> (ContainersMonitorImpl.java:run(371)) - Starting resource-monitoring for
> container_1454509557526_0014_01_93
>
>
>
> 2016-02-03 17:31:45,686 INFO  monitor.ContainersMonitorImpl
> (ContainersMonitorImpl.java:run(385)) - Stopping resource-monitoring for
> container_1454509557526_0014_01_11
>
>
>
>
>
>
>
> Then I can see the memory usage increasing from 230.6 MB to 12.6 GB, which
> is far below 50g, and the suddenly getting killed!?!
>
>
>
>
>
>
>
> 2016-02-03 17:33:17,350 INFO  monitor.ContainersMonitorImpl
> (ContainersMonitorImpl.java:run(458)) - Memory usage of ProcessTree 30962
> for container-id container_1454509557526_0014_01_93: 12.6 GB of 51 GB
> physical memor

RE: Spark 1.5.2 memory error

2016-02-03 Thread Mohammed Guller
Nirav,
Sorry to hear about your experience with Spark; however, sucks is a very strong 
word. Many organizations are processing a lot more than 150GB of data  with 
Spark.

Mohammed
Author: Big Data Analytics with 
Spark<http://www.amazon.com/Big-Data-Analytics-Spark-Practitioners/dp/1484209656/>

From: Nirav Patel [mailto:npa...@xactlycorp.com]
Sent: Wednesday, February 3, 2016 11:31 AM
To: Stefan Panayotov
Cc: Jim Green; Ted Yu; Jakob Odersky; user@spark.apache.org
Subject: Re: Spark 1.5.2 memory error

Hi Stefan,

Welcome to the OOM - heap space club. I have been struggling with similar 
errors (OOM and yarn executor being killed) and failing job or sending it in 
retry loops. I bet the same job will run perfectly fine with less resource on 
Hadoop MapReduce program. I have tested it for my program and it does work.

Bottomline from my experience. Spark sucks with memory management when job is 
processing large (not huge) amount of data. It's failing for me with 16gb 
executors, 10 executors, 6 threads each. And data its processing is only 150GB! 
It's 1 billion rows for me. Same job works perfectly fine with 1 million rows.

Hope that saves you some trouble.

Nirav



On Wed, Feb 3, 2016 at 11:00 AM, Stefan Panayotov 
mailto:spanayo...@msn.com>> wrote:
I drastically increased the memory:

spark.executor.memory = 50g
spark.driver.memory = 8g
spark.driver.maxResultSize = 8g
spark.yarn.executor.memoryOverhead = 768

I still see executors killed, but this time the memory does not seem to be the 
issue.
The error on the Jupyter notebook is:



Py4JJavaError: An error occurred while calling 
z:org.apache.spark.api.python.PythonRDD.collectAndServe.

: org.apache.spark.SparkException: Job aborted due to stage failure: Exception 
while getting task result: java.io.IOException: Failed to connect to 
/10.0.0.9:48755<http://10.0.0.9:48755>

From nodemanagers log corresponding to worker 10.0.0.9<http://10.0.0.9>:

2016-02-03 17:31:44,917 INFO  yarn.YarnShuffleService 
(YarnShuffleService.java:initializeApplication(129)) - Initializing application 
application_1454509557526_0014

2016-02-03 17:31:44,918 INFO  container.ContainerImpl 
(ContainerImpl.java:handle(1131)) - Container 
container_1454509557526_0014_01_93 transitioned from LOCALIZING to LOCALIZED

2016-02-03 17:31:44,947 INFO  container.ContainerImpl 
(ContainerImpl.java:handle(1131)) - Container 
container_1454509557526_0014_01_93 transitioned from LOCALIZED to RUNNING

2016-02-03 17:31:44,951 INFO  nodemanager.DefaultContainerExecutor 
(DefaultContainerExecutor.java:buildCommandExecutor(267)) - launchContainer: 
[bash, 
/mnt/resource/hadoop/yarn/local/usercache/root/appcache/application_1454509557526_0014/container_1454509557526_0014_01_93/default_container_executor.sh]

2016-02-03 17:31:45,686 INFO  monitor.ContainersMonitorImpl 
(ContainersMonitorImpl.java:run(371)) - Starting resource-monitoring for 
container_1454509557526_0014_01_93

2016-02-03 17:31:45,686 INFO  monitor.ContainersMonitorImpl 
(ContainersMonitorImpl.java:run(385)) - Stopping resource-monitoring for 
container_1454509557526_0014_01_11



Then I can see the memory usage increasing from 230.6 MB to 12.6 GB, which is 
far below 50g, and the suddenly getting killed!?!



2016-02-03 17:33:17,350 INFO  monitor.ContainersMonitorImpl 
(ContainersMonitorImpl.java:run(458)) - Memory usage of ProcessTree 30962 for 
container-id container_1454509557526_0014_01_93: 12.6 GB of 51 GB physical 
memory used; 52.8 GB of 107.1 GB virtual memory used

2016-02-03 17:33:17,613 INFO  container.ContainerImpl 
(ContainerImpl.java:handle(1131)) - Container 
container_1454509557526_0014_01_93 transitioned from RUNNING to KILLING

2016-02-03 17:33:17,613 INFO  launcher.ContainerLaunch 
(ContainerLaunch.java:cleanupContainer(370)) - Cleaning up container 
container_1454509557526_0014_01_93

2016-02-03 17:33:17,629 WARN  nodemanager.DefaultContainerExecutor 
(DefaultContainerExecutor.java:launchContainer(223)) - Exit code from container 
container_1454509557526_0014_01_93 is : 143

2016-02-03 17:33:17,667 INFO  container.ContainerImpl 
(ContainerImpl.java:handle(1131)) - Container 
container_1454509557526_0014_01_93 transitioned from KILLING to 
CONTAINER_CLEANEDUP_AFTER_KILL

2016-02-03 17:33:17,669 INFO  nodemanager.NMAuditLogger 
(NMAuditLogger.java:logSuccess(89)) - USER=root   OPERATION=Container 
Finished - KilledTARGET=ContainerImpl RESULT=SUCCESS   
APPID=application_1454509557526_0014 
CONTAINERID=container_1454509557526_0014_01_93

2016-02-03 17:33:17,670 INFO  container.ContainerImpl 
(ContainerImpl.java:handle(1131)) - Container 
container_1454509557526_0014_01_93 transitioned from 
CONTAINER_CLEANEDUP_AFTER_KILL to DONE

2016-02-03 17:33:17,670 INFO  application.ApplicationImpl 
(ApplicationImpl.java:transition(347)) - Removing 
container_1454509557526_0014_01_93 from application 
application_14

Re: Spark 1.5.2 memory error

2016-02-03 Thread Rishabh Wadhawan
iner 
> Finished - KilledTARGET=ContainerImpl RESULT=SUCCESS   
> APPID=application_1454509557526_0014 
> CONTAINERID=container_1454509557526_0014_01_93
> 
> 2016-02-03 17:33:17,670 INFO  container.ContainerImpl 
> (ContainerImpl.java:handle(1131)) - Container 
> container_1454509557526_0014_01_93 transitioned from 
> CONTAINER_CLEANEDUP_AFTER_KILL to DONE
> 
> 2016-02-03 17:33:17,670 INFO  application.ApplicationImpl 
> (ApplicationImpl.java:transition(347)) - Removing 
> container_1454509557526_0014_01_93 from application 
> application_1454509557526_0014
> 
> 2016-02-03 17:33:17,671 INFO  logaggregation.AppLogAggregatorImpl 
> (AppLogAggregatorImpl.java:startContainerLogAggregation(546)) - Considering 
> container container_1454509557526_0014_01_93 for log-aggregation
> 
> 2016-02-03 17:33:17,671 INFO  containermanager.AuxServices 
> (AuxServices.java:handle(196)) - Got event CONTAINER_STOP for appId 
> application_1454509557526_0014
> 
> 2016-02-03 17:33:17,671 INFO  yarn.YarnShuffleService 
> (YarnShuffleService.java:stopContainer(161)) - Stopping container 
> container_1454509557526_0014_01_93
> 
> 2016-02-03 17:33:20,351 INFO  monitor.ContainersMonitorImpl 
> (ContainersMonitorImpl.java:run(385)) - Stopping resource-monitoring for 
> container_1454509557526_0014_01_93
> 
> 2016-02-03 17:33:20,383 INFO  monitor.ContainersMonitorImpl 
> (ContainersMonitorImpl.java:run(458)) - Memory usage of ProcessTree 28727 for 
> container-id container_1454509557526_0012_01_01: 319.8 MB of 1.5 GB 
> physical memory used; 1.7 GB of 3.1 GB virtual memory used
> 2016-02-03 17:33:22,627 INFO  nodemanager.NodeStatusUpdaterImpl 
> (NodeStatusUpdaterImpl.java:removeOrTrackCompletedContainersFromContext(529)) 
> - Removed completed containers from NM context: 
> [container_1454509557526_0014_01_93]
>  
> I'll appreciate any suggestions.
> 
> Thanks,
> 
> Stefan Panayotov, PhD 
> Home: 610-355-0919  
> Cell: 610-517-5586  
> email: spanayo...@msn.com <mailto:spanayo...@msn.com> 
> spanayo...@outlook.com <mailto:spanayo...@outlook.com> 
> spanayo...@comcast.net <mailto:spanayo...@comcast.net>
> 
>  
> Date: Tue, 2 Feb 2016 15:40:10 -0800
> Subject: Re: Spark 1.5.2 memory error
> From: openkbi...@gmail.com <mailto:openkbi...@gmail.com>
> To: spanayo...@msn.com <mailto:spanayo...@msn.com>
> CC: yuzhih...@gmail.com <mailto:yuzhih...@gmail.com>; ja...@odersky.com 
> <mailto:ja...@odersky.com>; user@spark.apache.org 
> <mailto:user@spark.apache.org>
> 
> 
> Look at part#3 in below blog:
> http://www.openkb.info/2015/06/resource-allocation-configurations-for.html
>  <http://www.openkb.info/2015/06/resource-allocation-configurations-for.html>
> 
> You may want to increase the executor memory, not just the 
> spark.yarn.executor.memoryOverhead.
> 
> On Tue, Feb 2, 2016 at 2:14 PM, Stefan Panayotov  <mailto:spanayo...@msn.com>> wrote:
> For the memoryOvethead I have the default of 10% of 16g, and Spark version is 
> 1.5.2.
> 
>  
> 
> Stefan Panayotov, PhD
> Sent from Outlook Mail for Windows 10 phone
> 
>  
> 
> 
> From: Ted Yu <mailto:yuzhih...@gmail.com>
> Sent: Tuesday, February 2, 2016 4:52 PM
> To: Jakob Odersky <mailto:ja...@odersky.com>
> Cc: Stefan Panayotov <mailto:spanayo...@msn.com>; user@spark.apache.org 
> <mailto:user@spark.apache.org>
> Subject: Re: Spark 1.5.2 memory error
> 
>  
> 
> What value do you use for spark.yarn.executor.memoryOverhead ?
> 
>  
> 
> Please see https://spark.apache.org/docs/latest/running-on-yarn.html 
> <https://spark.apache.org/docs/latest/running-on-yarn.html> for description 
> of the parameter.
> 
>  
> 
> Which Spark release are you using ?
> 
>  
> 
> Cheers
> 
>  
> 
> On Tue, Feb 2, 2016 at 1:38 PM, Jakob Odersky  <mailto:ja...@odersky.com>> wrote:
> 
> Can you share some code that produces the error? It is probably not
> due to spark but rather the way data is handled in the user code.
> Does your code call any reduceByKey actions? These are often a source
> for OOM errors.
> 
> 
> On Tue, Feb 2, 2016 at 1:22 PM, Stefan Panayotov  <mailto:spanayo...@msn.com>> wrote:
> > Hi Guys,
> >
> > I need help with Spark memory errors when executing ML pipelines.
> > The error that I see is:
> >
> >
> > 16/02/02 20:34:17 INFO Executor: Executor is trying to kill task 32.0 in
> > stage 32.0 (TID 3298)
> >
> >
> > 16/02/02 20:34:17 INFO Executor: Executor is trying to kill task 12.0 in
> > stage 32.0 (TID 3278)
> >
> >
&

Re: Spark 1.5.2 memory error

2016-02-03 Thread Nirav Patel
14_01_93 for log-aggregation
>
> 2016-02-03 17:33:17,671 INFO  containermanager.AuxServices
> (AuxServices.java:handle(196)) - Got event CONTAINER_STOP for appId
> application_1454509557526_0014
>
> 2016-02-03 17:33:17,671 INFO  yarn.YarnShuffleService
> (YarnShuffleService.java:stopContainer(161)) - Stopping container
> container_1454509557526_0014_01_93
>
> 2016-02-03 17:33:20,351 INFO  monitor.ContainersMonitorImpl
> (ContainersMonitorImpl.java:run(385)) - Stopping resource-monitoring for
> container_1454509557526_0014_01_93
>
> 2016-02-03 17:33:20,383 INFO  monitor.ContainersMonitorImpl
> (ContainersMonitorImpl.java:run(458)) - Memory usage of ProcessTree 28727
> for container-id container_1454509557526_0012_01_01: 319.8 MB of 1.5 GB
> physical memory used; 1.7 GB of 3.1 GB virtual memory used
> 2016-02-03 17:33:22,627 INFO  nodemanager.NodeStatusUpdaterImpl
> (NodeStatusUpdaterImpl.java:removeOrTrackCompletedContainersFromContext(529))
> - Removed completed containers from NM context:
> [container_1454509557526_0014_01_93]
>
> I'll appreciate any suggestions.
>
> Thanks,
>
>
> *Stefan Panayotov, PhD **Home*: 610-355-0919
> *Cell*: 610-517-5586
> *email*: spanayo...@msn.com
> spanayo...@outlook.com
> spanayo...@comcast.net
>
>
> --
> Date: Tue, 2 Feb 2016 15:40:10 -0800
> Subject: Re: Spark 1.5.2 memory error
> From: openkbi...@gmail.com
> To: spanayo...@msn.com
> CC: yuzhih...@gmail.com; ja...@odersky.com; user@spark.apache.org
>
>
> Look at part#3 in below blog:
> http://www.openkb.info/2015/06/resource-allocation-configurations-for.html
>
> You may want to increase the executor memory, not just the
> spark.yarn.executor.memoryOverhead.
>
> On Tue, Feb 2, 2016 at 2:14 PM, Stefan Panayotov 
> wrote:
>
> For the memoryOvethead I have the default of 10% of 16g, and Spark version
> is 1.5.2.
>
>
>
> Stefan Panayotov, PhD
> Sent from Outlook Mail for Windows 10 phone
>
>
>
>
> *From: *Ted Yu 
> *Sent: *Tuesday, February 2, 2016 4:52 PM
> *To: *Jakob Odersky 
> *Cc: *Stefan Panayotov ; user@spark.apache.org
> *Subject: *Re: Spark 1.5.2 memory error
>
>
>
> What value do you use for spark.yarn.executor.memoryOverhead ?
>
>
>
> Please see https://spark.apache.org/docs/latest/running-on-yarn.html for
> description of the parameter.
>
>
>
> Which Spark release are you using ?
>
>
>
> Cheers
>
>
>
> On Tue, Feb 2, 2016 at 1:38 PM, Jakob Odersky  wrote:
>
> Can you share some code that produces the error? It is probably not
> due to spark but rather the way data is handled in the user code.
> Does your code call any reduceByKey actions? These are often a source
> for OOM errors.
>
>
> On Tue, Feb 2, 2016 at 1:22 PM, Stefan Panayotov 
> wrote:
> > Hi Guys,
> >
> > I need help with Spark memory errors when executing ML pipelines.
> > The error that I see is:
> >
> >
> > 16/02/02 20:34:17 INFO Executor: Executor is trying to kill task 32.0 in
> > stage 32.0 (TID 3298)
> >
> >
> > 16/02/02 20:34:17 INFO Executor: Executor is trying to kill task 12.0 in
> > stage 32.0 (TID 3278)
> >
> >
> > 16/02/02 20:34:39 INFO MemoryStore: ensureFreeSpace(2004728720) called
> with
> > curMem=296303415, maxMem=8890959790
> >
> >
> > 16/02/02 20:34:39 INFO MemoryStore: Block taskresult_3298 stored as
> bytes in
> > memory (estimated size 1911.9 MB, free 6.1 GB)
> >
> >
> > 16/02/02 20:34:39 ERROR CoarseGrainedExecutorBackend: RECEIVED SIGNAL 15:
> > SIGTERM
> >
> >
> > 16/02/02 20:34:39 ERROR Executor: Exception in task 12.0 in stage 32.0
> (TID
> > 3278)
> >
> >
> > java.lang.OutOfMemoryError: Java heap space
> >
> >
> >at java.util.Arrays.copyOf(Arrays.java:2271)
> >
> >
> >at
> > java.io.ByteArrayOutputStream.toByteArray(ByteArrayOutputStream.java:191)
> >
> >
> >at
> >
> org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:86)
> >
> >
> >at
> > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:256)
> >
> >
> >at
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> >
> >
> >at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> >
> >
> >at java.lang.Thread.run(Thread.java:745)
> >
> >
> > 16/02/02 20:34:39 INFO DiskBlockManager: Shutdown hook called
&

RE: Spark 1.5.2 memory error

2016-02-03 Thread Stefan Panayotov
@msn.com 
spanayo...@outlook.com 
spanayo...@comcast.net

 
Date: Tue, 2 Feb 2016 15:40:10 -0800
Subject: Re: Spark 1.5.2 memory error
From: openkbi...@gmail.com
To: spanayo...@msn.com
CC: yuzhih...@gmail.com; ja...@odersky.com; user@spark.apache.org

Look at part#3 in below 
blog:http://www.openkb.info/2015/06/resource-allocation-configurations-for.html

You may want to increase the executor memory, not just the 
spark.yarn.executor.memoryOverhead.
On Tue, Feb 2, 2016 at 2:14 PM, Stefan Panayotov  wrote:
For the memoryOvethead I have the default of 10% of 16g, and Spark version is 
1.5.2. Stefan Panayotov, PhD
Sent from Outlook Mail for Windows 10 phone 
From: Ted Yu
Sent: Tuesday, February 2, 2016 4:52 PM
To: Jakob Odersky
Cc: Stefan Panayotov; user@spark.apache.org
Subject: Re: Spark 1.5.2 memory error What value do you use for 
spark.yarn.executor.memoryOverhead ? Please see 
https://spark.apache.org/docs/latest/running-on-yarn.html for description of 
the parameter. Which Spark release are you using ? Cheers On Tue, Feb 2, 2016 
at 1:38 PM, Jakob Odersky  wrote:Can you share some code 
that produces the error? It is probably not
due to spark but rather the way data is handled in the user code.
Does your code call any reduceByKey actions? These are often a source
for OOM errors.
On Tue, Feb 2, 2016 at 1:22 PM, Stefan Panayotov  wrote:
> Hi Guys,
>
> I need help with Spark memory errors when executing ML pipelines.
> The error that I see is:
>
>
> 16/02/02 20:34:17 INFO Executor: Executor is trying to kill task 32.0 in
> stage 32.0 (TID 3298)
>
>
> 16/02/02 20:34:17 INFO Executor: Executor is trying to kill task 12.0 in
> stage 32.0 (TID 3278)
>
>
> 16/02/02 20:34:39 INFO MemoryStore: ensureFreeSpace(2004728720) called with
> curMem=296303415, maxMem=8890959790
>
>
> 16/02/02 20:34:39 INFO MemoryStore: Block taskresult_3298 stored as bytes in
> memory (estimated size 1911.9 MB, free 6.1 GB)
>
>
> 16/02/02 20:34:39 ERROR CoarseGrainedExecutorBackend: RECEIVED SIGNAL 15:
> SIGTERM
>
>
> 16/02/02 20:34:39 ERROR Executor: Exception in task 12.0 in stage 32.0 (TID
> 3278)
>
>
> java.lang.OutOfMemoryError: Java heap space
>
>
>at java.util.Arrays.copyOf(Arrays.java:2271)
>
>
>at
> java.io.ByteArrayOutputStream.toByteArray(ByteArrayOutputStream.java:191)
>
>
>at
> org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:86)
>
>
>at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:256)
>
>
>at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>
>
>at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>
>
>at java.lang.Thread.run(Thread.java:745)
>
>
> 16/02/02 20:34:39 INFO DiskBlockManager: Shutdown hook called
>
>
> 16/02/02 20:34:39 INFO Executor: Finished task 32.0 in stage 32.0 (TID
> 3298). 2004728720 bytes result sent via BlockManager)
>
>
> 16/02/02 20:34:39 ERROR SparkUncaughtExceptionHandler: Uncaught exception in
> thread Thread[Executor task launch worker-8,5,main]
>
>
> java.lang.OutOfMemoryError: Java heap space
>
>
>at java.util.Arrays.copyOf(Arrays.java:2271)
>
>
>at
> java.io.ByteArrayOutputStream.toByteArray(ByteArrayOutputStream.java:191)
>
>
>at
> org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:86)
>
>
>at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:256)
>
>
>at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>
>
>at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>
>
>at java.lang.Thread.run(Thread.java:745)
>
>
> 16/02/02 20:34:39 INFO ShutdownHookManager: Shutdown hook called
>
>
> 16/02/02 20:34:39 INFO MetricsSystemImpl: Stopping azure-file-system metrics
> system...
>
>
> 16/02/02 20:34:39 INFO MetricsSinkAdapter: azurefs2 thread interrupted.
>
>
> 16/02/02 20:34:39 INFO MetricsSystemImpl: azure-file-system metrics system
> stopped.
>
>
> 16/02/02 20:34:39 INFO MetricsSystemImpl: azure-file-system metrics system
> shutdown complete.
>
>
>
>
>
> And …..
>
>
>
>
>
> 16/02/02 20:09:03 INFO impl.ContainerManagementProtocolProxy: Opening proxy
> : 10.0.0.5:30050
>
>
> 16/02/02 20:33:51 INFO yarn.YarnAllocator: Completed container
> container_1454421662639_0011_01_05 (state: COMPLETE, exit status: -104)
>
>
> 16/02/02 20:33:51 WARN yarn.YarnAllocator: Container killed by YARN for
> exceeding memory limits. 16.8 GB of 16.5 GB physical memory used. Conside

Re: Spark 1.5.2 memory error

2016-02-02 Thread Jim Green
Look at part#3 in below blog:
http://www.openkb.info/2015/06/resource-allocation-configurations-for.html

You may want to increase the executor memory, not just the
spark.yarn.executor.memoryOverhead.

On Tue, Feb 2, 2016 at 2:14 PM, Stefan Panayotov  wrote:

> For the memoryOvethead I have the default of 10% of 16g, and Spark version
> is 1.5.2.
>
>
>
> Stefan Panayotov, PhD
> Sent from Outlook Mail for Windows 10 phone
>
>
>
>
> *From: *Ted Yu 
> *Sent: *Tuesday, February 2, 2016 4:52 PM
> *To: *Jakob Odersky 
> *Cc: *Stefan Panayotov ; user@spark.apache.org
> *Subject: *Re: Spark 1.5.2 memory error
>
>
>
> What value do you use for spark.yarn.executor.memoryOverhead ?
>
>
>
> Please see https://spark.apache.org/docs/latest/running-on-yarn.html for
> description of the parameter.
>
>
>
> Which Spark release are you using ?
>
>
>
> Cheers
>
>
>
> On Tue, Feb 2, 2016 at 1:38 PM, Jakob Odersky  wrote:
>
> Can you share some code that produces the error? It is probably not
> due to spark but rather the way data is handled in the user code.
> Does your code call any reduceByKey actions? These are often a source
> for OOM errors.
>
>
> On Tue, Feb 2, 2016 at 1:22 PM, Stefan Panayotov 
> wrote:
> > Hi Guys,
> >
> > I need help with Spark memory errors when executing ML pipelines.
> > The error that I see is:
> >
> >
> > 16/02/02 20:34:17 INFO Executor: Executor is trying to kill task 32.0 in
> > stage 32.0 (TID 3298)
> >
> >
> > 16/02/02 20:34:17 INFO Executor: Executor is trying to kill task 12.0 in
> > stage 32.0 (TID 3278)
> >
> >
> > 16/02/02 20:34:39 INFO MemoryStore: ensureFreeSpace(2004728720) called
> with
> > curMem=296303415, maxMem=8890959790
> >
> >
> > 16/02/02 20:34:39 INFO MemoryStore: Block taskresult_3298 stored as
> bytes in
> > memory (estimated size 1911.9 MB, free 6.1 GB)
> >
> >
> > 16/02/02 20:34:39 ERROR CoarseGrainedExecutorBackend: RECEIVED SIGNAL 15:
> > SIGTERM
> >
> >
> > 16/02/02 20:34:39 ERROR Executor: Exception in task 12.0 in stage 32.0
> (TID
> > 3278)
> >
> >
> > java.lang.OutOfMemoryError: Java heap space
> >
> >
> >at java.util.Arrays.copyOf(Arrays.java:2271)
> >
> >
> >at
> > java.io.ByteArrayOutputStream.toByteArray(ByteArrayOutputStream.java:191)
> >
> >
> >at
> >
> org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:86)
> >
> >
> >at
> > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:256)
> >
> >
> >at
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> >
> >
> >at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> >
> >
> >at java.lang.Thread.run(Thread.java:745)
> >
> >
> > 16/02/02 20:34:39 INFO DiskBlockManager: Shutdown hook called
> >
> >
> > 16/02/02 20:34:39 INFO Executor: Finished task 32.0 in stage 32.0 (TID
> > 3298). 2004728720 bytes result sent via BlockManager)
> >
> >
> > 16/02/02 20:34:39 ERROR SparkUncaughtExceptionHandler: Uncaught
> exception in
> > thread Thread[Executor task launch worker-8,5,main]
> >
> >
> > java.lang.OutOfMemoryError: Java heap space
> >
> >
> >at java.util.Arrays.copyOf(Arrays.java:2271)
> >
> >
> >at
> > java.io.ByteArrayOutputStream.toByteArray(ByteArrayOutputStream.java:191)
> >
> >
> >at
> >
> org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:86)
> >
> >
> >at
> > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:256)
> >
> >
> >at
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> >
> >
> >at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> >
> >
> >at java.lang.Thread.run(Thread.java:745)
> >
> >
> > 16/02/02 20:34:39 INFO ShutdownHookManager: Shutdown hook called
> >
> >
> > 16/02/02 20:34:39 INFO MetricsSystemImpl: Stopping azure-file-system
> metrics
> > system...
> >
> >
> > 16/02/02 20:34:39 INFO MetricsSinkAdapter: azurefs2 thread interrupted.
> >
> >
> > 16/02/02 20:34:39 INFO MetricsSystemImpl: azure-file-system metrics
> system
> > st

RE: Spark 1.5.2 memory error

2016-02-02 Thread Stefan Panayotov
For the memoryOvethead I have the default of 10% of 16g, and Spark version is 
1.5.2.

Stefan Panayotov, PhD
Sent from Outlook Mail for Windows 10 phone


From: Ted Yu
Sent: Tuesday, February 2, 2016 4:52 PM
To: Jakob Odersky
Cc: Stefan Panayotov; user@spark.apache.org
Subject: Re: Spark 1.5.2 memory error

What value do you use for spark.yarn.executor.memoryOverhead ?

Please see https://spark.apache.org/docs/latest/running-on-yarn.html for 
description of the parameter.

Which Spark release are you using ?

Cheers

On Tue, Feb 2, 2016 at 1:38 PM, Jakob Odersky  wrote:
Can you share some code that produces the error? It is probably not
due to spark but rather the way data is handled in the user code.
Does your code call any reduceByKey actions? These are often a source
for OOM errors.

On Tue, Feb 2, 2016 at 1:22 PM, Stefan Panayotov  wrote:
> Hi Guys,
>
> I need help with Spark memory errors when executing ML pipelines.
> The error that I see is:
>
>
> 16/02/02 20:34:17 INFO Executor: Executor is trying to kill task 32.0 in
> stage 32.0 (TID 3298)
>
>
> 16/02/02 20:34:17 INFO Executor: Executor is trying to kill task 12.0 in
> stage 32.0 (TID 3278)
>
>
> 16/02/02 20:34:39 INFO MemoryStore: ensureFreeSpace(2004728720) called with
> curMem=296303415, maxMem=8890959790
>
>
> 16/02/02 20:34:39 INFO MemoryStore: Block taskresult_3298 stored as bytes in
> memory (estimated size 1911.9 MB, free 6.1 GB)
>
>
> 16/02/02 20:34:39 ERROR CoarseGrainedExecutorBackend: RECEIVED SIGNAL 15:
> SIGTERM
>
>
> 16/02/02 20:34:39 ERROR Executor: Exception in task 12.0 in stage 32.0 (TID
> 3278)
>
>
> java.lang.OutOfMemoryError: Java heap space
>
>
>        at java.util.Arrays.copyOf(Arrays.java:2271)
>
>
>        at
> java.io.ByteArrayOutputStream.toByteArray(ByteArrayOutputStream.java:191)
>
>
>        at
> org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:86)
>
>
>        at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:256)
>
>
>        at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>
>
>        at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>
>
>        at java.lang.Thread.run(Thread.java:745)
>
>
> 16/02/02 20:34:39 INFO DiskBlockManager: Shutdown hook called
>
>
> 16/02/02 20:34:39 INFO Executor: Finished task 32.0 in stage 32.0 (TID
> 3298). 2004728720 bytes result sent via BlockManager)
>
>
> 16/02/02 20:34:39 ERROR SparkUncaughtExceptionHandler: Uncaught exception in
> thread Thread[Executor task launch worker-8,5,main]
>
>
> java.lang.OutOfMemoryError: Java heap space
>
>
>        at java.util.Arrays.copyOf(Arrays.java:2271)
>
>
>        at
> java.io.ByteArrayOutputStream.toByteArray(ByteArrayOutputStream.java:191)
>
>
>        at
> org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:86)
>
>
>        at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:256)
>
>
>        at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>
>
>        at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>
>
>        at java.lang.Thread.run(Thread.java:745)
>
>
> 16/02/02 20:34:39 INFO ShutdownHookManager: Shutdown hook called
>
>
> 16/02/02 20:34:39 INFO MetricsSystemImpl: Stopping azure-file-system metrics
> system...
>
>
> 16/02/02 20:34:39 INFO MetricsSinkAdapter: azurefs2 thread interrupted.
>
>
> 16/02/02 20:34:39 INFO MetricsSystemImpl: azure-file-system metrics system
> stopped.
>
>
> 16/02/02 20:34:39 INFO MetricsSystemImpl: azure-file-system metrics system
> shutdown complete.
>
>
>
>
>
> And …..
>
>
>
>
>
> 16/02/02 20:09:03 INFO impl.ContainerManagementProtocolProxy: Opening proxy
> : 10.0.0.5:30050
>
>
> 16/02/02 20:33:51 INFO yarn.YarnAllocator: Completed container
> container_1454421662639_0011_01_05 (state: COMPLETE, exit status: -104)
>
>
> 16/02/02 20:33:51 WARN yarn.YarnAllocator: Container killed by YARN for
> exceeding memory limits. 16.8 GB of 16.5 GB physical memory used. Consider
> boosting spark.yarn.executor.memoryOverhead.
>
>
> 16/02/02 20:33:56 INFO yarn.YarnAllocator: Will request 1 executor
> containers, each with 2 cores and 16768 MB memory including 384 MB overhead
>
>
> 16/02/02 20:33:56 INFO yarn.YarnAllocator: Container request (host: Any,
> capability: )
>
>
> 16/02/02 20:33:57 INFO yarn.YarnAllocator: Launching container
> container_1454421662639_0011_01_37 for on host 10.0.0.8
>
>
> 16/02/02 

Re: Spark 1.5.2 memory error

2016-02-02 Thread Ted Yu
What value do you use for spark.yarn.executor.memoryOverhead ?

Please see https://spark.apache.org/docs/latest/running-on-yarn.html for
description of the parameter.

Which Spark release are you using ?

Cheers

On Tue, Feb 2, 2016 at 1:38 PM, Jakob Odersky  wrote:

> Can you share some code that produces the error? It is probably not
> due to spark but rather the way data is handled in the user code.
> Does your code call any reduceByKey actions? These are often a source
> for OOM errors.
>
> On Tue, Feb 2, 2016 at 1:22 PM, Stefan Panayotov 
> wrote:
> > Hi Guys,
> >
> > I need help with Spark memory errors when executing ML pipelines.
> > The error that I see is:
> >
> >
> > 16/02/02 20:34:17 INFO Executor: Executor is trying to kill task 32.0 in
> > stage 32.0 (TID 3298)
> >
> >
> > 16/02/02 20:34:17 INFO Executor: Executor is trying to kill task 12.0 in
> > stage 32.0 (TID 3278)
> >
> >
> > 16/02/02 20:34:39 INFO MemoryStore: ensureFreeSpace(2004728720) called
> with
> > curMem=296303415, maxMem=8890959790
> >
> >
> > 16/02/02 20:34:39 INFO MemoryStore: Block taskresult_3298 stored as
> bytes in
> > memory (estimated size 1911.9 MB, free 6.1 GB)
> >
> >
> > 16/02/02 20:34:39 ERROR CoarseGrainedExecutorBackend: RECEIVED SIGNAL 15:
> > SIGTERM
> >
> >
> > 16/02/02 20:34:39 ERROR Executor: Exception in task 12.0 in stage 32.0
> (TID
> > 3278)
> >
> >
> > java.lang.OutOfMemoryError: Java heap space
> >
> >
> >at java.util.Arrays.copyOf(Arrays.java:2271)
> >
> >
> >at
> > java.io.ByteArrayOutputStream.toByteArray(ByteArrayOutputStream.java:191)
> >
> >
> >at
> >
> org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:86)
> >
> >
> >at
> > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:256)
> >
> >
> >at
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> >
> >
> >at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> >
> >
> >at java.lang.Thread.run(Thread.java:745)
> >
> >
> > 16/02/02 20:34:39 INFO DiskBlockManager: Shutdown hook called
> >
> >
> > 16/02/02 20:34:39 INFO Executor: Finished task 32.0 in stage 32.0 (TID
> > 3298). 2004728720 bytes result sent via BlockManager)
> >
> >
> > 16/02/02 20:34:39 ERROR SparkUncaughtExceptionHandler: Uncaught
> exception in
> > thread Thread[Executor task launch worker-8,5,main]
> >
> >
> > java.lang.OutOfMemoryError: Java heap space
> >
> >
> >at java.util.Arrays.copyOf(Arrays.java:2271)
> >
> >
> >at
> > java.io.ByteArrayOutputStream.toByteArray(ByteArrayOutputStream.java:191)
> >
> >
> >at
> >
> org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:86)
> >
> >
> >at
> > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:256)
> >
> >
> >at
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> >
> >
> >at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> >
> >
> >at java.lang.Thread.run(Thread.java:745)
> >
> >
> > 16/02/02 20:34:39 INFO ShutdownHookManager: Shutdown hook called
> >
> >
> > 16/02/02 20:34:39 INFO MetricsSystemImpl: Stopping azure-file-system
> metrics
> > system...
> >
> >
> > 16/02/02 20:34:39 INFO MetricsSinkAdapter: azurefs2 thread interrupted.
> >
> >
> > 16/02/02 20:34:39 INFO MetricsSystemImpl: azure-file-system metrics
> system
> > stopped.
> >
> >
> > 16/02/02 20:34:39 INFO MetricsSystemImpl: azure-file-system metrics
> system
> > shutdown complete.
> >
> >
> >
> >
> >
> > And …..
> >
> >
> >
> >
> >
> > 16/02/02 20:09:03 INFO impl.ContainerManagementProtocolProxy: Opening
> proxy
> > : 10.0.0.5:30050
> >
> >
> > 16/02/02 20:33:51 INFO yarn.YarnAllocator: Completed container
> > container_1454421662639_0011_01_05 (state: COMPLETE, exit status:
> -104)
> >
> >
> > 16/02/02 20:33:51 WARN yarn.YarnAllocator: Container killed by YARN for
> > exceeding memory limits. 16.8 GB of 16.5 GB physical memory used.
> Consider
> > boosting spark.yarn.executor.memoryOverhead.
> >
> >
> > 16/02/02 20:33:56 INFO yarn.YarnAllocator: Will request 1 executor
> > containers, each with 2 cores and 16768 MB memory including 384 MB
> overhead
> >
> >
> > 16/02/02 20:33:56 INFO yarn.YarnAllocator: Container request (host: Any,
> > capability: )
> >
> >
> > 16/02/02 20:33:57 INFO yarn.YarnAllocator: Launching container
> > container_1454421662639_0011_01_37 for on host 10.0.0.8
> >
> >
> > 16/02/02 20:33:57 INFO yarn.YarnAllocator: Launching ExecutorRunnable.
> > driverUrl:
> > akka.tcp://sparkDriver@10.0.0.15:47446/user/CoarseGrainedScheduler,
> > executorHostname: 10.0.0.8
> >
> >
> > 16/02/02 20:33:57 INFO yarn.YarnAllocator: Received 1 containers from
> YARN,
> > launching executors on 1 of them.
> >
> >
> > I'll really appreciate any help here.
> >
> > Thank you,
> >
> > Stefan Panayotov, PhD
> > Home: 610-355-091

Re: Spark 1.5.2 memory error

2016-02-02 Thread Jakob Odersky
Can you share some code that produces the error? It is probably not
due to spark but rather the way data is handled in the user code.
Does your code call any reduceByKey actions? These are often a source
for OOM errors.

On Tue, Feb 2, 2016 at 1:22 PM, Stefan Panayotov  wrote:
> Hi Guys,
>
> I need help with Spark memory errors when executing ML pipelines.
> The error that I see is:
>
>
> 16/02/02 20:34:17 INFO Executor: Executor is trying to kill task 32.0 in
> stage 32.0 (TID 3298)
>
>
> 16/02/02 20:34:17 INFO Executor: Executor is trying to kill task 12.0 in
> stage 32.0 (TID 3278)
>
>
> 16/02/02 20:34:39 INFO MemoryStore: ensureFreeSpace(2004728720) called with
> curMem=296303415, maxMem=8890959790
>
>
> 16/02/02 20:34:39 INFO MemoryStore: Block taskresult_3298 stored as bytes in
> memory (estimated size 1911.9 MB, free 6.1 GB)
>
>
> 16/02/02 20:34:39 ERROR CoarseGrainedExecutorBackend: RECEIVED SIGNAL 15:
> SIGTERM
>
>
> 16/02/02 20:34:39 ERROR Executor: Exception in task 12.0 in stage 32.0 (TID
> 3278)
>
>
> java.lang.OutOfMemoryError: Java heap space
>
>
>at java.util.Arrays.copyOf(Arrays.java:2271)
>
>
>at
> java.io.ByteArrayOutputStream.toByteArray(ByteArrayOutputStream.java:191)
>
>
>at
> org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:86)
>
>
>at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:256)
>
>
>at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>
>
>at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>
>
>at java.lang.Thread.run(Thread.java:745)
>
>
> 16/02/02 20:34:39 INFO DiskBlockManager: Shutdown hook called
>
>
> 16/02/02 20:34:39 INFO Executor: Finished task 32.0 in stage 32.0 (TID
> 3298). 2004728720 bytes result sent via BlockManager)
>
>
> 16/02/02 20:34:39 ERROR SparkUncaughtExceptionHandler: Uncaught exception in
> thread Thread[Executor task launch worker-8,5,main]
>
>
> java.lang.OutOfMemoryError: Java heap space
>
>
>at java.util.Arrays.copyOf(Arrays.java:2271)
>
>
>at
> java.io.ByteArrayOutputStream.toByteArray(ByteArrayOutputStream.java:191)
>
>
>at
> org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:86)
>
>
>at
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:256)
>
>
>at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>
>
>at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>
>
>at java.lang.Thread.run(Thread.java:745)
>
>
> 16/02/02 20:34:39 INFO ShutdownHookManager: Shutdown hook called
>
>
> 16/02/02 20:34:39 INFO MetricsSystemImpl: Stopping azure-file-system metrics
> system...
>
>
> 16/02/02 20:34:39 INFO MetricsSinkAdapter: azurefs2 thread interrupted.
>
>
> 16/02/02 20:34:39 INFO MetricsSystemImpl: azure-file-system metrics system
> stopped.
>
>
> 16/02/02 20:34:39 INFO MetricsSystemImpl: azure-file-system metrics system
> shutdown complete.
>
>
>
>
>
> And …..
>
>
>
>
>
> 16/02/02 20:09:03 INFO impl.ContainerManagementProtocolProxy: Opening proxy
> : 10.0.0.5:30050
>
>
> 16/02/02 20:33:51 INFO yarn.YarnAllocator: Completed container
> container_1454421662639_0011_01_05 (state: COMPLETE, exit status: -104)
>
>
> 16/02/02 20:33:51 WARN yarn.YarnAllocator: Container killed by YARN for
> exceeding memory limits. 16.8 GB of 16.5 GB physical memory used. Consider
> boosting spark.yarn.executor.memoryOverhead.
>
>
> 16/02/02 20:33:56 INFO yarn.YarnAllocator: Will request 1 executor
> containers, each with 2 cores and 16768 MB memory including 384 MB overhead
>
>
> 16/02/02 20:33:56 INFO yarn.YarnAllocator: Container request (host: Any,
> capability: )
>
>
> 16/02/02 20:33:57 INFO yarn.YarnAllocator: Launching container
> container_1454421662639_0011_01_37 for on host 10.0.0.8
>
>
> 16/02/02 20:33:57 INFO yarn.YarnAllocator: Launching ExecutorRunnable.
> driverUrl:
> akka.tcp://sparkDriver@10.0.0.15:47446/user/CoarseGrainedScheduler,
> executorHostname: 10.0.0.8
>
>
> 16/02/02 20:33:57 INFO yarn.YarnAllocator: Received 1 containers from YARN,
> launching executors on 1 of them.
>
>
> I'll really appreciate any help here.
>
> Thank you,
>
> Stefan Panayotov, PhD
> Home: 610-355-0919
> Cell: 610-517-5586
> email: spanayo...@msn.com
> spanayo...@outlook.com
> spanayo...@comcast.net
>

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Spark 1.5.2 memory error

2016-02-02 Thread Stefan Panayotov
Hi Guys,
 
I need help with Spark memory errors when executing ML pipelines.
The error that I see is:
 



16/02/02 20:34:17 INFO Executor: Executor is trying to kill task
32.0 in stage 32.0 (TID 3298)


16/02/02 20:34:17 INFO Executor: Executor is trying to kill task
12.0 in stage 32.0 (TID 3278)


16/02/02 20:34:39 INFO MemoryStore: ensureFreeSpace(2004728720)
called with curMem=296303415, maxMem=8890959790


16/02/02 20:34:39 INFO MemoryStore: Block taskresult_3298 stored
as bytes in memory (estimated size 1911.9 MB, free 6.1 GB)


16/02/02 20:34:39 ERROR CoarseGrainedExecutorBackend: RECEIVED
SIGNAL 15: SIGTERM


16/02/02 20:34:39 ERROR Executor: Exception in task 12.0 in
stage 32.0 (TID 3278)


java.lang.OutOfMemoryError:
Java heap space


   at
java.util.Arrays.copyOf(Arrays.java:2271)


   at
java.io.ByteArrayOutputStream.toByteArray(ByteArrayOutputStream.java:191)


   at
org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:86)


   at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:256)


   at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)


   at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)


   at
java.lang.Thread.run(Thread.java:745)


16/02/02 20:34:39 INFO DiskBlockManager: Shutdown hook called


16/02/02 20:34:39 INFO Executor: Finished task 32.0 in stage
32.0 (TID 3298). 2004728720 bytes result sent via BlockManager)


16/02/02 20:34:39 ERROR SparkUncaughtExceptionHandler: Uncaught
exception in thread Thread[Executor task launch worker-8,5,main]


java.lang.OutOfMemoryError:
Java heap space


   at
java.util.Arrays.copyOf(Arrays.java:2271)


   at
java.io.ByteArrayOutputStream.toByteArray(ByteArrayOutputStream.java:191)


   at
org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:86)


   at
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:256)


   at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)


   at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)


   at
java.lang.Thread.run(Thread.java:745)


16/02/02 20:34:39 INFO ShutdownHookManager: Shutdown hook called


16/02/02 20:34:39 INFO MetricsSystemImpl: Stopping
azure-file-system metrics system...


16/02/02 20:34:39 INFO MetricsSinkAdapter: azurefs2 thread
interrupted.


16/02/02 20:34:39 INFO MetricsSystemImpl: azure-file-system
metrics system stopped.


16/02/02 20:34:39 INFO MetricsSystemImpl: azure-file-system
metrics system shutdown complete.


 


And …..


 


16/02/02 20:09:03 INFO impl.ContainerManagementProtocolProxy:
Opening proxy : 10.0.0.5:30050


16/02/02 20:33:51 INFO yarn.YarnAllocator: Completed container
container_1454421662639_0011_01_05 (state: COMPLETE, exit status: -104)


16/02/02 20:33:51 WARN yarn.YarnAllocator: Container killed by YARN for
exceeding memory limits. 16.8 GB of 16.5 GB physical memory used. Consider
boosting spark.yarn.executor.memoryOverhead.


16/02/02 20:33:56 INFO yarn.YarnAllocator: Will request 1
executor containers, each with 2 cores and 16768 MB memory including 384 MB
overhead


16/02/02 20:33:56 INFO yarn.YarnAllocator: Container request
(host: Any, capability: )


16/02/02 20:33:57 INFO yarn.YarnAllocator: Launching container
container_1454421662639_0011_01_37 for on host 10.0.0.8


16/02/02 20:33:57 INFO yarn.YarnAllocator: Launching
ExecutorRunnable. driverUrl: 
akka.tcp://sparkDriver@10.0.0.15:47446/user/CoarseGrainedScheduler, 
executorHostname: 10.0.0.8


16/02/02 20:33:57 INFO yarn.YarnAllocator: Received 1 containers
from YARN, launching executors on 1 of them.


I'll really appreciate any help here.
 
Thank you,


Stefan Panayotov, PhD 
Home: 610-355-0919 
Cell: 610-517-5586 
email: spanayo...@msn.com 
spanayo...@outlook.com 
spanayo...@comcast.net