Re: Will it lead to OOM error?
Thanks all for your answers. Much appreciated. On Thu, Jun 23, 2022 at 6:07 AM Yong Walt wrote: > We have many cases like this. it won't cause OOM. > > Thanks > > On Wed, Jun 22, 2022 at 8:28 PM Sid wrote: > >> I have a 150TB CSV file. >> >> I have a total of 100 TB RAM and 100TB disk. So If I do something like >> this >> >> spark.read.option("header","true").csv(filepath).show(false) >> >> Will it lead to an OOM error since it doesn't have enough memory? or it >> will spill data onto the disk and process it? >> >> Thanks, >> Sid >> >
Re: Will it lead to OOM error?
We have many cases like this. it won't cause OOM. Thanks On Wed, Jun 22, 2022 at 8:28 PM Sid wrote: > I have a 150TB CSV file. > > I have a total of 100 TB RAM and 100TB disk. So If I do something like this > > spark.read.option("header","true").csv(filepath).show(false) > > Will it lead to an OOM error since it doesn't have enough memory? or it > will spill data onto the disk and process it? > > Thanks, > Sid >
Re: Will it lead to OOM error?
Yes, a single file compressed with a non-splitable compression (e.g. gzip) would have to be read by a single executor. That takes forever. You should consider to recompress the file with a splitable compression first. You will not want to read that file more than once, so you should uncompress it only once (in order to recompress). Enrico Am 22.06.22 um 20:17 schrieb Sid: Hi Enrico, Thanks for the insights. Could you please help me to understand with one example of compressed files where the file wouldn't be split in partitions and will put load on a single partition and might lead to OOM error? Thanks, Sid On Wed, Jun 22, 2022 at 6:40 PM Enrico Minack wrote: The RAM and disk memory consumtion depends on what you do with the data after reading them. Your particular action will read 20 lines from the first partition and show them. So it will not use any RAM or disk, no matter how large the CSV is. If you do a count instead of show, it will iterate over the each partition and return a count per partition, so no RAM here needed as well. If you do some real processing of the data, the requirement RAM and disk again depends on involved shuffles and intermediate results that need to be store in RAM or on disk. Enrico Am 22.06.22 um 14:54 schrieb Deepak Sharma: It will spill to disk if everything can’t be loaded in memory . On Wed, 22 Jun 2022 at 5:58 PM, Sid wrote: I have a 150TB CSV file. I have a total of 100 TB RAM and 100TB disk. So If I do something like this spark.read.option("header","true").csv(filepath).show(false) Will it lead to an OOM error since it doesn't have enough memory? or it will spill data onto the disk and process it? Thanks, Sid -- Thanks Deepak www.bigdatabig.com <http://www.bigdatabig.com> www.keosha.net <http://www.keosha.net>
Re: Will it lead to OOM error?
Hi Enrico, Thanks for the insights. Could you please help me to understand with one example of compressed files where the file wouldn't be split in partitions and will put load on a single partition and might lead to OOM error? Thanks, Sid On Wed, Jun 22, 2022 at 6:40 PM Enrico Minack wrote: > The RAM and disk memory consumtion depends on what you do with the data > after reading them. > > Your particular action will read 20 lines from the first partition and > show them. So it will not use any RAM or disk, no matter how large the CSV > is. > > If you do a count instead of show, it will iterate over the each partition > and return a count per partition, so no RAM here needed as well. > > If you do some real processing of the data, the requirement RAM and disk > again depends on involved shuffles and intermediate results that need to be > store in RAM or on disk. > > Enrico > > > Am 22.06.22 um 14:54 schrieb Deepak Sharma: > > It will spill to disk if everything can’t be loaded in memory . > > > On Wed, 22 Jun 2022 at 5:58 PM, Sid wrote: > >> I have a 150TB CSV file. >> >> I have a total of 100 TB RAM and 100TB disk. So If I do something like >> this >> >> spark.read.option("header","true").csv(filepath).show(false) >> >> Will it lead to an OOM error since it doesn't have enough memory? or it >> will spill data onto the disk and process it? >> >> Thanks, >> Sid >> > -- > Thanks > Deepak > www.bigdatabig.com > www.keosha.net > > >
Re: Will it lead to OOM error?
The RAM and disk memory consumtion depends on what you do with the data after reading them. Your particular action will read 20 lines from the first partition and show them. So it will not use any RAM or disk, no matter how large the CSV is. If you do a count instead of show, it will iterate over the each partition and return a count per partition, so no RAM here needed as well. If you do some real processing of the data, the requirement RAM and disk again depends on involved shuffles and intermediate results that need to be store in RAM or on disk. Enrico Am 22.06.22 um 14:54 schrieb Deepak Sharma: It will spill to disk if everything can’t be loaded in memory . On Wed, 22 Jun 2022 at 5:58 PM, Sid wrote: I have a 150TB CSV file. I have a total of 100 TB RAM and 100TB disk. So If I do something like this spark.read.option("header","true").csv(filepath).show(false) Will it lead to an OOM error since it doesn't have enough memory? or it will spill data onto the disk and process it? Thanks, Sid -- Thanks Deepak www.bigdatabig.com <http://www.bigdatabig.com> www.keosha.net <http://www.keosha.net>
Re: Will it lead to OOM error?
It will spill to disk if everything can’t be loaded in memory . On Wed, 22 Jun 2022 at 5:58 PM, Sid wrote: > I have a 150TB CSV file. > > I have a total of 100 TB RAM and 100TB disk. So If I do something like this > > spark.read.option("header","true").csv(filepath).show(false) > > Will it lead to an OOM error since it doesn't have enough memory? or it > will spill data onto the disk and process it? > > Thanks, > Sid > -- Thanks Deepak www.bigdatabig.com www.keosha.net
Will it lead to OOM error?
I have a 150TB CSV file. I have a total of 100 TB RAM and 100TB disk. So If I do something like this spark.read.option("header","true").csv(filepath).show(false) Will it lead to an OOM error since it doesn't have enough memory? or it will spill data onto the disk and process it? Thanks, Sid
Re: OOM Error
Sure folks, will try later today! Best Regards Ankit Khettry On Sat, 7 Sep, 2019, 6:56 PM Sunil Kalra, wrote: > Ankit > > Can you try reducing number of cores or increasing memory. Because with > below configuration your each core is getting ~3.5 GB. Otherwise your data > is skewed, that one of cores is getting too much data based key. > > spark.executor.cores 6 spark.executor.memory 36g > > On Sat, Sep 7, 2019 at 6:35 AM Chris Teoh wrote: > >> It says you have 3811 tasks in earlier stages and you're going down to >> 2001 partitions, that would make it more memory intensive. I'm guessing the >> default spark shuffle partition was 200 so that would have failed. Go for >> higher number, maybe even higher than 3811. What was your shuffle write >> from stage 7 and shuffle read from stage 8? >> >> On Sat, 7 Sep 2019, 7:57 pm Ankit Khettry, >> wrote: >> >>> Still unable to overcome the error. Attaching some screenshots for >>> reference. >>> Following are the configs used: >>> spark.yarn.max.executor.failures 1000 spark.yarn.driver.memoryOverhead >>> 6g spark.executor.cores 6 spark.executor.memory 36g >>> spark.sql.shuffle.partitions 2001 spark.memory.offHeap.size 8g >>> spark.memory.offHeap.enabled true spark.executor.instances 10 >>> spark.driver.memory 14g spark.yarn.executor.memoryOverhead 10g >>> >>> Best Regards >>> Ankit Khettry >>> >>> On Sat, Sep 7, 2019 at 2:56 PM Chris Teoh wrote: >>> >>>> You can try, consider processing each partition separately if your data >>>> is heavily skewed when you partition it. >>>> >>>> On Sat, 7 Sep 2019, 7:19 pm Ankit Khettry, >>>> wrote: >>>> >>>>> Thanks Chris >>>>> >>>>> Going to try it soon by setting maybe spark.sql.shuffle.partitions to >>>>> 2001. Also, I was wondering if it would help if I repartition the data by >>>>> the fields I am using in group by and window operations? >>>>> >>>>> Best Regards >>>>> Ankit Khettry >>>>> >>>>> On Sat, 7 Sep, 2019, 1:05 PM Chris Teoh, wrote: >>>>> >>>>>> Hi Ankit, >>>>>> >>>>>> Without looking at the Spark UI and the stages/DAG, I'm guessing >>>>>> you're running on default number of Spark shuffle partitions. >>>>>> >>>>>> If you're seeing a lot of shuffle spill, you likely have to increase >>>>>> the number of shuffle partitions to accommodate the huge shuffle size. >>>>>> >>>>>> I hope that helps >>>>>> Chris >>>>>> >>>>>> On Sat, 7 Sep 2019, 4:18 pm Ankit Khettry, >>>>>> wrote: >>>>>> >>>>>>> Nope, it's a batch job. >>>>>>> >>>>>>> Best Regards >>>>>>> Ankit Khettry >>>>>>> >>>>>>> On Sat, 7 Sep, 2019, 6:52 AM Upasana Sharma, < >>>>>>> 028upasana...@gmail.com> wrote: >>>>>>> >>>>>>>> Is it a streaming job? >>>>>>>> >>>>>>>> On Sat, Sep 7, 2019, 5:04 AM Ankit Khettry >>>>>>>> wrote: >>>>>>>> >>>>>>>>> I have a Spark job that consists of a large number of Window >>>>>>>>> operations and hence involves large shuffles. I have roughly 900 GiBs >>>>>>>>> of >>>>>>>>> data, although I am using a large enough cluster (10 * m5.4xlarge >>>>>>>>> instances). I am using the following configurations for the job, >>>>>>>>> although I >>>>>>>>> have tried various other combinations without any success. >>>>>>>>> >>>>>>>>> spark.yarn.driver.memoryOverhead 6g >>>>>>>>> spark.storage.memoryFraction 0.1 >>>>>>>>> spark.executor.cores 6 >>>>>>>>> spark.executor.memory 36g >>>>>>>>> spark.memory.offHeap.size 8g >>>>>>>>> spark.memory.offHeap.enabled true >>>>>>>>> spark.executor.instances 10 >>>>>>>>> spark.driver.memory 14g >>>>>>>>> spark.yarn.executor.memoryOverhead 10g >>>>>>>>> >>>>>>>>> I keep running into the following OOM error: >>>>>>>>> >>>>>>>>> org.apache.spark.memory.SparkOutOfMemoryError: Unable to acquire >>>>>>>>> 16384 bytes of memory, got 0 >>>>>>>>> at >>>>>>>>> org.apache.spark.memory.MemoryConsumer.throwOom(MemoryConsumer.java:157) >>>>>>>>> at >>>>>>>>> org.apache.spark.memory.MemoryConsumer.allocateArray(MemoryConsumer.java:98) >>>>>>>>> at >>>>>>>>> org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter.(UnsafeInMemorySorter.java:128) >>>>>>>>> at >>>>>>>>> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.(UnsafeExternalSorter.java:163) >>>>>>>>> >>>>>>>>> I see there are a large number of JIRAs in place for similar >>>>>>>>> issues and a great many of them are even marked resolved. >>>>>>>>> Can someone guide me as to how to approach this problem? I am >>>>>>>>> using Databricks Spark 2.4.1. >>>>>>>>> >>>>>>>>> Best Regards >>>>>>>>> Ankit Khettry >>>>>>>>> >>>>>>>>
Re: OOM Error
Ankit Can you try reducing number of cores or increasing memory. Because with below configuration your each core is getting ~3.5 GB. Otherwise your data is skewed, that one of cores is getting too much data based key. spark.executor.cores 6 spark.executor.memory 36g On Sat, Sep 7, 2019 at 6:35 AM Chris Teoh wrote: > It says you have 3811 tasks in earlier stages and you're going down to > 2001 partitions, that would make it more memory intensive. I'm guessing the > default spark shuffle partition was 200 so that would have failed. Go for > higher number, maybe even higher than 3811. What was your shuffle write > from stage 7 and shuffle read from stage 8? > > On Sat, 7 Sep 2019, 7:57 pm Ankit Khettry, > wrote: > >> Still unable to overcome the error. Attaching some screenshots for >> reference. >> Following are the configs used: >> spark.yarn.max.executor.failures 1000 spark.yarn.driver.memoryOverhead 6g >> spark.executor.cores 6 spark.executor.memory 36g >> spark.sql.shuffle.partitions 2001 spark.memory.offHeap.size 8g >> spark.memory.offHeap.enabled true spark.executor.instances 10 >> spark.driver.memory 14g spark.yarn.executor.memoryOverhead 10g >> >> Best Regards >> Ankit Khettry >> >> On Sat, Sep 7, 2019 at 2:56 PM Chris Teoh wrote: >> >>> You can try, consider processing each partition separately if your data >>> is heavily skewed when you partition it. >>> >>> On Sat, 7 Sep 2019, 7:19 pm Ankit Khettry, >>> wrote: >>> >>>> Thanks Chris >>>> >>>> Going to try it soon by setting maybe spark.sql.shuffle.partitions to >>>> 2001. Also, I was wondering if it would help if I repartition the data by >>>> the fields I am using in group by and window operations? >>>> >>>> Best Regards >>>> Ankit Khettry >>>> >>>> On Sat, 7 Sep, 2019, 1:05 PM Chris Teoh, wrote: >>>> >>>>> Hi Ankit, >>>>> >>>>> Without looking at the Spark UI and the stages/DAG, I'm guessing >>>>> you're running on default number of Spark shuffle partitions. >>>>> >>>>> If you're seeing a lot of shuffle spill, you likely have to increase >>>>> the number of shuffle partitions to accommodate the huge shuffle size. >>>>> >>>>> I hope that helps >>>>> Chris >>>>> >>>>> On Sat, 7 Sep 2019, 4:18 pm Ankit Khettry, >>>>> wrote: >>>>> >>>>>> Nope, it's a batch job. >>>>>> >>>>>> Best Regards >>>>>> Ankit Khettry >>>>>> >>>>>> On Sat, 7 Sep, 2019, 6:52 AM Upasana Sharma, <028upasana...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> Is it a streaming job? >>>>>>> >>>>>>> On Sat, Sep 7, 2019, 5:04 AM Ankit Khettry >>>>>>> wrote: >>>>>>> >>>>>>>> I have a Spark job that consists of a large number of Window >>>>>>>> operations and hence involves large shuffles. I have roughly 900 GiBs >>>>>>>> of >>>>>>>> data, although I am using a large enough cluster (10 * m5.4xlarge >>>>>>>> instances). I am using the following configurations for the job, >>>>>>>> although I >>>>>>>> have tried various other combinations without any success. >>>>>>>> >>>>>>>> spark.yarn.driver.memoryOverhead 6g >>>>>>>> spark.storage.memoryFraction 0.1 >>>>>>>> spark.executor.cores 6 >>>>>>>> spark.executor.memory 36g >>>>>>>> spark.memory.offHeap.size 8g >>>>>>>> spark.memory.offHeap.enabled true >>>>>>>> spark.executor.instances 10 >>>>>>>> spark.driver.memory 14g >>>>>>>> spark.yarn.executor.memoryOverhead 10g >>>>>>>> >>>>>>>> I keep running into the following OOM error: >>>>>>>> >>>>>>>> org.apache.spark.memory.SparkOutOfMemoryError: Unable to acquire >>>>>>>> 16384 bytes of memory, got 0 >>>>>>>> at >>>>>>>> org.apache.spark.memory.MemoryConsumer.throwOom(MemoryConsumer.java:157) >>>>>>>> at >>>>>>>> org.apache.spark.memory.MemoryConsumer.allocateArray(MemoryConsumer.java:98) >>>>>>>> at >>>>>>>> org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter.(UnsafeInMemorySorter.java:128) >>>>>>>> at >>>>>>>> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.(UnsafeExternalSorter.java:163) >>>>>>>> >>>>>>>> I see there are a large number of JIRAs in place for similar issues >>>>>>>> and a great many of them are even marked resolved. >>>>>>>> Can someone guide me as to how to approach this problem? I am using >>>>>>>> Databricks Spark 2.4.1. >>>>>>>> >>>>>>>> Best Regards >>>>>>>> Ankit Khettry >>>>>>>> >>>>>>>
Re: OOM Error
It says you have 3811 tasks in earlier stages and you're going down to 2001 partitions, that would make it more memory intensive. I'm guessing the default spark shuffle partition was 200 so that would have failed. Go for higher number, maybe even higher than 3811. What was your shuffle write from stage 7 and shuffle read from stage 8? On Sat, 7 Sep 2019, 7:57 pm Ankit Khettry, wrote: > Still unable to overcome the error. Attaching some screenshots for > reference. > Following are the configs used: > spark.yarn.max.executor.failures 1000 spark.yarn.driver.memoryOverhead 6g > spark.executor.cores 6 spark.executor.memory 36g > spark.sql.shuffle.partitions 2001 spark.memory.offHeap.size 8g > spark.memory.offHeap.enabled true spark.executor.instances 10 > spark.driver.memory 14g spark.yarn.executor.memoryOverhead 10g > > Best Regards > Ankit Khettry > > On Sat, Sep 7, 2019 at 2:56 PM Chris Teoh wrote: > >> You can try, consider processing each partition separately if your data >> is heavily skewed when you partition it. >> >> On Sat, 7 Sep 2019, 7:19 pm Ankit Khettry, >> wrote: >> >>> Thanks Chris >>> >>> Going to try it soon by setting maybe spark.sql.shuffle.partitions to >>> 2001. Also, I was wondering if it would help if I repartition the data by >>> the fields I am using in group by and window operations? >>> >>> Best Regards >>> Ankit Khettry >>> >>> On Sat, 7 Sep, 2019, 1:05 PM Chris Teoh, wrote: >>> >>>> Hi Ankit, >>>> >>>> Without looking at the Spark UI and the stages/DAG, I'm guessing you're >>>> running on default number of Spark shuffle partitions. >>>> >>>> If you're seeing a lot of shuffle spill, you likely have to increase >>>> the number of shuffle partitions to accommodate the huge shuffle size. >>>> >>>> I hope that helps >>>> Chris >>>> >>>> On Sat, 7 Sep 2019, 4:18 pm Ankit Khettry, >>>> wrote: >>>> >>>>> Nope, it's a batch job. >>>>> >>>>> Best Regards >>>>> Ankit Khettry >>>>> >>>>> On Sat, 7 Sep, 2019, 6:52 AM Upasana Sharma, <028upasana...@gmail.com> >>>>> wrote: >>>>> >>>>>> Is it a streaming job? >>>>>> >>>>>> On Sat, Sep 7, 2019, 5:04 AM Ankit Khettry >>>>>> wrote: >>>>>> >>>>>>> I have a Spark job that consists of a large number of Window >>>>>>> operations and hence involves large shuffles. I have roughly 900 GiBs of >>>>>>> data, although I am using a large enough cluster (10 * m5.4xlarge >>>>>>> instances). I am using the following configurations for the job, >>>>>>> although I >>>>>>> have tried various other combinations without any success. >>>>>>> >>>>>>> spark.yarn.driver.memoryOverhead 6g >>>>>>> spark.storage.memoryFraction 0.1 >>>>>>> spark.executor.cores 6 >>>>>>> spark.executor.memory 36g >>>>>>> spark.memory.offHeap.size 8g >>>>>>> spark.memory.offHeap.enabled true >>>>>>> spark.executor.instances 10 >>>>>>> spark.driver.memory 14g >>>>>>> spark.yarn.executor.memoryOverhead 10g >>>>>>> >>>>>>> I keep running into the following OOM error: >>>>>>> >>>>>>> org.apache.spark.memory.SparkOutOfMemoryError: Unable to acquire >>>>>>> 16384 bytes of memory, got 0 >>>>>>> at >>>>>>> org.apache.spark.memory.MemoryConsumer.throwOom(MemoryConsumer.java:157) >>>>>>> at >>>>>>> org.apache.spark.memory.MemoryConsumer.allocateArray(MemoryConsumer.java:98) >>>>>>> at >>>>>>> org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter.(UnsafeInMemorySorter.java:128) >>>>>>> at >>>>>>> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.(UnsafeExternalSorter.java:163) >>>>>>> >>>>>>> I see there are a large number of JIRAs in place for similar issues >>>>>>> and a great many of them are even marked resolved. >>>>>>> Can someone guide me as to how to approach this problem? I am using >>>>>>> Databricks Spark 2.4.1. >>>>>>> >>>>>>> Best Regards >>>>>>> Ankit Khettry >>>>>>> >>>>>>
Re: OOM Error
You can try, consider processing each partition separately if your data is heavily skewed when you partition it. On Sat, 7 Sep 2019, 7:19 pm Ankit Khettry, wrote: > Thanks Chris > > Going to try it soon by setting maybe spark.sql.shuffle.partitions to > 2001. Also, I was wondering if it would help if I repartition the data by > the fields I am using in group by and window operations? > > Best Regards > Ankit Khettry > > On Sat, 7 Sep, 2019, 1:05 PM Chris Teoh, wrote: > >> Hi Ankit, >> >> Without looking at the Spark UI and the stages/DAG, I'm guessing you're >> running on default number of Spark shuffle partitions. >> >> If you're seeing a lot of shuffle spill, you likely have to increase the >> number of shuffle partitions to accommodate the huge shuffle size. >> >> I hope that helps >> Chris >> >> On Sat, 7 Sep 2019, 4:18 pm Ankit Khettry, >> wrote: >> >>> Nope, it's a batch job. >>> >>> Best Regards >>> Ankit Khettry >>> >>> On Sat, 7 Sep, 2019, 6:52 AM Upasana Sharma, <028upasana...@gmail.com> >>> wrote: >>> >>>> Is it a streaming job? >>>> >>>> On Sat, Sep 7, 2019, 5:04 AM Ankit Khettry >>>> wrote: >>>> >>>>> I have a Spark job that consists of a large number of Window >>>>> operations and hence involves large shuffles. I have roughly 900 GiBs of >>>>> data, although I am using a large enough cluster (10 * m5.4xlarge >>>>> instances). I am using the following configurations for the job, although >>>>> I >>>>> have tried various other combinations without any success. >>>>> >>>>> spark.yarn.driver.memoryOverhead 6g >>>>> spark.storage.memoryFraction 0.1 >>>>> spark.executor.cores 6 >>>>> spark.executor.memory 36g >>>>> spark.memory.offHeap.size 8g >>>>> spark.memory.offHeap.enabled true >>>>> spark.executor.instances 10 >>>>> spark.driver.memory 14g >>>>> spark.yarn.executor.memoryOverhead 10g >>>>> >>>>> I keep running into the following OOM error: >>>>> >>>>> org.apache.spark.memory.SparkOutOfMemoryError: Unable to acquire 16384 >>>>> bytes of memory, got 0 >>>>> at >>>>> org.apache.spark.memory.MemoryConsumer.throwOom(MemoryConsumer.java:157) >>>>> at >>>>> org.apache.spark.memory.MemoryConsumer.allocateArray(MemoryConsumer.java:98) >>>>> at >>>>> org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter.(UnsafeInMemorySorter.java:128) >>>>> at >>>>> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.(UnsafeExternalSorter.java:163) >>>>> >>>>> I see there are a large number of JIRAs in place for similar issues >>>>> and a great many of them are even marked resolved. >>>>> Can someone guide me as to how to approach this problem? I am using >>>>> Databricks Spark 2.4.1. >>>>> >>>>> Best Regards >>>>> Ankit Khettry >>>>> >>>>
Re: OOM Error
Thanks Chris Going to try it soon by setting maybe spark.sql.shuffle.partitions to 2001. Also, I was wondering if it would help if I repartition the data by the fields I am using in group by and window operations? Best Regards Ankit Khettry On Sat, 7 Sep, 2019, 1:05 PM Chris Teoh, wrote: > Hi Ankit, > > Without looking at the Spark UI and the stages/DAG, I'm guessing you're > running on default number of Spark shuffle partitions. > > If you're seeing a lot of shuffle spill, you likely have to increase the > number of shuffle partitions to accommodate the huge shuffle size. > > I hope that helps > Chris > > On Sat, 7 Sep 2019, 4:18 pm Ankit Khettry, > wrote: > >> Nope, it's a batch job. >> >> Best Regards >> Ankit Khettry >> >> On Sat, 7 Sep, 2019, 6:52 AM Upasana Sharma, <028upasana...@gmail.com> >> wrote: >> >>> Is it a streaming job? >>> >>> On Sat, Sep 7, 2019, 5:04 AM Ankit Khettry >>> wrote: >>> >>>> I have a Spark job that consists of a large number of Window operations >>>> and hence involves large shuffles. I have roughly 900 GiBs of data, >>>> although I am using a large enough cluster (10 * m5.4xlarge instances). I >>>> am using the following configurations for the job, although I have tried >>>> various other combinations without any success. >>>> >>>> spark.yarn.driver.memoryOverhead 6g >>>> spark.storage.memoryFraction 0.1 >>>> spark.executor.cores 6 >>>> spark.executor.memory 36g >>>> spark.memory.offHeap.size 8g >>>> spark.memory.offHeap.enabled true >>>> spark.executor.instances 10 >>>> spark.driver.memory 14g >>>> spark.yarn.executor.memoryOverhead 10g >>>> >>>> I keep running into the following OOM error: >>>> >>>> org.apache.spark.memory.SparkOutOfMemoryError: Unable to acquire 16384 >>>> bytes of memory, got 0 >>>> at >>>> org.apache.spark.memory.MemoryConsumer.throwOom(MemoryConsumer.java:157) >>>> at >>>> org.apache.spark.memory.MemoryConsumer.allocateArray(MemoryConsumer.java:98) >>>> at >>>> org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter.(UnsafeInMemorySorter.java:128) >>>> at >>>> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.(UnsafeExternalSorter.java:163) >>>> >>>> I see there are a large number of JIRAs in place for similar issues and >>>> a great many of them are even marked resolved. >>>> Can someone guide me as to how to approach this problem? I am using >>>> Databricks Spark 2.4.1. >>>> >>>> Best Regards >>>> Ankit Khettry >>>> >>>
Re: OOM Error
Hi Ankit, Without looking at the Spark UI and the stages/DAG, I'm guessing you're running on default number of Spark shuffle partitions. If you're seeing a lot of shuffle spill, you likely have to increase the number of shuffle partitions to accommodate the huge shuffle size. I hope that helps Chris On Sat, 7 Sep 2019, 4:18 pm Ankit Khettry, wrote: > Nope, it's a batch job. > > Best Regards > Ankit Khettry > > On Sat, 7 Sep, 2019, 6:52 AM Upasana Sharma, <028upasana...@gmail.com> > wrote: > >> Is it a streaming job? >> >> On Sat, Sep 7, 2019, 5:04 AM Ankit Khettry >> wrote: >> >>> I have a Spark job that consists of a large number of Window operations >>> and hence involves large shuffles. I have roughly 900 GiBs of data, >>> although I am using a large enough cluster (10 * m5.4xlarge instances). I >>> am using the following configurations for the job, although I have tried >>> various other combinations without any success. >>> >>> spark.yarn.driver.memoryOverhead 6g >>> spark.storage.memoryFraction 0.1 >>> spark.executor.cores 6 >>> spark.executor.memory 36g >>> spark.memory.offHeap.size 8g >>> spark.memory.offHeap.enabled true >>> spark.executor.instances 10 >>> spark.driver.memory 14g >>> spark.yarn.executor.memoryOverhead 10g >>> >>> I keep running into the following OOM error: >>> >>> org.apache.spark.memory.SparkOutOfMemoryError: Unable to acquire 16384 >>> bytes of memory, got 0 >>> at >>> org.apache.spark.memory.MemoryConsumer.throwOom(MemoryConsumer.java:157) >>> at >>> org.apache.spark.memory.MemoryConsumer.allocateArray(MemoryConsumer.java:98) >>> at >>> org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter.(UnsafeInMemorySorter.java:128) >>> at >>> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.(UnsafeExternalSorter.java:163) >>> >>> I see there are a large number of JIRAs in place for similar issues and >>> a great many of them are even marked resolved. >>> Can someone guide me as to how to approach this problem? I am using >>> Databricks Spark 2.4.1. >>> >>> Best Regards >>> Ankit Khettry >>> >>
Re: OOM Error
Nope, it's a batch job. Best Regards Ankit Khettry On Sat, 7 Sep, 2019, 6:52 AM Upasana Sharma, <028upasana...@gmail.com> wrote: > Is it a streaming job? > > On Sat, Sep 7, 2019, 5:04 AM Ankit Khettry > wrote: > >> I have a Spark job that consists of a large number of Window operations >> and hence involves large shuffles. I have roughly 900 GiBs of data, >> although I am using a large enough cluster (10 * m5.4xlarge instances). I >> am using the following configurations for the job, although I have tried >> various other combinations without any success. >> >> spark.yarn.driver.memoryOverhead 6g >> spark.storage.memoryFraction 0.1 >> spark.executor.cores 6 >> spark.executor.memory 36g >> spark.memory.offHeap.size 8g >> spark.memory.offHeap.enabled true >> spark.executor.instances 10 >> spark.driver.memory 14g >> spark.yarn.executor.memoryOverhead 10g >> >> I keep running into the following OOM error: >> >> org.apache.spark.memory.SparkOutOfMemoryError: Unable to acquire 16384 >> bytes of memory, got 0 >> at >> org.apache.spark.memory.MemoryConsumer.throwOom(MemoryConsumer.java:157) >> at >> org.apache.spark.memory.MemoryConsumer.allocateArray(MemoryConsumer.java:98) >> at >> org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter.(UnsafeInMemorySorter.java:128) >> at >> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.(UnsafeExternalSorter.java:163) >> >> I see there are a large number of JIRAs in place for similar issues and a >> great many of them are even marked resolved. >> Can someone guide me as to how to approach this problem? I am using >> Databricks Spark 2.4.1. >> >> Best Regards >> Ankit Khettry >> >
Re: OOM Error
Is it a streaming job? On Sat, Sep 7, 2019, 5:04 AM Ankit Khettry wrote: > I have a Spark job that consists of a large number of Window operations > and hence involves large shuffles. I have roughly 900 GiBs of data, > although I am using a large enough cluster (10 * m5.4xlarge instances). I > am using the following configurations for the job, although I have tried > various other combinations without any success. > > spark.yarn.driver.memoryOverhead 6g > spark.storage.memoryFraction 0.1 > spark.executor.cores 6 > spark.executor.memory 36g > spark.memory.offHeap.size 8g > spark.memory.offHeap.enabled true > spark.executor.instances 10 > spark.driver.memory 14g > spark.yarn.executor.memoryOverhead 10g > > I keep running into the following OOM error: > > org.apache.spark.memory.SparkOutOfMemoryError: Unable to acquire 16384 > bytes of memory, got 0 > at org.apache.spark.memory.MemoryConsumer.throwOom(MemoryConsumer.java:157) > at > org.apache.spark.memory.MemoryConsumer.allocateArray(MemoryConsumer.java:98) > at > org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter.(UnsafeInMemorySorter.java:128) > at > org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.(UnsafeExternalSorter.java:163) > > I see there are a large number of JIRAs in place for similar issues and a > great many of them are even marked resolved. > Can someone guide me as to how to approach this problem? I am using > Databricks Spark 2.4.1. > > Best Regards > Ankit Khettry >
OOM Error
I have a Spark job that consists of a large number of Window operations and hence involves large shuffles. I have roughly 900 GiBs of data, although I am using a large enough cluster (10 * m5.4xlarge instances). I am using the following configurations for the job, although I have tried various other combinations without any success. spark.yarn.driver.memoryOverhead 6g spark.storage.memoryFraction 0.1 spark.executor.cores 6 spark.executor.memory 36g spark.memory.offHeap.size 8g spark.memory.offHeap.enabled true spark.executor.instances 10 spark.driver.memory 14g spark.yarn.executor.memoryOverhead 10g I keep running into the following OOM error: org.apache.spark.memory.SparkOutOfMemoryError: Unable to acquire 16384 bytes of memory, got 0 at org.apache.spark.memory.MemoryConsumer.throwOom(MemoryConsumer.java:157) at org.apache.spark.memory.MemoryConsumer.allocateArray(MemoryConsumer.java:98) at org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter.(UnsafeInMemorySorter.java:128) at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.(UnsafeExternalSorter.java:163) I see there are a large number of JIRAs in place for similar issues and a great many of them are even marked resolved. Can someone guide me as to how to approach this problem? I am using Databricks Spark 2.4.1. Best Regards Ankit Khettry
Re: Spark 2.0.0 OOM error at beginning of RDD map on AWS
Also for the record, turning on kryo was not able to help. On Tue, Aug 23, 2016 at 12:58 PM, Arun Luthra <arun.lut...@gmail.com> wrote: > Splitting up the Maps to separate objects did not help. > > However, I was able to work around the problem by reimplementing it with > RDD joins. > > On Aug 18, 2016 5:16 PM, "Arun Luthra" <arun.lut...@gmail.com> wrote: > >> This might be caused by a few large Map objects that Spark is trying to >> serialize. These are not broadcast variables or anything, they're just >> regular objects. >> >> Would it help if I further indexed these maps into a two-level Map i.e. >> Map[String, Map[String, Int]] ? Or would this still count against me? >> >> What if I manually split them up into numerous Map variables? >> >> On Mon, Aug 15, 2016 at 2:12 PM, Arun Luthra <arun.lut...@gmail.com> >> wrote: >> >>> I got this OOM error in Spark local mode. The error seems to have been >>> at the start of a stage (all of the stages on the UI showed as complete, >>> there were more stages to do but had not showed up on the UI yet). >>> >>> There appears to be ~100G of free memory at the time of the error. >>> >>> Spark 2.0.0 >>> 200G driver memory >>> local[30] >>> 8 /mntX/tmp directories for spark.local.dir >>> "spark.sql.shuffle.partitions", "500" >>> "spark.driver.maxResultSize","500" >>> "spark.default.parallelism", "1000" >>> >>> The line number for the error is at an RDD map operation where there are >>> some potentially large Map objects that are going to be accessed by each >>> record. Does it matter if they are broadcast variables or not? I imagine >>> not because its in local mode they should be available in memory to every >>> executor/core. >>> >>> Possibly related: >>> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Cl >>> osureCleaner-or-java-serializer-OOM-when-trying-to-grow-td24796.html >>> >>> Exception in thread "main" java.lang.OutOfMemoryError >>> at java.io.ByteArrayOutputStream.hugeCapacity(ByteArrayOutputSt >>> ream.java:123) >>> at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:117) >>> at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutput >>> Stream.java:93) >>> at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:153) >>> at java.io.ObjectOutputStream$BlockDataOutputStream.drain(Objec >>> tOutputStream.java:1877) >>> at java.io.ObjectOutputStream$BlockDataOutputStream.setBlockDat >>> aMode(ObjectOutputStream.java:1786) >>> at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1189) >>> at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348) >>> at org.apache.spark.serializer.JavaSerializationStream.writeObj >>> ect(JavaSerializer.scala:43) >>> at org.apache.spark.serializer.JavaSerializerInstance.serialize >>> (JavaSerializer.scala:100) >>> at org.apache.spark.util.ClosureCleaner$.ensureSerializable(Clo >>> sureCleaner.scala:295) >>> at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ >>> ClosureCleaner$$clean(ClosureCleaner.scala:288) >>> at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:108) >>> at org.apache.spark.SparkContext.clean(SparkContext.scala:2037) >>> at org.apache.spark.rdd.RDD$$anonfun$map$1.apply(RDD.scala:366) >>> at org.apache.spark.rdd.RDD$$anonfun$map$1.apply(RDD.scala:365) >>> at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperati >>> onScope.scala:151) >>> at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperati >>> onScope.scala:112) >>> at org.apache.spark.rdd.RDD.withScope(RDD.scala:358) >>> at org.apache.spark.rdd.RDD.map(RDD.scala:365) >>> at abc.Abc$.main(abc.scala:395) >>> at abc.Abc.main(abc.scala) >>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >>> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAcce >>> ssorImpl.java:62) >>> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMe >>> thodAccessorImpl.java:43) >>> at java.lang.reflect.Method.invoke(Method.java:498) >>> at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy >>> $SparkSubmit$$runMain(SparkSubmit.scala:729) >>> at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit >>> .scala:185) >>> at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210) >>> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124) >>> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) >>> >>> >>
Re: Spark 2.0.0 OOM error at beginning of RDD map on AWS
Splitting up the Maps to separate objects did not help. However, I was able to work around the problem by reimplementing it with RDD joins. On Aug 18, 2016 5:16 PM, "Arun Luthra" <arun.lut...@gmail.com> wrote: > This might be caused by a few large Map objects that Spark is trying to > serialize. These are not broadcast variables or anything, they're just > regular objects. > > Would it help if I further indexed these maps into a two-level Map i.e. > Map[String, Map[String, Int]] ? Or would this still count against me? > > What if I manually split them up into numerous Map variables? > > On Mon, Aug 15, 2016 at 2:12 PM, Arun Luthra <arun.lut...@gmail.com> > wrote: > >> I got this OOM error in Spark local mode. The error seems to have been at >> the start of a stage (all of the stages on the UI showed as complete, there >> were more stages to do but had not showed up on the UI yet). >> >> There appears to be ~100G of free memory at the time of the error. >> >> Spark 2.0.0 >> 200G driver memory >> local[30] >> 8 /mntX/tmp directories for spark.local.dir >> "spark.sql.shuffle.partitions", "500" >> "spark.driver.maxResultSize","500" >> "spark.default.parallelism", "1000" >> >> The line number for the error is at an RDD map operation where there are >> some potentially large Map objects that are going to be accessed by each >> record. Does it matter if they are broadcast variables or not? I imagine >> not because its in local mode they should be available in memory to every >> executor/core. >> >> Possibly related: >> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Cl >> osureCleaner-or-java-serializer-OOM-when-trying-to-grow-td24796.html >> >> Exception in thread "main" java.lang.OutOfMemoryError >> at java.io.ByteArrayOutputStream.hugeCapacity(ByteArrayOutputSt >> ream.java:123) >> at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:117) >> at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutput >> Stream.java:93) >> at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:153) >> at java.io.ObjectOutputStream$BlockDataOutputStream.drain(Objec >> tOutputStream.java:1877) >> at java.io.ObjectOutputStream$BlockDataOutputStream.setBlockDat >> aMode(ObjectOutputStream.java:1786) >> at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1189) >> at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348) >> at org.apache.spark.serializer.JavaSerializationStream.writeObj >> ect(JavaSerializer.scala:43) >> at org.apache.spark.serializer.JavaSerializerInstance.serialize >> (JavaSerializer.scala:100) >> at org.apache.spark.util.ClosureCleaner$.ensureSerializable(Clo >> sureCleaner.scala:295) >> at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ >> ClosureCleaner$$clean(ClosureCleaner.scala:288) >> at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:108) >> at org.apache.spark.SparkContext.clean(SparkContext.scala:2037) >> at org.apache.spark.rdd.RDD$$anonfun$map$1.apply(RDD.scala:366) >> at org.apache.spark.rdd.RDD$$anonfun$map$1.apply(RDD.scala:365) >> at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperati >> onScope.scala:151) >> at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperati >> onScope.scala:112) >> at org.apache.spark.rdd.RDD.withScope(RDD.scala:358) >> at org.apache.spark.rdd.RDD.map(RDD.scala:365) >> at abc.Abc$.main(abc.scala:395) >> at abc.Abc.main(abc.scala) >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAcce >> ssorImpl.java:62) >> at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMe >> thodAccessorImpl.java:43) >> at java.lang.reflect.Method.invoke(Method.java:498) >> at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy >> $SparkSubmit$$runMain(SparkSubmit.scala:729) >> at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit >> .scala:185) >> at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210) >> at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124) >> at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) >> >> >
Re: Spark 2.0.0 OOM error at beginning of RDD map on AWS
This might be caused by a few large Map objects that Spark is trying to serialize. These are not broadcast variables or anything, they're just regular objects. Would it help if I further indexed these maps into a two-level Map i.e. Map[String, Map[String, Int]] ? Or would this still count against me? What if I manually split them up into numerous Map variables? On Mon, Aug 15, 2016 at 2:12 PM, Arun Luthra <arun.lut...@gmail.com> wrote: > I got this OOM error in Spark local mode. The error seems to have been at > the start of a stage (all of the stages on the UI showed as complete, there > were more stages to do but had not showed up on the UI yet). > > There appears to be ~100G of free memory at the time of the error. > > Spark 2.0.0 > 200G driver memory > local[30] > 8 /mntX/tmp directories for spark.local.dir > "spark.sql.shuffle.partitions", "500" > "spark.driver.maxResultSize","500" > "spark.default.parallelism", "1000" > > The line number for the error is at an RDD map operation where there are > some potentially large Map objects that are going to be accessed by each > record. Does it matter if they are broadcast variables or not? I imagine > not because its in local mode they should be available in memory to every > executor/core. > > Possibly related: > http://apache-spark-user-list.1001560.n3.nabble.com/Spark- > ClosureCleaner-or-java-serializer-OOM-when-trying-to-grow-td24796.html > > Exception in thread "main" java.lang.OutOfMemoryError > at java.io.ByteArrayOutputStream.hugeCapacity(ByteArrayOutputStream.java: > 123) > at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:117) > at java.io.ByteArrayOutputStream.ensureCapacity( > ByteArrayOutputStream.java:93) > at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:153) > at java.io.ObjectOutputStream$BlockDataOutputStream.drain( > ObjectOutputStream.java:1877) > at java.io.ObjectOutputStream$BlockDataOutputStream.setBlockDataMode( > ObjectOutputStream.java:1786) > at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1189) > at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348) > at org.apache.spark.serializer.JavaSerializationStream. > writeObject(JavaSerializer.scala:43) > at org.apache.spark.serializer.JavaSerializerInstance. > serialize(JavaSerializer.scala:100) > at org.apache.spark.util.ClosureCleaner$.ensureSerializable( > ClosureCleaner.scala:295) > at org.apache.spark.util.ClosureCleaner$.org$apache$ > spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:288) > at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:108) > at org.apache.spark.SparkContext.clean(SparkContext.scala:2037) > at org.apache.spark.rdd.RDD$$anonfun$map$1.apply(RDD.scala:366) > at org.apache.spark.rdd.RDD$$anonfun$map$1.apply(RDD.scala:365) > at org.apache.spark.rdd.RDDOperationScope$.withScope( > RDDOperationScope.scala:151) > at org.apache.spark.rdd.RDDOperationScope$.withScope( > RDDOperationScope.scala:112) > at org.apache.spark.rdd.RDD.withScope(RDD.scala:358) > at org.apache.spark.rdd.RDD.map(RDD.scala:365) > at abc.Abc$.main(abc.scala:395) > at abc.Abc.main(abc.scala) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at sun.reflect.NativeMethodAccessorImpl.invoke( > NativeMethodAccessorImpl.java:62) > at sun.reflect.DelegatingMethodAccessorImpl.invoke( > DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$ > deploy$SparkSubmit$$runMain(SparkSubmit.scala:729) > at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:185) > at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > >
Spark 2.0.0 OOM error at beginning of RDD map on AWS
I got this OOM error in Spark local mode. The error seems to have been at the start of a stage (all of the stages on the UI showed as complete, there were more stages to do but had not showed up on the UI yet). There appears to be ~100G of free memory at the time of the error. Spark 2.0.0 200G driver memory local[30] 8 /mntX/tmp directories for spark.local.dir "spark.sql.shuffle.partitions", "500" "spark.driver.maxResultSize","500" "spark.default.parallelism", "1000" The line number for the error is at an RDD map operation where there are some potentially large Map objects that are going to be accessed by each record. Does it matter if they are broadcast variables or not? I imagine not because its in local mode they should be available in memory to every executor/core. Possibly related: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-ClosureCleaner-or-java-serializer-OOM-when-trying-to-grow-td24796.html Exception in thread "main" java.lang.OutOfMemoryError at java.io.ByteArrayOutputStream.hugeCapacity(ByteArrayOutputStream.java:123) at java.io.ByteArrayOutputStream.grow(ByteArrayOutputStream.java:117) at java.io.ByteArrayOutputStream.ensureCapacity(ByteArrayOutputStream.java:93) at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:153) at java.io.ObjectOutputStream$BlockDataOutputStream.drain(ObjectOutputStream.java:1877) at java.io.ObjectOutputStream$BlockDataOutputStream.setBlockDataMode(ObjectOutputStream.java:1786) at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1189) at java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:348) at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:43) at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:100) at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:295) at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:288) at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:108) at org.apache.spark.SparkContext.clean(SparkContext.scala:2037) at org.apache.spark.rdd.RDD$$anonfun$map$1.apply(RDD.scala:366) at org.apache.spark.rdd.RDD$$anonfun$map$1.apply(RDD.scala:365) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112) at org.apache.spark.rdd.RDD.withScope(RDD.scala:358) at org.apache.spark.rdd.RDD.map(RDD.scala:365) at abc.Abc$.main(abc.scala:395) at abc.Abc.main(abc.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:729) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:185) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:210) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:124) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
OOM error in Spark worker
My workers are going OOM over time. I am running a streaming job in spark 1.4.0. Here is the heap dump of workers. *16,802 instances of "org.apache.spark.deploy.worker.ExecutorRunner", loaded by "sun.misc.Launcher$AppClassLoader @ 0xdff94088" occupy 488,249,688 (95.80%) bytes. These instances are referenced from one instance of "java.lang.Object[]", loaded by "" Keywords org.apache.spark.deploy.worker.ExecutorRunner java.lang.Object[] sun.misc.Launcher$AppClassLoader @ 0xdff94088 * is this because of this bug: http://apache-spark-developers-list.1001551.n3.nabble.com/Worker-memory-leaks-td13341.html https://issues.apache.org/jira/browse/SPARK-9202 Also, I am getting below error continuously if one of the worker/executor dies on any node in my spark cluster. If I start the worker also, error doesn't go. I have to force_kill my streaming job and restart to fix the issue. Is it some bug? I am using Spark 1.4.0. *MY_IP in logs is IP of worker node which failed. * *15/09/03 11:29:11 WARN BlockManagerMaster: Failed to remove RDD 194218 - Ask timed out on [Actor[akka.tcp://sparkExecutor@MY_IP:38223/user/BlockManagerEndpoint1#656884654]] after [12 ms]} akka.pattern.AskTimeoutException: Ask timed out on [Actor[akka.tcp://sparkExecutor@MY_IP:38223/user/BlockManagerEndpoint1#656884654]] after [12 ms] at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:333) at akka.actor.Scheduler$$anon$7.run(Scheduler.scala:117) at scala.concurrent.Future$InternalCallbackExecutor$.scala$concurrent$Future$InternalCallbackExecutor$$unbatchedExecute(Future.scala:694) at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:691) at akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(Scheduler.scala:467) at akka.actor.LightArrayRevolverScheduler$$anon$8.executeBucket$1(Scheduler.scala:419) at akka.actor.LightArrayRevolverScheduler$$anon$8.nextTick(Scheduler.scala:423) at akka.actor.LightArrayRevolverScheduler$$anon$8.run(Scheduler.scala:375) at java.lang.Thread.run(Thread.java:745) 15/09/03 11:29:11 WARN BlockManagerMaster: Failed to remove RDD 194217 - Ask timed out on [Actor[akka.tcp://sparkExecutor@MY_IP:38223/user/BlockManagerEndpoint1#656884654]] after [12 ms]} akka.pattern.AskTimeoutException: Ask timed out on [Actor[akka.tcp://sparkExecutor@MY_IP:38223/user/BlockManagerEndpoint1#656884654]] after [12 ms] at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:333) at akka.actor.Scheduler$$anon$7.run(Scheduler.scala:117) at scala.concurrent.Future$InternalCallbackExecutor$.scala$concurrent$Future$InternalCallbackExecutor$$unbatchedExecute(Future.scala:694) at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:691) at akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(Scheduler.scala:467) at akka.actor.LightArrayRevolverScheduler$$anon$8.executeBucket$1(Scheduler.scala:419) at akka.actor.LightArrayRevolverScheduler$$anon$8.nextTick(Scheduler.scala:423) at akka.actor.LightArrayRevolverScheduler$$anon$8.run(Scheduler.scala:375) at java.lang.Thread.run(Thread.java:745) 15/09/03 11:29:11 ERROR SparkDeploySchedulerBackend: Asked to remove non-existent executor 16723 15/09/03 11:29:11 WARN BlockManagerMaster: Failed to remove RDD 194216 - Ask timed out on [Actor[akka.tcp://sparkExecutor@MY_IP:38223/user/BlockManagerEndpoint1#656884654]] after [12 ms]} * *It is easily reproducible if I manually stop a worker on one of my node. * *15/09/03 23:52:18 ERROR SparkDeploySchedulerBackend: Asked to remove non-existent executor 329 15/09/03 23:52:18 ERROR SparkDeploySchedulerBackend: Asked to remove non-existent executor 333 15/09/03 23:52:18 ERROR SparkDeploySchedulerBackend: Asked to remove non-existent executor 334 * *It doesn't go even if I start the worker again. Follow up question: If my streaming job has consumed some events from Kafka topic and are pending to be scheduled because of delay in processing... Will my force killing the streaming job lose that data which is not yet scheduled? * -- *VARUN SHARMA* *Flipkart* *Bangalore*
OOM error in Spark worker
My workers are going OOM over time. I am running a streaming job in spark 1.4.0. Here is the heap dump of workers. /16,802 instances of "org.apache.spark.deploy.worker.ExecutorRunner", loaded by "sun.misc.Launcher$AppClassLoader @ 0xdff94088" occupy 488,249,688 (95.80%) bytes. These instances are referenced from one instance of "java.lang.Object[]", loaded by "" Keywords org.apache.spark.deploy.worker.ExecutorRunner java.lang.Object[] sun.misc.Launcher$AppClassLoader @ 0xdff94088 / I am getting below error continuously if one of the worker/executor dies on any node in my spark cluster. If I start the worker also, error doesn't go. I have to force_kill my streaming job and restart to fix the issue. Is it some bug? I am using Spark 1.4.0. MY_IP in logs is IP of worker node which failed. /15/09/03 11:29:11 WARN BlockManagerMaster: Failed to remove RDD 194218 - Ask timed out on [Actor[akka.tcp://sparkExecutor@MY_IP:38223/user/BlockManagerEndpoint1#656884654]] after [12 ms]} akka.pattern.AskTimeoutException: Ask timed out on [Actor[akka.tcp://sparkExecutor@MY_IP:38223/user/BlockManagerEndpoint1#656884654]] after [12 ms] at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:333) at akka.actor.Scheduler$$anon$7.run(Scheduler.scala:117) at scala.concurrent.Future$InternalCallbackExecutor$.scala$concurrent$Future$InternalCallbackExecutor$$unbatchedExecute(Future.scala:694) at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:691) at akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(Scheduler.scala:467) at akka.actor.LightArrayRevolverScheduler$$anon$8.executeBucket$1(Scheduler.scala:419) at akka.actor.LightArrayRevolverScheduler$$anon$8.nextTick(Scheduler.scala:423) at akka.actor.LightArrayRevolverScheduler$$anon$8.run(Scheduler.scala:375) at java.lang.Thread.run(Thread.java:745) 15/09/03 11:29:11 WARN BlockManagerMaster: Failed to remove RDD 194217 - Ask timed out on [Actor[akka.tcp://sparkExecutor@MY_IP:38223/user/BlockManagerEndpoint1#656884654]] after [12 ms]} akka.pattern.AskTimeoutException: Ask timed out on [Actor[akka.tcp://sparkExecutor@MY_IP:38223/user/BlockManagerEndpoint1#656884654]] after [12 ms] at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:333) at akka.actor.Scheduler$$anon$7.run(Scheduler.scala:117) at scala.concurrent.Future$InternalCallbackExecutor$.scala$concurrent$Future$InternalCallbackExecutor$$unbatchedExecute(Future.scala:694) at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:691) at akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(Scheduler.scala:467) at akka.actor.LightArrayRevolverScheduler$$anon$8.executeBucket$1(Scheduler.scala:419) at akka.actor.LightArrayRevolverScheduler$$anon$8.nextTick(Scheduler.scala:423) at akka.actor.LightArrayRevolverScheduler$$anon$8.run(Scheduler.scala:375) at java.lang.Thread.run(Thread.java:745) 15/09/03 11:29:11 ERROR SparkDeploySchedulerBackend: Asked to remove non-existent executor 16723 15/09/03 11:29:11 WARN BlockManagerMaster: Failed to remove RDD 194216 - Ask timed out on [Actor[akka.tcp://sparkExecutor@MY_IP:38223/user/BlockManagerEndpoint1#656884654]] after [12 ms]} / It is easily reproducible if I manually stop a worker on one of my node. /15/09/03 23:52:18 ERROR SparkDeploySchedulerBackend: Asked to remove non-existent executor 329 15/09/03 23:52:18 ERROR SparkDeploySchedulerBackend: Asked to remove non-existent executor 333 15/09/03 23:52:18 ERROR SparkDeploySchedulerBackend: Asked to remove non-existent executor 334 / It doesn't go even if I start the worker again. Follow up question: If my streaming job has consumed some events from Kafka topic and are pending to be scheduled because of delay in processing... Will my force killing the streaming job lose that data which is not yet scheduled? Please help ASAP. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/OOM-error-in-Spark-worker-tp24856.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: OOM error with GMMs on 4GB dataset
Did you set `--driver-memory` with spark-submit? -Xiangrui On Mon, May 4, 2015 at 5:16 PM, Vinay Muttineni vmuttin...@ebay.com wrote: Hi, I am training a GMM with 10 gaussians on a 4 GB dataset(720,000 * 760). The spark (1.3.1) job is allocated 120 executors with 6GB each and the driver also has 6GB. Spark Config Params: .set(spark.hadoop.validateOutputSpecs, false).set(spark.dynamicAllocation.enabled, false).set(spark.driver.maxResultSize, 4g).set(spark.default.parallelism, 300).set(spark.serializer, org.apache.spark.serializer.KryoSerializer).set(spark.kryoserializer.buffer.mb, 500).set(spark.akka.frameSize, 256).set(spark.akka.timeout, 300) However, at the aggregate step (Line 168) val sums = breezeData.aggregate(ExpectationSum.zero(k, d))(compute.value, _ += _) I get OOM error and the application hangs indefinitely. Is this an issue or am I missing something? java.lang.OutOfMemoryError: Java heap space at akka.util.CompactByteString$.apply(ByteString.scala:410) at akka.util.ByteString$.apply(ByteString.scala:22) at akka.remote.transport.netty.TcpHandlers$class.onMessage(TcpSupport.scala:45) at akka.remote.transport.netty.TcpServerHandler.onMessage(TcpSupport.scala:57) at akka.remote.transport.netty.NettyServerHelpers$class.messageReceived(NettyHelpers.scala:43) at akka.remote.transport.netty.ServerHandler.messageReceived(NettyTransport.scala:180) at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296) at org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:462) at org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:443) at org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:310) at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268) at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255) at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108) at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89) at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 15/05/04 16:23:38 ERROR util.Utils: Uncaught exception in thread task-result-getter-2 java.lang.OutOfMemoryError: Java heap space Exception in thread task-result-getter-2 java.lang.OutOfMemoryError: Java heap space 15/05/04 16:23:45 INFO scheduler.TaskSetManager: Finished task 1070.0 in stage 6.0 (TID 8276) in 382069 ms on [] (160/3600) 15/05/04 16:23:54 WARN channel.DefaultChannelPipeline: An exception was thrown by a user handler while handling an exception event ([id: 0xc57da871, ] EXCEPTION: java.lang.OutOfMemoryError: Java heap space) java.lang.OutOfMemoryError: Java heap space 15/05/04 16:23:55 WARN channel.DefaultChannelPipeline: An exception was thrown by a user handler while handling an exception event ([id: 0x3c3dbb0c, ] EXCEPTION: java.lang.OutOfMemoryError: Java heap space) 15/05/04 16:24:45 ERROR actor.ActorSystemImpl: Uncaught fatal error from thread [sparkDriver-akka.remote.default-remote-dispatcher-6] shutting down ActorSystem [sparkDriver] Thanks! Vinay - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
OOM error with GMMs on 4GB dataset
Hi, I am training a GMM with 10 gaussians on a 4 GB dataset(720,000 * 760). The spark (1.3.1) job is allocated 120 executors with 6GB each and the driver also has 6GB. Spark Config Params: .set(spark.hadoop.validateOutputSpecs, false).set(spark.dynamicAllocation.enabled, false).set(spark.driver.maxResultSize, 4g).set(spark.default.parallelism, 300).set(spark.serializer, org.apache.spark.serializer.KryoSerializer).set(spark.kryoserializer.buffer.mb, 500).set(spark.akka.frameSize, 256).set(spark.akka.timeout, 300) However, at the aggregate step (Line 168) val sums = breezeData.aggregate(ExpectationSum.zero(k, d))(compute.value, _ += _) I get OOM error and the application hangs indefinitely. Is this an issue or am I missing something? java.lang.OutOfMemoryError: Java heap space at akka.util.CompactByteString$.apply(ByteString.scala:410) at akka.util.ByteString$.apply(ByteString.scala:22) at akka.remote.transport.netty.TcpHandlers$class.onMessage(TcpSupport.scala:45) at akka.remote.transport.netty.TcpServerHandler.onMessage(TcpSupport.scala:57) at akka.remote.transport.netty.NettyServerHelpers$class.messageReceived(NettyHelpers.scala:43) at akka.remote.transport.netty.ServerHandler.messageReceived(NettyTransport.scala:180) at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296) at org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:462) at org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:443) at org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:310) at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268) at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255) at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108) at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89) at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 15/05/04 16:23:38 ERROR util.Utils: Uncaught exception in thread task-result-getter-2 java.lang.OutOfMemoryError: Java heap space Exception in thread task-result-getter-2 java.lang.OutOfMemoryError: Java heap space 15/05/04 16:23:45 INFO scheduler.TaskSetManager: Finished task 1070.0 in stage 6.0 (TID 8276) in 382069 ms on [] (160/3600) 15/05/04 16:23:54 WARN channel.DefaultChannelPipeline: An exception was thrown by a user handler while handling an exception event ([id: 0xc57da871, ] EXCEPTION: java.lang.OutOfMemoryError: Java heap space) java.lang.OutOfMemoryError: Java heap space 15/05/04 16:23:55 WARN channel.DefaultChannelPipeline: An exception was thrown by a user handler while handling an exception event ([id: 0x3c3dbb0c, ] EXCEPTION: java.lang.OutOfMemoryError: Java heap space) 15/05/04 16:24:45 ERROR actor.ActorSystemImpl: Uncaught fatal error from thread [sparkDriver-akka.remote.default-remote-dispatcher-6] shutting down ActorSystem [sparkDriver] Thanks! Vinay
Re: OOM error
Thanks for the pointer it led me to http://spark.apache.org/docs/1.2.0/tuning.html increasing parallelism resolved the issue. On Mon, Feb 16, 2015 at 11:57 PM, Akhil Das ak...@sigmoidanalytics.com wrote: Increase your executor memory, Also you can play around with increasing the number of partitions/parallelism etc. Thanks Best Regards On Tue, Feb 17, 2015 at 3:39 AM, Harshvardhan Chauhan ha...@gumgum.com wrote: Hi All, I need some help with Out Of Memory errors in my application. I am using Spark 1.1.0 and my application is using Java API. I am running my app on EC2 25 m3.xlarge (4 Cores 15GB Memory) instances. The app only fails sometimes. Lots of mapToPair tasks a failing. My app is configured to run 120 executors and executor memory is 2G. These are various errors i see the in my logs. 15/02/16 10:53:48 INFO storage.MemoryStore: Block broadcast_1 of size 4680 dropped from memory (free 257277829) 15/02/16 10:53:49 WARN channel.DefaultChannelPipeline: An exception was thrown by a user handler while handling an exception event ([id: 0x6e0138a3, /10.61.192.194:35196 = /10.164.164.228:49445] EXCEPTION: java.lang.OutOfMemoryError: Java heap space) java.lang.OutOfMemoryError: Java heap space at java.nio.HeapByteBuffer.init(HeapByteBuffer.java:57) at java.nio.ByteBuffer.allocate(ByteBuffer.java:331) at org.jboss.netty.buffer.CompositeChannelBuffer.toByteBuffer(CompositeChannelBuffer.java:649) at org.jboss.netty.buffer.AbstractChannelBuffer.toByteBuffer(AbstractChannelBuffer.java:530) at org.jboss.netty.channel.socket.nio.SocketSendBufferPool.acquire(SocketSendBufferPool.java:77) at org.jboss.netty.channel.socket.nio.SocketSendBufferPool.acquire(SocketSendBufferPool.java:46) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.write0(AbstractNioWorker.java:194) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.writeFromTaskLoop(AbstractNioWorker.java:152) at org.jboss.netty.channel.socket.nio.AbstractNioChannel$WriteTask.run(AbstractNioChannel.java:335) at org.jboss.netty.channel.socket.nio.AbstractNioSelector.processTaskQueue(AbstractNioSelector.java:366) at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:290) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90) at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 15/02/16 10:53:49 WARN channel.DefaultChannelPipeline: An exception was thrown by a user handler while handling an exception event ([id: 0x2d0c1db1, /10.169.226.254:55790 = /10.164.164.228:49445] EXCEPTION: java.lang.OutOfMemoryError: Java heap space) java.lang.OutOfMemoryError: Java heap space at java.nio.HeapByteBuffer.init(HeapByteBuffer.java:57) at java.nio.ByteBuffer.allocate(ByteBuffer.java:331) at org.jboss.netty.buffer.CompositeChannelBuffer.toByteBuffer(CompositeChannelBuffer.java:649) at org.jboss.netty.buffer.AbstractChannelBuffer.toByteBuffer(AbstractChannelBuffer.java:530) at org.jboss.netty.channel.socket.nio.SocketSendBufferPool.acquire(SocketSendBufferPool.java:77) at org.jboss.netty.channel.socket.nio.SocketSendBufferPool.acquire(SocketSendBufferPool.java:46) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.write0(AbstractNioWorker.java:194) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.writeFromTaskLoop(AbstractNioWorker.java:152) at org.jboss.netty.channel.socket.nio.AbstractNioChannel$WriteTask.run(AbstractNioChannel.java:335) at org.jboss.netty.channel.socket.nio.AbstractNioSelector.processTaskQueue(AbstractNioSelector.java:366) at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:290) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90) at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 15/02/16 10:53:50 WARN channel.DefaultChannelPipeline: An exception was thrown by a user handler while handling an exception event ([id: 0xd4211985, /10.181.125.52:60959 = /10.164.164.228:49445] EXCEPTION: java.lang.OutOfMemoryError: Java heap space) java.lang.OutOfMemoryError: Java heap space at java.nio.HeapByteBuffer.init(HeapByteBuffer.java:57) at java.nio.ByteBuffer.allocate(ByteBuffer.java:331) at
OOM error
Hi All, I need some help with Out Of Memory errors in my application. I am using Spark 1.1.0 and my application is using Java API. I am running my app on EC2 25 m3.xlarge (4 Cores 15GB Memory) instances. The app only fails sometimes. Lots of mapToPair tasks a failing. My app is configured to run 120 executors and executor memory is 2G. These are various errors i see the in my logs. 15/02/16 10:53:48 INFO storage.MemoryStore: Block broadcast_1 of size 4680 dropped from memory (free 257277829) 15/02/16 10:53:49 WARN channel.DefaultChannelPipeline: An exception was thrown by a user handler while handling an exception event ([id: 0x6e0138a3, /10.61.192.194:35196 = /10.164.164.228:49445] EXCEPTION: java.lang.OutOfMemoryError: Java heap space) java.lang.OutOfMemoryError: Java heap space at java.nio.HeapByteBuffer.init(HeapByteBuffer.java:57) at java.nio.ByteBuffer.allocate(ByteBuffer.java:331) at org.jboss.netty.buffer.CompositeChannelBuffer.toByteBuffer(CompositeChannelBuffer.java:649) at org.jboss.netty.buffer.AbstractChannelBuffer.toByteBuffer(AbstractChannelBuffer.java:530) at org.jboss.netty.channel.socket.nio.SocketSendBufferPool.acquire(SocketSendBufferPool.java:77) at org.jboss.netty.channel.socket.nio.SocketSendBufferPool.acquire(SocketSendBufferPool.java:46) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.write0(AbstractNioWorker.java:194) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.writeFromTaskLoop(AbstractNioWorker.java:152) at org.jboss.netty.channel.socket.nio.AbstractNioChannel$WriteTask.run(AbstractNioChannel.java:335) at org.jboss.netty.channel.socket.nio.AbstractNioSelector.processTaskQueue(AbstractNioSelector.java:366) at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:290) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90) at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 15/02/16 10:53:49 WARN channel.DefaultChannelPipeline: An exception was thrown by a user handler while handling an exception event ([id: 0x2d0c1db1, /10.169.226.254:55790 = /10.164.164.228:49445] EXCEPTION: java.lang.OutOfMemoryError: Java heap space) java.lang.OutOfMemoryError: Java heap space at java.nio.HeapByteBuffer.init(HeapByteBuffer.java:57) at java.nio.ByteBuffer.allocate(ByteBuffer.java:331) at org.jboss.netty.buffer.CompositeChannelBuffer.toByteBuffer(CompositeChannelBuffer.java:649) at org.jboss.netty.buffer.AbstractChannelBuffer.toByteBuffer(AbstractChannelBuffer.java:530) at org.jboss.netty.channel.socket.nio.SocketSendBufferPool.acquire(SocketSendBufferPool.java:77) at org.jboss.netty.channel.socket.nio.SocketSendBufferPool.acquire(SocketSendBufferPool.java:46) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.write0(AbstractNioWorker.java:194) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.writeFromTaskLoop(AbstractNioWorker.java:152) at org.jboss.netty.channel.socket.nio.AbstractNioChannel$WriteTask.run(AbstractNioChannel.java:335) at org.jboss.netty.channel.socket.nio.AbstractNioSelector.processTaskQueue(AbstractNioSelector.java:366) at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:290) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90) at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 15/02/16 10:53:50 WARN channel.DefaultChannelPipeline: An exception was thrown by a user handler while handling an exception event ([id: 0xd4211985, /10.181.125.52:60959 = /10.164.164.228:49445] EXCEPTION: java.lang.OutOfMemoryError: Java heap space) java.lang.OutOfMemoryError: Java heap space at java.nio.HeapByteBuffer.init(HeapByteBuffer.java:57) at java.nio.ByteBuffer.allocate(ByteBuffer.java:331) at org.jboss.netty.buffer.CompositeChannelBuffer.toByteBuffer(CompositeChannelBuffer.java:649) at org.jboss.netty.buffer.AbstractChannelBuffer.toByteBuffer(AbstractChannelBuffer.java:530) at org.jboss.netty.channel.socket.nio.SocketSendBufferPool.acquire(SocketSendBufferPool.java:77) at org.jboss.netty.channel.socket.nio.SocketSendBufferPool.acquire(SocketSendBufferPool.java:46) at
Re: OOM error
Increase your executor memory, Also you can play around with increasing the number of partitions/parallelism etc. Thanks Best Regards On Tue, Feb 17, 2015 at 3:39 AM, Harshvardhan Chauhan ha...@gumgum.com wrote: Hi All, I need some help with Out Of Memory errors in my application. I am using Spark 1.1.0 and my application is using Java API. I am running my app on EC2 25 m3.xlarge (4 Cores 15GB Memory) instances. The app only fails sometimes. Lots of mapToPair tasks a failing. My app is configured to run 120 executors and executor memory is 2G. These are various errors i see the in my logs. 15/02/16 10:53:48 INFO storage.MemoryStore: Block broadcast_1 of size 4680 dropped from memory (free 257277829) 15/02/16 10:53:49 WARN channel.DefaultChannelPipeline: An exception was thrown by a user handler while handling an exception event ([id: 0x6e0138a3, /10.61.192.194:35196 = /10.164.164.228:49445] EXCEPTION: java.lang.OutOfMemoryError: Java heap space) java.lang.OutOfMemoryError: Java heap space at java.nio.HeapByteBuffer.init(HeapByteBuffer.java:57) at java.nio.ByteBuffer.allocate(ByteBuffer.java:331) at org.jboss.netty.buffer.CompositeChannelBuffer.toByteBuffer(CompositeChannelBuffer.java:649) at org.jboss.netty.buffer.AbstractChannelBuffer.toByteBuffer(AbstractChannelBuffer.java:530) at org.jboss.netty.channel.socket.nio.SocketSendBufferPool.acquire(SocketSendBufferPool.java:77) at org.jboss.netty.channel.socket.nio.SocketSendBufferPool.acquire(SocketSendBufferPool.java:46) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.write0(AbstractNioWorker.java:194) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.writeFromTaskLoop(AbstractNioWorker.java:152) at org.jboss.netty.channel.socket.nio.AbstractNioChannel$WriteTask.run(AbstractNioChannel.java:335) at org.jboss.netty.channel.socket.nio.AbstractNioSelector.processTaskQueue(AbstractNioSelector.java:366) at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:290) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90) at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 15/02/16 10:53:49 WARN channel.DefaultChannelPipeline: An exception was thrown by a user handler while handling an exception event ([id: 0x2d0c1db1, /10.169.226.254:55790 = /10.164.164.228:49445] EXCEPTION: java.lang.OutOfMemoryError: Java heap space) java.lang.OutOfMemoryError: Java heap space at java.nio.HeapByteBuffer.init(HeapByteBuffer.java:57) at java.nio.ByteBuffer.allocate(ByteBuffer.java:331) at org.jboss.netty.buffer.CompositeChannelBuffer.toByteBuffer(CompositeChannelBuffer.java:649) at org.jboss.netty.buffer.AbstractChannelBuffer.toByteBuffer(AbstractChannelBuffer.java:530) at org.jboss.netty.channel.socket.nio.SocketSendBufferPool.acquire(SocketSendBufferPool.java:77) at org.jboss.netty.channel.socket.nio.SocketSendBufferPool.acquire(SocketSendBufferPool.java:46) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.write0(AbstractNioWorker.java:194) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.writeFromTaskLoop(AbstractNioWorker.java:152) at org.jboss.netty.channel.socket.nio.AbstractNioChannel$WriteTask.run(AbstractNioChannel.java:335) at org.jboss.netty.channel.socket.nio.AbstractNioSelector.processTaskQueue(AbstractNioSelector.java:366) at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:290) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90) at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 15/02/16 10:53:50 WARN channel.DefaultChannelPipeline: An exception was thrown by a user handler while handling an exception event ([id: 0xd4211985, /10.181.125.52:60959 = /10.164.164.228:49445] EXCEPTION: java.lang.OutOfMemoryError: Java heap space) java.lang.OutOfMemoryError: Java heap space at java.nio.HeapByteBuffer.init(HeapByteBuffer.java:57) at java.nio.ByteBuffer.allocate(ByteBuffer.java:331) at org.jboss.netty.buffer.CompositeChannelBuffer.toByteBuffer(CompositeChannelBuffer.java:649) at org.jboss.netty.buffer.AbstractChannelBuffer.toByteBuffer(AbstractChannelBuffer.java:530) at