Re: OOM Error
Sure folks, will try later today! Best Regards Ankit Khettry On Sat, 7 Sep, 2019, 6:56 PM Sunil Kalra, wrote: > Ankit > > Can you try reducing number of cores or increasing memory. Because with > below configuration your each core is getting ~3.5 GB. Otherwise your data > is skewed, that one of cores is getting too much data based key. > > spark.executor.cores 6 spark.executor.memory 36g > > On Sat, Sep 7, 2019 at 6:35 AM Chris Teoh wrote: > >> It says you have 3811 tasks in earlier stages and you're going down to >> 2001 partitions, that would make it more memory intensive. I'm guessing the >> default spark shuffle partition was 200 so that would have failed. Go for >> higher number, maybe even higher than 3811. What was your shuffle write >> from stage 7 and shuffle read from stage 8? >> >> On Sat, 7 Sep 2019, 7:57 pm Ankit Khettry, >> wrote: >> >>> Still unable to overcome the error. Attaching some screenshots for >>> reference. >>> Following are the configs used: >>> spark.yarn.max.executor.failures 1000 spark.yarn.driver.memoryOverhead >>> 6g spark.executor.cores 6 spark.executor.memory 36g >>> spark.sql.shuffle.partitions 2001 spark.memory.offHeap.size 8g >>> spark.memory.offHeap.enabled true spark.executor.instances 10 >>> spark.driver.memory 14g spark.yarn.executor.memoryOverhead 10g >>> >>> Best Regards >>> Ankit Khettry >>> >>> On Sat, Sep 7, 2019 at 2:56 PM Chris Teoh wrote: >>> You can try, consider processing each partition separately if your data is heavily skewed when you partition it. On Sat, 7 Sep 2019, 7:19 pm Ankit Khettry, wrote: > Thanks Chris > > Going to try it soon by setting maybe spark.sql.shuffle.partitions to > 2001. Also, I was wondering if it would help if I repartition the data by > the fields I am using in group by and window operations? > > Best Regards > Ankit Khettry > > On Sat, 7 Sep, 2019, 1:05 PM Chris Teoh, wrote: > >> Hi Ankit, >> >> Without looking at the Spark UI and the stages/DAG, I'm guessing >> you're running on default number of Spark shuffle partitions. >> >> If you're seeing a lot of shuffle spill, you likely have to increase >> the number of shuffle partitions to accommodate the huge shuffle size. >> >> I hope that helps >> Chris >> >> On Sat, 7 Sep 2019, 4:18 pm Ankit Khettry, >> wrote: >> >>> Nope, it's a batch job. >>> >>> Best Regards >>> Ankit Khettry >>> >>> On Sat, 7 Sep, 2019, 6:52 AM Upasana Sharma, < >>> 028upasana...@gmail.com> wrote: >>> Is it a streaming job? On Sat, Sep 7, 2019, 5:04 AM Ankit Khettry wrote: > I have a Spark job that consists of a large number of Window > operations and hence involves large shuffles. I have roughly 900 GiBs > of > data, although I am using a large enough cluster (10 * m5.4xlarge > instances). I am using the following configurations for the job, > although I > have tried various other combinations without any success. > > spark.yarn.driver.memoryOverhead 6g > spark.storage.memoryFraction 0.1 > spark.executor.cores 6 > spark.executor.memory 36g > spark.memory.offHeap.size 8g > spark.memory.offHeap.enabled true > spark.executor.instances 10 > spark.driver.memory 14g > spark.yarn.executor.memoryOverhead 10g > > I keep running into the following OOM error: > > org.apache.spark.memory.SparkOutOfMemoryError: Unable to acquire > 16384 bytes of memory, got 0 > at > org.apache.spark.memory.MemoryConsumer.throwOom(MemoryConsumer.java:157) > at > org.apache.spark.memory.MemoryConsumer.allocateArray(MemoryConsumer.java:98) > at > org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter.(UnsafeInMemorySorter.java:128) > at > org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.(UnsafeExternalSorter.java:163) > > I see there are a large number of JIRAs in place for similar > issues and a great many of them are even marked resolved. > Can someone guide me as to how to approach this problem? I am > using Databricks Spark 2.4.1. > > Best Regards > Ankit Khettry >
Re: OOM Error
Ankit Can you try reducing number of cores or increasing memory. Because with below configuration your each core is getting ~3.5 GB. Otherwise your data is skewed, that one of cores is getting too much data based key. spark.executor.cores 6 spark.executor.memory 36g On Sat, Sep 7, 2019 at 6:35 AM Chris Teoh wrote: > It says you have 3811 tasks in earlier stages and you're going down to > 2001 partitions, that would make it more memory intensive. I'm guessing the > default spark shuffle partition was 200 so that would have failed. Go for > higher number, maybe even higher than 3811. What was your shuffle write > from stage 7 and shuffle read from stage 8? > > On Sat, 7 Sep 2019, 7:57 pm Ankit Khettry, > wrote: > >> Still unable to overcome the error. Attaching some screenshots for >> reference. >> Following are the configs used: >> spark.yarn.max.executor.failures 1000 spark.yarn.driver.memoryOverhead 6g >> spark.executor.cores 6 spark.executor.memory 36g >> spark.sql.shuffle.partitions 2001 spark.memory.offHeap.size 8g >> spark.memory.offHeap.enabled true spark.executor.instances 10 >> spark.driver.memory 14g spark.yarn.executor.memoryOverhead 10g >> >> Best Regards >> Ankit Khettry >> >> On Sat, Sep 7, 2019 at 2:56 PM Chris Teoh wrote: >> >>> You can try, consider processing each partition separately if your data >>> is heavily skewed when you partition it. >>> >>> On Sat, 7 Sep 2019, 7:19 pm Ankit Khettry, >>> wrote: >>> Thanks Chris Going to try it soon by setting maybe spark.sql.shuffle.partitions to 2001. Also, I was wondering if it would help if I repartition the data by the fields I am using in group by and window operations? Best Regards Ankit Khettry On Sat, 7 Sep, 2019, 1:05 PM Chris Teoh, wrote: > Hi Ankit, > > Without looking at the Spark UI and the stages/DAG, I'm guessing > you're running on default number of Spark shuffle partitions. > > If you're seeing a lot of shuffle spill, you likely have to increase > the number of shuffle partitions to accommodate the huge shuffle size. > > I hope that helps > Chris > > On Sat, 7 Sep 2019, 4:18 pm Ankit Khettry, > wrote: > >> Nope, it's a batch job. >> >> Best Regards >> Ankit Khettry >> >> On Sat, 7 Sep, 2019, 6:52 AM Upasana Sharma, <028upasana...@gmail.com> >> wrote: >> >>> Is it a streaming job? >>> >>> On Sat, Sep 7, 2019, 5:04 AM Ankit Khettry >>> wrote: >>> I have a Spark job that consists of a large number of Window operations and hence involves large shuffles. I have roughly 900 GiBs of data, although I am using a large enough cluster (10 * m5.4xlarge instances). I am using the following configurations for the job, although I have tried various other combinations without any success. spark.yarn.driver.memoryOverhead 6g spark.storage.memoryFraction 0.1 spark.executor.cores 6 spark.executor.memory 36g spark.memory.offHeap.size 8g spark.memory.offHeap.enabled true spark.executor.instances 10 spark.driver.memory 14g spark.yarn.executor.memoryOverhead 10g I keep running into the following OOM error: org.apache.spark.memory.SparkOutOfMemoryError: Unable to acquire 16384 bytes of memory, got 0 at org.apache.spark.memory.MemoryConsumer.throwOom(MemoryConsumer.java:157) at org.apache.spark.memory.MemoryConsumer.allocateArray(MemoryConsumer.java:98) at org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter.(UnsafeInMemorySorter.java:128) at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.(UnsafeExternalSorter.java:163) I see there are a large number of JIRAs in place for similar issues and a great many of them are even marked resolved. Can someone guide me as to how to approach this problem? I am using Databricks Spark 2.4.1. Best Regards Ankit Khettry >>>
Re: OOM Error
It says you have 3811 tasks in earlier stages and you're going down to 2001 partitions, that would make it more memory intensive. I'm guessing the default spark shuffle partition was 200 so that would have failed. Go for higher number, maybe even higher than 3811. What was your shuffle write from stage 7 and shuffle read from stage 8? On Sat, 7 Sep 2019, 7:57 pm Ankit Khettry, wrote: > Still unable to overcome the error. Attaching some screenshots for > reference. > Following are the configs used: > spark.yarn.max.executor.failures 1000 spark.yarn.driver.memoryOverhead 6g > spark.executor.cores 6 spark.executor.memory 36g > spark.sql.shuffle.partitions 2001 spark.memory.offHeap.size 8g > spark.memory.offHeap.enabled true spark.executor.instances 10 > spark.driver.memory 14g spark.yarn.executor.memoryOverhead 10g > > Best Regards > Ankit Khettry > > On Sat, Sep 7, 2019 at 2:56 PM Chris Teoh wrote: > >> You can try, consider processing each partition separately if your data >> is heavily skewed when you partition it. >> >> On Sat, 7 Sep 2019, 7:19 pm Ankit Khettry, >> wrote: >> >>> Thanks Chris >>> >>> Going to try it soon by setting maybe spark.sql.shuffle.partitions to >>> 2001. Also, I was wondering if it would help if I repartition the data by >>> the fields I am using in group by and window operations? >>> >>> Best Regards >>> Ankit Khettry >>> >>> On Sat, 7 Sep, 2019, 1:05 PM Chris Teoh, wrote: >>> Hi Ankit, Without looking at the Spark UI and the stages/DAG, I'm guessing you're running on default number of Spark shuffle partitions. If you're seeing a lot of shuffle spill, you likely have to increase the number of shuffle partitions to accommodate the huge shuffle size. I hope that helps Chris On Sat, 7 Sep 2019, 4:18 pm Ankit Khettry, wrote: > Nope, it's a batch job. > > Best Regards > Ankit Khettry > > On Sat, 7 Sep, 2019, 6:52 AM Upasana Sharma, <028upasana...@gmail.com> > wrote: > >> Is it a streaming job? >> >> On Sat, Sep 7, 2019, 5:04 AM Ankit Khettry >> wrote: >> >>> I have a Spark job that consists of a large number of Window >>> operations and hence involves large shuffles. I have roughly 900 GiBs of >>> data, although I am using a large enough cluster (10 * m5.4xlarge >>> instances). I am using the following configurations for the job, >>> although I >>> have tried various other combinations without any success. >>> >>> spark.yarn.driver.memoryOverhead 6g >>> spark.storage.memoryFraction 0.1 >>> spark.executor.cores 6 >>> spark.executor.memory 36g >>> spark.memory.offHeap.size 8g >>> spark.memory.offHeap.enabled true >>> spark.executor.instances 10 >>> spark.driver.memory 14g >>> spark.yarn.executor.memoryOverhead 10g >>> >>> I keep running into the following OOM error: >>> >>> org.apache.spark.memory.SparkOutOfMemoryError: Unable to acquire >>> 16384 bytes of memory, got 0 >>> at >>> org.apache.spark.memory.MemoryConsumer.throwOom(MemoryConsumer.java:157) >>> at >>> org.apache.spark.memory.MemoryConsumer.allocateArray(MemoryConsumer.java:98) >>> at >>> org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter.(UnsafeInMemorySorter.java:128) >>> at >>> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.(UnsafeExternalSorter.java:163) >>> >>> I see there are a large number of JIRAs in place for similar issues >>> and a great many of them are even marked resolved. >>> Can someone guide me as to how to approach this problem? I am using >>> Databricks Spark 2.4.1. >>> >>> Best Regards >>> Ankit Khettry >>> >>
Re: OOM Error
You can try, consider processing each partition separately if your data is heavily skewed when you partition it. On Sat, 7 Sep 2019, 7:19 pm Ankit Khettry, wrote: > Thanks Chris > > Going to try it soon by setting maybe spark.sql.shuffle.partitions to > 2001. Also, I was wondering if it would help if I repartition the data by > the fields I am using in group by and window operations? > > Best Regards > Ankit Khettry > > On Sat, 7 Sep, 2019, 1:05 PM Chris Teoh, wrote: > >> Hi Ankit, >> >> Without looking at the Spark UI and the stages/DAG, I'm guessing you're >> running on default number of Spark shuffle partitions. >> >> If you're seeing a lot of shuffle spill, you likely have to increase the >> number of shuffle partitions to accommodate the huge shuffle size. >> >> I hope that helps >> Chris >> >> On Sat, 7 Sep 2019, 4:18 pm Ankit Khettry, >> wrote: >> >>> Nope, it's a batch job. >>> >>> Best Regards >>> Ankit Khettry >>> >>> On Sat, 7 Sep, 2019, 6:52 AM Upasana Sharma, <028upasana...@gmail.com> >>> wrote: >>> Is it a streaming job? On Sat, Sep 7, 2019, 5:04 AM Ankit Khettry wrote: > I have a Spark job that consists of a large number of Window > operations and hence involves large shuffles. I have roughly 900 GiBs of > data, although I am using a large enough cluster (10 * m5.4xlarge > instances). I am using the following configurations for the job, although > I > have tried various other combinations without any success. > > spark.yarn.driver.memoryOverhead 6g > spark.storage.memoryFraction 0.1 > spark.executor.cores 6 > spark.executor.memory 36g > spark.memory.offHeap.size 8g > spark.memory.offHeap.enabled true > spark.executor.instances 10 > spark.driver.memory 14g > spark.yarn.executor.memoryOverhead 10g > > I keep running into the following OOM error: > > org.apache.spark.memory.SparkOutOfMemoryError: Unable to acquire 16384 > bytes of memory, got 0 > at > org.apache.spark.memory.MemoryConsumer.throwOom(MemoryConsumer.java:157) > at > org.apache.spark.memory.MemoryConsumer.allocateArray(MemoryConsumer.java:98) > at > org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter.(UnsafeInMemorySorter.java:128) > at > org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.(UnsafeExternalSorter.java:163) > > I see there are a large number of JIRAs in place for similar issues > and a great many of them are even marked resolved. > Can someone guide me as to how to approach this problem? I am using > Databricks Spark 2.4.1. > > Best Regards > Ankit Khettry >
Re: OOM Error
Thanks Chris Going to try it soon by setting maybe spark.sql.shuffle.partitions to 2001. Also, I was wondering if it would help if I repartition the data by the fields I am using in group by and window operations? Best Regards Ankit Khettry On Sat, 7 Sep, 2019, 1:05 PM Chris Teoh, wrote: > Hi Ankit, > > Without looking at the Spark UI and the stages/DAG, I'm guessing you're > running on default number of Spark shuffle partitions. > > If you're seeing a lot of shuffle spill, you likely have to increase the > number of shuffle partitions to accommodate the huge shuffle size. > > I hope that helps > Chris > > On Sat, 7 Sep 2019, 4:18 pm Ankit Khettry, > wrote: > >> Nope, it's a batch job. >> >> Best Regards >> Ankit Khettry >> >> On Sat, 7 Sep, 2019, 6:52 AM Upasana Sharma, <028upasana...@gmail.com> >> wrote: >> >>> Is it a streaming job? >>> >>> On Sat, Sep 7, 2019, 5:04 AM Ankit Khettry >>> wrote: >>> I have a Spark job that consists of a large number of Window operations and hence involves large shuffles. I have roughly 900 GiBs of data, although I am using a large enough cluster (10 * m5.4xlarge instances). I am using the following configurations for the job, although I have tried various other combinations without any success. spark.yarn.driver.memoryOverhead 6g spark.storage.memoryFraction 0.1 spark.executor.cores 6 spark.executor.memory 36g spark.memory.offHeap.size 8g spark.memory.offHeap.enabled true spark.executor.instances 10 spark.driver.memory 14g spark.yarn.executor.memoryOverhead 10g I keep running into the following OOM error: org.apache.spark.memory.SparkOutOfMemoryError: Unable to acquire 16384 bytes of memory, got 0 at org.apache.spark.memory.MemoryConsumer.throwOom(MemoryConsumer.java:157) at org.apache.spark.memory.MemoryConsumer.allocateArray(MemoryConsumer.java:98) at org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter.(UnsafeInMemorySorter.java:128) at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.(UnsafeExternalSorter.java:163) I see there are a large number of JIRAs in place for similar issues and a great many of them are even marked resolved. Can someone guide me as to how to approach this problem? I am using Databricks Spark 2.4.1. Best Regards Ankit Khettry >>>
Re: OOM Error
Hi Ankit, Without looking at the Spark UI and the stages/DAG, I'm guessing you're running on default number of Spark shuffle partitions. If you're seeing a lot of shuffle spill, you likely have to increase the number of shuffle partitions to accommodate the huge shuffle size. I hope that helps Chris On Sat, 7 Sep 2019, 4:18 pm Ankit Khettry, wrote: > Nope, it's a batch job. > > Best Regards > Ankit Khettry > > On Sat, 7 Sep, 2019, 6:52 AM Upasana Sharma, <028upasana...@gmail.com> > wrote: > >> Is it a streaming job? >> >> On Sat, Sep 7, 2019, 5:04 AM Ankit Khettry >> wrote: >> >>> I have a Spark job that consists of a large number of Window operations >>> and hence involves large shuffles. I have roughly 900 GiBs of data, >>> although I am using a large enough cluster (10 * m5.4xlarge instances). I >>> am using the following configurations for the job, although I have tried >>> various other combinations without any success. >>> >>> spark.yarn.driver.memoryOverhead 6g >>> spark.storage.memoryFraction 0.1 >>> spark.executor.cores 6 >>> spark.executor.memory 36g >>> spark.memory.offHeap.size 8g >>> spark.memory.offHeap.enabled true >>> spark.executor.instances 10 >>> spark.driver.memory 14g >>> spark.yarn.executor.memoryOverhead 10g >>> >>> I keep running into the following OOM error: >>> >>> org.apache.spark.memory.SparkOutOfMemoryError: Unable to acquire 16384 >>> bytes of memory, got 0 >>> at >>> org.apache.spark.memory.MemoryConsumer.throwOom(MemoryConsumer.java:157) >>> at >>> org.apache.spark.memory.MemoryConsumer.allocateArray(MemoryConsumer.java:98) >>> at >>> org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter.(UnsafeInMemorySorter.java:128) >>> at >>> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.(UnsafeExternalSorter.java:163) >>> >>> I see there are a large number of JIRAs in place for similar issues and >>> a great many of them are even marked resolved. >>> Can someone guide me as to how to approach this problem? I am using >>> Databricks Spark 2.4.1. >>> >>> Best Regards >>> Ankit Khettry >>> >>
Re: OOM Error
Nope, it's a batch job. Best Regards Ankit Khettry On Sat, 7 Sep, 2019, 6:52 AM Upasana Sharma, <028upasana...@gmail.com> wrote: > Is it a streaming job? > > On Sat, Sep 7, 2019, 5:04 AM Ankit Khettry > wrote: > >> I have a Spark job that consists of a large number of Window operations >> and hence involves large shuffles. I have roughly 900 GiBs of data, >> although I am using a large enough cluster (10 * m5.4xlarge instances). I >> am using the following configurations for the job, although I have tried >> various other combinations without any success. >> >> spark.yarn.driver.memoryOverhead 6g >> spark.storage.memoryFraction 0.1 >> spark.executor.cores 6 >> spark.executor.memory 36g >> spark.memory.offHeap.size 8g >> spark.memory.offHeap.enabled true >> spark.executor.instances 10 >> spark.driver.memory 14g >> spark.yarn.executor.memoryOverhead 10g >> >> I keep running into the following OOM error: >> >> org.apache.spark.memory.SparkOutOfMemoryError: Unable to acquire 16384 >> bytes of memory, got 0 >> at >> org.apache.spark.memory.MemoryConsumer.throwOom(MemoryConsumer.java:157) >> at >> org.apache.spark.memory.MemoryConsumer.allocateArray(MemoryConsumer.java:98) >> at >> org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter.(UnsafeInMemorySorter.java:128) >> at >> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.(UnsafeExternalSorter.java:163) >> >> I see there are a large number of JIRAs in place for similar issues and a >> great many of them are even marked resolved. >> Can someone guide me as to how to approach this problem? I am using >> Databricks Spark 2.4.1. >> >> Best Regards >> Ankit Khettry >> >
Re: OOM Error
Is it a streaming job? On Sat, Sep 7, 2019, 5:04 AM Ankit Khettry wrote: > I have a Spark job that consists of a large number of Window operations > and hence involves large shuffles. I have roughly 900 GiBs of data, > although I am using a large enough cluster (10 * m5.4xlarge instances). I > am using the following configurations for the job, although I have tried > various other combinations without any success. > > spark.yarn.driver.memoryOverhead 6g > spark.storage.memoryFraction 0.1 > spark.executor.cores 6 > spark.executor.memory 36g > spark.memory.offHeap.size 8g > spark.memory.offHeap.enabled true > spark.executor.instances 10 > spark.driver.memory 14g > spark.yarn.executor.memoryOverhead 10g > > I keep running into the following OOM error: > > org.apache.spark.memory.SparkOutOfMemoryError: Unable to acquire 16384 > bytes of memory, got 0 > at org.apache.spark.memory.MemoryConsumer.throwOom(MemoryConsumer.java:157) > at > org.apache.spark.memory.MemoryConsumer.allocateArray(MemoryConsumer.java:98) > at > org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter.(UnsafeInMemorySorter.java:128) > at > org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.(UnsafeExternalSorter.java:163) > > I see there are a large number of JIRAs in place for similar issues and a > great many of them are even marked resolved. > Can someone guide me as to how to approach this problem? I am using > Databricks Spark 2.4.1. > > Best Regards > Ankit Khettry >
Re: OOM error with GMMs on 4GB dataset
Did you set `--driver-memory` with spark-submit? -Xiangrui On Mon, May 4, 2015 at 5:16 PM, Vinay Muttineni vmuttin...@ebay.com wrote: Hi, I am training a GMM with 10 gaussians on a 4 GB dataset(720,000 * 760). The spark (1.3.1) job is allocated 120 executors with 6GB each and the driver also has 6GB. Spark Config Params: .set(spark.hadoop.validateOutputSpecs, false).set(spark.dynamicAllocation.enabled, false).set(spark.driver.maxResultSize, 4g).set(spark.default.parallelism, 300).set(spark.serializer, org.apache.spark.serializer.KryoSerializer).set(spark.kryoserializer.buffer.mb, 500).set(spark.akka.frameSize, 256).set(spark.akka.timeout, 300) However, at the aggregate step (Line 168) val sums = breezeData.aggregate(ExpectationSum.zero(k, d))(compute.value, _ += _) I get OOM error and the application hangs indefinitely. Is this an issue or am I missing something? java.lang.OutOfMemoryError: Java heap space at akka.util.CompactByteString$.apply(ByteString.scala:410) at akka.util.ByteString$.apply(ByteString.scala:22) at akka.remote.transport.netty.TcpHandlers$class.onMessage(TcpSupport.scala:45) at akka.remote.transport.netty.TcpServerHandler.onMessage(TcpSupport.scala:57) at akka.remote.transport.netty.NettyServerHelpers$class.messageReceived(NettyHelpers.scala:43) at akka.remote.transport.netty.ServerHandler.messageReceived(NettyTransport.scala:180) at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296) at org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:462) at org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:443) at org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:310) at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268) at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255) at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108) at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89) at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 15/05/04 16:23:38 ERROR util.Utils: Uncaught exception in thread task-result-getter-2 java.lang.OutOfMemoryError: Java heap space Exception in thread task-result-getter-2 java.lang.OutOfMemoryError: Java heap space 15/05/04 16:23:45 INFO scheduler.TaskSetManager: Finished task 1070.0 in stage 6.0 (TID 8276) in 382069 ms on [] (160/3600) 15/05/04 16:23:54 WARN channel.DefaultChannelPipeline: An exception was thrown by a user handler while handling an exception event ([id: 0xc57da871, ] EXCEPTION: java.lang.OutOfMemoryError: Java heap space) java.lang.OutOfMemoryError: Java heap space 15/05/04 16:23:55 WARN channel.DefaultChannelPipeline: An exception was thrown by a user handler while handling an exception event ([id: 0x3c3dbb0c, ] EXCEPTION: java.lang.OutOfMemoryError: Java heap space) 15/05/04 16:24:45 ERROR actor.ActorSystemImpl: Uncaught fatal error from thread [sparkDriver-akka.remote.default-remote-dispatcher-6] shutting down ActorSystem [sparkDriver] Thanks! Vinay - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: OOM error
Thanks for the pointer it led me to http://spark.apache.org/docs/1.2.0/tuning.html increasing parallelism resolved the issue. On Mon, Feb 16, 2015 at 11:57 PM, Akhil Das ak...@sigmoidanalytics.com wrote: Increase your executor memory, Also you can play around with increasing the number of partitions/parallelism etc. Thanks Best Regards On Tue, Feb 17, 2015 at 3:39 AM, Harshvardhan Chauhan ha...@gumgum.com wrote: Hi All, I need some help with Out Of Memory errors in my application. I am using Spark 1.1.0 and my application is using Java API. I am running my app on EC2 25 m3.xlarge (4 Cores 15GB Memory) instances. The app only fails sometimes. Lots of mapToPair tasks a failing. My app is configured to run 120 executors and executor memory is 2G. These are various errors i see the in my logs. 15/02/16 10:53:48 INFO storage.MemoryStore: Block broadcast_1 of size 4680 dropped from memory (free 257277829) 15/02/16 10:53:49 WARN channel.DefaultChannelPipeline: An exception was thrown by a user handler while handling an exception event ([id: 0x6e0138a3, /10.61.192.194:35196 = /10.164.164.228:49445] EXCEPTION: java.lang.OutOfMemoryError: Java heap space) java.lang.OutOfMemoryError: Java heap space at java.nio.HeapByteBuffer.init(HeapByteBuffer.java:57) at java.nio.ByteBuffer.allocate(ByteBuffer.java:331) at org.jboss.netty.buffer.CompositeChannelBuffer.toByteBuffer(CompositeChannelBuffer.java:649) at org.jboss.netty.buffer.AbstractChannelBuffer.toByteBuffer(AbstractChannelBuffer.java:530) at org.jboss.netty.channel.socket.nio.SocketSendBufferPool.acquire(SocketSendBufferPool.java:77) at org.jboss.netty.channel.socket.nio.SocketSendBufferPool.acquire(SocketSendBufferPool.java:46) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.write0(AbstractNioWorker.java:194) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.writeFromTaskLoop(AbstractNioWorker.java:152) at org.jboss.netty.channel.socket.nio.AbstractNioChannel$WriteTask.run(AbstractNioChannel.java:335) at org.jboss.netty.channel.socket.nio.AbstractNioSelector.processTaskQueue(AbstractNioSelector.java:366) at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:290) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90) at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 15/02/16 10:53:49 WARN channel.DefaultChannelPipeline: An exception was thrown by a user handler while handling an exception event ([id: 0x2d0c1db1, /10.169.226.254:55790 = /10.164.164.228:49445] EXCEPTION: java.lang.OutOfMemoryError: Java heap space) java.lang.OutOfMemoryError: Java heap space at java.nio.HeapByteBuffer.init(HeapByteBuffer.java:57) at java.nio.ByteBuffer.allocate(ByteBuffer.java:331) at org.jboss.netty.buffer.CompositeChannelBuffer.toByteBuffer(CompositeChannelBuffer.java:649) at org.jboss.netty.buffer.AbstractChannelBuffer.toByteBuffer(AbstractChannelBuffer.java:530) at org.jboss.netty.channel.socket.nio.SocketSendBufferPool.acquire(SocketSendBufferPool.java:77) at org.jboss.netty.channel.socket.nio.SocketSendBufferPool.acquire(SocketSendBufferPool.java:46) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.write0(AbstractNioWorker.java:194) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.writeFromTaskLoop(AbstractNioWorker.java:152) at org.jboss.netty.channel.socket.nio.AbstractNioChannel$WriteTask.run(AbstractNioChannel.java:335) at org.jboss.netty.channel.socket.nio.AbstractNioSelector.processTaskQueue(AbstractNioSelector.java:366) at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:290) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90) at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 15/02/16 10:53:50 WARN channel.DefaultChannelPipeline: An exception was thrown by a user handler while handling an exception event ([id: 0xd4211985, /10.181.125.52:60959 = /10.164.164.228:49445] EXCEPTION: java.lang.OutOfMemoryError: Java heap space) java.lang.OutOfMemoryError: Java heap space at java.nio.HeapByteBuffer.init(HeapByteBuffer.java:57) at java.nio.ByteBuffer.allocate(ByteBuffer.java:331) at
Re: OOM error
Increase your executor memory, Also you can play around with increasing the number of partitions/parallelism etc. Thanks Best Regards On Tue, Feb 17, 2015 at 3:39 AM, Harshvardhan Chauhan ha...@gumgum.com wrote: Hi All, I need some help with Out Of Memory errors in my application. I am using Spark 1.1.0 and my application is using Java API. I am running my app on EC2 25 m3.xlarge (4 Cores 15GB Memory) instances. The app only fails sometimes. Lots of mapToPair tasks a failing. My app is configured to run 120 executors and executor memory is 2G. These are various errors i see the in my logs. 15/02/16 10:53:48 INFO storage.MemoryStore: Block broadcast_1 of size 4680 dropped from memory (free 257277829) 15/02/16 10:53:49 WARN channel.DefaultChannelPipeline: An exception was thrown by a user handler while handling an exception event ([id: 0x6e0138a3, /10.61.192.194:35196 = /10.164.164.228:49445] EXCEPTION: java.lang.OutOfMemoryError: Java heap space) java.lang.OutOfMemoryError: Java heap space at java.nio.HeapByteBuffer.init(HeapByteBuffer.java:57) at java.nio.ByteBuffer.allocate(ByteBuffer.java:331) at org.jboss.netty.buffer.CompositeChannelBuffer.toByteBuffer(CompositeChannelBuffer.java:649) at org.jboss.netty.buffer.AbstractChannelBuffer.toByteBuffer(AbstractChannelBuffer.java:530) at org.jboss.netty.channel.socket.nio.SocketSendBufferPool.acquire(SocketSendBufferPool.java:77) at org.jboss.netty.channel.socket.nio.SocketSendBufferPool.acquire(SocketSendBufferPool.java:46) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.write0(AbstractNioWorker.java:194) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.writeFromTaskLoop(AbstractNioWorker.java:152) at org.jboss.netty.channel.socket.nio.AbstractNioChannel$WriteTask.run(AbstractNioChannel.java:335) at org.jboss.netty.channel.socket.nio.AbstractNioSelector.processTaskQueue(AbstractNioSelector.java:366) at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:290) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90) at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 15/02/16 10:53:49 WARN channel.DefaultChannelPipeline: An exception was thrown by a user handler while handling an exception event ([id: 0x2d0c1db1, /10.169.226.254:55790 = /10.164.164.228:49445] EXCEPTION: java.lang.OutOfMemoryError: Java heap space) java.lang.OutOfMemoryError: Java heap space at java.nio.HeapByteBuffer.init(HeapByteBuffer.java:57) at java.nio.ByteBuffer.allocate(ByteBuffer.java:331) at org.jboss.netty.buffer.CompositeChannelBuffer.toByteBuffer(CompositeChannelBuffer.java:649) at org.jboss.netty.buffer.AbstractChannelBuffer.toByteBuffer(AbstractChannelBuffer.java:530) at org.jboss.netty.channel.socket.nio.SocketSendBufferPool.acquire(SocketSendBufferPool.java:77) at org.jboss.netty.channel.socket.nio.SocketSendBufferPool.acquire(SocketSendBufferPool.java:46) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.write0(AbstractNioWorker.java:194) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.writeFromTaskLoop(AbstractNioWorker.java:152) at org.jboss.netty.channel.socket.nio.AbstractNioChannel$WriteTask.run(AbstractNioChannel.java:335) at org.jboss.netty.channel.socket.nio.AbstractNioSelector.processTaskQueue(AbstractNioSelector.java:366) at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:290) at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90) at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) 15/02/16 10:53:50 WARN channel.DefaultChannelPipeline: An exception was thrown by a user handler while handling an exception event ([id: 0xd4211985, /10.181.125.52:60959 = /10.164.164.228:49445] EXCEPTION: java.lang.OutOfMemoryError: Java heap space) java.lang.OutOfMemoryError: Java heap space at java.nio.HeapByteBuffer.init(HeapByteBuffer.java:57) at java.nio.ByteBuffer.allocate(ByteBuffer.java:331) at org.jboss.netty.buffer.CompositeChannelBuffer.toByteBuffer(CompositeChannelBuffer.java:649) at org.jboss.netty.buffer.AbstractChannelBuffer.toByteBuffer(AbstractChannelBuffer.java:530) at