Thanks Chris Going to try it soon by setting maybe spark.sql.shuffle.partitions to 2001. Also, I was wondering if it would help if I repartition the data by the fields I am using in group by and window operations?
Best Regards Ankit Khettry On Sat, 7 Sep, 2019, 1:05 PM Chris Teoh, <chris.t...@gmail.com> wrote: > Hi Ankit, > > Without looking at the Spark UI and the stages/DAG, I'm guessing you're > running on default number of Spark shuffle partitions. > > If you're seeing a lot of shuffle spill, you likely have to increase the > number of shuffle partitions to accommodate the huge shuffle size. > > I hope that helps > Chris > > On Sat, 7 Sep 2019, 4:18 pm Ankit Khettry, <justankit2...@gmail.com> > wrote: > >> Nope, it's a batch job. >> >> Best Regards >> Ankit Khettry >> >> On Sat, 7 Sep, 2019, 6:52 AM Upasana Sharma, <028upasana...@gmail.com> >> wrote: >> >>> Is it a streaming job? >>> >>> On Sat, Sep 7, 2019, 5:04 AM Ankit Khettry <justankit2...@gmail.com> >>> wrote: >>> >>>> I have a Spark job that consists of a large number of Window operations >>>> and hence involves large shuffles. I have roughly 900 GiBs of data, >>>> although I am using a large enough cluster (10 * m5.4xlarge instances). I >>>> am using the following configurations for the job, although I have tried >>>> various other combinations without any success. >>>> >>>> spark.yarn.driver.memoryOverhead 6g >>>> spark.storage.memoryFraction 0.1 >>>> spark.executor.cores 6 >>>> spark.executor.memory 36g >>>> spark.memory.offHeap.size 8g >>>> spark.memory.offHeap.enabled true >>>> spark.executor.instances 10 >>>> spark.driver.memory 14g >>>> spark.yarn.executor.memoryOverhead 10g >>>> >>>> I keep running into the following OOM error: >>>> >>>> org.apache.spark.memory.SparkOutOfMemoryError: Unable to acquire 16384 >>>> bytes of memory, got 0 >>>> at >>>> org.apache.spark.memory.MemoryConsumer.throwOom(MemoryConsumer.java:157) >>>> at >>>> org.apache.spark.memory.MemoryConsumer.allocateArray(MemoryConsumer.java:98) >>>> at >>>> org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter.<init>(UnsafeInMemorySorter.java:128) >>>> at >>>> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.<init>(UnsafeExternalSorter.java:163) >>>> >>>> I see there are a large number of JIRAs in place for similar issues and >>>> a great many of them are even marked resolved. >>>> Can someone guide me as to how to approach this problem? I am using >>>> Databricks Spark 2.4.1. >>>> >>>> Best Regards >>>> Ankit Khettry >>>> >>>