Sure folks, will try later today! Best Regards Ankit Khettry
On Sat, 7 Sep, 2019, 6:56 PM Sunil Kalra, <suneel.ka...@gmail.com> wrote: > Ankit > > Can you try reducing number of cores or increasing memory. Because with > below configuration your each core is getting ~3.5 GB. Otherwise your data > is skewed, that one of cores is getting too much data based key. > > spark.executor.cores 6 spark.executor.memory 36g > > On Sat, Sep 7, 2019 at 6:35 AM Chris Teoh <chris.t...@gmail.com> wrote: > >> It says you have 3811 tasks in earlier stages and you're going down to >> 2001 partitions, that would make it more memory intensive. I'm guessing the >> default spark shuffle partition was 200 so that would have failed. Go for >> higher number, maybe even higher than 3811. What was your shuffle write >> from stage 7 and shuffle read from stage 8? >> >> On Sat, 7 Sep 2019, 7:57 pm Ankit Khettry, <justankit2...@gmail.com> >> wrote: >> >>> Still unable to overcome the error. Attaching some screenshots for >>> reference. >>> Following are the configs used: >>> spark.yarn.max.executor.failures 1000 spark.yarn.driver.memoryOverhead >>> 6g spark.executor.cores 6 spark.executor.memory 36g >>> spark.sql.shuffle.partitions 2001 spark.memory.offHeap.size 8g >>> spark.memory.offHeap.enabled true spark.executor.instances 10 >>> spark.driver.memory 14g spark.yarn.executor.memoryOverhead 10g >>> >>> Best Regards >>> Ankit Khettry >>> >>> On Sat, Sep 7, 2019 at 2:56 PM Chris Teoh <chris.t...@gmail.com> wrote: >>> >>>> You can try, consider processing each partition separately if your data >>>> is heavily skewed when you partition it. >>>> >>>> On Sat, 7 Sep 2019, 7:19 pm Ankit Khettry, <justankit2...@gmail.com> >>>> wrote: >>>> >>>>> Thanks Chris >>>>> >>>>> Going to try it soon by setting maybe spark.sql.shuffle.partitions to >>>>> 2001. Also, I was wondering if it would help if I repartition the data by >>>>> the fields I am using in group by and window operations? >>>>> >>>>> Best Regards >>>>> Ankit Khettry >>>>> >>>>> On Sat, 7 Sep, 2019, 1:05 PM Chris Teoh, <chris.t...@gmail.com> wrote: >>>>> >>>>>> Hi Ankit, >>>>>> >>>>>> Without looking at the Spark UI and the stages/DAG, I'm guessing >>>>>> you're running on default number of Spark shuffle partitions. >>>>>> >>>>>> If you're seeing a lot of shuffle spill, you likely have to increase >>>>>> the number of shuffle partitions to accommodate the huge shuffle size. >>>>>> >>>>>> I hope that helps >>>>>> Chris >>>>>> >>>>>> On Sat, 7 Sep 2019, 4:18 pm Ankit Khettry, <justankit2...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> Nope, it's a batch job. >>>>>>> >>>>>>> Best Regards >>>>>>> Ankit Khettry >>>>>>> >>>>>>> On Sat, 7 Sep, 2019, 6:52 AM Upasana Sharma, < >>>>>>> 028upasana...@gmail.com> wrote: >>>>>>> >>>>>>>> Is it a streaming job? >>>>>>>> >>>>>>>> On Sat, Sep 7, 2019, 5:04 AM Ankit Khettry <justankit2...@gmail.com> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> I have a Spark job that consists of a large number of Window >>>>>>>>> operations and hence involves large shuffles. I have roughly 900 GiBs >>>>>>>>> of >>>>>>>>> data, although I am using a large enough cluster (10 * m5.4xlarge >>>>>>>>> instances). I am using the following configurations for the job, >>>>>>>>> although I >>>>>>>>> have tried various other combinations without any success. >>>>>>>>> >>>>>>>>> spark.yarn.driver.memoryOverhead 6g >>>>>>>>> spark.storage.memoryFraction 0.1 >>>>>>>>> spark.executor.cores 6 >>>>>>>>> spark.executor.memory 36g >>>>>>>>> spark.memory.offHeap.size 8g >>>>>>>>> spark.memory.offHeap.enabled true >>>>>>>>> spark.executor.instances 10 >>>>>>>>> spark.driver.memory 14g >>>>>>>>> spark.yarn.executor.memoryOverhead 10g >>>>>>>>> >>>>>>>>> I keep running into the following OOM error: >>>>>>>>> >>>>>>>>> org.apache.spark.memory.SparkOutOfMemoryError: Unable to acquire >>>>>>>>> 16384 bytes of memory, got 0 >>>>>>>>> at >>>>>>>>> org.apache.spark.memory.MemoryConsumer.throwOom(MemoryConsumer.java:157) >>>>>>>>> at >>>>>>>>> org.apache.spark.memory.MemoryConsumer.allocateArray(MemoryConsumer.java:98) >>>>>>>>> at >>>>>>>>> org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter.<init>(UnsafeInMemorySorter.java:128) >>>>>>>>> at >>>>>>>>> org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.<init>(UnsafeExternalSorter.java:163) >>>>>>>>> >>>>>>>>> I see there are a large number of JIRAs in place for similar >>>>>>>>> issues and a great many of them are even marked resolved. >>>>>>>>> Can someone guide me as to how to approach this problem? I am >>>>>>>>> using Databricks Spark 2.4.1. >>>>>>>>> >>>>>>>>> Best Regards >>>>>>>>> Ankit Khettry >>>>>>>>> >>>>>>>>