I have a Spark job that consists of a large number of Window operations and hence involves large shuffles. I have roughly 900 GiBs of data, although I am using a large enough cluster (10 * m5.4xlarge instances). I am using the following configurations for the job, although I have tried various other combinations without any success.
spark.yarn.driver.memoryOverhead 6g spark.storage.memoryFraction 0.1 spark.executor.cores 6 spark.executor.memory 36g spark.memory.offHeap.size 8g spark.memory.offHeap.enabled true spark.executor.instances 10 spark.driver.memory 14g spark.yarn.executor.memoryOverhead 10g I keep running into the following OOM error: org.apache.spark.memory.SparkOutOfMemoryError: Unable to acquire 16384 bytes of memory, got 0 at org.apache.spark.memory.MemoryConsumer.throwOom(MemoryConsumer.java:157) at org.apache.spark.memory.MemoryConsumer.allocateArray(MemoryConsumer.java:98) at org.apache.spark.util.collection.unsafe.sort.UnsafeInMemorySorter.<init>(UnsafeInMemorySorter.java:128) at org.apache.spark.util.collection.unsafe.sort.UnsafeExternalSorter.<init>(UnsafeExternalSorter.java:163) I see there are a large number of JIRAs in place for similar issues and a great many of them are even marked resolved. Can someone guide me as to how to approach this problem? I am using Databricks Spark 2.4.1. Best Regards Ankit Khettry