You can try repartitioning the data, if it’s a skewed data then you may need to salt the keys for better partitioning. Are you using a coalesce or any other fn which brings the data to lesser nodes. Window function also incurs shuffling that could be an issue.
On Mon, 6 Jan 2020 at 9:49 AM, Rishi Shah <rishishah.s...@gmail.com> wrote: > Thanks Hemant, underlying data volume increased from 550GB to 690GB and > now the same job doesn't succeed. I tried incrementing executor memory to > 20G as well, still fails. I am running this in Databricks and start cluster > with 20G assigned to spark.executor.memory property. > > Also some more information on the job, I have about 4 window functions on > this dataset before it gets written out. > > Any other ideas? > > Thanks, > -Shraddha > > On Sun, Jan 5, 2020 at 11:06 PM hemant singh <hemant2...@gmail.com> wrote: > >> You can try increasing the executor memory, generally this error comes >> when there is not enough memory in individual executors. >> Job is getting completed may be because when tasks are re-scheduled it >> would be going through. >> >> Thanks. >> >> On Mon, 6 Jan 2020 at 5:47 AM, Rishi Shah <rishishah.s...@gmail.com> >> wrote: >> >>> Hello All, >>> >>> One of my jobs, keep getting into this situation where 100s of tasks >>> keep failing with below error but job eventually completes. >>> >>> org.apache.spark.memory.SparkOutOfMemoryError: Unable to acquire 16384 >>> bytes of memory >>> >>> Could someone advice? >>> >>> -- >>> Regards, >>> >>> Rishi Shah >>> >> > > -- > Regards, > > Rishi Shah >