Github user sitalkedia commented on the issue: https://github.com/apache/spark/pull/15722 @davies - Yes, we dumped the logging and confirmed that the OOM is because we are not freeing the `LongArray` while reseting the `BytesToBytesMap`. The job which used to fail because of OOM runs fine with this change. As explained above, the situation will lead to OOM when already running task is allocated more than fair share of its memory as a result of delay in scheduling by the scheduler. The `LongArray` itself can grown beyond the fair share of memory for the task (We have use cases where the `LongArray` is consuming significant portion of total memory because of too many keys) and later when the task spills, the `LongArray` is not freed as a result of which subsequent memory allocation request is denied by the memory manager resulting in OOM.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org