Github user sitalkedia commented on the issue:

    https://github.com/apache/spark/pull/15722
  
    @davies - Yes, we dumped the logging and confirmed that the OOM is because 
we are not freeing the `LongArray` while reseting the `BytesToBytesMap`. The 
job which used to fail because of OOM runs fine with this change. 
    
    As explained above, the situation will lead to OOM when already running 
task is allocated more than fair share of its memory as a result of delay in 
scheduling by the scheduler. The `LongArray` itself can grown beyond the fair 
share of memory for the task (We have use cases where the `LongArray` is 
consuming significant portion of total memory because of too many keys) and 
later when the task spills, the `LongArray` is not freed as a result of which 
subsequent memory allocation request is denied by the memory manager resulting 
in OOM. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to