[ 
https://issues.apache.org/jira/browse/TEZ-2850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14903481#comment-14903481
 ] 

Saikat commented on TEZ-2850:
-----------------------------

This is a unique scenario, that we faced, while running a Tez Job.
A reducer vertex task fetches around 200000 map outputs, each of around  ~100 
odd bytes.
So total mapoutput size is around 200000 * 100 ~ 20Mb.
The MergeManager has a merge threshold check, where if it crosses this 
threshold, InmemoryMerger will be triggered and it will spill the inmemory 
fetched map outputs to disk to free up memory.

In our scenario, mergethreshold(~500mb) >> commitMemory(~20mb), So inMemory 
merger never gets triggerd.
Finally when the finalMerge() is called in close(), MergeManager calls 
createInMemorySegments() to do the final merge.
In this, when Tez creates a IFileInputStream object for the InMemoryReader, the 
IFileInputStream allocates a buffer of size 4096(hard coded).
Thus the total size of a single inmemory segment comes to around 5kb, even 
though data in this segment is only in order of 100 bytes. So, for 200000 map 
outputs, the total size is 200000 * 5000 ~ 1G, which causes OOM!

Attached is  a snapshot of the heap dump which shows this scenario.




> Tez MergeManager OOM for small Map Outputs
> ------------------------------------------
>
>                 Key: TEZ-2850
>                 URL: https://issues.apache.org/jira/browse/TEZ-2850
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Saikat
>         Attachments: OOM_1.png, OOM_2.png, OOM_3.png
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to