[
https://issues.apache.org/jira/browse/PIG-4043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046993#comment-14046993
]
Cheolsoo Park commented on PIG-4043:
------------------------------------
{quote}
I think the OOM is because there are two huge arrays during the same time
unlike Hadoop 1.x HadoopShims.
{quote}
This isn't true. In fact, I am seeing OOM in 0.12 that doesn't include the code
you're referring to (introduced by PIG-3913). In 0.12, there are no two copies
of TaskReport arrays. If you look at the heap dump, it is a single array object
that is as big as 800MB.
In addition, I see the same issue in Lipstick, for example,
[here|https://github.com/Netflix/Lipstick/blob/master/lipstick-console/src/main/java/com/netflix/lipstick/pigtolipstick/BasicP2LClient.java#L414].
The Pig dies as soon as calling {{JobClient.getTaskMapReports()}}. I've been
running several tests so far. It's clear that I cannot run my job (100K
mappers) with any {{JobClient.getTaskMapReports()}} call in both Pig and
Lipstick in Hadoop 2.4.
Unless {{JobClient.getTaskMapReports()}} itself returns an iterator, we need a
way of disabling it.
> JobClient.getMap/ReduceTaskReports() causes OOM for jobs with a large number
> of tasks
> -------------------------------------------------------------------------------------
>
> Key: PIG-4043
> URL: https://issues.apache.org/jira/browse/PIG-4043
> Project: Pig
> Issue Type: Bug
> Reporter: Cheolsoo Park
> Assignee: Cheolsoo Park
> Fix For: 0.14.0
>
> Attachments: PIG-4043-1.patch, heapdump.png
>
>
> With Hadoop 2.4, I often see Pig client fails due to OOM when there are many
> tasks (~100K) with 1GB heap size.
> The heap dump (attached) shows that TaskReport[] occupies about 80% of heap
> space at the time of OOM.
> The problem is that JobClient.getMap/ReduceTaskReports() returns an array of
> TaskReport objects, which can be huge if the number of task is large.
--
This message was sent by Atlassian JIRA
(v6.2#6252)