[ 
https://issues.apache.org/jira/browse/PIG-3659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13865666#comment-13865666
 ] 

Rohini Palaniswamy commented on PIG-3659:
-----------------------------------------

  Current code just defaults to 1G for each vertex to get things to work. 

We need to 
   1) Classify whether a vertex is a map or reduce and set java.opts 
(mapreduce.map.java.opts or mapreduce.reduce.java.opts), memory.mb 
(mapreduce.map.memory.mb or mapreduce.reduce.memory.mb) and env 
(mapreduce.map.env or mapreduce.reduce.env) accordingly on the vertex. A simple 
thing would be to assume all root vertexes to be map vertexes and intermediate 
or leaf vertexes to be reduce vertexes.
   2) Even for a map vertex, if there are multiple outputs more memory is 
required as combine and sort happens on each output. Similarly on a reduce 
vertex if there are multiple inputs shuffle and sort happens on each  input 
thus requiring more memory than the traditional map or reduce. i.e the sort 
buffers (io.sort.mb) and buffer for holding each record before serializing or 
deserializing them take up memory. For eg: With 3 inputs or outputs, thrice the 
amount of memory is tried to be allocated for the buffers leading to OOM. 
Increasing memory for a vertex based on number of inputs or outputs might not 
solve the problem totally. This is something we will have to talk to Tez guys 
to see how effectively this can be solved.

> Memory management for each vertex
> ---------------------------------
>
>                 Key: PIG-3659
>                 URL: https://issues.apache.org/jira/browse/PIG-3659
>             Project: Pig
>          Issue Type: Sub-task
>          Components: tez
>            Reporter: Rohini Palaniswamy
>             Fix For: tez-branch
>
>
> We need to configure appropriate memory options for each vertex.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to