[ 
https://issues.apache.org/jira/browse/PIG-4853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15236012#comment-15236012
 ] 

Daniel Dai commented on PIG-4853:
---------------------------------

Ok, sounds reasonable. +1

> Fetch inputs before starting outputs
> ------------------------------------
>
>                 Key: PIG-4853
>                 URL: https://issues.apache.org/jira/browse/PIG-4853
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Rohini Palaniswamy
>            Assignee: Rohini Palaniswamy
>             Fix For: 0.16.0
>
>         Attachments: PIG-4853-1.patch
>
>
>     Force fetch inputs before starting outputs so that we can choose to 
> allocate more space for buffers by setting 
> tez.task.scale.memory.input-output-concurrent=false which is a new option in 
> Tez. With the default value of true, WeightedScalingMemoryDistributor in Tez 
> for a TezConfiguration.TEZ_TASK_SCALE_MEMORY_RESERVE_FRACTION of 0.5 and 1G 
> memory, will split the 512MB between inputs and outputs. If set to false, it 
> will allocate 512MB to inputs and 512MB to outputs.  For eg: For two join 
> inputs and one group by output
> tez.task.scale.memory.input-output-concurrent=true
> {code}
> 2016-03-28 01:15:58,842 [INFO] [TezChild] |resources.MemoryDistributor|: 
> Allocations=[scope-32:org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput:OUTPUT:268435456:83684722],
>  
> [scope-30:org.apache.tez.runtime.library.input.OrderedGroupedKVInput:INPUT:620652160:193488239],
>  
> [scope-29:org.apache.tez.runtime.library.input.OrderedGroupedKVInput:INPUT:620652160:193488239]
> {code}
> tez.task.scale.memory.input-output-concurrent=false
> {code}
> 2016-03-28 01:25:36,665 [INFO] [TezChild] |resources.MemoryDistributor|: 
> Allocations=[scope-32:org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput:OUTPUT:268435456:268435456],
>  
> [scope-29:org.apache.tez.runtime.library.input.OrderedGroupedKVInput:INPUT:620652160:235330600],
>  
> [scope-30:org.apache.tez.runtime.library.input.OrderedGroupedKVInput:INPUT:620652160:235330600]
> {code}
> To ensure we don't hit OOM, we need to finish fetching the inputs by calling 
> reader.next() before calling output.start(). That will make sure the input 
> buffers are released before output buffers are allocated. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to