[ https://issues.apache.org/jira/browse/PIG-4853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15236012#comment-15236012 ]
Daniel Dai commented on PIG-4853: --------------------------------- Ok, sounds reasonable. +1 > Fetch inputs before starting outputs > ------------------------------------ > > Key: PIG-4853 > URL: https://issues.apache.org/jira/browse/PIG-4853 > Project: Pig > Issue Type: Improvement > Reporter: Rohini Palaniswamy > Assignee: Rohini Palaniswamy > Fix For: 0.16.0 > > Attachments: PIG-4853-1.patch > > > Force fetch inputs before starting outputs so that we can choose to > allocate more space for buffers by setting > tez.task.scale.memory.input-output-concurrent=false which is a new option in > Tez. With the default value of true, WeightedScalingMemoryDistributor in Tez > for a TezConfiguration.TEZ_TASK_SCALE_MEMORY_RESERVE_FRACTION of 0.5 and 1G > memory, will split the 512MB between inputs and outputs. If set to false, it > will allocate 512MB to inputs and 512MB to outputs. For eg: For two join > inputs and one group by output > tez.task.scale.memory.input-output-concurrent=true > {code} > 2016-03-28 01:15:58,842 [INFO] [TezChild] |resources.MemoryDistributor|: > Allocations=[scope-32:org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput:OUTPUT:268435456:83684722], > > [scope-30:org.apache.tez.runtime.library.input.OrderedGroupedKVInput:INPUT:620652160:193488239], > > [scope-29:org.apache.tez.runtime.library.input.OrderedGroupedKVInput:INPUT:620652160:193488239] > {code} > tez.task.scale.memory.input-output-concurrent=false > {code} > 2016-03-28 01:25:36,665 [INFO] [TezChild] |resources.MemoryDistributor|: > Allocations=[scope-32:org.apache.tez.runtime.library.output.OrderedPartitionedKVOutput:OUTPUT:268435456:268435456], > > [scope-29:org.apache.tez.runtime.library.input.OrderedGroupedKVInput:INPUT:620652160:235330600], > > [scope-30:org.apache.tez.runtime.library.input.OrderedGroupedKVInput:INPUT:620652160:235330600] > {code} > To ensure we don't hit OOM, we need to finish fetching the inputs by calling > reader.next() before calling output.start(). That will make sure the input > buffers are released before output buffers are allocated. -- This message was sent by Atlassian JIRA (v6.3.4#6332)