[ 
https://issues.apache.org/jira/browse/TEZ-3206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15242242#comment-15242242
 ] 

Ming Ma commented on TEZ-3206:
------------------------------

Thanks [~jeagles]! AFAIK, partition statistics is via VertexManagerEvent which 
stops at AM, while empty partition list is via DataMovementEvent and routed to 
AM then reducers. So the size of partition statistics shouldn't impact any 
reducer.

For the DataMovementEvent task OOM case, is it because each reducer gets 
launched after all 100k mappers have finished and thus fills up its event 
queue? I assume the same thing could happen to AM, e.g., VertexManagerEvent 
explode AM's event queue. Although that is less likely, as it requires all 100k 
mappers finish at the same time or at faster rate than AM async dispatcher can 
process.

> Have unordered partitioned KV output send partition stats via 
> VertexManagerEvent 
> ---------------------------------------------------------------------------------
>
>                 Key: TEZ-3206
>                 URL: https://issues.apache.org/jira/browse/TEZ-3206
>             Project: Apache Tez
>          Issue Type: New Feature
>            Reporter: Ming Ma
>
> As part of the auto-parallelism feature, ordered partitioned KV output's 
> partition stats are sent to ShuffleVertexManager via VertexManagerEvent. But 
> this isn't available for unordered partitioned output. Having 
> {{UnorderedPartitionedKVWriter}} send partition stats will enable the 
> auto-parallelism support for unordered KV or other custom data routing 
> mechanisms that depend on partition size.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to