[ 
https://issues.apache.org/jira/browse/TEZ-2962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15159414#comment-15159414
 ] 

Bikas Saha commented on TEZ-2962:
---------------------------------

The downside of partition stats is that the values are approximate in buckets 
of 1mb/10mb/100mb etc. So 100MB stat could imply 900mb actual data size. So 
respecting max data size per task can become tricky.

> Use per partition stats in shuffle vertex manager auto parallelism
> ------------------------------------------------------------------
>
>                 Key: TEZ-2962
>                 URL: https://issues.apache.org/jira/browse/TEZ-2962
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Bikas Saha
>            Priority: Critical
>
> The original code used output size sent by completed tasks. Recently per 
> partition stats have been added that provide granular information. Using 
> partition stats may be more accurate and also remove the duplicate counting 
> of data size in partition stats and per task overall.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to