[ https://issues.apache.org/jira/browse/TEZ-2962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15159414#comment-15159414 ]
Bikas Saha commented on TEZ-2962: --------------------------------- The downside of partition stats is that the values are approximate in buckets of 1mb/10mb/100mb etc. So 100MB stat could imply 900mb actual data size. So respecting max data size per task can become tricky. > Use per partition stats in shuffle vertex manager auto parallelism > ------------------------------------------------------------------ > > Key: TEZ-2962 > URL: https://issues.apache.org/jira/browse/TEZ-2962 > Project: Apache Tez > Issue Type: Bug > Reporter: Bikas Saha > Priority: Critical > > The original code used output size sent by completed tasks. Recently per > partition stats have been added that provide granular information. Using > partition stats may be more accurate and also remove the duplicate counting > of data size in partition stats and per task overall. -- This message was sent by Atlassian JIRA (v6.3.4#6332)