[ 
https://issues.apache.org/jira/browse/TEZ-3769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16065653#comment-16065653
 ] 

Zhiyuan Yang edited comment on TEZ-3769 at 6/27/17 11:19 PM:
-------------------------------------------------------------

General discussion beyond this patch: 
1. about counter ADDITIONAL_SPILLS_BYTES_WRITTEN, there are difference between 
the usage (final spill stats) and documentation(bytes written due to 
unnecessary spills). If final spill size is not useful, we can merge it into 
normal counter. Or we just fix the documentation/comments.
2. Think we should refactor this unordered writer later sometime. Right now 
it's stuffed with too many things and so many code path was multiplexed. It'll 
be harder and harder to modify or review.


was (Author: aplusplus):
General discussion beyond this patch: 
1. about counter ADDITIONAL_SPILLS_BYTES_WRITTEN, there are difference between 
the usage (final spill stats) and documentation(bytes written due to 
unnecessary spills).
2. Think we should refactor this unordered writer later sometime. Right now 
it's stuffed with too many things and so many code path was multiplexed. It'll 
be harder and harder to modify or review.

> Unordered: Fix wrong stats being sent out in the last event, when final merge 
> is disabled
> -----------------------------------------------------------------------------------------
>
>                 Key: TEZ-3769
>                 URL: https://issues.apache.org/jira/browse/TEZ-3769
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Rajesh Balamohan
>         Attachments: TEZ-3769.1.patch, TEZ-3769.2.patch
>
>
> When final merge is disabled (without pipelining), wrong stats was sent out 
> in the last event. 
> It was based on {{numRecordsPerPartition}} which contains the overall 
> partition data. It should be ideally be based on the spill result and its 
> buffers.
> Also, {{finalSpill}} was unncessarily sending events when no data was present 
> (i.e, when currentBuffer didn't have any data).  This can be optimized to 
> reduce the number of events being sent across.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to