[ 
https://issues.apache.org/jira/browse/TEZ-2950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15012429#comment-15012429
 ] 

Gopal V edited comment on TEZ-2950 at 11/19/15 12:09 AM:
---------------------------------------------------------

bq. enable final merge in output = false doesn't necessarily solve this. That 
has the same issues of partial failures which exists with pipelined shuffle. 
The fetcher can start serving out chunks of the data and then have the source 
fail, which will cause the task fetching the data to fail (chunks for the same 
input from different attempts of the source).

The downstream only starts receiving events if the source task completes 
successfully - this was done to allow for speculative execution.


was (Author: gopalv):
bq, enable final merge in output = false doesn't necessarily solve this. That 
has the same issues of partial failures which exists with pipelined shuffle. 
The fetcher can start serving out chunks of the data and then have the source 
fail, which will cause the task fetching the data to fail (chunks for the same 
input from different attempts of the source).

The downstream only starts receiving events if the source task completes 
successfully - this was done to allow for speculative execution.

> Poor performance of UnorderedPartitionedKVWriter
> ------------------------------------------------
>
>                 Key: TEZ-2950
>                 URL: https://issues.apache.org/jira/browse/TEZ-2950
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Rohini Palaniswamy
>
> Came across a job which was taking a long time in 
> UnorderedPartitionedKVWriter.mergeAll. It was decompressing and reading data 
> from spill files (8500 spills) and then writing the final compressed merge 
> file. Why do we need spill files for UnorderedPartitionedKVWriter? Why not 
> just buffer and keep directly writing to the final file which will save a lot 
> of time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to