[ https://issues.apache.org/jira/browse/TEZ-2950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15291917#comment-15291917 ]
Bikas Saha commented on TEZ-2950: --------------------------------- bq. 2. Rely on pipelined shuffle to avoid the final merge. Per old discussion with [~rajesh.balamohan] avoiding final merge is independent of pipeline shuffle and could be enabled without it (this needs code change though). Perhaps what you allude to in 4. > Poor performance of UnorderedPartitionedKVWriter > ------------------------------------------------ > > Key: TEZ-2950 > URL: https://issues.apache.org/jira/browse/TEZ-2950 > Project: Apache Tez > Issue Type: Bug > Reporter: Rohini Palaniswamy > Assignee: Kuhu Shukla > Attachments: TEZ-2950.001_prelim.patch > > > Came across a job which was taking a long time in > UnorderedPartitionedKVWriter.mergeAll. It was decompressing and reading data > from spill files (8500 spills) and then writing the final compressed merge > file. Why do we need spill files for UnorderedPartitionedKVWriter? Why not > just buffer and keep directly writing to the final file which will save a lot > of time. -- This message was sent by Atlassian JIRA (v6.3.4#6332)