[ 
https://issues.apache.org/jira/browse/TEZ-1264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14054335#comment-14054335
 ] 

Bikas Saha commented on TEZ-1264:
---------------------------------

This would probably be better done by using a hash-group to keep the top N 
instead of the regular approach of accumulating data and then sorting.

> Support for limiting output records in OnFileSortedOutput
> ---------------------------------------------------------
>
>                 Key: TEZ-1264
>                 URL: https://issues.apache.org/jira/browse/TEZ-1264
>             Project: Apache Tez
>          Issue Type: Improvement
>            Reporter: Rohini Palaniswamy
>            Assignee: Rohini Palaniswamy
>
>  When we are limiting on unsorted output, we can stop after reaching the 
> count in the Processor. But if limiting has to be done on sorted output in 
> map phase it is not possible as sorting is done by OnFileSortedOutput. If 
> limiting was supported as part of the output, then we can limit records 
> before writing to each part file after Partitioner is applied.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to