[
https://issues.apache.org/jira/browse/TEZ-1264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14054335#comment-14054335
]
Bikas Saha commented on TEZ-1264:
---------------------------------
This would probably be better done by using a hash-group to keep the top N
instead of the regular approach of accumulating data and then sorting.
> Support for limiting output records in OnFileSortedOutput
> ---------------------------------------------------------
>
> Key: TEZ-1264
> URL: https://issues.apache.org/jira/browse/TEZ-1264
> Project: Apache Tez
> Issue Type: Improvement
> Reporter: Rohini Palaniswamy
> Assignee: Rohini Palaniswamy
>
> When we are limiting on unsorted output, we can stop after reaching the
> count in the Processor. But if limiting has to be done on sorted output in
> map phase it is not possible as sorting is done by OnFileSortedOutput. If
> limiting was supported as part of the output, then we can limit records
> before writing to each part file after Partitioner is applied.
--
This message was sent by Atlassian JIRA
(v6.2#6252)