[
https://issues.apache.org/jira/browse/TEZ-1264?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14054507#comment-14054507
]
Rohini Palaniswamy commented on TEZ-1264:
-----------------------------------------
Good idea. To see if a record makes it to the top N just need to compare with
the last one and that check should not be that costly. But need to see how to
do memory management and spill when the LIMIT count is very high.
> Support for limiting output records in OnFileSortedOutput
> ---------------------------------------------------------
>
> Key: TEZ-1264
> URL: https://issues.apache.org/jira/browse/TEZ-1264
> Project: Apache Tez
> Issue Type: Improvement
> Reporter: Rohini Palaniswamy
> Assignee: Rohini Palaniswamy
>
> When we are limiting on unsorted output, we can stop after reaching the
> count in the Processor. But if limiting has to be done on sorted output in
> map phase it is not possible as sorting is done by OnFileSortedOutput. If
> limiting was supported as part of the output, then we can limit records
> before writing to each part file after Partitioner is applied.
--
This message was sent by Atlassian JIRA
(v6.2#6252)