[
https://issues.apache.org/jira/browse/HIVE-5588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13803473#comment-13803473
]
Sergey Shelukhin commented on HIVE-5588:
----------------------------------------
We have discussed a little bit (unfortunately after making the patch) and
decided that we won't do it.
It will add extra serialization overhead; and the problems that we were
planning to solve (e.g. putting it in front of FileSink for HIVE-4002 case) can
be solved better via different means. Moreover, given that sorting relies on
BinarySortableSerDe, it will not work straightforwardly w/whatever serde
FileSink is using, additional serde will need to be created. And there's no
code in Hive to actually sort keys, w/o serde.
I will attach the patch for reference.
If needed in future it can be easily pushed thru.
It already works correctly on most scenarios; when distinct columns are present
there's an exception, small additional code duplication w/ReduceSink is needed
to make it work. It is explained in "TODO#" comment where the code is not
correct.
> change TopN to be an operator
> -----------------------------
>
> Key: HIVE-5588
> URL: https://issues.apache.org/jira/browse/HIVE-5588
> Project: Hive
> Issue Type: Task
> Reporter: Sergey Shelukhin
> Assignee: Sergey Shelukhin
>
> See HIVE-5503, as well as the discussion in HIVE-3562.
> If topN is a separate operator, it can be reused for file sink, and
> vectorized version can be implemented.
--
This message was sent by Atlassian JIRA
(v6.1#6144)