GitHub user takuti opened a pull request: https://github.com/apache/incubator-hivemall/pull/108
[HIVEMALL-138] `to_ordered_map` UDAF with size limit ## What changes were proposed in this pull request? Implement `to_bounded_ordered_map` UDAF. The UDAF is an extended version of `to_ordered_map` which has limit of map size. `to_bounded_ordered_map` UDAF can be used as an alternative of `each_top_k` UDTF. The main difference is that the former actively utilizes mapper-side aggregation. ## What type of PR is it? Feature ## What is the Jira issue? https://issues.apache.org/jira/browse/HIVEMALL-138 ## How was this patch tested? Manual test on local and EMR ## How to use this feature? ``` to_bounded_ordered_map(key, value, size [, const boolean reverseOrder=false]) ``` ```sql with t as ( select 10 as key, 'apple' as value union all select 3 as key, 'banana' as value union all select 4 as key, 'candy' as value ) select to_bounded_ordered_map(key, value, 1), to_bounded_ordered_map(key, value, 2), to_bounded_ordered_map(key, value, 3), to_bounded_ordered_map(key, value, 100), to_bounded_ordered_map(key, value, 1, true), to_bounded_ordered_map(key, value, 2, true), to_bounded_ordered_map(key, value, 3, true), to_bounded_ordered_map(key, value, 100, true) from t ; ``` > {3:"banana"} {3:"banana",4:"candy"} {3:"banana",4:"candy",10:"apple"} {3:"banana",4:"candy",10:"apple"} {10:"apple"} {10:"apple",4:"candy"} {10:"apple",4:"candy",3:"banana"} {10:"apple",4:"candy",3:"banana"} ## Checklist - [x] Did you apply source code formatter, i.e., `mvn formatter:format`, for your commit? You can merge this pull request into a Git repository by running: $ git pull https://github.com/takuti/incubator-hivemall topk-ordered-map Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-hivemall/pull/108.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #108 ---- commit 78403e8a3cb99b6bccdf2500ad5551d413345222 Author: Takuya Kitazawa <k.tak...@gmail.com> Date: 2017-08-07T05:26:13Z Fix typo commit 46a23a2129ea74244e8a42b6aa5d9da9d5cf8ba1 Author: Takuya Kitazawa <k.tak...@gmail.com> Date: 2017-08-07T07:14:42Z Implement `to_bounded_ordered_map` UDAF commit 3c029f9bd71adb70db8dfc48f6452362dacc164c Author: Takuya Kitazawa <k.tak...@gmail.com> Date: 2017-08-07T07:24:15Z Throw an exception for invalid map size ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---