[ 
https://issues.apache.org/jira/browse/HIVE-5093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13744004#comment-13744004
 ] 

Thejas M Nair commented on HIVE-5093:
-------------------------------------

[~gopalv] I agree this is going to be very useful with <human-number> limits.

[~appodictic] I think limit queries are fairly common, for analytical queries 
as well as when people are iteratively trying our their queries and want 
quickly check if the queries are working as expected. This optimization can 
lead to significant performance boost for such queries and the code change 
required is not significant as demonstrated by attached WIP patch. 
I agree that we should look at different ways of adding this optimization. 

As Gopal suggested, using a different sort function for map is one option for 
the order-by queries. 

The fact that hive uses map-side aggregation is something to consider for 
optimizing the group-by case. One option would be to push the limit into the 
map-side aggregation operator, that will also reduce its memory requirements. 
But that is probably little more complicated than this change which is more 
factored out in a separate combiner code.

                
> Use a combiner for LIMIT with GROUP BY and ORDER BY operators
> -------------------------------------------------------------
>
>                 Key: HIVE-5093
>                 URL: https://issues.apache.org/jira/browse/HIVE-5093
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Processor
>    Affects Versions: 0.12.0
>            Reporter: Gopal V
>            Assignee: Gopal V
>         Attachments: HIVE-5093-WIP-01.patch
>
>
> Operator trees of the following structure can have a memory friendly combiner 
> put in place after the sort-phase 
> "GBY-LIM" and "OBY-LIM"
> This will cut down on I/O when spilling to disk and particularly during the 
> merge phase of the reducer.
> There are two possible combiners - LimitNKeysCombiner and 
> LimitNValuesCombiner.
> The first one would be ideal for the GROUP-BY case, while the latter would 
> more useful for the ORDER-BY case.
> The combiners are still relevant even if there are 1:1 forward operators on 
> the reducer side and for small data items, the MR base layer does not run the 
> combiners at all.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to