[jira] Commented: (HIVE-931) Sorted Group By

Namit Jain (JIRA) Thu, 19 Nov 2009 22:20:04 -0800

    [ 
https://issues.apache.org/jira/browse/HIVE-931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12780463#action_12780463
 ]


Namit Jain commented on HIVE-931:
---------------------------------

1. Given the fact that partition pruning has already happened and stored in the 
parse context, can you use that information
instead of calling PartitionPruner.prune() again?
2. Instead of walking up the tree, can you collect the list of the tablescans 
before that group by ?
3. Can you add some more comments in GroupByOptimizer ?
4. I am not sure, but there seems to be a bug there:
  
    what about the case:

    (subq) followed by groupby, 

    are you taking the base tables of the subquery which may be different ?

Can you add tests for the above scenario ?

> Sorted Group By
> ---------------
>
>                 Key: HIVE-931
>                 URL: https://issues.apache.org/jira/browse/HIVE-931
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Query Processor
>            Reporter: Namit Jain
>            Assignee: He Yongqiang
>             Fix For: 0.5.0
>
>         Attachments: hive-931-2009-11-18.patch, hive-931-2009-11-19.patch
>
>
> If the table is sorted by a given key, we don't use that for group by. That 
> can be very useful.
> For eg: if T is sorted by column c1,
> For select c1, aggr() from T group by c1
> we always use a single map-reduce job. No hash table is needed on the mapper, 
> since the data is sorted by c1 anyway.
> This will reduce the memory pressure on the mapper and also remove overhead 
> of maintaining the hash table.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-931) Sorted Group By

Reply via email to