[jira] [Updated] (HIVE-2621) Allow multiple group bys with the same input data and spray keys to be run on the same reducer.

Phabricator (Updated) (JIRA) Thu, 22 Dec 2011 18:56:01 -0800

     [ 
https://issues.apache.org/jira/browse/HIVE-2621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Phabricator updated HIVE-2621:
------------------------------

    Attachment: HIVE-2621.D567.4.patch

kevinwilfong updated the revision "HIVE-2621 [jira] Allow multiple group bys 
with the same input data and spray keys to be run on the same reducer.".
Reviewers: JIRA

  Addressed Namit's and Yongqiang's comments as follows:

  Removed code for singlemrMultiGroupBy optimization, and all related methods, 
as the new code should produce similar results and can handle more cases, such 
as filters.

  Shared code between getCommonDistinctExprs and getCommonGroupByDestGroups, as 
well as between genCommonGroupByPlanReduceSinkOperator and 
genGroupByPlanReduceSinkOperator.

  Added comments where requested.

  Deduplicated filters in common filter used before a common group by reduce 
sink.

REVISION DETAIL
  https://reviews.facebook.net/D567

AFFECTED FILES
  common/src/java/org/apache/hadoop/hive/conf/HiveConf.java
  ql/src/test/results/clientpositive/groupby9.q.out
  ql/src/test/results/clientpositive/groupby7_noskew_multi_single_reducer.q.out
  ql/src/test/results/clientpositive/groupby10.q.out
  ql/src/test/results/clientpositive/parallel.q.out
  ql/src/test/results/clientpositive/groupby_multi_single_reducer.q.out
  ql/src/test/results/clientpositive/multigroupby_singlemr.q.out
  ql/src/test/results/clientpositive/multi_insert.q.out
  ql/src/test/results/clientpositive/groupby8.q.out
  
ql/src/test/results/clientpositive/groupby_complex_types_multi_single_reducer.q.out
  ql/src/test/results/clientpositive/groupby7_map_multi_single_reducer.q.out
  ql/src/test/queries/clientpositive/groupby7_noskew.q
  ql/src/test/queries/clientpositive/groupby10.q
  ql/src/test/queries/clientpositive/groupby_multi_single_reducer.q
  ql/src/test/queries/clientpositive/multigroupby_singlemr.q
  ql/src/test/queries/clientpositive/groupby7_map.q
  ql/src/test/queries/clientpositive/groupby8.q
  ql/src/test/queries/clientpositive/groupby9.q
  ql/src/test/queries/clientpositive/groupby7_noskew_multi_single_reducer.q
  ql/src/test/queries/clientpositive/groupby7_map_multi_single_reducer.q
  
ql/src/test/queries/clientpositive/groupby_complex_types_multi_single_reducer.q
  ql/src/java/org/apache/hadoop/hive/ql/plan/ExprNodeDesc.java
  ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java

                
> Allow multiple group bys with the same input data and spray keys to be run on 
> the same reducer.
> -----------------------------------------------------------------------------------------------
>
>                 Key: HIVE-2621
>                 URL: https://issues.apache.org/jira/browse/HIVE-2621
>             Project: Hive
>          Issue Type: New Feature
>            Reporter: Kevin Wilfong
>            Assignee: Kevin Wilfong
>         Attachments: HIVE-2621.1.patch.txt, HIVE-2621.D567.1.patch, 
> HIVE-2621.D567.2.patch, HIVE-2621.D567.3.patch, HIVE-2621.D567.4.patch
>
>
> Currently, when a user runs a query, such as a multi-insert, where each 
> insertion subclause consists of a simple query followed by a group by, the 
> group bys for each clause are run on a separate reducer.  This requires 
> writing the data for each group by clause to an intermediate file, and then 
> reading it back.  This uses a significant amount of the total CPU consumed by 
> the query for an otherwise simple query.
> If the subclauses are grouped by their distinct expressions and group by 
> keys, with all of the group by expressions for a group of subclauses run on a 
> single reducer, this would reduce the amount of reading/writing to 
> intermediate files for some queries.
> To do this, for each group of subclauses, in the mapper we would execute a 
> the filters for each subclause 'or'd together (provided each subclause has a 
> filter) followed by a reduce sink.  In the reducer, the child operators would 
> be each subclauses filter followed by the group by and any subsequent 
> operations.
> Note that this would require turning off map aggregation, so we would need to 
> make using this type of plan configurable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2621) Allow multiple group bys with the same input data and spray keys to be run on the same reducer.

Reply via email to