-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18210/
-----------------------------------------------------------

(Updated Feb. 20, 2014, 2:28 p.m.)


Review request for Tajo.


Changes
-------

rebased against the latest revision.


Bugs: TAJO-601
    https://issues.apache.org/jira/browse/TAJO-601


Repository: tajo


Description
-------

Currently, distinct aggregation queries are executed as follows:
* the first stage: it just shuffles tuples by hashing grouping keys.
* the second stage: it sorts them and executes sort aggregation.

This way executes queries including distinct aggregation functions with only 
two stages. But, it leads to large intermediate data during shuffle phase.

This kind of query can be rewritten as two queries:

[Original query]
----------
SELECT grp1, grp2, count(*) as total, count(distinct grp3) as distinct_col from 
rel1 group by grp1, grp2;
----------

[Rewritten query]
----------
SELECT grp1, grp2, sum(cnt) as total, count(grp3) as distinct_col from (
  SELECT grp1, grp2, grp3, count(*) as cnt from rel1 group by grp1, grp2, grp3) 
tmp1 group by grp1, grp2
) table1;
----------

I'm expecting that this rewrite will significantly reduce the intermediate data 
volume and query response time in most cases.


Diffs (updated)
-----

  
tajo-catalog/tajo-catalog-common/src/main/java/org/apache/tajo/catalog/SortSpec.java
 3ef73d5c5385b40fcfb3b0ecbbc35b783224c760 
  tajo-common/src/main/java/org/apache/tajo/util/TUtil.java 
cc694d43f42f68945cf53a7b8b9bbdca97a4f205 
  
tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/eval/EvalTreeUtil.java
 da05739b8feff0e04b1762f8000b1f3818c773a2 
  
tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/eval/FunctionEval.java
 0555bdec8aff6fa79c02b640c81ad55d4666b90a 
  
tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/function/builtin/SumDoubleDistinct.java
 PRE-CREATION 
  
tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/function/builtin/SumFloat.java
 10fd7205f29c82adf87816737598ce762ee0ebc9 
  
tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/function/builtin/SumFloatDistinct.java
 PRE-CREATION 
  
tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/function/builtin/SumIntDistinct.java
 PRE-CREATION 
  
tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/function/builtin/SumLongDistinct.java
 PRE-CREATION 
  
tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/ExprsVerifier.java
 b14c448ee5b3ce0dfca67c6a9b942f1803cc91f9 
  
tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/LogicalPlanner.java
 f7c0bfab78cb3416e7a2ed263cc362917023e3ca 
  
tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/PhysicalPlannerImpl.java
 67f56303e04787bf950c4a9a703faec58fb74cd4 
  
tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/PlannerUtil.java
 7d5e2fc7e085cc36527383a208277384035263e7 
  
tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/PreLogicalPlanVerifier.java
 6dac031218c650b9c1c86811b4552fe6d82da0c1 
  
tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/enforce/Enforcer.java
 dd46996eca7eb9c38f87d97813f5dcc7220429ed 
  
tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/global/DataChannel.java
 9f5c6bf9dd7b549308724ce1e8044aff1630cef1 
  
tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/global/GlobalPlanner.java
 f390b52f378a2d7e84e40876df4a4b416af912ef 
  
tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/global/MasterPlan.java
 91f658dab395620f5a891f51407b3676b07a8fa5 
  
tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/physical/ExternalSortExec.java
 791781e526c54f216152e935682bc2c3147a9e0c 
  
tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/physical/SeqScanExec.java
 53a1c24197c40c77153f79f90c05882c90aae957 
  
tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/rewrite/FilterPushDownRule.java
 399903c66bb8a62074facd0bbbe9b3b8e891c067 
  
tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/rewrite/PartitionedTableRewriter.java
 e5f7fb40414e0b2e2e40bccebe24069ee4d9301b 
  
tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/rewrite/ProjectionPushDownRule.java
 633d0c1857533b02c4ecc6913c740fd2e3722845 
  
tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/master/querymaster/QueryMaster.java
 ae6d5ebb97f8c4287ffd11262b2932d2f8b1250c 
  
tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/master/querymaster/QueryMasterManagerService.java
 3c30e3854abaa891f72b368144942164e5dffab7 
  
tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/master/querymaster/Repartitioner.java
 56c26797aad1dbe95945567961e9425fef72fa96 
  
tajo-core/tajo-core-backend/src/test/java/org/apache/tajo/engine/eval/TestEvalTreeUtil.java
 d7562426647a6a9d6aae5207a67ddcdd03d0ee3a 
  
tajo-core/tajo-core-backend/src/test/java/org/apache/tajo/engine/query/TestGroupByQuery.java
 1f80bce23c74e3abdcbf9bc0553ec30244d6bd93 
  
tajo-core/tajo-core-backend/src/test/java/org/apache/tajo/master/TestExecutionBlockCursor.java
 053c02833e80dd931807fa6314965e687d7b26c0 
  
tajo-core/tajo-core-backend/src/test/java/org/apache/tajo/master/TestGlobalPlanner.java
 2d3124d7e9d7853b0f872eee1016cbae504c9c6b 
  
tajo-core/tajo-core-backend/src/test/resources/queries/TestGroupByQuery/testCountDistinct.sql
  
  
tajo-core/tajo-core-backend/src/test/resources/queries/TestGroupByQuery/testCountDistinct2.sql
  
  
tajo-core/tajo-core-backend/src/test/resources/queries/TestGroupByQuery/testDistinctAggregation3.sql
 PRE-CREATION 
  
tajo-core/tajo-core-backend/src/test/resources/queries/TestGroupByQuery/testDistinctAggregation4.sql
 PRE-CREATION 
  
tajo-core/tajo-core-backend/src/test/resources/queries/TestGroupByQuery/testDistinctAggregation5.sql
 PRE-CREATION 
  
tajo-core/tajo-core-backend/src/test/resources/queries/TestGroupByQuery/testDistinctAggregationWithHaving1.sql
 PRE-CREATION 
  
tajo-core/tajo-core-backend/src/test/resources/queries/TestGroupByQuery/testDistinctAggregationWithUnion1.sql
 PRE-CREATION 
  
tajo-core/tajo-core-backend/src/test/resources/results/TestGroupByQuery/testCountDistinct.result
  
  
tajo-core/tajo-core-backend/src/test/resources/results/TestGroupByQuery/testCountDistinct2.result
  
  
tajo-core/tajo-core-backend/src/test/resources/results/TestGroupByQuery/testDistinctAggregation3.result
 PRE-CREATION 
  
tajo-core/tajo-core-backend/src/test/resources/results/TestGroupByQuery/testDistinctAggregation4.result
 PRE-CREATION 
  
tajo-core/tajo-core-backend/src/test/resources/results/TestGroupByQuery/testDistinctAggregation5.result
 PRE-CREATION 
  
tajo-core/tajo-core-backend/src/test/resources/results/TestGroupByQuery/testDistinctAggregationWithHaving1.result
 PRE-CREATION 
  
tajo-core/tajo-core-backend/src/test/resources/results/TestGroupByQuery/testDistinctAggregationWithUnion1.result
 PRE-CREATION 
  tajo-storage/src/main/java/org/apache/tajo/storage/RawFile.java 
c3a7525154e0f36d51dcca211949f21f57a9f1c8 

Diff: https://reviews.apache.org/r/18210/diff/


Testing
-------

mvn clean install


Thanks,

Hyunsik Choi

Reply via email to