> On Feb. 20, 2014, 10:51 a.m., Jung JaeHwa wrote:
> > Hyunsik, thank you for waiting.
> > 
> > I tested the patch on my local cluster. 
> > But validation for different columns doesn't work as expected. For example, 
> > following queries finished without the PlanningException.
> > 
> > - select count(distinct id), sum(distinct score) from table1
> > - select id, count(distinct id), sum(distinct name) from table1 group by id
> > 
> > For reference, I created a table which written at tajo wiki.
> > 
> > Anyway, I found that it has never been called. Please, check this situation.
> > 
> > And if that's okay with you, I want to suggest unit test cases for 
> > unsupported queries.
> > But if you think that it's waste of resource, may be disregarded. :)

Could you check the patch once again? I've tried your test, but I can see the 
following messages:

tajo> select count(distinct l_orderkey), sum(distinct l_partkey) from lineitem;
different DISTINCT columns are not supported yet: l_orderkey, l_partkey

tajo> select id, count(distinct l_orderkey), sum(distinct l_partkey) from 
lineitem group by id;
different DISTINCT columns are not supported yet: l_orderkey, l_partkey

tajo> select count(distinct id), sum(distinct score) from table1;
different DISTINCT columns are not supported yet: id, score
tajo> select id, count(distinct id), sum(distinct name) from table1 group by id;
different DISTINCT columns are not supported yet: id, name


Thanks!


- Hyunsik


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18210/#review34962
-----------------------------------------------------------


On Feb. 18, 2014, 9:03 p.m., Hyunsik Choi wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/18210/
> -----------------------------------------------------------
> 
> (Updated Feb. 18, 2014, 9:03 p.m.)
> 
> 
> Review request for Tajo.
> 
> 
> Bugs: TAJO-601
>     https://issues.apache.org/jira/browse/TAJO-601
> 
> 
> Repository: tajo
> 
> 
> Description
> -------
> 
> Currently, distinct aggregation queries are executed as follows:
> * the first stage: it just shuffles tuples by hashing grouping keys.
> * the second stage: it sorts them and executes sort aggregation.
> 
> This way executes queries including distinct aggregation functions with only 
> two stages. But, it leads to large intermediate data during shuffle phase.
> 
> This kind of query can be rewritten as two queries:
> 
> [Original query]
> ----------
> SELECT grp1, grp2, count(*) as total, count(distinct grp3) as distinct_col 
> from rel1 group by grp1, grp2;
> ----------
> 
> [Rewritten query]
> ----------
> SELECT grp1, grp2, sum(cnt) as total, count(grp3) as distinct_col from (
>   SELECT grp1, grp2, grp3, count(*) as cnt from rel1 group by grp1, grp2, 
> grp3) tmp1 group by grp1, grp2
> ) table1;
> ----------
> 
> I'm expecting that this rewrite will significantly reduce the intermediate 
> data volume and query response time in most cases.
> 
> 
> Diffs
> -----
> 
>   tajo-common/src/main/java/org/apache/tajo/util/TUtil.java cc694d4 
>   
> tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/eval/EvalTreeUtil.java
>  da05739 
>   
> tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/function/builtin/SumDoubleDistinct.java
>  PRE-CREATION 
>   
> tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/function/builtin/SumFloat.java
>  10fd720 
>   
> tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/function/builtin/SumFloatDistinct.java
>  PRE-CREATION 
>   
> tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/function/builtin/SumIntDistinct.java
>  PRE-CREATION 
>   
> tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/function/builtin/SumLongDistinct.java
>  PRE-CREATION 
>   
> tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/ExprsVerifier.java
>  b14c448 
>   
> tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/LogicalPlanner.java
>  f7c0bfa 
>   
> tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/PlannerUtil.java
>  624518b 
>   
> tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/PreLogicalPlanVerifier.java
>  6dac031 
>   
> tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/global/DataChannel.java
>  efa1e05 
>   
> tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/global/GlobalPlanner.java
>  f390b52 
>   
> tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/global/MasterPlan.java
>  91f658d 
>   
> tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/physical/SeqScanExec.java
>  a0c0eeb 
>   
> tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/rewrite/FilterPushDownRule.java
>  399903c 
>   
> tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/rewrite/PartitionedTableRewriter.java
>  e5f7fb4 
>   
> tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/engine/planner/rewrite/ProjectionPushDownRule.java
>  633d0c1 
>   
> tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/master/querymaster/QueryMaster.java
>  ae6d5eb 
>   
> tajo-core/tajo-core-backend/src/main/java/org/apache/tajo/master/querymaster/QueryMasterManagerService.java
>  3c30e38 
>   
> tajo-core/tajo-core-backend/src/test/java/org/apache/tajo/engine/eval/TestEvalTreeUtil.java
>  d756242 
>   
> tajo-core/tajo-core-backend/src/test/java/org/apache/tajo/engine/query/TestGroupByQuery.java
>  1f80bce 
>   
> tajo-core/tajo-core-backend/src/test/java/org/apache/tajo/master/TestExecutionBlockCursor.java
>  053c028 
>   
> tajo-core/tajo-core-backend/src/test/java/org/apache/tajo/master/TestGlobalPlanner.java
>  2d3124d 
>   
> tajo-core/tajo-core-backend/src/test/resources/queries/TestGroupByQuery/testCountDistinct.sql
>  6fe604e 
>   
> tajo-core/tajo-core-backend/src/test/resources/queries/TestGroupByQuery/testCountDistinct2.sql
>  6bf8a8a 
>   
> tajo-core/tajo-core-backend/src/test/resources/queries/TestGroupByQuery/testDistinctAggregation1.sql
>  PRE-CREATION 
>   
> tajo-core/tajo-core-backend/src/test/resources/queries/TestGroupByQuery/testDistinctAggregation2.sql
>  PRE-CREATION 
>   
> tajo-core/tajo-core-backend/src/test/resources/queries/TestGroupByQuery/testDistinctAggregation3.sql
>  PRE-CREATION 
>   
> tajo-core/tajo-core-backend/src/test/resources/queries/TestGroupByQuery/testDistinctAggregation4.sql
>  PRE-CREATION 
>   
> tajo-core/tajo-core-backend/src/test/resources/queries/TestGroupByQuery/testDistinctAggregation5.sql
>  PRE-CREATION 
>   
> tajo-core/tajo-core-backend/src/test/resources/queries/TestGroupByQuery/testDistinctAggregationWithHaving1.sql
>  PRE-CREATION 
>   
> tajo-core/tajo-core-backend/src/test/resources/results/TestGroupByQuery/testCountDistinct.result
>  f2ad32a 
>   
> tajo-core/tajo-core-backend/src/test/resources/results/TestGroupByQuery/testCountDistinct2.result
>  9164120 
>   
> tajo-core/tajo-core-backend/src/test/resources/results/TestGroupByQuery/testDistinctAggregation1.result
>  PRE-CREATION 
>   
> tajo-core/tajo-core-backend/src/test/resources/results/TestGroupByQuery/testDistinctAggregation2.result
>  PRE-CREATION 
>   
> tajo-core/tajo-core-backend/src/test/resources/results/TestGroupByQuery/testDistinctAggregation3.result
>  PRE-CREATION 
>   
> tajo-core/tajo-core-backend/src/test/resources/results/TestGroupByQuery/testDistinctAggregation4.result
>  PRE-CREATION 
>   
> tajo-core/tajo-core-backend/src/test/resources/results/TestGroupByQuery/testDistinctAggregation5.result
>  PRE-CREATION 
>   
> tajo-core/tajo-core-backend/src/test/resources/results/TestGroupByQuery/testDistinctAggregationWithHaving1.result
>  PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/18210/diff/
> 
> 
> Testing
> -------
> 
> mvn clean install
> 
> 
> Thanks,
> 
> Hyunsik Choi
> 
>

Reply via email to