[ 
https://issues.apache.org/jira/browse/DRILL-6839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16683739#comment-16683739
 ] 

Volodymyr Vysotskyi commented on DRILL-6839:
--------------------------------------------

[~amansinha100], is it possible to modify {{StreamAggPrule}} to create either 
two-phase aggregation or single-phase one for the case when two-phase cannot be 
created, similar to 
[{{ProjectPrule}}|https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/planner/physical/ProjectPrule.java#L68]
 and other rules?
Or was {{StreamAggPrule}} implemented in such a way because if the plan has 
both single- and two-phase aggregations, the single one will have less cost? 
If it is true, can we modify the cost calculations to justify it depending on 
the row count and values of broadcast options?

> Failed to plan (aggregate + Hash or NL join) when slice target is low 
> ----------------------------------------------------------------------
>
>                 Key: DRILL-6839
>                 URL: https://issues.apache.org/jira/browse/DRILL-6839
>             Project: Apache Drill
>          Issue Type: Bug
>            Reporter: Igor Guzenko
>            Priority: Major
>             Fix For: 1.16.0
>
>
> *Case 1.* When nested loop join is about to be used:
>  - Option "_planner.enable_nljoin_for_scalar_only_" is set to false
>  - Option "_planner.slice_target_" is set to low value for imitation of big 
> input tables
>  
> {code:java}
> @Category(SqlTest.class)
> public class CrossJoinTest extends ClusterTest {
>  @BeforeClass
>  public static void setUp() throws Exception {
>  startCluster(ClusterFixture.builder(dirTestWatcher));
>  }
>  @Test
>  public void testCrossJoinSucceedsForLowSliceTarget() throws Exception {
>    try {
>      client.alterSession(PlannerSettings.NLJOIN_FOR_SCALAR.getOptionName(), 
> false);
>      client.alterSession(ExecConstants.SLICE_TARGET, 1);
>      queryBuilder().sql(
>         "SELECT COUNT(l.nation_id) " +
>         "FROM cp.`tpch/nation.parquet` l " +
>         ", cp.`tpch/region.parquet` r")
>      .run();
>    } finally {
>     client.resetSession(ExecConstants.SLICE_TARGET);
>     client.resetSession(PlannerSettings.NLJOIN_FOR_SCALAR.getOptionName());
>    }
>  }
> }{code}
>  
> *Case 2.* When hash join is about to be used:
>  - Option "planner.enable_mergejoin" is set to false, so hash join will be 
> used instead
>  - Option "planner.slice_target" is set to low value for imitation of big 
> input tables
>  - Comment out //ruleList.add(HashJoinPrule.DIST_INSTANCE); in 
> PlannerPhase.getPhysicalRules method
> {code:java}
> @Category(SqlTest.class)
> public class CrossJoinTest extends ClusterTest {
>  @BeforeClass
>  public static void setUp() throws Exception {
>    startCluster(ClusterFixture.builder(dirTestWatcher));
>  }
>  @Test
>  public void testInnerJoinSucceedsForLowSliceTarget() throws Exception {
>    try {
>     client.alterSession(PlannerSettings.MERGEJOIN.getOptionName(), false);
>     client.alterSession(ExecConstants.SLICE_TARGET, 1);
>     queryBuilder().sql(
>       "SELECT COUNT(l.nation_id) " +
>       "FROM cp.`tpch/nation.parquet` l " +
>       "INNER JOIN cp.`tpch/region.parquet` r " +
>       "ON r.nation_id = l.nation_id")
>     .run();
>    } finally {
>     client.resetSession(ExecConstants.SLICE_TARGET);
>     client.resetSession(PlannerSettings.MERGEJOIN.getOptionName());
>    }
>  }
> }
> {code}
>  
> *Workaround:* To avoid the exception we need to set option 
> "_planner.enable_multiphase_agg_" to false. By doing this we avoid 
> unsuccessful attempts to create 2 phase aggregation plan in StreamAggPrule 
> and guarantee that logical aggregate will be converted to physical one. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to