[ 
https://issues.apache.org/jira/browse/DRILL-4539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15228889#comment-15228889
 ] 

ASF GitHub Bot commented on DRILL-4539:
---------------------------------------

Github user amansinha100 commented on a diff in the pull request:

    https://github.com/apache/drill/pull/462#discussion_r58761755
  
    --- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/DrillRelOptUtil.java
 ---
    @@ -169,4 +176,223 @@ private static boolean containIdentity(List<? extends 
RexNode> exps,
         }
         return true;
       }
    +
    +  /**
    +   * Copied from {@link RelOptUtil#splitJoinCondition(RelNode, RelNode, 
RexNode, List, List)}. Modified to rewrite
    +   * the null equal join condition using IS NOT DISTINCT FROM operator.
    +   *
    +   * Splits out the equi-join components of a join condition, and returns
    +   * what's left. For example, given the condition
    +   *
    +   * <blockquote><code>L.A = R.X AND L.B = L.C AND (L.D = 5 OR L.E =
    +   * R.Y)</code></blockquote>
    +   *
    +   * returns
    +   *
    +   * <ul>
    +   * <li>leftKeys = {A}
    +   * <li>rightKeys = {X}
    +   * <li>rest = L.B = L.C AND (L.D = 5 OR L.E = R.Y)</li>
    +   * </ul>
    +   *
    +   * @param left      left input to join
    +   * @param right     right input to join
    +   * @param condition join condition
    +   * @param leftKeys  The ordinals of the fields from the left input which 
are
    +   *                  equi-join keys
    +   * @param rightKeys The ordinals of the fields from the right input which
    +   *                  are equi-join keys
    +   * @param joinOps List of equi-join operators (EQUALS or IS NOT DISTINCT 
FROM) used to join the left and right keys.
    +   * @return remaining join filters that are not equijoins; may return a
    +   * {@link RexLiteral} true, but never null
    +   */
    +  public static RexNode splitJoinCondition(
    +      RelNode left,
    +      RelNode right,
    +      RexNode condition,
    +      List<Integer> leftKeys,
    +      List<Integer> rightKeys,
    +      List<SqlBinaryOperator> joinOps) {
    +    final List<RexNode> nonEquiList = new ArrayList<>();
    +
    +    splitJoinCondition(
    +        left.getRowType().getFieldCount(),
    +        condition,
    +        leftKeys,
    +        rightKeys,
    +        joinOps,
    +        nonEquiList);
    +
    +    return RexUtil.composeConjunction(
    +        left.getCluster().getRexBuilder(), nonEquiList, false);
    +  }
    +
    +  /**
    +   * Copied from {@link RelOptUtil#splitJoinCondition(int, RexNode, List, 
List, List)}. Modified to rewrite the null
    +   * equal join condition using IS NOT DISTINCT FROM operator.
    +   */
    +  private static void splitJoinCondition(
    --- End diff --
    
    Can you confirm if this rewrite does *not* do the conversion if the join 
condition happens to involve columns coming from not just 2 tables but from 3  
tables ? e.g if the user accidentally gives: 
    
    SELECT * FROM t1, t2, t3 WHERE t1.a = t2.a  OR (t1.b is null and t3.b is 
null)


> Add support for Null Equality Joins
> -----------------------------------
>
>                 Key: DRILL-4539
>                 URL: https://issues.apache.org/jira/browse/DRILL-4539
>             Project: Apache Drill
>          Issue Type: Improvement
>            Reporter: Jacques Nadeau
>            Assignee: Venki Korukanti
>
> Tableau frequently generates queries similar to this:
> {code}
> SELECT `t0`.`city` AS `city`,
>   `t2`.`X_measure__B` AS `max_Calculation_DFIDBHHAIIECCJFDAG_ok`,
>   `t0`.`state` AS `state`,
>   `t0`.`sum_stars_ok` AS `sum_stars_ok`
> FROM (
>   SELECT `business`.`city` AS `city`,
>     `business`.`state` AS `state`,
>     SUM(`business`.`stars`) AS `sum_stars_ok`
>   FROM `mongo.academic`.`business` `business`
>   GROUP BY `business`.`city`,
>     `business`.`state`
> ) `t0`
>   INNER JOIN (
>   SELECT MAX(`t1`.`X_measure__A`) AS `X_measure__B`,
>     `t1`.`city` AS `city`,
>     `t1`.`state` AS `state`
>   FROM (
>     SELECT `business`.`city` AS `city`,
>       `business`.`state` AS `state`,
>       `business`.`business_id` AS `business_id`,
>       SUM(`business`.`stars`) AS `X_measure__A`
>     FROM `mongo.academic`.`business` `business`
>     GROUP BY `business`.`city`,
>       `business`.`state`,
>       `business`.`business_id`
>   ) `t1`
>   GROUP BY `t1`.`city`,
>     `t1`.`state`
> ) `t2` ON (((`t0`.`city` = `t2`.`city`) OR ((`t0`.`city` IS NULL) AND 
> (`t2`.`city` IS NULL))) AND ((`t0`.`state` = `t2`.`state`) OR ((`t0`.`state` 
> IS NULL) AND (`t2`.`state` IS NULL))))
> {code}
> If you look at the join condition, you'll note that the join condition is an 
> equality condition which also allows null=null. We should add a planning 
> rewrite rule and execution join option to allow null equality so that we 
> don't treat this as a cartesian join.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to