[ 
https://issues.apache.org/jira/browse/DRILL-4539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15212566#comment-15212566
 ] 

Aman Sinha commented on DRILL-4539:
-----------------------------------

That's odd..we should not need to do NestedLoop join for this.  Both HashJoin 
and MergeJoin support the 'is not distinct from' join condition.  For an 
example of hashjoin where this gets used, see DRILL-2092.  I think the 
JoinUtils.checkCartesianJoin() is incorrectly reporting that this is a 
non-equality join.   Some changes may be needed there based on output of 
RelOptUtil.splitJoinCondition().  

> Add support for Null Equality Joins
> -----------------------------------
>
>                 Key: DRILL-4539
>                 URL: https://issues.apache.org/jira/browse/DRILL-4539
>             Project: Apache Drill
>          Issue Type: Improvement
>            Reporter: Jacques Nadeau
>            Assignee: Venki Korukanti
>
> Tableau frequently generates queries similar to this:
> {code}
> SELECT `t0`.`city` AS `city`,
>   `t2`.`X_measure__B` AS `max_Calculation_DFIDBHHAIIECCJFDAG_ok`,
>   `t0`.`state` AS `state`,
>   `t0`.`sum_stars_ok` AS `sum_stars_ok`
> FROM (
>   SELECT `business`.`city` AS `city`,
>     `business`.`state` AS `state`,
>     SUM(`business`.`stars`) AS `sum_stars_ok`
>   FROM `mongo.academic`.`business` `business`
>   GROUP BY `business`.`city`,
>     `business`.`state`
> ) `t0`
>   INNER JOIN (
>   SELECT MAX(`t1`.`X_measure__A`) AS `X_measure__B`,
>     `t1`.`city` AS `city`,
>     `t1`.`state` AS `state`
>   FROM (
>     SELECT `business`.`city` AS `city`,
>       `business`.`state` AS `state`,
>       `business`.`business_id` AS `business_id`,
>       SUM(`business`.`stars`) AS `X_measure__A`
>     FROM `mongo.academic`.`business` `business`
>     GROUP BY `business`.`city`,
>       `business`.`state`,
>       `business`.`business_id`
>   ) `t1`
>   GROUP BY `t1`.`city`,
>     `t1`.`state`
> ) `t2` ON (((`t0`.`city` = `t2`.`city`) OR ((`t0`.`city` IS NULL) AND 
> (`t2`.`city` IS NULL))) AND ((`t0`.`state` = `t2`.`state`) OR ((`t0`.`state` 
> IS NULL) AND (`t2`.`state` IS NULL))))
> {code}
> If you look at the join condition, you'll note that the join condition is an 
> equality condition which also allows null=null. We should add a planning 
> rewrite rule and execution join option to allow null equality so that we 
> don't treat this as a cartesian join.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to