[jira] [Commented] (DRILL-4539) Add support for Null Equality Joins

2016-09-22 Thread Zelaine Fong (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15513830#comment-15513830
 ] 

Zelaine Fong commented on DRILL-4539:
-

Note that this was addressed in Calcite in CALCITE-1200.  But because Drill is 
on an older Calcite fork, it doesn't have this change.

> Add support for Null Equality Joins
> ---
>
> Key: DRILL-4539
> URL: https://issues.apache.org/jira/browse/DRILL-4539
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Jacques Nadeau
>Assignee: Venki Korukanti
>
> Tableau frequently generates queries similar to this:
> {code}
> SELECT `t0`.`city` AS `city`,
>   `t2`.`X_measure__B` AS `max_Calculation_DFIDBHHAIIECCJFDAG_ok`,
>   `t0`.`state` AS `state`,
>   `t0`.`sum_stars_ok` AS `sum_stars_ok`
> FROM (
>   SELECT `business`.`city` AS `city`,
> `business`.`state` AS `state`,
> SUM(`business`.`stars`) AS `sum_stars_ok`
>   FROM `mongo.academic`.`business` `business`
>   GROUP BY `business`.`city`,
> `business`.`state`
> ) `t0`
>   INNER JOIN (
>   SELECT MAX(`t1`.`X_measure__A`) AS `X_measure__B`,
> `t1`.`city` AS `city`,
> `t1`.`state` AS `state`
>   FROM (
> SELECT `business`.`city` AS `city`,
>   `business`.`state` AS `state`,
>   `business`.`business_id` AS `business_id`,
>   SUM(`business`.`stars`) AS `X_measure__A`
> FROM `mongo.academic`.`business` `business`
> GROUP BY `business`.`city`,
>   `business`.`state`,
>   `business`.`business_id`
>   ) `t1`
>   GROUP BY `t1`.`city`,
> `t1`.`state`
> ) `t2` ON (((`t0`.`city` = `t2`.`city`) OR ((`t0`.`city` IS NULL) AND 
> (`t2`.`city` IS NULL))) AND ((`t0`.`state` = `t2`.`state`) OR ((`t0`.`state` 
> IS NULL) AND (`t2`.`state` IS NULL
> {code}
> If you look at the join condition, you'll note that the join condition is an 
> equality condition which also allows null=null. We should add a planning 
> rewrite rule and execution join option to allow null equality so that we 
> don't treat this as a cartesian join.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4539) Add support for Null Equality Joins

2016-08-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15411069#comment-15411069
 ] 

ASF GitHub Bot commented on DRILL-4539:
---

Github user vkorukanti closed the pull request at:

https://github.com/apache/drill/pull/462


> Add support for Null Equality Joins
> ---
>
> Key: DRILL-4539
> URL: https://issues.apache.org/jira/browse/DRILL-4539
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Jacques Nadeau
>Assignee: Venki Korukanti
>
> Tableau frequently generates queries similar to this:
> {code}
> SELECT `t0`.`city` AS `city`,
>   `t2`.`X_measure__B` AS `max_Calculation_DFIDBHHAIIECCJFDAG_ok`,
>   `t0`.`state` AS `state`,
>   `t0`.`sum_stars_ok` AS `sum_stars_ok`
> FROM (
>   SELECT `business`.`city` AS `city`,
> `business`.`state` AS `state`,
> SUM(`business`.`stars`) AS `sum_stars_ok`
>   FROM `mongo.academic`.`business` `business`
>   GROUP BY `business`.`city`,
> `business`.`state`
> ) `t0`
>   INNER JOIN (
>   SELECT MAX(`t1`.`X_measure__A`) AS `X_measure__B`,
> `t1`.`city` AS `city`,
> `t1`.`state` AS `state`
>   FROM (
> SELECT `business`.`city` AS `city`,
>   `business`.`state` AS `state`,
>   `business`.`business_id` AS `business_id`,
>   SUM(`business`.`stars`) AS `X_measure__A`
> FROM `mongo.academic`.`business` `business`
> GROUP BY `business`.`city`,
>   `business`.`state`,
>   `business`.`business_id`
>   ) `t1`
>   GROUP BY `t1`.`city`,
> `t1`.`state`
> ) `t2` ON (((`t0`.`city` = `t2`.`city`) OR ((`t0`.`city` IS NULL) AND 
> (`t2`.`city` IS NULL))) AND ((`t0`.`state` = `t2`.`state`) OR ((`t0`.`state` 
> IS NULL) AND (`t2`.`state` IS NULL
> {code}
> If you look at the join condition, you'll note that the join condition is an 
> equality condition which also allows null=null. We should add a planning 
> rewrite rule and execution join option to allow null equality so that we 
> don't treat this as a cartesian join.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4539) Add support for Null Equality Joins

2016-04-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243693#comment-15243693
 ] 

ASF GitHub Bot commented on DRILL-4539:
---

Github user vkorukanti commented on a diff in the pull request:

https://github.com/apache/drill/pull/462#discussion_r59943901
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/HashTableConfig.java
 ---
@@ -22,25 +22,36 @@
 import com.fasterxml.jackson.annotation.JsonCreator;
 import com.fasterxml.jackson.annotation.JsonProperty;
 import com.fasterxml.jackson.annotation.JsonTypeName;
+import org.apache.drill.exec.physical.impl.join.JoinUtils.JoinComparator;
 
 @JsonTypeName("hashtable-config")
 public class HashTableConfig  {
 
-  static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(HashTableConfig.class);
-
   private final int initialCapacity;
   private final float loadFactor;
   private final NamedExpression[] keyExprsBuild;
   private final NamedExpression[] keyExprsProbe;
+  private final JoinComparator[] comparators;
 
   @JsonCreator
   public HashTableConfig(@JsonProperty("initialCapacity") int 
initialCapacity, @JsonProperty("loadFactor") float loadFactor,
  @JsonProperty("keyExprsBuild") NamedExpression[] 
keyExprsBuild,
- @JsonProperty("keyExprsProbe") NamedExpression[] 
keyExprsProbe) {
+ @JsonProperty("keyExprsProbe") NamedExpression[] 
keyExprsProbe,
+ @JsonProperty("comparators") JoinComparator[] 
comparators) {
 this.initialCapacity = initialCapacity;
 this.loadFactor = loadFactor;
 this.keyExprsBuild = keyExprsBuild;
 this.keyExprsProbe = keyExprsProbe;
+if (comparators != null) {
+  this.comparators = comparators;
+} else {
+  // default is nulls are equal
--- End diff --

Agree, it is clean to remove passing null and explicitly pass a value to 
comparator. Updated patch. Also renamed join.JoinComparator to 
common.Comparator.


> Add support for Null Equality Joins
> ---
>
> Key: DRILL-4539
> URL: https://issues.apache.org/jira/browse/DRILL-4539
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Jacques Nadeau
>Assignee: Venki Korukanti
>
> Tableau frequently generates queries similar to this:
> {code}
> SELECT `t0`.`city` AS `city`,
>   `t2`.`X_measure__B` AS `max_Calculation_DFIDBHHAIIECCJFDAG_ok`,
>   `t0`.`state` AS `state`,
>   `t0`.`sum_stars_ok` AS `sum_stars_ok`
> FROM (
>   SELECT `business`.`city` AS `city`,
> `business`.`state` AS `state`,
> SUM(`business`.`stars`) AS `sum_stars_ok`
>   FROM `mongo.academic`.`business` `business`
>   GROUP BY `business`.`city`,
> `business`.`state`
> ) `t0`
>   INNER JOIN (
>   SELECT MAX(`t1`.`X_measure__A`) AS `X_measure__B`,
> `t1`.`city` AS `city`,
> `t1`.`state` AS `state`
>   FROM (
> SELECT `business`.`city` AS `city`,
>   `business`.`state` AS `state`,
>   `business`.`business_id` AS `business_id`,
>   SUM(`business`.`stars`) AS `X_measure__A`
> FROM `mongo.academic`.`business` `business`
> GROUP BY `business`.`city`,
>   `business`.`state`,
>   `business`.`business_id`
>   ) `t1`
>   GROUP BY `t1`.`city`,
> `t1`.`state`
> ) `t2` ON (((`t0`.`city` = `t2`.`city`) OR ((`t0`.`city` IS NULL) AND 
> (`t2`.`city` IS NULL))) AND ((`t0`.`state` = `t2`.`state`) OR ((`t0`.`state` 
> IS NULL) AND (`t2`.`state` IS NULL
> {code}
> If you look at the join condition, you'll note that the join condition is an 
> equality condition which also allows null=null. We should add a planning 
> rewrite rule and execution join option to allow null equality so that we 
> don't treat this as a cartesian join.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4539) Add support for Null Equality Joins

2016-04-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243660#comment-15243660
 ] 

ASF GitHub Bot commented on DRILL-4539:
---

Github user amansinha100 commented on the pull request:

https://github.com/apache/drill/pull/462#issuecomment-210647305
  
I had one comment on the latest commit but overall LGTM.  +1


> Add support for Null Equality Joins
> ---
>
> Key: DRILL-4539
> URL: https://issues.apache.org/jira/browse/DRILL-4539
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Jacques Nadeau
>Assignee: Venki Korukanti
>
> Tableau frequently generates queries similar to this:
> {code}
> SELECT `t0`.`city` AS `city`,
>   `t2`.`X_measure__B` AS `max_Calculation_DFIDBHHAIIECCJFDAG_ok`,
>   `t0`.`state` AS `state`,
>   `t0`.`sum_stars_ok` AS `sum_stars_ok`
> FROM (
>   SELECT `business`.`city` AS `city`,
> `business`.`state` AS `state`,
> SUM(`business`.`stars`) AS `sum_stars_ok`
>   FROM `mongo.academic`.`business` `business`
>   GROUP BY `business`.`city`,
> `business`.`state`
> ) `t0`
>   INNER JOIN (
>   SELECT MAX(`t1`.`X_measure__A`) AS `X_measure__B`,
> `t1`.`city` AS `city`,
> `t1`.`state` AS `state`
>   FROM (
> SELECT `business`.`city` AS `city`,
>   `business`.`state` AS `state`,
>   `business`.`business_id` AS `business_id`,
>   SUM(`business`.`stars`) AS `X_measure__A`
> FROM `mongo.academic`.`business` `business`
> GROUP BY `business`.`city`,
>   `business`.`state`,
>   `business`.`business_id`
>   ) `t1`
>   GROUP BY `t1`.`city`,
> `t1`.`state`
> ) `t2` ON (((`t0`.`city` = `t2`.`city`) OR ((`t0`.`city` IS NULL) AND 
> (`t2`.`city` IS NULL))) AND ((`t0`.`state` = `t2`.`state`) OR ((`t0`.`state` 
> IS NULL) AND (`t2`.`state` IS NULL
> {code}
> If you look at the join condition, you'll note that the join condition is an 
> equality condition which also allows null=null. We should add a planning 
> rewrite rule and execution join option to allow null equality so that we 
> don't treat this as a cartesian join.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4539) Add support for Null Equality Joins

2016-04-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243656#comment-15243656
 ] 

ASF GitHub Bot commented on DRILL-4539:
---

Github user amansinha100 commented on a diff in the pull request:

https://github.com/apache/drill/pull/462#discussion_r59941224
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/HashTableConfig.java
 ---
@@ -22,25 +22,36 @@
 import com.fasterxml.jackson.annotation.JsonCreator;
 import com.fasterxml.jackson.annotation.JsonProperty;
 import com.fasterxml.jackson.annotation.JsonTypeName;
+import org.apache.drill.exec.physical.impl.join.JoinUtils.JoinComparator;
 
 @JsonTypeName("hashtable-config")
 public class HashTableConfig  {
 
-  static final org.slf4j.Logger logger = 
org.slf4j.LoggerFactory.getLogger(HashTableConfig.class);
-
   private final int initialCapacity;
   private final float loadFactor;
   private final NamedExpression[] keyExprsBuild;
   private final NamedExpression[] keyExprsProbe;
+  private final JoinComparator[] comparators;
 
   @JsonCreator
   public HashTableConfig(@JsonProperty("initialCapacity") int 
initialCapacity, @JsonProperty("loadFactor") float loadFactor,
  @JsonProperty("keyExprsBuild") NamedExpression[] 
keyExprsBuild,
- @JsonProperty("keyExprsProbe") NamedExpression[] 
keyExprsProbe) {
+ @JsonProperty("keyExprsProbe") NamedExpression[] 
keyExprsProbe,
+ @JsonProperty("comparators") JoinComparator[] 
comparators) {
 this.initialCapacity = initialCapacity;
 this.loadFactor = loadFactor;
 this.keyExprsBuild = keyExprsBuild;
 this.keyExprsProbe = keyExprsProbe;
+if (comparators != null) {
+  this.comparators = comparators;
+} else {
+  // default is nulls are equal
--- End diff --

It might seem confusing at first (it was to me) to consider the 
nulls-are-equal as the default case since that's not true for joins; however it 
is true for a group-by operation.  Could you have the caller pass in explicit 
comparator such that the HashTableConfig does not have to decide ?  The 
HashAggregate could instantiate the hash table with an IS_NOT_DISTINCT_FROM 
comparator.  What do you think ?  (one minor point is that the 'JoinComparator' 
is now being used for both Joins and Aggregates...we could call it Comparator 
or something generic). 


> Add support for Null Equality Joins
> ---
>
> Key: DRILL-4539
> URL: https://issues.apache.org/jira/browse/DRILL-4539
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Jacques Nadeau
>Assignee: Venki Korukanti
>
> Tableau frequently generates queries similar to this:
> {code}
> SELECT `t0`.`city` AS `city`,
>   `t2`.`X_measure__B` AS `max_Calculation_DFIDBHHAIIECCJFDAG_ok`,
>   `t0`.`state` AS `state`,
>   `t0`.`sum_stars_ok` AS `sum_stars_ok`
> FROM (
>   SELECT `business`.`city` AS `city`,
> `business`.`state` AS `state`,
> SUM(`business`.`stars`) AS `sum_stars_ok`
>   FROM `mongo.academic`.`business` `business`
>   GROUP BY `business`.`city`,
> `business`.`state`
> ) `t0`
>   INNER JOIN (
>   SELECT MAX(`t1`.`X_measure__A`) AS `X_measure__B`,
> `t1`.`city` AS `city`,
> `t1`.`state` AS `state`
>   FROM (
> SELECT `business`.`city` AS `city`,
>   `business`.`state` AS `state`,
>   `business`.`business_id` AS `business_id`,
>   SUM(`business`.`stars`) AS `X_measure__A`
> FROM `mongo.academic`.`business` `business`
> GROUP BY `business`.`city`,
>   `business`.`state`,
>   `business`.`business_id`
>   ) `t1`
>   GROUP BY `t1`.`city`,
> `t1`.`state`
> ) `t2` ON (((`t0`.`city` = `t2`.`city`) OR ((`t0`.`city` IS NULL) AND 
> (`t2`.`city` IS NULL))) AND ((`t0`.`state` = `t2`.`state`) OR ((`t0`.`state` 
> IS NULL) AND (`t2`.`state` IS NULL
> {code}
> If you look at the join condition, you'll note that the join condition is an 
> equality condition which also allows null=null. We should add a planning 
> rewrite rule and execution join option to allow null equality so that we 
> don't treat this as a cartesian join.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4539) Add support for Null Equality Joins

2016-04-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15243221#comment-15243221
 ] 

ASF GitHub Bot commented on DRILL-4539:
---

Github user vkorukanti commented on the pull request:

https://github.com/apache/drill/pull/462#issuecomment-210546147
  
Updated PR to handle mixed EQUALS and IS NOT DISTINCT FROM in same join 
condition. @amansinha100 Could you please review the latest commit in the PR?


> Add support for Null Equality Joins
> ---
>
> Key: DRILL-4539
> URL: https://issues.apache.org/jira/browse/DRILL-4539
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Jacques Nadeau
>Assignee: Venki Korukanti
>
> Tableau frequently generates queries similar to this:
> {code}
> SELECT `t0`.`city` AS `city`,
>   `t2`.`X_measure__B` AS `max_Calculation_DFIDBHHAIIECCJFDAG_ok`,
>   `t0`.`state` AS `state`,
>   `t0`.`sum_stars_ok` AS `sum_stars_ok`
> FROM (
>   SELECT `business`.`city` AS `city`,
> `business`.`state` AS `state`,
> SUM(`business`.`stars`) AS `sum_stars_ok`
>   FROM `mongo.academic`.`business` `business`
>   GROUP BY `business`.`city`,
> `business`.`state`
> ) `t0`
>   INNER JOIN (
>   SELECT MAX(`t1`.`X_measure__A`) AS `X_measure__B`,
> `t1`.`city` AS `city`,
> `t1`.`state` AS `state`
>   FROM (
> SELECT `business`.`city` AS `city`,
>   `business`.`state` AS `state`,
>   `business`.`business_id` AS `business_id`,
>   SUM(`business`.`stars`) AS `X_measure__A`
> FROM `mongo.academic`.`business` `business`
> GROUP BY `business`.`city`,
>   `business`.`state`,
>   `business`.`business_id`
>   ) `t1`
>   GROUP BY `t1`.`city`,
> `t1`.`state`
> ) `t2` ON (((`t0`.`city` = `t2`.`city`) OR ((`t0`.`city` IS NULL) AND 
> (`t2`.`city` IS NULL))) AND ((`t0`.`state` = `t2`.`state`) OR ((`t0`.`state` 
> IS NULL) AND (`t2`.`state` IS NULL
> {code}
> If you look at the join condition, you'll note that the join condition is an 
> equality condition which also allows null=null. We should add a planning 
> rewrite rule and execution join option to allow null equality so that we 
> don't treat this as a cartesian join.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4539) Add support for Null Equality Joins

2016-04-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15242283#comment-15242283
 ] 

ASF GitHub Bot commented on DRILL-4539:
---

Github user vkorukanti commented on the pull request:

https://github.com/apache/drill/pull/462#issuecomment-210244225
  
It seems like mixed condition of EQUALS and IS NOT DISTINCT FROM in the 
same join condition can be supported few changes. Here are the WIP changes [1]. 
Need to test more and clean it up.

[1] 
https://github.com/vkorukanti/drill/commit/0c8a4a2cef926492853e9c8ae856b82615be3c72


> Add support for Null Equality Joins
> ---
>
> Key: DRILL-4539
> URL: https://issues.apache.org/jira/browse/DRILL-4539
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Jacques Nadeau
>Assignee: Venki Korukanti
>
> Tableau frequently generates queries similar to this:
> {code}
> SELECT `t0`.`city` AS `city`,
>   `t2`.`X_measure__B` AS `max_Calculation_DFIDBHHAIIECCJFDAG_ok`,
>   `t0`.`state` AS `state`,
>   `t0`.`sum_stars_ok` AS `sum_stars_ok`
> FROM (
>   SELECT `business`.`city` AS `city`,
> `business`.`state` AS `state`,
> SUM(`business`.`stars`) AS `sum_stars_ok`
>   FROM `mongo.academic`.`business` `business`
>   GROUP BY `business`.`city`,
> `business`.`state`
> ) `t0`
>   INNER JOIN (
>   SELECT MAX(`t1`.`X_measure__A`) AS `X_measure__B`,
> `t1`.`city` AS `city`,
> `t1`.`state` AS `state`
>   FROM (
> SELECT `business`.`city` AS `city`,
>   `business`.`state` AS `state`,
>   `business`.`business_id` AS `business_id`,
>   SUM(`business`.`stars`) AS `X_measure__A`
> FROM `mongo.academic`.`business` `business`
> GROUP BY `business`.`city`,
>   `business`.`state`,
>   `business`.`business_id`
>   ) `t1`
>   GROUP BY `t1`.`city`,
> `t1`.`state`
> ) `t2` ON (((`t0`.`city` = `t2`.`city`) OR ((`t0`.`city` IS NULL) AND 
> (`t2`.`city` IS NULL))) AND ((`t0`.`state` = `t2`.`state`) OR ((`t0`.`state` 
> IS NULL) AND (`t2`.`state` IS NULL
> {code}
> If you look at the join condition, you'll note that the join condition is an 
> equality condition which also allows null=null. We should add a planning 
> rewrite rule and execution join option to allow null equality so that we 
> don't treat this as a cartesian join.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4539) Add support for Null Equality Joins

2016-04-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15241672#comment-15241672
 ] 

ASF GitHub Bot commented on DRILL-4539:
---

Github user amansinha100 commented on the pull request:

https://github.com/apache/drill/pull/462#issuecomment-210085324
  
I think additional support may be needed; the hash table only stores the 
equality join keys, so the generated code in ChainedHashTable is targeted to 
the those columns.  For the additional columns you will need to keep track of 
them (I haven't thought through the details yet.)


> Add support for Null Equality Joins
> ---
>
> Key: DRILL-4539
> URL: https://issues.apache.org/jira/browse/DRILL-4539
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Jacques Nadeau
>Assignee: Venki Korukanti
>
> Tableau frequently generates queries similar to this:
> {code}
> SELECT `t0`.`city` AS `city`,
>   `t2`.`X_measure__B` AS `max_Calculation_DFIDBHHAIIECCJFDAG_ok`,
>   `t0`.`state` AS `state`,
>   `t0`.`sum_stars_ok` AS `sum_stars_ok`
> FROM (
>   SELECT `business`.`city` AS `city`,
> `business`.`state` AS `state`,
> SUM(`business`.`stars`) AS `sum_stars_ok`
>   FROM `mongo.academic`.`business` `business`
>   GROUP BY `business`.`city`,
> `business`.`state`
> ) `t0`
>   INNER JOIN (
>   SELECT MAX(`t1`.`X_measure__A`) AS `X_measure__B`,
> `t1`.`city` AS `city`,
> `t1`.`state` AS `state`
>   FROM (
> SELECT `business`.`city` AS `city`,
>   `business`.`state` AS `state`,
>   `business`.`business_id` AS `business_id`,
>   SUM(`business`.`stars`) AS `X_measure__A`
> FROM `mongo.academic`.`business` `business`
> GROUP BY `business`.`city`,
>   `business`.`state`,
>   `business`.`business_id`
>   ) `t1`
>   GROUP BY `t1`.`city`,
> `t1`.`state`
> ) `t2` ON (((`t0`.`city` = `t2`.`city`) OR ((`t0`.`city` IS NULL) AND 
> (`t2`.`city` IS NULL))) AND ((`t0`.`state` = `t2`.`state`) OR ((`t0`.`state` 
> IS NULL) AND (`t2`.`state` IS NULL
> {code}
> If you look at the join condition, you'll note that the join condition is an 
> equality condition which also allows null=null. We should add a planning 
> rewrite rule and execution join option to allow null equality so that we 
> don't treat this as a cartesian join.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4539) Add support for Null Equality Joins

2016-04-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15241530#comment-15241530
 ] 

ASF GitHub Bot commented on DRILL-4539:
---

Github user vkorukanti commented on the pull request:

https://github.com/apache/drill/pull/462#issuecomment-210056992
  
I haven't tried this, but wondering whether this is sufficient to handle 
mix of EQUALS and IS NOT DISTINCT FROM cases.  Maintain a list (one for each 
join condition) instead of single "areNullsEquals" flag in MergeJoinBatch [1] 
and ChainedHashTable [2] and generate the code accordingly for each condition.

[1] 
https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/MergeJoinBatch.java#L463

[2] 
https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/ChainedHashTable.java#L251


> Add support for Null Equality Joins
> ---
>
> Key: DRILL-4539
> URL: https://issues.apache.org/jira/browse/DRILL-4539
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Jacques Nadeau
>Assignee: Venki Korukanti
>
> Tableau frequently generates queries similar to this:
> {code}
> SELECT `t0`.`city` AS `city`,
>   `t2`.`X_measure__B` AS `max_Calculation_DFIDBHHAIIECCJFDAG_ok`,
>   `t0`.`state` AS `state`,
>   `t0`.`sum_stars_ok` AS `sum_stars_ok`
> FROM (
>   SELECT `business`.`city` AS `city`,
> `business`.`state` AS `state`,
> SUM(`business`.`stars`) AS `sum_stars_ok`
>   FROM `mongo.academic`.`business` `business`
>   GROUP BY `business`.`city`,
> `business`.`state`
> ) `t0`
>   INNER JOIN (
>   SELECT MAX(`t1`.`X_measure__A`) AS `X_measure__B`,
> `t1`.`city` AS `city`,
> `t1`.`state` AS `state`
>   FROM (
> SELECT `business`.`city` AS `city`,
>   `business`.`state` AS `state`,
>   `business`.`business_id` AS `business_id`,
>   SUM(`business`.`stars`) AS `X_measure__A`
> FROM `mongo.academic`.`business` `business`
> GROUP BY `business`.`city`,
>   `business`.`state`,
>   `business`.`business_id`
>   ) `t1`
>   GROUP BY `t1`.`city`,
> `t1`.`state`
> ) `t2` ON (((`t0`.`city` = `t2`.`city`) OR ((`t0`.`city` IS NULL) AND 
> (`t2`.`city` IS NULL))) AND ((`t0`.`state` = `t2`.`state`) OR ((`t0`.`state` 
> IS NULL) AND (`t2`.`state` IS NULL
> {code}
> If you look at the join condition, you'll note that the join condition is an 
> equality condition which also allows null=null. We should add a planning 
> rewrite rule and execution join option to allow null equality so that we 
> don't treat this as a cartesian join.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4539) Add support for Null Equality Joins

2016-04-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15241497#comment-15241497
 ] 

ASF GitHub Bot commented on DRILL-4539:
---

Github user amansinha100 commented on the pull request:

https://github.com/apache/drill/pull/462#issuecomment-210047513
  
Right, the HashJoin and MergeJoin currently can handle one type of 
comparison, not mixed...this is why when we have a join condition such as  
t1.a1 = t2.a2  AND t2.b1 < t2.b2  then Drill planner currently will split it up 
into equality join and the non-equality conditions.  The equality is handled by 
the join operator while a new Filter node is added after the join to process 
the non-equality.   I had created a JIRA to be able to handle both types of 
comparisons in the join operator:  see 
https://issues.apache.org/jira/browse/DRILL-3803.Do you need this support 
for this particular null equality join issue ?  I would have thought that your 
rewrite rule would produce the IS NOT DISTINCT FROM condition, but not a mixed 
condition.  


> Add support for Null Equality Joins
> ---
>
> Key: DRILL-4539
> URL: https://issues.apache.org/jira/browse/DRILL-4539
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Jacques Nadeau
>Assignee: Venki Korukanti
>
> Tableau frequently generates queries similar to this:
> {code}
> SELECT `t0`.`city` AS `city`,
>   `t2`.`X_measure__B` AS `max_Calculation_DFIDBHHAIIECCJFDAG_ok`,
>   `t0`.`state` AS `state`,
>   `t0`.`sum_stars_ok` AS `sum_stars_ok`
> FROM (
>   SELECT `business`.`city` AS `city`,
> `business`.`state` AS `state`,
> SUM(`business`.`stars`) AS `sum_stars_ok`
>   FROM `mongo.academic`.`business` `business`
>   GROUP BY `business`.`city`,
> `business`.`state`
> ) `t0`
>   INNER JOIN (
>   SELECT MAX(`t1`.`X_measure__A`) AS `X_measure__B`,
> `t1`.`city` AS `city`,
> `t1`.`state` AS `state`
>   FROM (
> SELECT `business`.`city` AS `city`,
>   `business`.`state` AS `state`,
>   `business`.`business_id` AS `business_id`,
>   SUM(`business`.`stars`) AS `X_measure__A`
> FROM `mongo.academic`.`business` `business`
> GROUP BY `business`.`city`,
>   `business`.`state`,
>   `business`.`business_id`
>   ) `t1`
>   GROUP BY `t1`.`city`,
> `t1`.`state`
> ) `t2` ON (((`t0`.`city` = `t2`.`city`) OR ((`t0`.`city` IS NULL) AND 
> (`t2`.`city` IS NULL))) AND ((`t0`.`state` = `t2`.`state`) OR ((`t0`.`state` 
> IS NULL) AND (`t2`.`state` IS NULL
> {code}
> If you look at the join condition, you'll note that the join condition is an 
> equality condition which also allows null=null. We should add a planning 
> rewrite rule and execution join option to allow null equality so that we 
> don't treat this as a cartesian join.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4539) Add support for Null Equality Joins

2016-04-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15240832#comment-15240832
 ] 

ASF GitHub Bot commented on DRILL-4539:
---

Github user vkorukanti commented on the pull request:

https://github.com/apache/drill/pull/462#issuecomment-209836956
  
Added few more tests:
```SELECT * FROM t1, t2, t3 WHERE t1.a = t2.a OR (t1.b is null and t3.b is 
null)``` to test it throws CannotPlanException
```SELECT * FROM t1, t2 WHERE t1.a = t2.a OR (t1.a is null and t2.a is 
null)``` to test null-equality filter is pushed into join condition. Had to 
make some changes to ```DrillFilterJoinRules``` to handle this case.

Also few more tests around 3 table join that involve null-equality 
condition in ```WHERE``` clause and ```JOIN ON```

4 tests from extended regression suite failed, because now we generate a 
join condition that involves ```=`` and ```IS NOT DISTINCT FROM```. 

```SELECT * FROM cp.`jsonInput/nullableOrdered1.json` t1 JOIN 
cp.`jsonInput/nullableOrdered2.json` t2 ON t1.key = t2.key WHERE t1.data IS NOT 
DISTINCT FROM t2.data```

It looks like HashJoin/MergeJoin operators are not handling mixed 
comparator in join condition. Looking at the ```ChainedHashTable``` and 
```MergeJoinBatch``` code briefly, it seems like we should be able to generate 
code for handling mixed comparators case. @amansinha100: Let me know if I am 
not correct.





> Add support for Null Equality Joins
> ---
>
> Key: DRILL-4539
> URL: https://issues.apache.org/jira/browse/DRILL-4539
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Jacques Nadeau
>Assignee: Venki Korukanti
>
> Tableau frequently generates queries similar to this:
> {code}
> SELECT `t0`.`city` AS `city`,
>   `t2`.`X_measure__B` AS `max_Calculation_DFIDBHHAIIECCJFDAG_ok`,
>   `t0`.`state` AS `state`,
>   `t0`.`sum_stars_ok` AS `sum_stars_ok`
> FROM (
>   SELECT `business`.`city` AS `city`,
> `business`.`state` AS `state`,
> SUM(`business`.`stars`) AS `sum_stars_ok`
>   FROM `mongo.academic`.`business` `business`
>   GROUP BY `business`.`city`,
> `business`.`state`
> ) `t0`
>   INNER JOIN (
>   SELECT MAX(`t1`.`X_measure__A`) AS `X_measure__B`,
> `t1`.`city` AS `city`,
> `t1`.`state` AS `state`
>   FROM (
> SELECT `business`.`city` AS `city`,
>   `business`.`state` AS `state`,
>   `business`.`business_id` AS `business_id`,
>   SUM(`business`.`stars`) AS `X_measure__A`
> FROM `mongo.academic`.`business` `business`
> GROUP BY `business`.`city`,
>   `business`.`state`,
>   `business`.`business_id`
>   ) `t1`
>   GROUP BY `t1`.`city`,
> `t1`.`state`
> ) `t2` ON (((`t0`.`city` = `t2`.`city`) OR ((`t0`.`city` IS NULL) AND 
> (`t2`.`city` IS NULL))) AND ((`t0`.`state` = `t2`.`state`) OR ((`t0`.`state` 
> IS NULL) AND (`t2`.`state` IS NULL
> {code}
> If you look at the join condition, you'll note that the join condition is an 
> equality condition which also allows null=null. We should add a planning 
> rewrite rule and execution join option to allow null equality so that we 
> don't treat this as a cartesian join.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4539) Add support for Null Equality Joins

2016-04-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228891#comment-15228891
 ] 

ASF GitHub Bot commented on DRILL-4539:
---

Github user amansinha100 commented on a diff in the pull request:

https://github.com/apache/drill/pull/462#discussion_r58761862
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/DrillRelOptUtil.java
 ---
@@ -169,4 +176,223 @@ private static boolean containIdentity(List exps,
 }
 return true;
   }
+
+  /**
+   * Copied from {@link RelOptUtil#splitJoinCondition(RelNode, RelNode, 
RexNode, List, List)}. Modified to rewrite
+   * the null equal join condition using IS NOT DISTINCT FROM operator.
+   *
+   * Splits out the equi-join components of a join condition, and returns
+   * what's left. For example, given the condition
+   *
+   * L.A = R.X AND L.B = L.C AND (L.D = 5 OR L.E =
+   * R.Y)
+   *
+   * returns
+   *
+   * 
+   * leftKeys = {A}
+   * rightKeys = {X}
+   * rest = L.B = L.C AND (L.D = 5 OR L.E = R.Y)
+   * 
+   *
+   * @param left  left input to join
+   * @param right right input to join
+   * @param condition join condition
+   * @param leftKeys  The ordinals of the fields from the left input which 
are
+   *  equi-join keys
+   * @param rightKeys The ordinals of the fields from the right input which
+   *  are equi-join keys
+   * @param joinOps List of equi-join operators (EQUALS or IS NOT DISTINCT 
FROM) used to join the left and right keys.
+   * @return remaining join filters that are not equijoins; may return a
+   * {@link RexLiteral} true, but never null
+   */
+  public static RexNode splitJoinCondition(
+  RelNode left,
+  RelNode right,
+  RexNode condition,
+  List leftKeys,
+  List rightKeys,
+  List joinOps) {
+final List nonEquiList = new ArrayList<>();
+
+splitJoinCondition(
+left.getRowType().getFieldCount(),
+condition,
+leftKeys,
+rightKeys,
+joinOps,
+nonEquiList);
+
+return RexUtil.composeConjunction(
+left.getCluster().getRexBuilder(), nonEquiList, false);
+  }
+
+  /**
+   * Copied from {@link RelOptUtil#splitJoinCondition(int, RexNode, List, 
List, List)}. Modified to rewrite the null
+   * equal join condition using IS NOT DISTINCT FROM operator.
+   */
+  private static void splitJoinCondition(
--- End diff --

Rest looks good to me.  +1. 


> Add support for Null Equality Joins
> ---
>
> Key: DRILL-4539
> URL: https://issues.apache.org/jira/browse/DRILL-4539
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Jacques Nadeau
>Assignee: Venki Korukanti
>
> Tableau frequently generates queries similar to this:
> {code}
> SELECT `t0`.`city` AS `city`,
>   `t2`.`X_measure__B` AS `max_Calculation_DFIDBHHAIIECCJFDAG_ok`,
>   `t0`.`state` AS `state`,
>   `t0`.`sum_stars_ok` AS `sum_stars_ok`
> FROM (
>   SELECT `business`.`city` AS `city`,
> `business`.`state` AS `state`,
> SUM(`business`.`stars`) AS `sum_stars_ok`
>   FROM `mongo.academic`.`business` `business`
>   GROUP BY `business`.`city`,
> `business`.`state`
> ) `t0`
>   INNER JOIN (
>   SELECT MAX(`t1`.`X_measure__A`) AS `X_measure__B`,
> `t1`.`city` AS `city`,
> `t1`.`state` AS `state`
>   FROM (
> SELECT `business`.`city` AS `city`,
>   `business`.`state` AS `state`,
>   `business`.`business_id` AS `business_id`,
>   SUM(`business`.`stars`) AS `X_measure__A`
> FROM `mongo.academic`.`business` `business`
> GROUP BY `business`.`city`,
>   `business`.`state`,
>   `business`.`business_id`
>   ) `t1`
>   GROUP BY `t1`.`city`,
> `t1`.`state`
> ) `t2` ON (((`t0`.`city` = `t2`.`city`) OR ((`t0`.`city` IS NULL) AND 
> (`t2`.`city` IS NULL))) AND ((`t0`.`state` = `t2`.`state`) OR ((`t0`.`state` 
> IS NULL) AND (`t2`.`state` IS NULL
> {code}
> If you look at the join condition, you'll note that the join condition is an 
> equality condition which also allows null=null. We should add a planning 
> rewrite rule and execution join option to allow null equality so that we 
> don't treat this as a cartesian join.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4539) Add support for Null Equality Joins

2016-04-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228889#comment-15228889
 ] 

ASF GitHub Bot commented on DRILL-4539:
---

Github user amansinha100 commented on a diff in the pull request:

https://github.com/apache/drill/pull/462#discussion_r58761755
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/DrillRelOptUtil.java
 ---
@@ -169,4 +176,223 @@ private static boolean containIdentity(List exps,
 }
 return true;
   }
+
+  /**
+   * Copied from {@link RelOptUtil#splitJoinCondition(RelNode, RelNode, 
RexNode, List, List)}. Modified to rewrite
+   * the null equal join condition using IS NOT DISTINCT FROM operator.
+   *
+   * Splits out the equi-join components of a join condition, and returns
+   * what's left. For example, given the condition
+   *
+   * L.A = R.X AND L.B = L.C AND (L.D = 5 OR L.E =
+   * R.Y)
+   *
+   * returns
+   *
+   * 
+   * leftKeys = {A}
+   * rightKeys = {X}
+   * rest = L.B = L.C AND (L.D = 5 OR L.E = R.Y)
+   * 
+   *
+   * @param left  left input to join
+   * @param right right input to join
+   * @param condition join condition
+   * @param leftKeys  The ordinals of the fields from the left input which 
are
+   *  equi-join keys
+   * @param rightKeys The ordinals of the fields from the right input which
+   *  are equi-join keys
+   * @param joinOps List of equi-join operators (EQUALS or IS NOT DISTINCT 
FROM) used to join the left and right keys.
+   * @return remaining join filters that are not equijoins; may return a
+   * {@link RexLiteral} true, but never null
+   */
+  public static RexNode splitJoinCondition(
+  RelNode left,
+  RelNode right,
+  RexNode condition,
+  List leftKeys,
+  List rightKeys,
+  List joinOps) {
+final List nonEquiList = new ArrayList<>();
+
+splitJoinCondition(
+left.getRowType().getFieldCount(),
+condition,
+leftKeys,
+rightKeys,
+joinOps,
+nonEquiList);
+
+return RexUtil.composeConjunction(
+left.getCluster().getRexBuilder(), nonEquiList, false);
+  }
+
+  /**
+   * Copied from {@link RelOptUtil#splitJoinCondition(int, RexNode, List, 
List, List)}. Modified to rewrite the null
+   * equal join condition using IS NOT DISTINCT FROM operator.
+   */
+  private static void splitJoinCondition(
--- End diff --

Can you confirm if this rewrite does *not* do the conversion if the join 
condition happens to involve columns coming from not just 2 tables but from 3  
tables ? e.g if the user accidentally gives: 

SELECT * FROM t1, t2, t3 WHERE t1.a = t2.a  OR (t1.b is null and t3.b is 
null)


> Add support for Null Equality Joins
> ---
>
> Key: DRILL-4539
> URL: https://issues.apache.org/jira/browse/DRILL-4539
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Jacques Nadeau
>Assignee: Venki Korukanti
>
> Tableau frequently generates queries similar to this:
> {code}
> SELECT `t0`.`city` AS `city`,
>   `t2`.`X_measure__B` AS `max_Calculation_DFIDBHHAIIECCJFDAG_ok`,
>   `t0`.`state` AS `state`,
>   `t0`.`sum_stars_ok` AS `sum_stars_ok`
> FROM (
>   SELECT `business`.`city` AS `city`,
> `business`.`state` AS `state`,
> SUM(`business`.`stars`) AS `sum_stars_ok`
>   FROM `mongo.academic`.`business` `business`
>   GROUP BY `business`.`city`,
> `business`.`state`
> ) `t0`
>   INNER JOIN (
>   SELECT MAX(`t1`.`X_measure__A`) AS `X_measure__B`,
> `t1`.`city` AS `city`,
> `t1`.`state` AS `state`
>   FROM (
> SELECT `business`.`city` AS `city`,
>   `business`.`state` AS `state`,
>   `business`.`business_id` AS `business_id`,
>   SUM(`business`.`stars`) AS `X_measure__A`
> FROM `mongo.academic`.`business` `business`
> GROUP BY `business`.`city`,
>   `business`.`state`,
>   `business`.`business_id`
>   ) `t1`
>   GROUP BY `t1`.`city`,
> `t1`.`state`
> ) `t2` ON (((`t0`.`city` = `t2`.`city`) OR ((`t0`.`city` IS NULL) AND 
> (`t2`.`city` IS NULL))) AND ((`t0`.`state` = `t2`.`state`) OR ((`t0`.`state` 
> IS NULL) AND (`t2`.`state` IS NULL
> {code}
> If you look at the join condition, you'll note that the join condition is an 
> equality condition which also allows null=null. We should add a planning 
> rewrite rule and execution join option to allow null equality so that we 
> don't treat this as a cartesian join.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4539) Add support for Null Equality Joins

2016-04-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228838#comment-15228838
 ] 

ASF GitHub Bot commented on DRILL-4539:
---

Github user vkorukanti commented on a diff in the pull request:

https://github.com/apache/drill/pull/462#discussion_r58757405
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/TestJoinNullable.java ---
@@ -407,11 +342,94 @@ public void 
testMergeLOJNullableBothInputsOrderedDescNullsLastVsAscNullsLast() t
 + " ORDER BY 1 ASC NULLS LAST  ) t2 "
 + "USING ( key )",
 TEST_RES_PATH, TEST_RES_PATH);
-final int expectedRecordCount = 6;
+testHelper(query, 6, false, true);
+  }
+
+  @Test
+  public void withDistinctFromJoinConditionHashJoin() throws Exception {
+final String query = "SELECT * FROM " +
+"cp.`jsonInput/nullableOrdered1.json` t1 JOIN " +
+"cp.`jsonInput/nullableOrdered2.json` t2 " +
+"ON t1.key IS NOT DISTINCT FROM t2.key AND t1.data is NOT 
null";
+nullEqualJoinHelper(query);
+  }
+
+  @Test
+  public void withDistinctFromJoinConditionMergeJoin() throws Exception {
+try {
+  test("alter session set `planner.enable_hashjoin` = false");
+  final String query = "SELECT * FROM " +
+  "cp.`jsonInput/nullableOrdered1.json` t1 JOIN " +
+  "cp.`jsonInput/nullableOrdered2.json` t2 " +
+  "ON t1.key IS NOT DISTINCT FROM t2.key";
+  nullEqualJoinHelper(query);
+} finally {
+  test("alter session set `planner.enable_hashjoin` = true");
+}
+  }
+
+  @Test
+  public void withNullEqualHashJoin() throws Exception {
+final String query = "SELECT * FROM " +
+"cp.`jsonInput/nullableOrdered1.json` t1 JOIN " +
+"cp.`jsonInput/nullableOrdered2.json` t2 " +
+"ON t1.key = t2.key OR (t1.key IS NULL AND t2.key IS NULL)";
+nullEqualJoinHelper(query);
+  }
 
-enableJoin(false, true);
-final int actualRecordCount = testSql(query);
-assertEquals("Number of output rows", expectedRecordCount, 
actualRecordCount);
+  @Test
+  public void withNullEqualMergeJoin() throws Exception {
+try {
+  test("alter session set `planner.enable_hashjoin` = false");
+  final String query = "SELECT * FROM " +
+  "cp.`jsonInput/nullableOrdered1.json` t1 JOIN " +
+  "cp.`jsonInput/nullableOrdered2.json` t2 " +
+  "ON t1.key = t2.key OR (t1.key IS NULL AND t2.key IS NULL)";
+  nullEqualJoinHelper(query);
+} finally {
+  test("alter session set `planner.enable_hashjoin` = true");
+}
+  }
+
+  public void nullEqualJoinHelper(final String query) throws Exception {
+testBuilder()
+.sqlQuery(query)
+.unOrdered()
+.baselineColumns("key", "data", "data0", "key0")
+.baselineValues(null, "L_null_1", "R_null_1", null)
+.baselineValues(null, "L_null_2", "R_null_1", null)
+.baselineValues("A", "L_A_1", "R_A_1", "A")
+.baselineValues("A", "L_A_2", "R_A_1", "A")
+.baselineValues(null, "L_null_1", "R_null_2", null)
+.baselineValues(null, "L_null_2", "R_null_2", null)
+.baselineValues(null, "L_null_1", "R_null_3", null)
+.baselineValues(null, "L_null_2", "R_null_3", null)
+.go();
   }
 
+  @Test
+  public void withNullEqualAdditionFilter() throws Exception {
--- End diff --

Sure. I will update the patch with new tests.


> Add support for Null Equality Joins
> ---
>
> Key: DRILL-4539
> URL: https://issues.apache.org/jira/browse/DRILL-4539
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Jacques Nadeau
>Assignee: Venki Korukanti
>
> Tableau frequently generates queries similar to this:
> {code}
> SELECT `t0`.`city` AS `city`,
>   `t2`.`X_measure__B` AS `max_Calculation_DFIDBHHAIIECCJFDAG_ok`,
>   `t0`.`state` AS `state`,
>   `t0`.`sum_stars_ok` AS `sum_stars_ok`
> FROM (
>   SELECT `business`.`city` AS `city`,
> `business`.`state` AS `state`,
> SUM(`business`.`stars`) AS `sum_stars_ok`
>   FROM `mongo.academic`.`business` `business`
>   GROUP BY `business`.`city`,
> `business`.`state`
> ) `t0`
>   INNER JOIN (
>   SELECT MAX(`t1`.`X_measure__A`) AS `X_measure__B`,
> `t1`.`city` AS `city`,
> `t1`.`state` AS `state`
>   FROM (
> SELECT `business`.`city` AS `city`,
>   `business`.`state` AS `state`,
>   `business`.`business_id` AS `business_id`,
>   SUM(`business`.`stars`) AS `X_measure__A`
> 

[jira] [Commented] (DRILL-4539) Add support for Null Equality Joins

2016-04-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228833#comment-15228833
 ] 

ASF GitHub Bot commented on DRILL-4539:
---

Github user amansinha100 commented on a diff in the pull request:

https://github.com/apache/drill/pull/462#discussion_r58757060
  
--- Diff: 
exec/java-exec/src/test/java/org/apache/drill/TestJoinNullable.java ---
@@ -407,11 +342,94 @@ public void 
testMergeLOJNullableBothInputsOrderedDescNullsLastVsAscNullsLast() t
 + " ORDER BY 1 ASC NULLS LAST  ) t2 "
 + "USING ( key )",
 TEST_RES_PATH, TEST_RES_PATH);
-final int expectedRecordCount = 6;
+testHelper(query, 6, false, true);
+  }
+
+  @Test
+  public void withDistinctFromJoinConditionHashJoin() throws Exception {
+final String query = "SELECT * FROM " +
+"cp.`jsonInput/nullableOrdered1.json` t1 JOIN " +
+"cp.`jsonInput/nullableOrdered2.json` t2 " +
+"ON t1.key IS NOT DISTINCT FROM t2.key AND t1.data is NOT 
null";
+nullEqualJoinHelper(query);
+  }
+
+  @Test
+  public void withDistinctFromJoinConditionMergeJoin() throws Exception {
+try {
+  test("alter session set `planner.enable_hashjoin` = false");
+  final String query = "SELECT * FROM " +
+  "cp.`jsonInput/nullableOrdered1.json` t1 JOIN " +
+  "cp.`jsonInput/nullableOrdered2.json` t2 " +
+  "ON t1.key IS NOT DISTINCT FROM t2.key";
+  nullEqualJoinHelper(query);
+} finally {
+  test("alter session set `planner.enable_hashjoin` = true");
+}
+  }
+
+  @Test
+  public void withNullEqualHashJoin() throws Exception {
+final String query = "SELECT * FROM " +
+"cp.`jsonInput/nullableOrdered1.json` t1 JOIN " +
+"cp.`jsonInput/nullableOrdered2.json` t2 " +
+"ON t1.key = t2.key OR (t1.key IS NULL AND t2.key IS NULL)";
+nullEqualJoinHelper(query);
+  }
 
-enableJoin(false, true);
-final int actualRecordCount = testSql(query);
-assertEquals("Number of output rows", expectedRecordCount, 
actualRecordCount);
+  @Test
+  public void withNullEqualMergeJoin() throws Exception {
+try {
+  test("alter session set `planner.enable_hashjoin` = false");
+  final String query = "SELECT * FROM " +
+  "cp.`jsonInput/nullableOrdered1.json` t1 JOIN " +
+  "cp.`jsonInput/nullableOrdered2.json` t2 " +
+  "ON t1.key = t2.key OR (t1.key IS NULL AND t2.key IS NULL)";
+  nullEqualJoinHelper(query);
+} finally {
+  test("alter session set `planner.enable_hashjoin` = true");
+}
+  }
+
+  public void nullEqualJoinHelper(final String query) throws Exception {
+testBuilder()
+.sqlQuery(query)
+.unOrdered()
+.baselineColumns("key", "data", "data0", "key0")
+.baselineValues(null, "L_null_1", "R_null_1", null)
+.baselineValues(null, "L_null_2", "R_null_1", null)
+.baselineValues("A", "L_A_1", "R_A_1", "A")
+.baselineValues("A", "L_A_2", "R_A_1", "A")
+.baselineValues(null, "L_null_1", "R_null_2", null)
+.baselineValues(null, "L_null_2", "R_null_2", null)
+.baselineValues(null, "L_null_1", "R_null_3", null)
+.baselineValues(null, "L_null_2", "R_null_3", null)
+.go();
   }
 
+  @Test
+  public void withNullEqualAdditionFilter() throws Exception {
--- End diff --

Could you also do similar test with the join condition in the WHERE clause 
instead of ON clause ?  i.e something like:  SELECT * FROM t1, t2 WHERE t1.a = 
t2.a OR (t1.a is null and t2.a is null)
For such cases, Calcite filter pushdown into join needs to be applied 
first. 


> Add support for Null Equality Joins
> ---
>
> Key: DRILL-4539
> URL: https://issues.apache.org/jira/browse/DRILL-4539
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Jacques Nadeau
>Assignee: Venki Korukanti
>
> Tableau frequently generates queries similar to this:
> {code}
> SELECT `t0`.`city` AS `city`,
>   `t2`.`X_measure__B` AS `max_Calculation_DFIDBHHAIIECCJFDAG_ok`,
>   `t0`.`state` AS `state`,
>   `t0`.`sum_stars_ok` AS `sum_stars_ok`
> FROM (
>   SELECT `business`.`city` AS `city`,
> `business`.`state` AS `state`,
> SUM(`business`.`stars`) AS `sum_stars_ok`
>   FROM `mongo.academic`.`business` `business`
>   GROUP BY `business`.`city`,
> `business`.`state`
> ) `t0`
>   INNER JOIN (
>   SELECT MAX(`t1`.`X_measure__A`) AS `X_measure__B`,
> `t1`.`city` 

[jira] [Commented] (DRILL-4539) Add support for Null Equality Joins

2016-04-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228757#comment-15228757
 ] 

ASF GitHub Bot commented on DRILL-4539:
---

Github user vkorukanti commented on a diff in the pull request:

https://github.com/apache/drill/pull/462#discussion_r58751970
  
--- Diff: 
exec/java-exec/src/main/codegen/templates/ComparisonFunctions.java ---
@@ -215,6 +192,36 @@ public void eval() {
 }
   }
 
+  <#-- IS_DISTINCT_FROM function -->
+  @FunctionTemplate(names = {"is_distinct_from", "is distinct from" },
--- End diff --

I added tests for each category of template code path (primitive type, 
decimal type and interval type) in TestIsDistinctFromFunctions.java


> Add support for Null Equality Joins
> ---
>
> Key: DRILL-4539
> URL: https://issues.apache.org/jira/browse/DRILL-4539
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Jacques Nadeau
>Assignee: Venki Korukanti
>
> Tableau frequently generates queries similar to this:
> {code}
> SELECT `t0`.`city` AS `city`,
>   `t2`.`X_measure__B` AS `max_Calculation_DFIDBHHAIIECCJFDAG_ok`,
>   `t0`.`state` AS `state`,
>   `t0`.`sum_stars_ok` AS `sum_stars_ok`
> FROM (
>   SELECT `business`.`city` AS `city`,
> `business`.`state` AS `state`,
> SUM(`business`.`stars`) AS `sum_stars_ok`
>   FROM `mongo.academic`.`business` `business`
>   GROUP BY `business`.`city`,
> `business`.`state`
> ) `t0`
>   INNER JOIN (
>   SELECT MAX(`t1`.`X_measure__A`) AS `X_measure__B`,
> `t1`.`city` AS `city`,
> `t1`.`state` AS `state`
>   FROM (
> SELECT `business`.`city` AS `city`,
>   `business`.`state` AS `state`,
>   `business`.`business_id` AS `business_id`,
>   SUM(`business`.`stars`) AS `X_measure__A`
> FROM `mongo.academic`.`business` `business`
> GROUP BY `business`.`city`,
>   `business`.`state`,
>   `business`.`business_id`
>   ) `t1`
>   GROUP BY `t1`.`city`,
> `t1`.`state`
> ) `t2` ON (((`t0`.`city` = `t2`.`city`) OR ((`t0`.`city` IS NULL) AND 
> (`t2`.`city` IS NULL))) AND ((`t0`.`state` = `t2`.`state`) OR ((`t0`.`state` 
> IS NULL) AND (`t2`.`state` IS NULL
> {code}
> If you look at the join condition, you'll note that the join condition is an 
> equality condition which also allows null=null. We should add a planning 
> rewrite rule and execution join option to allow null equality so that we 
> don't treat this as a cartesian join.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4539) Add support for Null Equality Joins

2016-04-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228746#comment-15228746
 ] 

ASF GitHub Bot commented on DRILL-4539:
---

Github user amansinha100 commented on a diff in the pull request:

https://github.com/apache/drill/pull/462#discussion_r58751332
  
--- Diff: 
exec/java-exec/src/main/codegen/templates/ComparisonFunctions.java ---
@@ -215,6 +192,36 @@ public void eval() {
 }
   }
 
+  <#-- IS_DISTINCT_FROM function -->
+  @FunctionTemplate(names = {"is_distinct_from", "is distinct from" },
--- End diff --

I am not opposed to having a native implementation of IS [NOT] DISTINCT 
FROM...clearly the generated code is more compact; however adding this new 
functions means we would need proper functional test coverage for various data 
types.  Any thoughts regarding that ? 


> Add support for Null Equality Joins
> ---
>
> Key: DRILL-4539
> URL: https://issues.apache.org/jira/browse/DRILL-4539
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Jacques Nadeau
>Assignee: Venki Korukanti
>
> Tableau frequently generates queries similar to this:
> {code}
> SELECT `t0`.`city` AS `city`,
>   `t2`.`X_measure__B` AS `max_Calculation_DFIDBHHAIIECCJFDAG_ok`,
>   `t0`.`state` AS `state`,
>   `t0`.`sum_stars_ok` AS `sum_stars_ok`
> FROM (
>   SELECT `business`.`city` AS `city`,
> `business`.`state` AS `state`,
> SUM(`business`.`stars`) AS `sum_stars_ok`
>   FROM `mongo.academic`.`business` `business`
>   GROUP BY `business`.`city`,
> `business`.`state`
> ) `t0`
>   INNER JOIN (
>   SELECT MAX(`t1`.`X_measure__A`) AS `X_measure__B`,
> `t1`.`city` AS `city`,
> `t1`.`state` AS `state`
>   FROM (
> SELECT `business`.`city` AS `city`,
>   `business`.`state` AS `state`,
>   `business`.`business_id` AS `business_id`,
>   SUM(`business`.`stars`) AS `X_measure__A`
> FROM `mongo.academic`.`business` `business`
> GROUP BY `business`.`city`,
>   `business`.`state`,
>   `business`.`business_id`
>   ) `t1`
>   GROUP BY `t1`.`city`,
> `t1`.`state`
> ) `t2` ON (((`t0`.`city` = `t2`.`city`) OR ((`t0`.`city` IS NULL) AND 
> (`t2`.`city` IS NULL))) AND ((`t0`.`state` = `t2`.`state`) OR ((`t0`.`state` 
> IS NULL) AND (`t2`.`state` IS NULL
> {code}
> If you look at the join condition, you'll note that the join condition is an 
> equality condition which also allows null=null. We should add a planning 
> rewrite rule and execution join option to allow null equality so that we 
> don't treat this as a cartesian join.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4539) Add support for Null Equality Joins

2016-04-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228651#comment-15228651
 ] 

ASF GitHub Bot commented on DRILL-4539:
---

Github user vkorukanti commented on a diff in the pull request:

https://github.com/apache/drill/pull/462#discussion_r58742916
  
--- Diff: 
exec/java-exec/src/main/codegen/templates/ComparisonFunctions.java ---
@@ -215,6 +192,36 @@ public void eval() {
 }
   }
 
+  <#-- IS_DISTINCT_FROM function -->
+  @FunctionTemplate(names = {"is_distinct_from", "is distinct from" },
--- End diff --

If adding new functions is a concern, I can make the 
```RelOptUtil#splitJoinCondition``` to identify rewritten ```IS NOT DISTINCT 
FROM``` functions also.


> Add support for Null Equality Joins
> ---
>
> Key: DRILL-4539
> URL: https://issues.apache.org/jira/browse/DRILL-4539
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Jacques Nadeau
>Assignee: Venki Korukanti
>
> Tableau frequently generates queries similar to this:
> {code}
> SELECT `t0`.`city` AS `city`,
>   `t2`.`X_measure__B` AS `max_Calculation_DFIDBHHAIIECCJFDAG_ok`,
>   `t0`.`state` AS `state`,
>   `t0`.`sum_stars_ok` AS `sum_stars_ok`
> FROM (
>   SELECT `business`.`city` AS `city`,
> `business`.`state` AS `state`,
> SUM(`business`.`stars`) AS `sum_stars_ok`
>   FROM `mongo.academic`.`business` `business`
>   GROUP BY `business`.`city`,
> `business`.`state`
> ) `t0`
>   INNER JOIN (
>   SELECT MAX(`t1`.`X_measure__A`) AS `X_measure__B`,
> `t1`.`city` AS `city`,
> `t1`.`state` AS `state`
>   FROM (
> SELECT `business`.`city` AS `city`,
>   `business`.`state` AS `state`,
>   `business`.`business_id` AS `business_id`,
>   SUM(`business`.`stars`) AS `X_measure__A`
> FROM `mongo.academic`.`business` `business`
> GROUP BY `business`.`city`,
>   `business`.`state`,
>   `business`.`business_id`
>   ) `t1`
>   GROUP BY `t1`.`city`,
> `t1`.`state`
> ) `t2` ON (((`t0`.`city` = `t2`.`city`) OR ((`t0`.`city` IS NULL) AND 
> (`t2`.`city` IS NULL))) AND ((`t0`.`state` = `t2`.`state`) OR ((`t0`.`state` 
> IS NULL) AND (`t2`.`state` IS NULL
> {code}
> If you look at the join condition, you'll note that the join condition is an 
> equality condition which also allows null=null. We should add a planning 
> rewrite rule and execution join option to allow null equality so that we 
> don't treat this as a cartesian join.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4539) Add support for Null Equality Joins

2016-04-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228646#comment-15228646
 ] 

ASF GitHub Bot commented on DRILL-4539:
---

Github user vkorukanti commented on a diff in the pull request:

https://github.com/apache/drill/pull/462#discussion_r58742546
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/DrillRelOptUtil.java
 ---
@@ -169,4 +176,223 @@ private static boolean containIdentity(List exps,
 }
 return true;
   }
+
+  /**
+   * Copied from {@link RelOptUtil#splitJoinCondition(RelNode, RelNode, 
RexNode, List, List)}. Modified to rewrite
--- End diff --

I will followup with a JIRA on Calcite project to see if we can push this 
change to Calcite. 

The function ```RelOptUtil#splitJoinCondition``` in the current form itself 
seems to have a problem/limitation. Currently it just returns the left and 
right join key indices, but doesn't return whether the condition is ```EQUAL``` 
or ```IS NOT DISTINCT FROM``` (it adds the key pair if they have either of 
these function in comparison). 


> Add support for Null Equality Joins
> ---
>
> Key: DRILL-4539
> URL: https://issues.apache.org/jira/browse/DRILL-4539
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Jacques Nadeau
>Assignee: Venki Korukanti
>
> Tableau frequently generates queries similar to this:
> {code}
> SELECT `t0`.`city` AS `city`,
>   `t2`.`X_measure__B` AS `max_Calculation_DFIDBHHAIIECCJFDAG_ok`,
>   `t0`.`state` AS `state`,
>   `t0`.`sum_stars_ok` AS `sum_stars_ok`
> FROM (
>   SELECT `business`.`city` AS `city`,
> `business`.`state` AS `state`,
> SUM(`business`.`stars`) AS `sum_stars_ok`
>   FROM `mongo.academic`.`business` `business`
>   GROUP BY `business`.`city`,
> `business`.`state`
> ) `t0`
>   INNER JOIN (
>   SELECT MAX(`t1`.`X_measure__A`) AS `X_measure__B`,
> `t1`.`city` AS `city`,
> `t1`.`state` AS `state`
>   FROM (
> SELECT `business`.`city` AS `city`,
>   `business`.`state` AS `state`,
>   `business`.`business_id` AS `business_id`,
>   SUM(`business`.`stars`) AS `X_measure__A`
> FROM `mongo.academic`.`business` `business`
> GROUP BY `business`.`city`,
>   `business`.`state`,
>   `business`.`business_id`
>   ) `t1`
>   GROUP BY `t1`.`city`,
> `t1`.`state`
> ) `t2` ON (((`t0`.`city` = `t2`.`city`) OR ((`t0`.`city` IS NULL) AND 
> (`t2`.`city` IS NULL))) AND ((`t0`.`state` = `t2`.`state`) OR ((`t0`.`state` 
> IS NULL) AND (`t2`.`state` IS NULL
> {code}
> If you look at the join condition, you'll note that the join condition is an 
> equality condition which also allows null=null. We should add a planning 
> rewrite rule and execution join option to allow null equality so that we 
> don't treat this as a cartesian join.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4539) Add support for Null Equality Joins

2016-04-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228628#comment-15228628
 ] 

ASF GitHub Bot commented on DRILL-4539:
---

Github user vkorukanti commented on a diff in the pull request:

https://github.com/apache/drill/pull/462#discussion_r58741416
  
--- Diff: 
exec/java-exec/src/main/codegen/templates/ComparisonFunctions.java ---
@@ -215,6 +192,36 @@ public void eval() {
 }
   }
 
+  <#-- IS_DISTINCT_FROM function -->
+  @FunctionTemplate(names = {"is_distinct_from", "is distinct from" },
--- End diff --

I am not sure if there is way to differentiate between the function in join 
condition vs. function in project expr. I don't see any context info in 
DrillConvertletTable.get() method call. Also the generated code in rewritten 
case is too much. For following query:
```SELECT INT_col is not distinct from BIGINT_col as col, 
int_distinct_result FROM cp.`functions/distinct_from.json```

Without rewrite: 
https://gist.github.com/vkorukanti/e981058f985ed24e6c4ef6b47d670e0f
With rewrite: 
https://gist.github.com/vkorukanti/d80aa2ba40c65c9215c38ed18b20a685

Sizes may differ after scalar replacement is done, but it is still too much 
code to simple ```is not distinct from``` function. 




> Add support for Null Equality Joins
> ---
>
> Key: DRILL-4539
> URL: https://issues.apache.org/jira/browse/DRILL-4539
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Jacques Nadeau
>Assignee: Venki Korukanti
>
> Tableau frequently generates queries similar to this:
> {code}
> SELECT `t0`.`city` AS `city`,
>   `t2`.`X_measure__B` AS `max_Calculation_DFIDBHHAIIECCJFDAG_ok`,
>   `t0`.`state` AS `state`,
>   `t0`.`sum_stars_ok` AS `sum_stars_ok`
> FROM (
>   SELECT `business`.`city` AS `city`,
> `business`.`state` AS `state`,
> SUM(`business`.`stars`) AS `sum_stars_ok`
>   FROM `mongo.academic`.`business` `business`
>   GROUP BY `business`.`city`,
> `business`.`state`
> ) `t0`
>   INNER JOIN (
>   SELECT MAX(`t1`.`X_measure__A`) AS `X_measure__B`,
> `t1`.`city` AS `city`,
> `t1`.`state` AS `state`
>   FROM (
> SELECT `business`.`city` AS `city`,
>   `business`.`state` AS `state`,
>   `business`.`business_id` AS `business_id`,
>   SUM(`business`.`stars`) AS `X_measure__A`
> FROM `mongo.academic`.`business` `business`
> GROUP BY `business`.`city`,
>   `business`.`state`,
>   `business`.`business_id`
>   ) `t1`
>   GROUP BY `t1`.`city`,
> `t1`.`state`
> ) `t2` ON (((`t0`.`city` = `t2`.`city`) OR ((`t0`.`city` IS NULL) AND 
> (`t2`.`city` IS NULL))) AND ((`t0`.`state` = `t2`.`state`) OR ((`t0`.`state` 
> IS NULL) AND (`t2`.`state` IS NULL
> {code}
> If you look at the join condition, you'll note that the join condition is an 
> equality condition which also allows null=null. We should add a planning 
> rewrite rule and execution join option to allow null equality so that we 
> don't treat this as a cartesian join.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4539) Add support for Null Equality Joins

2016-04-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228503#comment-15228503
 ] 

ASF GitHub Bot commented on DRILL-4539:
---

Github user amansinha100 commented on a diff in the pull request:

https://github.com/apache/drill/pull/462#discussion_r58730151
  
--- Diff: 
exec/java-exec/src/main/codegen/templates/ComparisonFunctions.java ---
@@ -215,6 +192,36 @@ public void eval() {
 }
   }
 
+  <#-- IS_DISTINCT_FROM function -->
+  @FunctionTemplate(names = {"is_distinct_from", "is distinct from" },
--- End diff --

@vkorukanti, I want to clarify...if the query only had a join condition 
with IS_NOT_DISTINCT_FROM, I would think it should work just with your 
convertlet changes, since both HashJoin and MergeJoin handle this type of join 
condition.  Is the reason you had to implement the full comparator codegen to 
handle more general types of comparisons ?  e.g in the SELECT list if I say  
'SELECT  a IS NOT DISTINCT FROM b'  ?Suppose we had a convertlet that only 
preserved the IS (NOT) DISTINCT FROM join condition, and defaulted to the 
Calcite rewrite using the CASE expression, then we would not have to implement 
the full comparator. 


> Add support for Null Equality Joins
> ---
>
> Key: DRILL-4539
> URL: https://issues.apache.org/jira/browse/DRILL-4539
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Jacques Nadeau
>Assignee: Venki Korukanti
>
> Tableau frequently generates queries similar to this:
> {code}
> SELECT `t0`.`city` AS `city`,
>   `t2`.`X_measure__B` AS `max_Calculation_DFIDBHHAIIECCJFDAG_ok`,
>   `t0`.`state` AS `state`,
>   `t0`.`sum_stars_ok` AS `sum_stars_ok`
> FROM (
>   SELECT `business`.`city` AS `city`,
> `business`.`state` AS `state`,
> SUM(`business`.`stars`) AS `sum_stars_ok`
>   FROM `mongo.academic`.`business` `business`
>   GROUP BY `business`.`city`,
> `business`.`state`
> ) `t0`
>   INNER JOIN (
>   SELECT MAX(`t1`.`X_measure__A`) AS `X_measure__B`,
> `t1`.`city` AS `city`,
> `t1`.`state` AS `state`
>   FROM (
> SELECT `business`.`city` AS `city`,
>   `business`.`state` AS `state`,
>   `business`.`business_id` AS `business_id`,
>   SUM(`business`.`stars`) AS `X_measure__A`
> FROM `mongo.academic`.`business` `business`
> GROUP BY `business`.`city`,
>   `business`.`state`,
>   `business`.`business_id`
>   ) `t1`
>   GROUP BY `t1`.`city`,
> `t1`.`state`
> ) `t2` ON (((`t0`.`city` = `t2`.`city`) OR ((`t0`.`city` IS NULL) AND 
> (`t2`.`city` IS NULL))) AND ((`t0`.`state` = `t2`.`state`) OR ((`t0`.`state` 
> IS NULL) AND (`t2`.`state` IS NULL
> {code}
> If you look at the join condition, you'll note that the join condition is an 
> equality condition which also allows null=null. We should add a planning 
> rewrite rule and execution join option to allow null equality so that we 
> don't treat this as a cartesian join.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4539) Add support for Null Equality Joins

2016-04-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228484#comment-15228484
 ] 

ASF GitHub Bot commented on DRILL-4539:
---

Github user amansinha100 commented on a diff in the pull request:

https://github.com/apache/drill/pull/462#discussion_r58728331
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/DrillRelOptUtil.java
 ---
@@ -169,4 +176,223 @@ private static boolean containIdentity(List exps,
 }
 return true;
   }
+
+  /**
+   * Copied from {@link RelOptUtil#splitJoinCondition(RelNode, RelNode, 
RexNode, List, List)}. Modified to rewrite
--- End diff --

Agree that we ideally should leverage the Calcite code..especially since 
this method is pretty heavily used and modified periodically so keeping Drill's 
version of this method in sync will be difficult. 


> Add support for Null Equality Joins
> ---
>
> Key: DRILL-4539
> URL: https://issues.apache.org/jira/browse/DRILL-4539
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Jacques Nadeau
>Assignee: Venki Korukanti
>
> Tableau frequently generates queries similar to this:
> {code}
> SELECT `t0`.`city` AS `city`,
>   `t2`.`X_measure__B` AS `max_Calculation_DFIDBHHAIIECCJFDAG_ok`,
>   `t0`.`state` AS `state`,
>   `t0`.`sum_stars_ok` AS `sum_stars_ok`
> FROM (
>   SELECT `business`.`city` AS `city`,
> `business`.`state` AS `state`,
> SUM(`business`.`stars`) AS `sum_stars_ok`
>   FROM `mongo.academic`.`business` `business`
>   GROUP BY `business`.`city`,
> `business`.`state`
> ) `t0`
>   INNER JOIN (
>   SELECT MAX(`t1`.`X_measure__A`) AS `X_measure__B`,
> `t1`.`city` AS `city`,
> `t1`.`state` AS `state`
>   FROM (
> SELECT `business`.`city` AS `city`,
>   `business`.`state` AS `state`,
>   `business`.`business_id` AS `business_id`,
>   SUM(`business`.`stars`) AS `X_measure__A`
> FROM `mongo.academic`.`business` `business`
> GROUP BY `business`.`city`,
>   `business`.`state`,
>   `business`.`business_id`
>   ) `t1`
>   GROUP BY `t1`.`city`,
> `t1`.`state`
> ) `t2` ON (((`t0`.`city` = `t2`.`city`) OR ((`t0`.`city` IS NULL) AND 
> (`t2`.`city` IS NULL))) AND ((`t0`.`state` = `t2`.`state`) OR ((`t0`.`state` 
> IS NULL) AND (`t2`.`state` IS NULL
> {code}
> If you look at the join condition, you'll note that the join condition is an 
> equality condition which also allows null=null. We should add a planning 
> rewrite rule and execution join option to allow null equality so that we 
> don't treat this as a cartesian join.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4539) Add support for Null Equality Joins

2016-04-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15227610#comment-15227610
 ] 

ASF GitHub Bot commented on DRILL-4539:
---

Github user jacques-n commented on a diff in the pull request:

https://github.com/apache/drill/pull/462#discussion_r58645138
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/planner/common/DrillRelOptUtil.java
 ---
@@ -169,4 +176,223 @@ private static boolean containIdentity(List exps,
 }
 return true;
   }
+
+  /**
+   * Copied from {@link RelOptUtil#splitJoinCondition(RelNode, RelNode, 
RexNode, List, List)}. Modified to rewrite
--- End diff --

Is there a way to make this not copied? @julianhyde, any ideas for how to 
refactor common stuff out?


> Add support for Null Equality Joins
> ---
>
> Key: DRILL-4539
> URL: https://issues.apache.org/jira/browse/DRILL-4539
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Jacques Nadeau
>Assignee: Venki Korukanti
>
> Tableau frequently generates queries similar to this:
> {code}
> SELECT `t0`.`city` AS `city`,
>   `t2`.`X_measure__B` AS `max_Calculation_DFIDBHHAIIECCJFDAG_ok`,
>   `t0`.`state` AS `state`,
>   `t0`.`sum_stars_ok` AS `sum_stars_ok`
> FROM (
>   SELECT `business`.`city` AS `city`,
> `business`.`state` AS `state`,
> SUM(`business`.`stars`) AS `sum_stars_ok`
>   FROM `mongo.academic`.`business` `business`
>   GROUP BY `business`.`city`,
> `business`.`state`
> ) `t0`
>   INNER JOIN (
>   SELECT MAX(`t1`.`X_measure__A`) AS `X_measure__B`,
> `t1`.`city` AS `city`,
> `t1`.`state` AS `state`
>   FROM (
> SELECT `business`.`city` AS `city`,
>   `business`.`state` AS `state`,
>   `business`.`business_id` AS `business_id`,
>   SUM(`business`.`stars`) AS `X_measure__A`
> FROM `mongo.academic`.`business` `business`
> GROUP BY `business`.`city`,
>   `business`.`state`,
>   `business`.`business_id`
>   ) `t1`
>   GROUP BY `t1`.`city`,
> `t1`.`state`
> ) `t2` ON (((`t0`.`city` = `t2`.`city`) OR ((`t0`.`city` IS NULL) AND 
> (`t2`.`city` IS NULL))) AND ((`t0`.`state` = `t2`.`state`) OR ((`t0`.`state` 
> IS NULL) AND (`t2`.`state` IS NULL
> {code}
> If you look at the join condition, you'll note that the join condition is an 
> equality condition which also allows null=null. We should add a planning 
> rewrite rule and execution join option to allow null equality so that we 
> don't treat this as a cartesian join.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4539) Add support for Null Equality Joins

2016-04-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15227606#comment-15227606
 ] 

ASF GitHub Bot commented on DRILL-4539:
---

Github user jacques-n commented on the pull request:

https://github.com/apache/drill/pull/462#issuecomment-206092089
  
@amansinha100, can you take a look?


> Add support for Null Equality Joins
> ---
>
> Key: DRILL-4539
> URL: https://issues.apache.org/jira/browse/DRILL-4539
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Jacques Nadeau
>Assignee: Venki Korukanti
>
> Tableau frequently generates queries similar to this:
> {code}
> SELECT `t0`.`city` AS `city`,
>   `t2`.`X_measure__B` AS `max_Calculation_DFIDBHHAIIECCJFDAG_ok`,
>   `t0`.`state` AS `state`,
>   `t0`.`sum_stars_ok` AS `sum_stars_ok`
> FROM (
>   SELECT `business`.`city` AS `city`,
> `business`.`state` AS `state`,
> SUM(`business`.`stars`) AS `sum_stars_ok`
>   FROM `mongo.academic`.`business` `business`
>   GROUP BY `business`.`city`,
> `business`.`state`
> ) `t0`
>   INNER JOIN (
>   SELECT MAX(`t1`.`X_measure__A`) AS `X_measure__B`,
> `t1`.`city` AS `city`,
> `t1`.`state` AS `state`
>   FROM (
> SELECT `business`.`city` AS `city`,
>   `business`.`state` AS `state`,
>   `business`.`business_id` AS `business_id`,
>   SUM(`business`.`stars`) AS `X_measure__A`
> FROM `mongo.academic`.`business` `business`
> GROUP BY `business`.`city`,
>   `business`.`state`,
>   `business`.`business_id`
>   ) `t1`
>   GROUP BY `t1`.`city`,
> `t1`.`state`
> ) `t2` ON (((`t0`.`city` = `t2`.`city`) OR ((`t0`.`city` IS NULL) AND 
> (`t2`.`city` IS NULL))) AND ((`t0`.`state` = `t2`.`state`) OR ((`t0`.`state` 
> IS NULL) AND (`t2`.`state` IS NULL
> {code}
> If you look at the join condition, you'll note that the join condition is an 
> equality condition which also allows null=null. We should add a planning 
> rewrite rule and execution join option to allow null equality so that we 
> don't treat this as a cartesian join.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4539) Add support for Null Equality Joins

2016-04-05 Thread Venki Korukanti (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15226910#comment-15226910
 ] 

Venki Korukanti commented on DRILL-4539:


RelOptUtil.splitJoinCondition() is copied from Calcite and modified to:
1) All null equality conditions
2) Also output the operator (EQUALS, IS NOT DISTINCT FROM) to use between left 
and right keys. We need this info to reconstruct the join condition.

If the changes look ok, I will check with Calcite community to see if we can 
push this change to Calcite. Seems like a generic case.

> Add support for Null Equality Joins
> ---
>
> Key: DRILL-4539
> URL: https://issues.apache.org/jira/browse/DRILL-4539
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Jacques Nadeau
>Assignee: Venki Korukanti
>
> Tableau frequently generates queries similar to this:
> {code}
> SELECT `t0`.`city` AS `city`,
>   `t2`.`X_measure__B` AS `max_Calculation_DFIDBHHAIIECCJFDAG_ok`,
>   `t0`.`state` AS `state`,
>   `t0`.`sum_stars_ok` AS `sum_stars_ok`
> FROM (
>   SELECT `business`.`city` AS `city`,
> `business`.`state` AS `state`,
> SUM(`business`.`stars`) AS `sum_stars_ok`
>   FROM `mongo.academic`.`business` `business`
>   GROUP BY `business`.`city`,
> `business`.`state`
> ) `t0`
>   INNER JOIN (
>   SELECT MAX(`t1`.`X_measure__A`) AS `X_measure__B`,
> `t1`.`city` AS `city`,
> `t1`.`state` AS `state`
>   FROM (
> SELECT `business`.`city` AS `city`,
>   `business`.`state` AS `state`,
>   `business`.`business_id` AS `business_id`,
>   SUM(`business`.`stars`) AS `X_measure__A`
> FROM `mongo.academic`.`business` `business`
> GROUP BY `business`.`city`,
>   `business`.`state`,
>   `business`.`business_id`
>   ) `t1`
>   GROUP BY `t1`.`city`,
> `t1`.`state`
> ) `t2` ON (((`t0`.`city` = `t2`.`city`) OR ((`t0`.`city` IS NULL) AND 
> (`t2`.`city` IS NULL))) AND ((`t0`.`state` = `t2`.`state`) OR ((`t0`.`state` 
> IS NULL) AND (`t2`.`state` IS NULL
> {code}
> If you look at the join condition, you'll note that the join condition is an 
> equality condition which also allows null=null. We should add a planning 
> rewrite rule and execution join option to allow null equality so that we 
> don't treat this as a cartesian join.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4539) Add support for Null Equality Joins

2016-04-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15226892#comment-15226892
 ] 

ASF GitHub Bot commented on DRILL-4539:
---

GitHub user vkorukanti opened a pull request:

https://github.com/apache/drill/pull/462

DRILL-4539: Add support for Null Equality Joins

@amansinha100 @jacques-n Could you please review the patches?

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/vkorukanti/drill null_rewrite

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/462.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #462


commit 3a1db78e24fe0b956659365a1267d9aee79a2955
Author: vkorukanti 
Date:   2016-04-01T23:44:24Z

DRILL-4539 (part 1): Use custom convertlet table for IS [NOT] DISTINCT FROM 
functions

Test changes:
+ Add join (hash and merge) tests that have IS NOT DISTINCT FROM in the 
join condition
+ Refactor/fix tests in TestJoinNullable class to reset options properly at 
the end of the test.

commit d9a149007e58c4219b7d0d1d9647558e27a401b8
Author: vkorukanti 
Date:   2016-04-04T23:25:10Z

DRILL-4539 (part 2): Implement IS_DISTINCT_FROM and IS_NOT_DISTINCT_FROM 
functions

commit af40ee7095d02ec10bf500c24f04433e02ccb007
Author: vkorukanti 
Date:   2016-04-05T18:19:07Z

DRILL-4539 (part 3): Rewrite null equal join condition as IS NOT DISTINCT 
FROM




> Add support for Null Equality Joins
> ---
>
> Key: DRILL-4539
> URL: https://issues.apache.org/jira/browse/DRILL-4539
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Jacques Nadeau
>Assignee: Venki Korukanti
>
> Tableau frequently generates queries similar to this:
> {code}
> SELECT `t0`.`city` AS `city`,
>   `t2`.`X_measure__B` AS `max_Calculation_DFIDBHHAIIECCJFDAG_ok`,
>   `t0`.`state` AS `state`,
>   `t0`.`sum_stars_ok` AS `sum_stars_ok`
> FROM (
>   SELECT `business`.`city` AS `city`,
> `business`.`state` AS `state`,
> SUM(`business`.`stars`) AS `sum_stars_ok`
>   FROM `mongo.academic`.`business` `business`
>   GROUP BY `business`.`city`,
> `business`.`state`
> ) `t0`
>   INNER JOIN (
>   SELECT MAX(`t1`.`X_measure__A`) AS `X_measure__B`,
> `t1`.`city` AS `city`,
> `t1`.`state` AS `state`
>   FROM (
> SELECT `business`.`city` AS `city`,
>   `business`.`state` AS `state`,
>   `business`.`business_id` AS `business_id`,
>   SUM(`business`.`stars`) AS `X_measure__A`
> FROM `mongo.academic`.`business` `business`
> GROUP BY `business`.`city`,
>   `business`.`state`,
>   `business`.`business_id`
>   ) `t1`
>   GROUP BY `t1`.`city`,
> `t1`.`state`
> ) `t2` ON (((`t0`.`city` = `t2`.`city`) OR ((`t0`.`city` IS NULL) AND 
> (`t2`.`city` IS NULL))) AND ((`t0`.`state` = `t2`.`state`) OR ((`t0`.`state` 
> IS NULL) AND (`t2`.`state` IS NULL
> {code}
> If you look at the join condition, you'll note that the join condition is an 
> equality condition which also allows null=null. We should add a planning 
> rewrite rule and execution join option to allow null equality so that we 
> don't treat this as a cartesian join.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4539) Add support for Null Equality Joins

2016-04-04 Thread Venki Korukanti (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224702#comment-15224702
 ] 

Venki Korukanti commented on DRILL-4539:


The patch I posted is not complete yet. Currently Drill doesn't have 
{{IS_DISTINCT_FROM}} or {{IS_NOT_DISTINCT_FROM}} implementations. If we don't 
let Calcite rewrite these functions anymore, then we need to provided 
implementations in Drill. 

> Add support for Null Equality Joins
> ---
>
> Key: DRILL-4539
> URL: https://issues.apache.org/jira/browse/DRILL-4539
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Jacques Nadeau
>Assignee: Venki Korukanti
>
> Tableau frequently generates queries similar to this:
> {code}
> SELECT `t0`.`city` AS `city`,
>   `t2`.`X_measure__B` AS `max_Calculation_DFIDBHHAIIECCJFDAG_ok`,
>   `t0`.`state` AS `state`,
>   `t0`.`sum_stars_ok` AS `sum_stars_ok`
> FROM (
>   SELECT `business`.`city` AS `city`,
> `business`.`state` AS `state`,
> SUM(`business`.`stars`) AS `sum_stars_ok`
>   FROM `mongo.academic`.`business` `business`
>   GROUP BY `business`.`city`,
> `business`.`state`
> ) `t0`
>   INNER JOIN (
>   SELECT MAX(`t1`.`X_measure__A`) AS `X_measure__B`,
> `t1`.`city` AS `city`,
> `t1`.`state` AS `state`
>   FROM (
> SELECT `business`.`city` AS `city`,
>   `business`.`state` AS `state`,
>   `business`.`business_id` AS `business_id`,
>   SUM(`business`.`stars`) AS `X_measure__A`
> FROM `mongo.academic`.`business` `business`
> GROUP BY `business`.`city`,
>   `business`.`state`,
>   `business`.`business_id`
>   ) `t1`
>   GROUP BY `t1`.`city`,
> `t1`.`state`
> ) `t2` ON (((`t0`.`city` = `t2`.`city`) OR ((`t0`.`city` IS NULL) AND 
> (`t2`.`city` IS NULL))) AND ((`t0`.`state` = `t2`.`state`) OR ((`t0`.`state` 
> IS NULL) AND (`t2`.`state` IS NULL
> {code}
> If you look at the join condition, you'll note that the join condition is an 
> equality condition which also allows null=null. We should add a planning 
> rewrite rule and execution join option to allow null equality so that we 
> don't treat this as a cartesian join.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4539) Add support for Null Equality Joins

2016-04-04 Thread Venki Korukanti (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15224693#comment-15224693
 ] 

Venki Korukanti commented on DRILL-4539:


When the query {{IS_NOT_DISTINCT_FROM}} function, it is rewritten in SQL-to-Rel 
conversion phase. In DRILL-2092, AggregateExpandDistinctAggregatesRule expands 
aggregates as {{IS_NOT_DISTINCT_FROM}} function. Once rewritten, it doesn't go 
through the convertlet table conversion.

> Add support for Null Equality Joins
> ---
>
> Key: DRILL-4539
> URL: https://issues.apache.org/jira/browse/DRILL-4539
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Jacques Nadeau
>Assignee: Venki Korukanti
>
> Tableau frequently generates queries similar to this:
> {code}
> SELECT `t0`.`city` AS `city`,
>   `t2`.`X_measure__B` AS `max_Calculation_DFIDBHHAIIECCJFDAG_ok`,
>   `t0`.`state` AS `state`,
>   `t0`.`sum_stars_ok` AS `sum_stars_ok`
> FROM (
>   SELECT `business`.`city` AS `city`,
> `business`.`state` AS `state`,
> SUM(`business`.`stars`) AS `sum_stars_ok`
>   FROM `mongo.academic`.`business` `business`
>   GROUP BY `business`.`city`,
> `business`.`state`
> ) `t0`
>   INNER JOIN (
>   SELECT MAX(`t1`.`X_measure__A`) AS `X_measure__B`,
> `t1`.`city` AS `city`,
> `t1`.`state` AS `state`
>   FROM (
> SELECT `business`.`city` AS `city`,
>   `business`.`state` AS `state`,
>   `business`.`business_id` AS `business_id`,
>   SUM(`business`.`stars`) AS `X_measure__A`
> FROM `mongo.academic`.`business` `business`
> GROUP BY `business`.`city`,
>   `business`.`state`,
>   `business`.`business_id`
>   ) `t1`
>   GROUP BY `t1`.`city`,
> `t1`.`state`
> ) `t2` ON (((`t0`.`city` = `t2`.`city`) OR ((`t0`.`city` IS NULL) AND 
> (`t2`.`city` IS NULL))) AND ((`t0`.`state` = `t2`.`state`) OR ((`t0`.`state` 
> IS NULL) AND (`t2`.`state` IS NULL
> {code}
> If you look at the join condition, you'll note that the join condition is an 
> equality condition which also allows null=null. We should add a planning 
> rewrite rule and execution join option to allow null equality so that we 
> don't treat this as a cartesian join.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4539) Add support for Null Equality Joins

2016-04-01 Thread Aman Sinha (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15222726#comment-15222726
 ] 

Aman Sinha commented on DRILL-4539:
---

Interesting that Calcite's default behavior is to do the conversion of  
IS_NOT_DISTINCT_FROM to CASE if it was specified in the original query 
considering that Calcite itself internally generates the HashJoin plan with 
IS_NOT_DISTINCT_FROM join condition (see DRILL-2092 for an example).  

> Add support for Null Equality Joins
> ---
>
> Key: DRILL-4539
> URL: https://issues.apache.org/jira/browse/DRILL-4539
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Jacques Nadeau
>Assignee: Venki Korukanti
>
> Tableau frequently generates queries similar to this:
> {code}
> SELECT `t0`.`city` AS `city`,
>   `t2`.`X_measure__B` AS `max_Calculation_DFIDBHHAIIECCJFDAG_ok`,
>   `t0`.`state` AS `state`,
>   `t0`.`sum_stars_ok` AS `sum_stars_ok`
> FROM (
>   SELECT `business`.`city` AS `city`,
> `business`.`state` AS `state`,
> SUM(`business`.`stars`) AS `sum_stars_ok`
>   FROM `mongo.academic`.`business` `business`
>   GROUP BY `business`.`city`,
> `business`.`state`
> ) `t0`
>   INNER JOIN (
>   SELECT MAX(`t1`.`X_measure__A`) AS `X_measure__B`,
> `t1`.`city` AS `city`,
> `t1`.`state` AS `state`
>   FROM (
> SELECT `business`.`city` AS `city`,
>   `business`.`state` AS `state`,
>   `business`.`business_id` AS `business_id`,
>   SUM(`business`.`stars`) AS `X_measure__A`
> FROM `mongo.academic`.`business` `business`
> GROUP BY `business`.`city`,
>   `business`.`state`,
>   `business`.`business_id`
>   ) `t1`
>   GROUP BY `t1`.`city`,
> `t1`.`state`
> ) `t2` ON (((`t0`.`city` = `t2`.`city`) OR ((`t0`.`city` IS NULL) AND 
> (`t2`.`city` IS NULL))) AND ((`t0`.`state` = `t2`.`state`) OR ((`t0`.`state` 
> IS NULL) AND (`t2`.`state` IS NULL
> {code}
> If you look at the join condition, you'll note that the join condition is an 
> equality condition which also allows null=null. We should add a planning 
> rewrite rule and execution join option to allow null equality so that we 
> don't treat this as a cartesian join.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4539) Add support for Null Equality Joins

2016-04-01 Thread Venki Korukanti (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15222555#comment-15222555
 ] 

Venki Korukanti commented on DRILL-4539:


Calcite's StandardConvertletTable is converting the {{IS_NOT_DISTINCT_FROM}} 
function into a {{CASE}} expression. Overriding it with a custom convertlet 
table, makes the query execute (query that has {{IS_NOT_DISTINCT_FROM}}) for 
both merge join and hash join. Changes are 
[here|https://github.com/vkorukanti/drill/commits/DRILL-4539].

> Add support for Null Equality Joins
> ---
>
> Key: DRILL-4539
> URL: https://issues.apache.org/jira/browse/DRILL-4539
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Jacques Nadeau
>Assignee: Venki Korukanti
>
> Tableau frequently generates queries similar to this:
> {code}
> SELECT `t0`.`city` AS `city`,
>   `t2`.`X_measure__B` AS `max_Calculation_DFIDBHHAIIECCJFDAG_ok`,
>   `t0`.`state` AS `state`,
>   `t0`.`sum_stars_ok` AS `sum_stars_ok`
> FROM (
>   SELECT `business`.`city` AS `city`,
> `business`.`state` AS `state`,
> SUM(`business`.`stars`) AS `sum_stars_ok`
>   FROM `mongo.academic`.`business` `business`
>   GROUP BY `business`.`city`,
> `business`.`state`
> ) `t0`
>   INNER JOIN (
>   SELECT MAX(`t1`.`X_measure__A`) AS `X_measure__B`,
> `t1`.`city` AS `city`,
> `t1`.`state` AS `state`
>   FROM (
> SELECT `business`.`city` AS `city`,
>   `business`.`state` AS `state`,
>   `business`.`business_id` AS `business_id`,
>   SUM(`business`.`stars`) AS `X_measure__A`
> FROM `mongo.academic`.`business` `business`
> GROUP BY `business`.`city`,
>   `business`.`state`,
>   `business`.`business_id`
>   ) `t1`
>   GROUP BY `t1`.`city`,
> `t1`.`state`
> ) `t2` ON (((`t0`.`city` = `t2`.`city`) OR ((`t0`.`city` IS NULL) AND 
> (`t2`.`city` IS NULL))) AND ((`t0`.`state` = `t2`.`state`) OR ((`t0`.`state` 
> IS NULL) AND (`t2`.`state` IS NULL
> {code}
> If you look at the join condition, you'll note that the join condition is an 
> equality condition which also allows null=null. We should add a planning 
> rewrite rule and execution join option to allow null equality so that we 
> don't treat this as a cartesian join.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4539) Add support for Null Equality Joins

2016-03-25 Thread Aman Sinha (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15212645#comment-15212645
 ] 

Aman Sinha commented on DRILL-4539:
---

Actually, it looks like the 'condition' is TRUE when checkCartesianJoin() is 
run, so the join filter pushdown never happened from above the join.  I just 
realized the plan does not even have the IS NOT DISTINCT FROM filter...it is 
getting converted to the more complex condition involving CASE statement which 
did not get pushed to the join.  You may want to check why the original filter 
got changed..

> Add support for Null Equality Joins
> ---
>
> Key: DRILL-4539
> URL: https://issues.apache.org/jira/browse/DRILL-4539
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Jacques Nadeau
>Assignee: Venki Korukanti
>
> Tableau frequently generates queries similar to this:
> {code}
> SELECT `t0`.`city` AS `city`,
>   `t2`.`X_measure__B` AS `max_Calculation_DFIDBHHAIIECCJFDAG_ok`,
>   `t0`.`state` AS `state`,
>   `t0`.`sum_stars_ok` AS `sum_stars_ok`
> FROM (
>   SELECT `business`.`city` AS `city`,
> `business`.`state` AS `state`,
> SUM(`business`.`stars`) AS `sum_stars_ok`
>   FROM `mongo.academic`.`business` `business`
>   GROUP BY `business`.`city`,
> `business`.`state`
> ) `t0`
>   INNER JOIN (
>   SELECT MAX(`t1`.`X_measure__A`) AS `X_measure__B`,
> `t1`.`city` AS `city`,
> `t1`.`state` AS `state`
>   FROM (
> SELECT `business`.`city` AS `city`,
>   `business`.`state` AS `state`,
>   `business`.`business_id` AS `business_id`,
>   SUM(`business`.`stars`) AS `X_measure__A`
> FROM `mongo.academic`.`business` `business`
> GROUP BY `business`.`city`,
>   `business`.`state`,
>   `business`.`business_id`
>   ) `t1`
>   GROUP BY `t1`.`city`,
> `t1`.`state`
> ) `t2` ON (((`t0`.`city` = `t2`.`city`) OR ((`t0`.`city` IS NULL) AND 
> (`t2`.`city` IS NULL))) AND ((`t0`.`state` = `t2`.`state`) OR ((`t0`.`state` 
> IS NULL) AND (`t2`.`state` IS NULL
> {code}
> If you look at the join condition, you'll note that the join condition is an 
> equality condition which also allows null=null. We should add a planning 
> rewrite rule and execution join option to allow null equality so that we 
> don't treat this as a cartesian join.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4539) Add support for Null Equality Joins

2016-03-25 Thread Aman Sinha (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15212566#comment-15212566
 ] 

Aman Sinha commented on DRILL-4539:
---

That's odd..we should not need to do NestedLoop join for this.  Both HashJoin 
and MergeJoin support the 'is not distinct from' join condition.  For an 
example of hashjoin where this gets used, see DRILL-2092.  I think the 
JoinUtils.checkCartesianJoin() is incorrectly reporting that this is a 
non-equality join.   Some changes may be needed there based on output of 
RelOptUtil.splitJoinCondition().  

> Add support for Null Equality Joins
> ---
>
> Key: DRILL-4539
> URL: https://issues.apache.org/jira/browse/DRILL-4539
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Jacques Nadeau
>Assignee: Venki Korukanti
>
> Tableau frequently generates queries similar to this:
> {code}
> SELECT `t0`.`city` AS `city`,
>   `t2`.`X_measure__B` AS `max_Calculation_DFIDBHHAIIECCJFDAG_ok`,
>   `t0`.`state` AS `state`,
>   `t0`.`sum_stars_ok` AS `sum_stars_ok`
> FROM (
>   SELECT `business`.`city` AS `city`,
> `business`.`state` AS `state`,
> SUM(`business`.`stars`) AS `sum_stars_ok`
>   FROM `mongo.academic`.`business` `business`
>   GROUP BY `business`.`city`,
> `business`.`state`
> ) `t0`
>   INNER JOIN (
>   SELECT MAX(`t1`.`X_measure__A`) AS `X_measure__B`,
> `t1`.`city` AS `city`,
> `t1`.`state` AS `state`
>   FROM (
> SELECT `business`.`city` AS `city`,
>   `business`.`state` AS `state`,
>   `business`.`business_id` AS `business_id`,
>   SUM(`business`.`stars`) AS `X_measure__A`
> FROM `mongo.academic`.`business` `business`
> GROUP BY `business`.`city`,
>   `business`.`state`,
>   `business`.`business_id`
>   ) `t1`
>   GROUP BY `t1`.`city`,
> `t1`.`state`
> ) `t2` ON (((`t0`.`city` = `t2`.`city`) OR ((`t0`.`city` IS NULL) AND 
> (`t2`.`city` IS NULL))) AND ((`t0`.`state` = `t2`.`state`) OR ((`t0`.`state` 
> IS NULL) AND (`t2`.`state` IS NULL
> {code}
> If you look at the join condition, you'll note that the join condition is an 
> equality condition which also allows null=null. We should add a planning 
> rewrite rule and execution join option to allow null equality so that we 
> don't treat this as a cartesian join.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4539) Add support for Null Equality Joins

2016-03-25 Thread Jacques Nadeau (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15212256#comment-15212256
 ] 

Jacques Nadeau commented on DRILL-4539:
---

Quick note, that rewrite is not sufficient to correctly plan the query without 
cartesian join. To get it to run, we still have to do:

{code}
SET `planner.enable_nljoin_for_scalar_only` = false;
{code}

{code}
explain plan for select 
  t1.id, t1.name, t2.id, t2.name 
  from dfs.`/opt/data/example.json` t1
  join dfs.`/opt/data/example.json` t2
  on t1.id is not distinct from t2.id
{code}

{code}
00-00Screen
00-01  Project(id=[$0], name=[$1], id0=[$2], name0=[$3])
00-02Project(id=[$0], name=[$1], id0=[$2], name0=[$3])
00-03  SelectionVectorRemover
00-04Filter(condition=[CAST(CASE(IS NULL($0), IS NULL($2), IS 
NULL($2), IS NULL($0), =($0, $2))):BOOLEAN NOT NULL])
00-05  NestedLoopJoin(condition=[true], joinType=[inner])
00-07Project(id=[$1], name=[$0])
00-09  Scan(groupscan=[EasyGroupScan 
[selectionRoot=file:/opt/data/example.json, numFiles=1, columns=[`id`, `name`], 
files=[file:/opt/data/example.json]]])
00-06Project(id0=[$0], name0=[$1])
00-08  Project(id=[$1], name=[$0])
00-10Scan(groupscan=[EasyGroupScan 
[selectionRoot=file:/opt/data/example.json, numFiles=1, columns=[`id`, `name`], 
files=[file:/opt/data/example.json]]])
{code}

> Add support for Null Equality Joins
> ---
>
> Key: DRILL-4539
> URL: https://issues.apache.org/jira/browse/DRILL-4539
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Jacques Nadeau
>Assignee: Venki Korukanti
>
> Tableau frequently generates queries similar to this:
> {code}
> SELECT `t0`.`city` AS `city`,
>   `t2`.`X_measure__B` AS `max_Calculation_DFIDBHHAIIECCJFDAG_ok`,
>   `t0`.`state` AS `state`,
>   `t0`.`sum_stars_ok` AS `sum_stars_ok`
> FROM (
>   SELECT `business`.`city` AS `city`,
> `business`.`state` AS `state`,
> SUM(`business`.`stars`) AS `sum_stars_ok`
>   FROM `mongo.academic`.`business` `business`
>   GROUP BY `business`.`city`,
> `business`.`state`
> ) `t0`
>   INNER JOIN (
>   SELECT MAX(`t1`.`X_measure__A`) AS `X_measure__B`,
> `t1`.`city` AS `city`,
> `t1`.`state` AS `state`
>   FROM (
> SELECT `business`.`city` AS `city`,
>   `business`.`state` AS `state`,
>   `business`.`business_id` AS `business_id`,
>   SUM(`business`.`stars`) AS `X_measure__A`
> FROM `mongo.academic`.`business` `business`
> GROUP BY `business`.`city`,
>   `business`.`state`,
>   `business`.`business_id`
>   ) `t1`
>   GROUP BY `t1`.`city`,
> `t1`.`state`
> ) `t2` ON (((`t0`.`city` = `t2`.`city`) OR ((`t0`.`city` IS NULL) AND 
> (`t2`.`city` IS NULL))) AND ((`t0`.`state` = `t2`.`state`) OR ((`t0`.`state` 
> IS NULL) AND (`t2`.`state` IS NULL
> {code}
> If you look at the join condition, you'll note that the join condition is an 
> equality condition which also allows null=null. We should add a planning 
> rewrite rule and execution join option to allow null equality so that we 
> don't treat this as a cartesian join.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4539) Add support for Null Equality Joins

2016-03-25 Thread Zelaine Fong (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15212245#comment-15212245
 ] 

Zelaine Fong commented on DRILL-4539:
-

Just for clarity, the proposed rewritten syntax using IS NOT DISTINCT FROM 
would be:

Original Query:
select ... FROM t1, t2 WHERE t1.c1 = t2.c2 OR (t1.c1 IS NULL AND t2.c2 IS NULL)

Rewritten Query:
select ... FROM t1, t2 WHERE t1.c1 IS NOT DISTINCT FROM t2.c2

> Add support for Null Equality Joins
> ---
>
> Key: DRILL-4539
> URL: https://issues.apache.org/jira/browse/DRILL-4539
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Jacques Nadeau
>Assignee: Venki Korukanti
>
> Tableau frequently generates queries similar to this:
> {code}
> SELECT `t0`.`city` AS `city`,
>   `t2`.`X_measure__B` AS `max_Calculation_DFIDBHHAIIECCJFDAG_ok`,
>   `t0`.`state` AS `state`,
>   `t0`.`sum_stars_ok` AS `sum_stars_ok`
> FROM (
>   SELECT `business`.`city` AS `city`,
> `business`.`state` AS `state`,
> SUM(`business`.`stars`) AS `sum_stars_ok`
>   FROM `mongo.academic`.`business` `business`
>   GROUP BY `business`.`city`,
> `business`.`state`
> ) `t0`
>   INNER JOIN (
>   SELECT MAX(`t1`.`X_measure__A`) AS `X_measure__B`,
> `t1`.`city` AS `city`,
> `t1`.`state` AS `state`
>   FROM (
> SELECT `business`.`city` AS `city`,
>   `business`.`state` AS `state`,
>   `business`.`business_id` AS `business_id`,
>   SUM(`business`.`stars`) AS `X_measure__A`
> FROM `mongo.academic`.`business` `business`
> GROUP BY `business`.`city`,
>   `business`.`state`,
>   `business`.`business_id`
>   ) `t1`
>   GROUP BY `t1`.`city`,
> `t1`.`state`
> ) `t2` ON (((`t0`.`city` = `t2`.`city`) OR ((`t0`.`city` IS NULL) AND 
> (`t2`.`city` IS NULL))) AND ((`t0`.`state` = `t2`.`state`) OR ((`t0`.`state` 
> IS NULL) AND (`t2`.`state` IS NULL
> {code}
> If you look at the join condition, you'll note that the join condition is an 
> equality condition which also allows null=null. We should add a planning 
> rewrite rule and execution join option to allow null equality so that we 
> don't treat this as a cartesian join.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4539) Add support for Null Equality Joins

2016-03-24 Thread Aman Sinha (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15211156#comment-15211156
 ] 

Aman Sinha commented on DRILL-4539:
---

The execution side is already available in Drill .. we added this a while ago 
(may have been pre 1.0). 

> Add support for Null Equality Joins
> ---
>
> Key: DRILL-4539
> URL: https://issues.apache.org/jira/browse/DRILL-4539
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Jacques Nadeau
>Assignee: Venki Korukanti
>
> Tableau frequently generates queries similar to this:
> {code}
> SELECT `t0`.`city` AS `city`,
>   `t2`.`X_measure__B` AS `max_Calculation_DFIDBHHAIIECCJFDAG_ok`,
>   `t0`.`state` AS `state`,
>   `t0`.`sum_stars_ok` AS `sum_stars_ok`
> FROM (
>   SELECT `business`.`city` AS `city`,
> `business`.`state` AS `state`,
> SUM(`business`.`stars`) AS `sum_stars_ok`
>   FROM `mongo.academic`.`business` `business`
>   GROUP BY `business`.`city`,
> `business`.`state`
> ) `t0`
>   INNER JOIN (
>   SELECT MAX(`t1`.`X_measure__A`) AS `X_measure__B`,
> `t1`.`city` AS `city`,
> `t1`.`state` AS `state`
>   FROM (
> SELECT `business`.`city` AS `city`,
>   `business`.`state` AS `state`,
>   `business`.`business_id` AS `business_id`,
>   SUM(`business`.`stars`) AS `X_measure__A`
> FROM `mongo.academic`.`business` `business`
> GROUP BY `business`.`city`,
>   `business`.`state`,
>   `business`.`business_id`
>   ) `t1`
>   GROUP BY `t1`.`city`,
> `t1`.`state`
> ) `t2` ON (((`t0`.`city` = `t2`.`city`) OR ((`t0`.`city` IS NULL) AND 
> (`t2`.`city` IS NULL))) AND ((`t0`.`state` = `t2`.`state`) OR ((`t0`.`state` 
> IS NULL) AND (`t2`.`state` IS NULL
> {code}
> If you look at the join condition, you'll note that the join condition is an 
> equality condition which also allows null=null. We should add a planning 
> rewrite rule and execution join option to allow null equality so that we 
> don't treat this as a cartesian join.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4539) Add support for Null Equality Joins

2016-03-24 Thread Jacques Nadeau (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15211052#comment-15211052
 ] 

Jacques Nadeau commented on DRILL-4539:
---

Thanks for these details. Does this mean the execution is already available or 
is my initial description still accurate?

> Add support for Null Equality Joins
> ---
>
> Key: DRILL-4539
> URL: https://issues.apache.org/jira/browse/DRILL-4539
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Jacques Nadeau
>Assignee: Venki Korukanti
>
> Tableau frequently generates queries similar to this:
> {code}
> SELECT `t0`.`city` AS `city`,
>   `t2`.`X_measure__B` AS `max_Calculation_DFIDBHHAIIECCJFDAG_ok`,
>   `t0`.`state` AS `state`,
>   `t0`.`sum_stars_ok` AS `sum_stars_ok`
> FROM (
>   SELECT `business`.`city` AS `city`,
> `business`.`state` AS `state`,
> SUM(`business`.`stars`) AS `sum_stars_ok`
>   FROM `mongo.academic`.`business` `business`
>   GROUP BY `business`.`city`,
> `business`.`state`
> ) `t0`
>   INNER JOIN (
>   SELECT MAX(`t1`.`X_measure__A`) AS `X_measure__B`,
> `t1`.`city` AS `city`,
> `t1`.`state` AS `state`
>   FROM (
> SELECT `business`.`city` AS `city`,
>   `business`.`state` AS `state`,
>   `business`.`business_id` AS `business_id`,
>   SUM(`business`.`stars`) AS `X_measure__A`
> FROM `mongo.academic`.`business` `business`
> GROUP BY `business`.`city`,
>   `business`.`state`,
>   `business`.`business_id`
>   ) `t1`
>   GROUP BY `t1`.`city`,
> `t1`.`state`
> ) `t2` ON (((`t0`.`city` = `t2`.`city`) OR ((`t0`.`city` IS NULL) AND 
> (`t2`.`city` IS NULL))) AND ((`t0`.`state` = `t2`.`state`) OR ((`t0`.`state` 
> IS NULL) AND (`t2`.`state` IS NULL
> {code}
> If you look at the join condition, you'll note that the join condition is an 
> equality condition which also allows null=null. We should add a planning 
> rewrite rule and execution join option to allow null equality so that we 
> don't treat this as a cartesian join.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4539) Add support for Null Equality Joins

2016-03-24 Thread Aman Sinha (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15211045#comment-15211045
 ] 

Aman Sinha commented on DRILL-4539:
---

See here for the  Drill support:
 
https://github.com/apache/drill/blob/master/exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/JoinUtils.java#L74

> Add support for Null Equality Joins
> ---
>
> Key: DRILL-4539
> URL: https://issues.apache.org/jira/browse/DRILL-4539
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Jacques Nadeau
>Assignee: Venki Korukanti
>
> Tableau frequently generates queries similar to this:
> {code}
> SELECT `t0`.`city` AS `city`,
>   `t2`.`X_measure__B` AS `max_Calculation_DFIDBHHAIIECCJFDAG_ok`,
>   `t0`.`state` AS `state`,
>   `t0`.`sum_stars_ok` AS `sum_stars_ok`
> FROM (
>   SELECT `business`.`city` AS `city`,
> `business`.`state` AS `state`,
> SUM(`business`.`stars`) AS `sum_stars_ok`
>   FROM `mongo.academic`.`business` `business`
>   GROUP BY `business`.`city`,
> `business`.`state`
> ) `t0`
>   INNER JOIN (
>   SELECT MAX(`t1`.`X_measure__A`) AS `X_measure__B`,
> `t1`.`city` AS `city`,
> `t1`.`state` AS `state`
>   FROM (
> SELECT `business`.`city` AS `city`,
>   `business`.`state` AS `state`,
>   `business`.`business_id` AS `business_id`,
>   SUM(`business`.`stars`) AS `X_measure__A`
> FROM `mongo.academic`.`business` `business`
> GROUP BY `business`.`city`,
>   `business`.`state`,
>   `business`.`business_id`
>   ) `t1`
>   GROUP BY `t1`.`city`,
> `t1`.`state`
> ) `t2` ON (((`t0`.`city` = `t2`.`city`) OR ((`t0`.`city` IS NULL) AND 
> (`t2`.`city` IS NULL))) AND ((`t0`.`state` = `t2`.`state`) OR ((`t0`.`state` 
> IS NULL) AND (`t2`.`state` IS NULL
> {code}
> If you look at the join condition, you'll note that the join condition is an 
> equality condition which also allows null=null. We should add a planning 
> rewrite rule and execution join option to allow null equality so that we 
> don't treat this as a cartesian join.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (DRILL-4539) Add support for Null Equality Joins

2016-03-24 Thread Aman Sinha (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15211041#comment-15211041
 ] 

Aman Sinha commented on DRILL-4539:
---

Yes, we should do this optimization.  We should leverage the IS NOT DISTINCT 
FROM comparison which is SQL standard and supported by Calcite as well as 
Drill.   This allows nulls to be compared as equal.  

> Add support for Null Equality Joins
> ---
>
> Key: DRILL-4539
> URL: https://issues.apache.org/jira/browse/DRILL-4539
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Jacques Nadeau
>Assignee: Venki Korukanti
>
> Tableau frequently generates queries similar to this:
> {code}
> SELECT `t0`.`city` AS `city`,
>   `t2`.`X_measure__B` AS `max_Calculation_DFIDBHHAIIECCJFDAG_ok`,
>   `t0`.`state` AS `state`,
>   `t0`.`sum_stars_ok` AS `sum_stars_ok`
> FROM (
>   SELECT `business`.`city` AS `city`,
> `business`.`state` AS `state`,
> SUM(`business`.`stars`) AS `sum_stars_ok`
>   FROM `mongo.academic`.`business` `business`
>   GROUP BY `business`.`city`,
> `business`.`state`
> ) `t0`
>   INNER JOIN (
>   SELECT MAX(`t1`.`X_measure__A`) AS `X_measure__B`,
> `t1`.`city` AS `city`,
> `t1`.`state` AS `state`
>   FROM (
> SELECT `business`.`city` AS `city`,
>   `business`.`state` AS `state`,
>   `business`.`business_id` AS `business_id`,
>   SUM(`business`.`stars`) AS `X_measure__A`
> FROM `mongo.academic`.`business` `business`
> GROUP BY `business`.`city`,
>   `business`.`state`,
>   `business`.`business_id`
>   ) `t1`
>   GROUP BY `t1`.`city`,
> `t1`.`state`
> ) `t2` ON (((`t0`.`city` = `t2`.`city`) OR ((`t0`.`city` IS NULL) AND 
> (`t2`.`city` IS NULL))) AND ((`t0`.`state` = `t2`.`state`) OR ((`t0`.`state` 
> IS NULL) AND (`t2`.`state` IS NULL
> {code}
> If you look at the join condition, you'll note that the join condition is an 
> equality condition which also allows null=null. We should add a planning 
> rewrite rule and execution join option to allow null equality so that we 
> don't treat this as a cartesian join.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)