subject:"\[GitHub\] spark pull request #16467\: \[SPARK\-19017\]\[SQL\] NOT IN subquery with more than..."

[GitHub] spark pull request #16467: [SPARK-19017][SQL] NOT IN subquery with more than...

2017-01-24 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/16467


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16467: [SPARK-19017][SQL] NOT IN subquery with more than...

2017-01-24 Thread hvanhovell

Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/16467#discussion_r97667507
  
--- Diff: 
sql/core/src/test/resources/sql-tests/results/subquery/in-subquery/not-in-multiple-columns.sql.out
 ---
@@ -0,0 +1,59 @@
+-- Automatically generated by SQLQueryTestSuite
+-- Number of queries: 5
+
+
+-- !query 0
+create temporary view t1 as select * from values
+  (1, 1), (2, 1), (null, 1),
+  (1, 3), (null, 3),
+  (1, null), (null, 2)
+as t1(a1, b1)
+-- !query 0 schema
+struct<>
+-- !query 0 output
+
+
+
+-- !query 1
+create temporary view t2 as select * from values
+  (1, 1),
+  (null, 3),
+  (1, null)
+as t2(a2, b2)
+-- !query 1 schema
+struct<>
+-- !query 1 output
+
+
+
+-- !query 2
+select a1,b1
+from   t1
+where  (a1,b1) not in (select a2,b2
+   from   t2)
+-- !query 2 schema
+struct
+-- !query 2 output
+2  1
+
--- End diff --

Ok yeah you are right. I was confusing this with the or rules.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16467: [SPARK-19017][SQL] NOT IN subquery with more than...

2017-01-05 Thread nsyca

Github user nsyca commented on a diff in the pull request:

https://github.com/apache/spark/pull/16467#discussion_r9475
  
--- Diff: 
sql/core/src/test/resources/sql-tests/results/subquery/in-subquery/not-in-multiple-columns.sql.out
 ---
@@ -0,0 +1,59 @@
+-- Automatically generated by SQLQueryTestSuite
+-- Number of queries: 5
+
+
+-- !query 0
+create temporary view t1 as select * from values
+  (1, 1), (2, 1), (null, 1),
+  (1, 3), (null, 3),
+  (1, null), (null, 2)
+as t1(a1, b1)
+-- !query 0 schema
+struct<>
+-- !query 0 output
+
+
+
+-- !query 1
+create temporary view t2 as select * from values
+  (1, 1),
+  (null, 3),
+  (1, null)
+as t2(a2, b2)
+-- !query 1 schema
+struct<>
+-- !query 1 output
+
+
+
+-- !query 2
+select a1,b1
+from   t1
+where  (a1,b1) not in (select a2,b2
+   from   t2)
+-- !query 2 schema
+struct
+-- !query 2 output
+2  1
+
--- End diff --

Let's consider this:

(null, 2) NOT IN { (1, 1), (null, 3), (1, null) }

which is equal to

 AND (null <> 1 OR 2 <> null) => ... AND (unknown OR unknown)
 => ... AND unknown
 => unknown

Therefore (null, 2) is not part of the result set.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16467: [SPARK-19017][SQL] NOT IN subquery with more than...

2017-01-05 Thread nsyca

Github user nsyca commented on a diff in the pull request:

https://github.com/apache/spark/pull/16467#discussion_r94888382
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/SubquerySuite.scala 
---
@@ -263,12 +263,12 @@ class SubquerySuite extends QueryTest with 
SharedSQLContext {
   Row(1, 2.0) :: Row(1, 2.0) :: Nil)
 
 checkAnswer(
-  sql("select * from l where a not in (select c from t where b < d)"),
-  Row(1, 2.0) :: Row(1, 2.0) :: Row(3, 3.0) :: Nil)
+  sql("select * from l where (a, b) not in (select c, d from t) and a 
< 4"),
+  Row(1, 2.0) :: Row(1, 2.0) :: Row(2, 1.0) :: Row(2, 1.0) :: Row(3, 
3.0) :: Nil)
 
 // Empty sub-query
 checkAnswer(
-  sql("select * from l where a not in (select c from r where c > 10 
and b < d)"),
+  sql("select * from l where (a, b) not in (select c, d from r where c 
> 10)"),
--- End diff --

This test case is effectively covering the case of empty subquery.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16467: [SPARK-19017][SQL] NOT IN subquery with more than...

2017-01-05 Thread nsyca

Github user nsyca commented on a diff in the pull request:

https://github.com/apache/spark/pull/16467#discussion_r94887839
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala
 ---
@@ -68,8 +68,15 @@ object RewritePredicateSubquery extends 
Rule[LogicalPlan] with PredicateHelper {
   // Note that will almost certainly be planned as a Broadcast 
Nested Loop join.
   // Use EXISTS if performance matters to you.
   val (joinCond, outerPlan) = rewriteExistentialExpr(conditions, p)
-  val anyNull = 
splitConjunctivePredicates(joinCond.get).map(IsNull).reduceLeft(Or)
-  Join(outerPlan, sub, LeftAnti, Option(Or(anyNull, joinCond.get)))
+  // Expand the NOT IN expression with the NULL-aware semantic
+  // to its full form. That is from:
+  //   (a1,b1,...) = (a2,b2,...)
+  // to
+  //   (a1=a2 OR isnull(a1=a2)) AND (b1=b2 OR isnull(b1=b2)) AND 
...
+  val joinConds = splitConjunctivePredicates(joinCond.get)
+  val isNulls = joinConds.map(IsNull)
--- End diff --

Thank you. I have changed the code.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16467: [SPARK-19017][SQL] NOT IN subquery with more than...

2017-01-05 Thread hvanhovell

Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/16467#discussion_r94871003
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/SubquerySuite.scala 
---
@@ -263,12 +263,12 @@ class SubquerySuite extends QueryTest with 
SharedSQLContext {
   Row(1, 2.0) :: Row(1, 2.0) :: Nil)
 
 checkAnswer(
-  sql("select * from l where a not in (select c from t where b < d)"),
-  Row(1, 2.0) :: Row(1, 2.0) :: Row(3, 3.0) :: Nil)
+  sql("select * from l where (a, b) not in (select c, d from t) and a 
< 4"),
+  Row(1, 2.0) :: Row(1, 2.0) :: Row(2, 1.0) :: Row(2, 1.0) :: Row(3, 
3.0) :: Nil)
 
 // Empty sub-query
 checkAnswer(
-  sql("select * from l where a not in (select c from r where c > 10 
and b < d)"),
+  sql("select * from l where (a, b) not in (select c, d from r where c 
> 10)"),
--- End diff --

Then we also should test an empty subquery :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16467: [SPARK-19017][SQL] NOT IN subquery with more than...

2017-01-05 Thread hvanhovell

Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/16467#discussion_r94871021
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/subquery.scala
 ---
@@ -68,8 +68,15 @@ object RewritePredicateSubquery extends 
Rule[LogicalPlan] with PredicateHelper {
   // Note that will almost certainly be planned as a Broadcast 
Nested Loop join.
   // Use EXISTS if performance matters to you.
   val (joinCond, outerPlan) = rewriteExistentialExpr(conditions, p)
-  val anyNull = 
splitConjunctivePredicates(joinCond.get).map(IsNull).reduceLeft(Or)
-  Join(outerPlan, sub, LeftAnti, Option(Or(anyNull, joinCond.get)))
+  // Expand the NOT IN expression with the NULL-aware semantic
+  // to its full form. That is from:
+  //   (a1,b1,...) = (a2,b2,...)
+  // to
+  //   (a1=a2 OR isnull(a1=a2)) AND (b1=b2 OR isnull(b1=b2)) AND 
...
+  val joinConds = splitConjunctivePredicates(joinCond.get)
+  val isNulls = joinConds.map(IsNull)
--- End diff --

Minor - We can write this more directly:
```scala
val pairs = joinConds.map(c => Or(c, IsNull(c))).reduceLeft(And)
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16467: [SPARK-19017][SQL] NOT IN subquery with more than...

2017-01-05 Thread hvanhovell

Github user hvanhovell commented on a diff in the pull request:

https://github.com/apache/spark/pull/16467#discussion_r94874117
  
--- Diff: 
sql/core/src/test/resources/sql-tests/results/subquery/in-subquery/not-in-multiple-columns.sql.out
 ---
@@ -0,0 +1,59 @@
+-- Automatically generated by SQLQueryTestSuite
+-- Number of queries: 5
+
+
+-- !query 0
+create temporary view t1 as select * from values
+  (1, 1), (2, 1), (null, 1),
+  (1, 3), (null, 3),
+  (1, null), (null, 2)
+as t1(a1, b1)
+-- !query 0 schema
+struct<>
+-- !query 0 output
+
+
+
+-- !query 1
+create temporary view t2 as select * from values
+  (1, 1),
+  (null, 3),
+  (1, null)
+as t2(a2, b2)
+-- !query 1 schema
+struct<>
+-- !query 1 output
+
+
+
+-- !query 2
+select a1,b1
+from   t1
+where  (a1,b1) not in (select a2,b2
+   from   t2)
+-- !query 2 schema
+struct
+-- !query 2 output
+2  1
+
--- End diff --

Why is `(null, 2)` missing? There is no tuple in t2 for which b2=2.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16467: [SPARK-19017][SQL] NOT IN subquery with more than...

2017-01-03 Thread nsyca

Github user nsyca commented on a diff in the pull request:

https://github.com/apache/spark/pull/16467#discussion_r94518758
  
--- Diff: 
sql/core/src/test/resources/sql-tests/results/subquery/in-subquery/not-in-multiple-columns.sql.out
 ---
@@ -0,0 +1,59 @@
+-- Automatically generated by SQLQueryTestSuite
+-- Number of queries: 5
+
+
+-- !query 0
+create temporary view t1 as select * from values
+  (1, 1), (2, 1), (null, 1),
+  (1, 3), (null, 3),
+  (1, null), (null, 2)
+as t1(a1, b1)
+-- !query 0 schema
+struct<>
+-- !query 0 output
+
+
+
+-- !query 1
+create temporary view t2 as select * from values
+  (1, 1),
+  (null, 3),
+  (1, null)
+as t2(a2, b2)
+-- !query 1 schema
+struct<>
+-- !query 1 output
+
+
+
+-- !query 2
+select a1,b1
+from   t1
+where  (a1,b1) not in (select a2,b2
+   from   t2)
+-- !query 2 schema
+struct
+-- !query 2 output
+2  1
+
+
+-- !query 3
+select a1,b1
+from   t1
+where  (a1-1,b1) not in (select a2,b2
+ from   t2)
+-- !query 3 schema
+struct
+-- !query 3 output
+1  1
+
+
+-- !query 4
+select a1,b1
+from   t1
+where  (a1,b1) not in (select a2+1,b2
+   from   t2)
+-- !query 4 schema
+struct
+-- !query 4 output
+1  1
--- End diff --

It returns an empty set without this fix.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16467: [SPARK-19017][SQL] NOT IN subquery with more than...

2017-01-03 Thread nsyca

Github user nsyca commented on a diff in the pull request:

https://github.com/apache/spark/pull/16467#discussion_r94519176
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/SubquerySuite.scala 
---
@@ -263,12 +263,12 @@ class SubquerySuite extends QueryTest with 
SharedSQLContext {
   Row(1, 2.0) :: Row(1, 2.0) :: Nil)
 
 checkAnswer(
-  sql("select * from l where a not in (select c from t where b < d)"),
-  Row(1, 2.0) :: Row(1, 2.0) :: Row(3, 3.0) :: Nil)
+  sql("select * from l where (a, b) not in (select c, d from t) and a 
< 4"),
+  Row(1, 2.0) :: Row(1, 2.0) :: Row(2, 1.0) :: Row(2, 1.0) :: Row(3, 
3.0) :: Nil)
--- End diff --

Query with correlated predicates in NOT IN subquery could generate 
incorrect results (this problem is tracked by SPARK-18966). With this fix, it 
reveals the problem. Here I modify the test case to cover the code path for 
multiple columns instead.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16467: [SPARK-19017][SQL] NOT IN subquery with more than...

2017-01-03 Thread nsyca

Github user nsyca commented on a diff in the pull request:

https://github.com/apache/spark/pull/16467#discussion_r94518742
  
--- Diff: 
sql/core/src/test/resources/sql-tests/results/subquery/in-subquery/not-in-multiple-columns.sql.out
 ---
@@ -0,0 +1,59 @@
+-- Automatically generated by SQLQueryTestSuite
+-- Number of queries: 5
+
+
+-- !query 0
+create temporary view t1 as select * from values
+  (1, 1), (2, 1), (null, 1),
+  (1, 3), (null, 3),
+  (1, null), (null, 2)
+as t1(a1, b1)
+-- !query 0 schema
+struct<>
+-- !query 0 output
+
+
+
+-- !query 1
+create temporary view t2 as select * from values
+  (1, 1),
+  (null, 3),
+  (1, null)
+as t2(a2, b2)
+-- !query 1 schema
+struct<>
+-- !query 1 output
+
+
+
+-- !query 2
+select a1,b1
+from   t1
+where  (a1,b1) not in (select a2,b2
+   from   t2)
+-- !query 2 schema
+struct
+-- !query 2 output
+2  1
+
--- End diff --

It returns an empty set without this fix.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16467: [SPARK-19017][SQL] NOT IN subquery with more than...

2017-01-03 Thread nsyca

Github user nsyca commented on a diff in the pull request:

https://github.com/apache/spark/pull/16467#discussion_r94519010
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/SQLQueryTestSuite.scala ---
@@ -163,7 +163,12 @@ class SQLQueryTestSuite extends QueryTest with 
SharedSQLContext {
 s"-- Number of queries: ${outputs.size}\n\n\n" +
 outputs.zipWithIndex.map{case (qr, i) => 
qr.toString(i)}.mkString("\n\n\n") + "\n"
   }
-  stringToFile(new File(testCase.resultFile), goldenOutput)
+  val resultFile = new File(testCase.resultFile);
+  val parent = resultFile.getParentFile();
+  if (!parent.exists()) {
+assert(parent.mkdirs(), "Could not create directory: " + parent)
+  }
+  stringToFile(resultFile, goldenOutput)
--- End diff --

This newly added code is to address an issue, when test files are located 
in a hierarchy of sub-directories, at the time the golden result files are 
generated it could happen that the structure of those sub-directories are not 
yet created. The code will create the required sub-directories.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16467: [SPARK-19017][SQL] NOT IN subquery with more than...

2017-01-03 Thread nsyca

Github user nsyca commented on a diff in the pull request:

https://github.com/apache/spark/pull/16467#discussion_r94518750
  
--- Diff: 
sql/core/src/test/resources/sql-tests/results/subquery/in-subquery/not-in-multiple-columns.sql.out
 ---
@@ -0,0 +1,59 @@
+-- Automatically generated by SQLQueryTestSuite
+-- Number of queries: 5
+
+
+-- !query 0
+create temporary view t1 as select * from values
+  (1, 1), (2, 1), (null, 1),
+  (1, 3), (null, 3),
+  (1, null), (null, 2)
+as t1(a1, b1)
+-- !query 0 schema
+struct<>
+-- !query 0 output
+
+
+
+-- !query 1
+create temporary view t2 as select * from values
+  (1, 1),
+  (null, 3),
+  (1, null)
+as t2(a2, b2)
+-- !query 1 schema
+struct<>
+-- !query 1 output
+
+
+
+-- !query 2
+select a1,b1
+from   t1
+where  (a1,b1) not in (select a2,b2
+   from   t2)
+-- !query 2 schema
+struct
+-- !query 2 output
+2  1
+
+
+-- !query 3
+select a1,b1
+from   t1
+where  (a1-1,b1) not in (select a2,b2
+ from   t2)
+-- !query 3 schema
+struct
+-- !query 3 output
+1  1
+
--- End diff --

It returns an empty set without this fix.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16467: [SPARK-19017][SQL] NOT IN subquery with more than...

2017-01-03 Thread nsyca

Github user nsyca commented on a diff in the pull request:

https://github.com/apache/spark/pull/16467#discussion_r94519375
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/SubquerySuite.scala 
---
@@ -263,12 +263,12 @@ class SubquerySuite extends QueryTest with 
SharedSQLContext {
   Row(1, 2.0) :: Row(1, 2.0) :: Nil)
 
 checkAnswer(
-  sql("select * from l where a not in (select c from t where b < d)"),
-  Row(1, 2.0) :: Row(1, 2.0) :: Row(3, 3.0) :: Nil)
+  sql("select * from l where (a, b) not in (select c, d from t) and a 
< 4"),
+  Row(1, 2.0) :: Row(1, 2.0) :: Row(2, 1.0) :: Row(2, 1.0) :: Row(3, 
3.0) :: Nil)
 
 // Empty sub-query
 checkAnswer(
-  sql("select * from l where a not in (select c from r where c > 10 
and b < d)"),
+  sql("select * from l where (a, b) not in (select c, d from r where c 
> 10)"),
--- End diff --

With the predicate `c > 10` (which filters all the rows in the subquery), 
it covers up the correlated predicate problem. Instead of removing the test 
case completely, I just modify to have a different coverage for multiple 
columns.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #16467: [SPARK-19017][SQL] NOT IN subquery with more than...

2017-01-03 Thread nsyca

GitHub user nsyca opened a pull request:

https://github.com/apache/spark/pull/16467

[SPARK-19017][SQL] NOT IN subquery with more than one column may return 
incorrect results

## What changes were proposed in this pull request?

This PR fixes the code in Optimizer phase where the NULL-aware expression 
of a NOT IN query is expanded in Rule `RewritePredicateSubquery`.

Example:
The query

 select a1,b1
 from   t1
 where  (a1,b1) not in (select a2,b2
from   t2);

has the (a1, b1) = (a2, b2) rewritten from (before this fix):

Join LeftAnti, ((isnull((_1#2 = a2#16)) || isnull((_2#3 = b2#17))) || 
((_1#2 = a2#16) && (_2#3 = b2#17)))

to (after this fix):

Join LeftAnti, (((_1#2 = a2#16) || isnull((_1#2 = a2#16))) && ((_2#3 = 
b2#17) || isnull((_2#3 = b2#17

## How was this patch tested?

sql/test, catalyst/test and new test cases in SQLQueryTestSuite.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/nsyca/spark 19017

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16467.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16467


commit b98865127a39bde885f9b1680cfe608629d59d51
Author: Nattavut Sutyanyong 
Date:   2016-07-29T21:43:56Z

[SPARK-16804][SQL] Correlated subqueries containing LIMIT return incorrect 
results

## What changes were proposed in this pull request?

This patch fixes the incorrect results in the rule ResolveSubquery in 
Catalyst's Analysis phase.

## How was this patch tested?
./dev/run-tests
a new unit test on the problematic pattern.

commit 069ed8f8e5f14dca7a15701945d42fc27fe82f3c
Author: Nattavut Sutyanyong 
Date:   2016-07-29T21:50:02Z

[SPARK-16804][SQL] Correlated subqueries containing LIMIT return incorrect 
results

## What changes were proposed in this pull request?

This patch fixes the incorrect results in the rule ResolveSubquery in 
Catalyst's Analysis phase.

## How was this patch tested?
./dev/run-tests
a new unit test on the problematic pattern.

commit edca333c081e6d4e53a91b496fba4a3ef4ee89ac
Author: Nattavut Sutyanyong 
Date:   2016-07-30T00:28:15Z

New positive test cases

commit 64184fdb77c1a305bb2932e82582da28bb4c0e53
Author: Nattavut Sutyanyong 
Date:   2016-08-01T13:20:09Z

Fix unit test case failure

commit 29f82b05c9e40e7934397257c674b260a8e8a996
Author: Nattavut Sutyanyong 
Date:   2016-08-05T17:42:01Z

blocking TABLESAMPLE

commit ac43ab47907a1ccd6d22f920415fbb4de93d4720
Author: Nattavut Sutyanyong 
Date:   2016-08-05T21:10:19Z

Fixing code styling

commit 631d396031e8bf627eb1f4872a4d3a17c144536c
Author: Nattavut Sutyanyong 
Date:   2016-08-07T18:39:44Z

Correcting Scala test style

commit 7eb9b2dbba3633a1958e38e0019e3ce816300514
Author: Nattavut Sutyanyong 
Date:   2016-08-08T02:31:09Z

One (last) attempt to correct the Scala style tests

commit 1387cf51541408ac20048064fa5e559836af932c
Author: Nattavut Sutyanyong 
Date:   2016-08-12T20:11:50Z

Merge remote-tracking branch 'upstream/master'

commit 3faa2d5edc030495f8b870d2c017cb714c17b6a7
Author: Nattavut Sutyanyong 
Date:   2016-12-14T16:35:52Z

Merge remote-tracking branch 'upstream/master'

commit a30863457ef49f99aff001b1987da75093c20f86
Author: Nattavut Sutyanyong 
Date:   2016-12-30T17:18:18Z

Merge remote-tracking branch 'upstream/master'

commit 473c81bacda2b12e6b85fe3f609ba334460bf0fe
Author: Nattavut Sutyanyong 
Date:   2017-01-01T16:15:07Z

first try on the fix

commit 278ebaea9ab52bc141e85e578416203107d38eda
Author: Nattavut Sutyanyong 
Date:   2017-01-03T22:07:35Z

add/update test cases

commit f1524b99aff70e688e4763db7898da53286a321e
Author: Nattavut Sutyanyong 
Date:   2017-01-03T22:08:03Z

Merge remote-tracking branch 'upstream/master'

commit 9e1b29e99f33a5f78f1edca80495ab33b2389d2a
Author: Nattavut Sutyanyong 
Date:   2017-01-03T22:09:26Z

Merge branch 'master' into 19017

commit de655d0d00693a2bc98fddad7be6f55fb2690555
Author: Nattavut Sutyanyong 
Date:   2017-01-04T01:26:45Z

Add descriptive comment




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request #16467: [SPARK-19017][SQL] NOT IN subquery with more than...

[GitHub] spark pull request #16467: [SPARK-19017][SQL] NOT IN subquery with more than...

[GitHub] spark pull request #16467: [SPARK-19017][SQL] NOT IN subquery with more than...

[GitHub] spark pull request #16467: [SPARK-19017][SQL] NOT IN subquery with more than...

[GitHub] spark pull request #16467: [SPARK-19017][SQL] NOT IN subquery with more than...

[GitHub] spark pull request #16467: [SPARK-19017][SQL] NOT IN subquery with more than...

[GitHub] spark pull request #16467: [SPARK-19017][SQL] NOT IN subquery with more than...

[GitHub] spark pull request #16467: [SPARK-19017][SQL] NOT IN subquery with more than...

[GitHub] spark pull request #16467: [SPARK-19017][SQL] NOT IN subquery with more than...

[GitHub] spark pull request #16467: [SPARK-19017][SQL] NOT IN subquery with more than...

[GitHub] spark pull request #16467: [SPARK-19017][SQL] NOT IN subquery with more than...

[GitHub] spark pull request #16467: [SPARK-19017][SQL] NOT IN subquery with more than...

[GitHub] spark pull request #16467: [SPARK-19017][SQL] NOT IN subquery with more than...

[GitHub] spark pull request #16467: [SPARK-19017][SQL] NOT IN subquery with more than...

[GitHub] spark pull request #16467: [SPARK-19017][SQL] NOT IN subquery with more than...

15 matches

Site Navigation

Mail list logo

Footer information