[GitHub] spark pull request: [SPARK-13249][SQL] Add Filter checking nullabi...

2016-03-08 Thread viirya
Github user viirya closed the pull request at:

https://github.com/apache/spark/pull/11235


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13249][SQL] Add Filter checking nullabi...

2016-03-08 Thread viirya
Github user viirya commented on the pull request:

https://github.com/apache/spark/pull/11235#issuecomment-194049145
  
@davies yea, looks like it is doing the same. Let me close this now. Thanks 
for reviewing this anyway!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13249][SQL] Add Filter checking nullabi...

2016-03-08 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/11235#discussion_r55457081
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -146,6 +147,56 @@ object EliminateSerialization extends 
Rule[LogicalPlan] {
 }
 
 /**
+ * Add Filter to left and right of a EquiJoin to filter out rows with null 
keys.
+ * So we may not need to check nullability of keys while joining. Besides, 
by filtering
+ * out keys with null, we can also reduce data size in Join.
+ */
+object AddNullFilterForEquiJoin extends Rule[LogicalPlan] with 
PredicateHelper {
+  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+case ExtractEquiJoinKeys(Inner, leftKeys, rightKeys, condition, left, 
right) =>
+  val leftConditions = leftKeys.distinct.map { l =>
+IsNotNull(l)
+  }.reduceLeft(And)
+
+  val rightConditions = rightKeys.distinct.map { r =>
+IsNotNull(r)
+  }.reduceLeft(And)
+
+  val keysConditions = ExpressionSet(leftKeys.zip(rightKeys).map { lr 
=>
+EqualTo(lr._2, lr._1)
+  }).reduceLeft(And)
+
+  val finalConditions = if (condition.isDefined) {
+And(keysConditions, condition.get)
+  } else {
+keysConditions
+  }
+
+  val leftHasAllNotNullPredicate = left.constraints.nonEmpty &&
+left.constraints.filter(_.isInstanceOf[IsNotNull])
+  .forall(expr => 
leftConditions.references.intersect(expr.references).nonEmpty)
+
+  val rightHasAllNotNullPredicate = right.constraints.nonEmpty &&
+right.constraints.filter(_.isInstanceOf[IsNotNull])
+  .forall(expr => 
rightConditions.references.intersect(expr.references).nonEmpty)
--- End diff --

Because we want to eliminate reading nulls in advance of join operator. If 
we simply add IsNotNull to the left/right to the conditions, the nulls are 
still read during performing join.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13249][SQL] Add Filter checking nullabi...

2016-03-08 Thread davies
Github user davies commented on the pull request:

https://github.com/apache/spark/pull/11235#issuecomment-194041946
  
I did not realized that we had merged 
https://github.com/apache/spark/pull/11372/files, do we still need this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13249][SQL] Add Filter checking nullabi...

2016-03-08 Thread marmbrus
Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/11235#discussion_r55403683
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -146,6 +147,56 @@ object EliminateSerialization extends 
Rule[LogicalPlan] {
 }
 
 /**
+ * Add Filter to left and right of a EquiJoin to filter out rows with null 
keys.
+ * So we may not need to check nullability of keys while joining. Besides, 
by filtering
+ * out keys with null, we can also reduce data size in Join.
+ */
+object AddNullFilterForEquiJoin extends Rule[LogicalPlan] with 
PredicateHelper {
+  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+case ExtractEquiJoinKeys(Inner, leftKeys, rightKeys, condition, left, 
right) =>
+  val leftConditions = leftKeys.distinct.map { l =>
+IsNotNull(l)
+  }.reduceLeft(And)
+
+  val rightConditions = rightKeys.distinct.map { r =>
+IsNotNull(r)
+  }.reduceLeft(And)
+
+  val keysConditions = ExpressionSet(leftKeys.zip(rightKeys).map { lr 
=>
+EqualTo(lr._2, lr._1)
+  }).reduceLeft(And)
+
+  val finalConditions = if (condition.isDefined) {
+And(keysConditions, condition.get)
+  } else {
+keysConditions
+  }
+
+  val leftHasAllNotNullPredicate = left.constraints.nonEmpty &&
+left.constraints.filter(_.isInstanceOf[IsNotNull])
+  .forall(expr => 
leftConditions.references.intersect(expr.references).nonEmpty)
+
+  val rightHasAllNotNullPredicate = right.constraints.nonEmpty &&
+right.constraints.filter(_.isInstanceOf[IsNotNull])
+  .forall(expr => 
rightConditions.references.intersect(expr.references).nonEmpty)
--- End diff --

This seems complicated and definitely needs more comments.  Why are we not 
just adding IsNotNull to the left/right when it doesn't already exist in the 
conditions?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13249][SQL] Add Filter checking nullabi...

2016-03-08 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/11235#discussion_r55398759
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -146,6 +147,56 @@ object EliminateSerialization extends 
Rule[LogicalPlan] {
 }
 
 /**
+ * Add Filter to left and right of a EquiJoin to filter out rows with null 
keys.
+ * So we may not need to check nullability of keys while joining. Besides, 
by filtering
+ * out keys with null, we can also reduce data size in Join.
+ */
+object AddNullFilterForEquiJoin extends Rule[LogicalPlan] with 
PredicateHelper {
+  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+case ExtractEquiJoinKeys(Inner, leftKeys, rightKeys, condition, left, 
right) =>
+  val leftConditions = leftKeys.distinct.map { l =>
+IsNotNull(l)
+  }.reduceLeft(And)
+
+  val rightConditions = rightKeys.distinct.map { r =>
+IsNotNull(r)
+  }.reduceLeft(And)
+
+  val keysConditions = ExpressionSet(leftKeys.zip(rightKeys).map { lr 
=>
+EqualTo(lr._2, lr._1)
+  }).reduceLeft(And)
+
+  val finalConditions = if (condition.isDefined) {
+And(keysConditions, condition.get)
+  } else {
+keysConditions
+  }
+
+  val leftHasAllNotNullPredicate = left.constraints.nonEmpty &&
+left.constraints.filter(_.isInstanceOf[IsNotNull])
+  .forall(expr => 
leftConditions.references.intersect(expr.references).nonEmpty)
+
+  val rightHasAllNotNullPredicate = right.constraints.nonEmpty &&
+right.constraints.filter(_.isInstanceOf[IsNotNull])
+  .forall(expr => 
rightConditions.references.intersect(expr.references).nonEmpty)
+
+  val newLeft = if (leftHasAllNotNullPredicate) {
+left
+  } else {
+Filter(leftConditions, left)
+  }
+
+  val newRight = if (rightHasAllNotNullPredicate) {
+right
+  } else {
+Filter(rightConditions, right)
+  }
+
+  Join(newLeft, newRight, Inner, Some(finalConditions))
--- End diff --

If no filter added, we should return the original join.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13249][SQL] Add Filter checking nullabi...

2016-03-08 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/11235#discussion_r55398641
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -146,6 +147,56 @@ object EliminateSerialization extends 
Rule[LogicalPlan] {
 }
 
 /**
+ * Add Filter to left and right of a EquiJoin to filter out rows with null 
keys.
+ * So we may not need to check nullability of keys while joining. Besides, 
by filtering
+ * out keys with null, we can also reduce data size in Join.
+ */
+object AddNullFilterForEquiJoin extends Rule[LogicalPlan] with 
PredicateHelper {
+  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+case ExtractEquiJoinKeys(Inner, leftKeys, rightKeys, condition, left, 
right) =>
+  val leftConditions = leftKeys.distinct.map { l =>
+IsNotNull(l)
+  }.reduceLeft(And)
+
+  val rightConditions = rightKeys.distinct.map { r =>
+IsNotNull(r)
+  }.reduceLeft(And)
+
+  val keysConditions = ExpressionSet(leftKeys.zip(rightKeys).map { lr 
=>
+EqualTo(lr._2, lr._1)
+  }).reduceLeft(And)
+
+  val finalConditions = if (condition.isDefined) {
--- End diff --

It's better to use the original condition


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13249][SQL] Add Filter checking nullabi...

2016-03-08 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/11235#discussion_r55398500
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -146,6 +147,56 @@ object EliminateSerialization extends 
Rule[LogicalPlan] {
 }
 
 /**
+ * Add Filter to left and right of a EquiJoin to filter out rows with null 
keys.
+ * So we may not need to check nullability of keys while joining. Besides, 
by filtering
+ * out keys with null, we can also reduce data size in Join.
+ */
+object AddNullFilterForEquiJoin extends Rule[LogicalPlan] with 
PredicateHelper {
+  def apply(plan: LogicalPlan): LogicalPlan = plan transform {
+case ExtractEquiJoinKeys(Inner, leftKeys, rightKeys, condition, left, 
right) =>
+  val leftConditions = leftKeys.distinct.map { l =>
--- End diff --

We should only add predicate if the key is nullable and there is no 
IsNotNull constraints on the key.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13249][SQL] Add Filter checking nullabi...

2016-03-08 Thread viirya
Github user viirya commented on the pull request:

https://github.com/apache/spark/pull/11235#issuecomment-193720918
  
Comments addressed, please check if the change is good to merge now. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13249][SQL] Add Filter checking nullabi...

2016-03-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11235#issuecomment-193579525
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13249][SQL] Add Filter checking nullabi...

2016-03-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11235#issuecomment-193579527
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52617/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13249][SQL] Add Filter checking nullabi...

2016-03-07 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11235#issuecomment-193578999
  
**[Test build #52617 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52617/consoleFull)**
 for PR 11235 at commit 
[`312cb32`](https://github.com/apache/spark/commit/312cb326922624e95528b7f2dc92129c59b3b524).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13249][SQL] Add Filter checking nullabi...

2016-03-07 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/11235#discussion_r55309414
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -144,6 +146,56 @@ object EliminateSerialization extends 
Rule[LogicalPlan] {
 }
 
 /**
+ * Add Filter to left and right of an inner Join to filter out rows with 
null keys.
+ * So we may not need to check nullability of keys while joining. Besides, 
by filtering
+ * out keys with null, we can also reduce data size in Join.
+ */
+object AddFilterOfNullForInnerJoin extends Rule[LogicalPlan] with 
PredicateHelper {
--- End diff --

Renamed. Will do part of semi join and outer join in separate PR once this 
getting merged.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13249][SQL] Add Filter checking nullabi...

2016-03-07 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11235#issuecomment-193541806
  
**[Test build #52617 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52617/consoleFull)**
 for PR 11235 at commit 
[`312cb32`](https://github.com/apache/spark/commit/312cb326922624e95528b7f2dc92129c59b3b524).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13249][SQL] Add Filter checking nullabi...

2016-03-07 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/11235#discussion_r55247463
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -144,6 +146,56 @@ object EliminateSerialization extends 
Rule[LogicalPlan] {
 }
 
 /**
+ * Add Filter to left and right of an inner Join to filter out rows with 
null keys.
+ * So we may not need to check nullability of keys while joining. Besides, 
by filtering
+ * out keys with null, we can also reduce data size in Join.
+ */
+object AddFilterOfNullForInnerJoin extends Rule[LogicalPlan] with 
PredicateHelper {
--- End diff --

Not commenting at the name at all (haven't looked at the rest of the pr), 
but AddFilterOfNullForEquiJoin can be shortened slightly to 
AddNullFilterForEquiJoin


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13249][SQL] Add Filter checking nullabi...

2016-03-07 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/11235#discussion_r55245770
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -144,6 +146,56 @@ object EliminateSerialization extends 
Rule[LogicalPlan] {
 }
 
 /**
+ * Add Filter to left and right of an inner Join to filter out rows with 
null keys.
+ * So we may not need to check nullability of keys while joining. Besides, 
by filtering
+ * out keys with null, we can also reduce data size in Join.
+ */
+object AddFilterOfNullForInnerJoin extends Rule[LogicalPlan] with 
PredicateHelper {
--- End diff --

We could also do this for semi join and outer join, should we call it 
AddFilterOfNullForEquiJoin ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13249][SQL] Add Filter checking nullabi...

2016-03-07 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/11235#discussion_r55245694
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -57,6 +57,8 @@ abstract class Optimizer extends 
RuleExecutor[LogicalPlan] {
   ReplaceDistinctWithAggregate) ::
 Batch("Aggregate", FixedPoint(100),
   RemoveLiteralFromGroupExpressions) ::
+Batch("Join", FixedPoint(100),
--- End diff --

If we can make this rule idempotent, we don't need to put this as separate 
group.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13249][SQL] Add Filter checking nullabi...

2016-03-07 Thread davies
Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/11235#discussion_r55244765
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -57,6 +57,8 @@ abstract class Optimizer extends 
RuleExecutor[LogicalPlan] {
   ReplaceDistinctWithAggregate) ::
 Batch("Aggregate", FixedPoint(100),
   RemoveLiteralFromGroupExpressions) ::
+Batch("Join", FixedPoint(100),
--- End diff --

We may have more InnerJoin from OuterJoinElimination, should we move this 
rule after that?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13249][SQL] Add Filter checking nullabi...

2016-03-06 Thread viirya
Github user viirya commented on the pull request:

https://github.com/apache/spark/pull/11235#issuecomment-193064804
  
ping @marmbrus @rxin @davies @liancheng Is this ready to go? Or you have 
other comments? Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13249][SQL] Add Filter checking nullabi...

2016-03-05 Thread viirya
Github user viirya commented on the pull request:

https://github.com/apache/spark/pull/11235#issuecomment-192639157
  
@liancheng Can you review this too? I think I've addressed previous 
comments. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13249][SQL] Add Filter checking nullabi...

2016-03-01 Thread viirya
Github user viirya commented on the pull request:

https://github.com/apache/spark/pull/11235#issuecomment-191033735
  
@marmbrus @davies @rxin any comments for this? Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13249][SQL] Add Filter checking nullabi...

2016-02-29 Thread viirya
Github user viirya commented on the pull request:

https://github.com/apache/spark/pull/11235#issuecomment-190200841
  
ping @marmbrus @davies @rxin 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13249][SQL] Add Filter checking nullabi...

2016-02-26 Thread viirya
Github user viirya commented on the pull request:

https://github.com/apache/spark/pull/11235#issuecomment-189309998
  
@marmbrus I've addressed the comments. Please see if this change is 
appropriate. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13249][SQL] Add Filter checking nullabi...

2016-02-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11235#issuecomment-189170441
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52028/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13249][SQL] Add Filter checking nullabi...

2016-02-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11235#issuecomment-189170436
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13249][SQL] Add Filter checking nullabi...

2016-02-26 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11235#issuecomment-189169285
  
**[Test build #52028 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52028/consoleFull)**
 for PR 11235 at commit 
[`bf4777c`](https://github.com/apache/spark/commit/bf4777c00fad6cf143f351e1e5b4d8cbb2b5e230).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13249][SQL] Add Filter checking nullabi...

2016-02-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11235#issuecomment-189140564
  
**[Test build #52028 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52028/consoleFull)**
 for PR 11235 at commit 
[`bf4777c`](https://github.com/apache/spark/commit/bf4777c00fad6cf143f351e1e5b4d8cbb2b5e230).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13249][SQL] Add Filter checking nullabi...

2016-02-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11235#issuecomment-189122781
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13249][SQL] Add Filter checking nullabi...

2016-02-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11235#issuecomment-189122783
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/52021/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13249][SQL] Add Filter checking nullabi...

2016-02-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11235#issuecomment-189122519
  
**[Test build #52021 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52021/consoleFull)**
 for PR 11235 at commit 
[`8291831`](https://github.com/apache/spark/commit/8291831ffd934f06c91d6388c92e8ede66c2fdc7).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13249][SQL] Add Filter checking nullabi...

2016-02-25 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/11235#discussion_r54203802
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala
 ---
@@ -123,6 +123,9 @@ object ExtractEquiJoinKeys extends Logging with 
PredicateHelper {
 case EqualTo(l, r) =>
   canEvaluate(l, left) && canEvaluate(r, right) ||
 canEvaluate(l, right) && canEvaluate(r, left)
+case EqualNullSafe(l, r) =>
+  canEvaluate(l, left) && canEvaluate(r, right) ||
+canEvaluate(l, right) && canEvaluate(r, left)
--- End diff --

Since `EqualNullSafe` is added into left/right keys in 
`ExtractEquiJoinKeys`, we should also exclude it from the returned `condition`. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13249][SQL] Add Filter checking nullabi...

2016-02-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11235#issuecomment-189103359
  
**[Test build #52021 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/52021/consoleFull)**
 for PR 11235 at commit 
[`8291831`](https://github.com/apache/spark/commit/8291831ffd934f06c91d6388c92e8ede66c2fdc7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13249][SQL] Add Filter checking nullabi...

2016-02-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11235#issuecomment-188752803
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51960/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13249][SQL] Add Filter checking nullabi...

2016-02-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11235#issuecomment-188752798
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13249][SQL] Add Filter checking nullabi...

2016-02-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11235#issuecomment-188751471
  
**[Test build #51960 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51960/consoleFull)**
 for PR 11235 at commit 
[`88f5020`](https://github.com/apache/spark/commit/88f5020a95ab7c8895c90fb947bddc0ff71fa2cc).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13249][SQL] Add Filter checking nullabi...

2016-02-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11235#issuecomment-188750192
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13249][SQL] Add Filter checking nullabi...

2016-02-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11235#issuecomment-188750194
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51959/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13249][SQL] Add Filter checking nullabi...

2016-02-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11235#issuecomment-188749094
  
**[Test build #51959 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51959/consoleFull)**
 for PR 11235 at commit 
[`687e948`](https://github.com/apache/spark/commit/687e94851c17e8473880bc1568d512535c155f12).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13249][SQL] Add Filter checking nullabi...

2016-02-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11235#issuecomment-188702764
  
**[Test build #51960 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51960/consoleFull)**
 for PR 11235 at commit 
[`88f5020`](https://github.com/apache/spark/commit/88f5020a95ab7c8895c90fb947bddc0ff71fa2cc).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13249][SQL] Add Filter checking nullabi...

2016-02-25 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/11235#discussion_r54071987
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastHashJoin.scala
 ---
@@ -274,7 +274,7 @@ case class BroadcastHashJoin(
  |// generate join key for stream side
  |${keyEv.code}
  |// find matches from HashRelation
- |$bufferType $matches = $anyNull ? null : 
($bufferType)$relationTerm.get(${keyEv.value});
+ |$bufferType $matches = 
($bufferType)$relationTerm.get(${keyEv.value});
--- End diff --

Will revert this and address that in another PR too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13249][SQL] Add Filter checking nullabi...

2016-02-25 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/11235#discussion_r54071649
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashJoin.scala ---
@@ -126,11 +126,11 @@ trait HashJoin {
   while (currentHashMatches == null && streamIter.hasNext) {
 currentStreamedRow = streamIter.next()
 val key = joinKeys(currentStreamedRow)
-if (!key.anyNull) {
-  currentHashMatches = hashedRelation.get(key)
-  if (currentHashMatches != null) {
-currentMatchPosition = 0
-  }
+// We do filtering null keys by inserting a Filter before 
inner join,
--- End diff --

OK. I will revert this part and try to address this in another PR. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13249][SQL] Add Filter checking nullabi...

2016-02-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11235#issuecomment-188698959
  
**[Test build #51959 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51959/consoleFull)**
 for PR 11235 at commit 
[`687e948`](https://github.com/apache/spark/commit/687e94851c17e8473880bc1568d512535c155f12).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13249][SQL] Add Filter checking nullabi...

2016-02-22 Thread marmbrus
Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/11235#discussion_r53670406
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashJoin.scala ---
@@ -126,11 +126,11 @@ trait HashJoin {
   while (currentHashMatches == null && streamIter.hasNext) {
 currentStreamedRow = streamIter.next()
 val key = joinKeys(currentStreamedRow)
-if (!key.anyNull) {
-  currentHashMatches = hashedRelation.get(key)
-  if (currentHashMatches != null) {
-currentMatchPosition = 0
-  }
+// We do filtering null keys by inserting a Filter before 
inner join,
--- End diff --

Even if we were going to make changes like this, they should be done in a 
different PR and come with some benchmarks to show that you are actually 
speeding things up.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13249][SQL] Add Filter checking nullabi...

2016-02-22 Thread marmbrus
Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/11235#discussion_r53670314
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashJoin.scala ---
@@ -126,11 +126,11 @@ trait HashJoin {
   while (currentHashMatches == null && streamIter.hasNext) {
 currentStreamedRow = streamIter.next()
 val key = joinKeys(currentStreamedRow)
-if (!key.anyNull) {
-  currentHashMatches = hashedRelation.get(key)
-  if (currentHashMatches != null) {
-currentMatchPosition = 0
-  }
+// We do filtering null keys by inserting a Filter before 
inner join,
--- End diff --

The execution engines correctness should not be predicated on specific 
optimizations.  That is *really* tight coupling.  If you want to optimize code 
generation for null handling you should do it based on signals like `nullable` 
and not on undocumented, implicit contracts between components.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13249][SQL] Add Filter checking nullabi...

2016-02-22 Thread marmbrus
Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/11235#discussion_r53670042
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastHashJoin.scala
 ---
@@ -274,7 +274,7 @@ case class BroadcastHashJoin(
  |// generate join key for stream side
  |${keyEv.code}
  |// find matches from HashRelation
- |$bufferType $matches = $anyNull ? null : 
($bufferType)$relationTerm.get(${keyEv.value});
+ |$bufferType $matches = 
($bufferType)$relationTerm.get(${keyEv.value});
--- End diff --

Explain these changes please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13249][SQL] Add Filter checking nullabi...

2016-02-22 Thread marmbrus
Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/11235#discussion_r53669872
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -57,6 +57,8 @@ abstract class Optimizer extends 
RuleExecutor[LogicalPlan] {
   ReplaceDistinctWithAggregate) ::
 Batch("Aggregate", FixedPoint(100),
   RemoveLiteralFromGroupExpressions) ::
+Batch("Join", Once,
+  AddFilterOfNullForInnerJoin) ::
--- End diff --

We should make this rule idempotent instead of hacking it and making it run 
only once.  You are loosing the benefits of emergent optimizations with this 
implementation.

I would directly construct the filter in the left/right child, but only 
when its not already present in the `constraints` of the child.  This is the 
whole reason we added the ability to reason about what `constraints` are 
already present on a subtree.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13249][SQL] Add Filter checking nullabi...

2016-02-21 Thread viirya
Github user viirya commented on the pull request:

https://github.com/apache/spark/pull/11235#issuecomment-186975972
  
cc @davies @marmbrus @liancheng @rxin 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13249][SQL] Add Filter checking nullabi...

2016-02-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11235#issuecomment-186835014
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13249][SQL] Add Filter checking nullabi...

2016-02-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11235#issuecomment-186835016
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51631/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13249][SQL] Add Filter checking nullabi...

2016-02-21 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11235#issuecomment-186834946
  
**[Test build #51631 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51631/consoleFull)**
 for PR 11235 at commit 
[`aada320`](https://github.com/apache/spark/commit/aada32032b6b5a71c735ddb65c4852b40f6e31c6).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `abstract class SubqueryExpression extends LeafExpression `
  * `case class ScalarSubquery(`
  * `case class Subquery(name: String, child: SparkPlan) extends UnaryNode `
  * `case class ScalarSubquery(`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13249][SQL] Add Filter checking nullabi...

2016-02-21 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11235#issuecomment-186823215
  
**[Test build #51631 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51631/consoleFull)**
 for PR 11235 at commit 
[`aada320`](https://github.com/apache/spark/commit/aada32032b6b5a71c735ddb65c4852b40f6e31c6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13249][SQL] Add Filter checking nullabi...

2016-02-19 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/11235#discussion_r53442868
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashJoin.scala ---
@@ -126,11 +126,11 @@ trait HashJoin {
   while (currentHashMatches == null && streamIter.hasNext) {
 currentStreamedRow = streamIter.next()
 val key = joinKeys(currentStreamedRow)
-if (!key.anyNull) {
-  currentHashMatches = hashedRelation.get(key)
-  if (currentHashMatches != null) {
-currentMatchPosition = 0
-  }
+// We do filtering null keys by inserting a Filter before 
inner join,
--- End diff --

Understood.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13249][SQL] Add Filter checking nullabi...

2016-02-19 Thread viirya
Github user viirya commented on the pull request:

https://github.com/apache/spark/pull/11235#issuecomment-186142858
  
ping @davies 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13249][SQL] Add Filter checking nullabi...

2016-02-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11235#issuecomment-186116416
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51531/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13249][SQL] Add Filter checking nullabi...

2016-02-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11235#issuecomment-186116414
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13249][SQL] Add Filter checking nullabi...

2016-02-19 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11235#issuecomment-186116106
  
**[Test build #51531 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51531/consoleFull)**
 for PR 11235 at commit 
[`4f15778`](https://github.com/apache/spark/commit/4f15778f2cc26baf7b5b07313e6f50e458a400a3).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13249][SQL] Add Filter checking nullabi...

2016-02-18 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11235#issuecomment-186095529
  
**[Test build #51531 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51531/consoleFull)**
 for PR 11235 at commit 
[`4f15778`](https://github.com/apache/spark/commit/4f15778f2cc26baf7b5b07313e6f50e458a400a3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13249][SQL] Add Filter checking nullabi...

2016-02-18 Thread viirya
Github user viirya commented on the pull request:

https://github.com/apache/spark/pull/11235#issuecomment-186092279
  
@maropu yeah, I think so.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13249][SQL] Add Filter checking nullabi...

2016-02-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11235#issuecomment-185558200
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51468/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13249][SQL] Add Filter checking nullabi...

2016-02-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11235#issuecomment-185558199
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13249][SQL] Add Filter checking nullabi...

2016-02-17 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11235#issuecomment-185558006
  
**[Test build #51468 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51468/consoleFull)**
 for PR 11235 at commit 
[`0c14be5`](https://github.com/apache/spark/commit/0c14be50ad16f445b8cbaf0cd1df0eaf32a14dcd).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13249][SQL] Add Filter checking nullabi...

2016-02-17 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11235#issuecomment-185538567
  
**[Test build #51468 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51468/consoleFull)**
 for PR 11235 at commit 
[`0c14be5`](https://github.com/apache/spark/commit/0c14be50ad16f445b8cbaf0cd1df0eaf32a14dcd).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13249][SQL] Add Filter checking nullabi...

2016-02-17 Thread viirya
Github user viirya commented on the pull request:

https://github.com/apache/spark/pull/11235#issuecomment-185535356
  
retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13249][SQL] Add Filter checking nullabi...

2016-02-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11235#issuecomment-185531092
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51464/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13249][SQL] Add Filter checking nullabi...

2016-02-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11235#issuecomment-185531090
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13249][SQL] Add Filter checking nullabi...

2016-02-17 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/11235#discussion_r53267709
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashJoin.scala ---
@@ -126,11 +126,11 @@ trait HashJoin {
   while (currentHashMatches == null && streamIter.hasNext) {
 currentStreamedRow = streamIter.next()
 val key = joinKeys(currentStreamedRow)
-if (!key.anyNull) {
-  currentHashMatches = hashedRelation.get(key)
-  if (currentHashMatches != null) {
-currentMatchPosition = 0
-  }
+// We do filtering null keys by inserting a Filter before 
inner join,
--- End diff --

As null are filters by added filter, I think we should be able to skip it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13249][SQL] Add Filter checking nullabi...

2016-02-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11235#issuecomment-185161812
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51426/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13249][SQL] Add Filter checking nullabi...

2016-02-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11235#issuecomment-185161806
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13249][SQL] Add Filter checking nullabi...

2016-02-17 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11235#issuecomment-185161522
  
**[Test build #51426 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51426/consoleFull)**
 for PR 11235 at commit 
[`7766d17`](https://github.com/apache/spark/commit/7766d17d432285d8ca8cb4eb63e8e1d5717ac00d).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class AddFilterOfNullForInnerJoinSuite extends PlanTest `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13249][SQL] Add Filter checking nullabi...

2016-02-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11235#issuecomment-185154496
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51425/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13249][SQL] Add Filter checking nullabi...

2016-02-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11235#issuecomment-185154491
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13249][SQL] Add Filter checking nullabi...

2016-02-17 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11235#issuecomment-185153934
  
**[Test build #51425 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51425/consoleFull)**
 for PR 11235 at commit 
[`f8505de`](https://github.com/apache/spark/commit/f8505de0bf89f008becb05eaa2e744ae7290c1ba).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13249][SQL] Add Filter checking nullabi...

2016-02-17 Thread maropu
Github user maropu commented on the pull request:

https://github.com/apache/spark/pull/11235#issuecomment-185149697
  
ISTM we can also add `NULL` filters in the one side of left/right-outer 
joins. Is it wrong?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13249][SQL] Add Filter checking nullabi...

2016-02-17 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11235#issuecomment-185134057
  
**[Test build #51426 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51426/consoleFull)**
 for PR 11235 at commit 
[`7766d17`](https://github.com/apache/spark/commit/7766d17d432285d8ca8cb4eb63e8e1d5717ac00d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13249][SQL] Add Filter checking nullabi...

2016-02-17 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/11235#discussion_r53143407
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashJoin.scala ---
@@ -126,11 +126,11 @@ trait HashJoin {
   while (currentHashMatches == null && streamIter.hasNext) {
 currentStreamedRow = streamIter.next()
 val key = joinKeys(currentStreamedRow)
-if (!key.anyNull) {
-  currentHashMatches = hashedRelation.get(key)
-  if (currentHashMatches != null) {
-currentMatchPosition = 0
-  }
+// We do filtering null keys by inserting a Filter before 
inner join,
--- End diff --

We need assertion: !key.anyNull?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13249][SQL] Add Filter checking nullabi...

2016-02-17 Thread maropu
Github user maropu commented on a diff in the pull request:

https://github.com/apache/spark/pull/11235#discussion_r53143261
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -23,7 +23,7 @@ import 
org.apache.spark.sql.catalyst.analysis.{CleanupAliases, EliminateSubQueri
 import org.apache.spark.sql.catalyst.expressions._
 import org.apache.spark.sql.catalyst.expressions.Literal.{FalseLiteral, 
TrueLiteral}
 import org.apache.spark.sql.catalyst.expressions.aggregate._
-import 
org.apache.spark.sql.catalyst.planning.{ExtractFiltersAndInnerJoins, Unions}
+import org.apache.spark.sql.catalyst.planning._
--- End diff --

Nit: Is it better, `import 
org.apache.spark.sql.catalyst.planning.{ExtractEquiJoinKeys, 
ExtractFiltersAndInnerJoins, Unions}`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13249][SQL] Add Filter checking nullabi...

2016-02-17 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11235#issuecomment-185123129
  
**[Test build #51425 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51425/consoleFull)**
 for PR 11235 at commit 
[`f8505de`](https://github.com/apache/spark/commit/f8505de0bf89f008becb05eaa2e744ae7290c1ba).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13249][SQL] Add Filter checking nullabi...

2016-02-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11235#issuecomment-185082131
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13249][SQL] Add Filter checking nullabi...

2016-02-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11235#issuecomment-185082137
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/51414/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13249][SQL] Add Filter checking nullabi...

2016-02-16 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11235#issuecomment-185081565
  
**[Test build #51414 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51414/consoleFull)**
 for PR 11235 at commit 
[`132890b`](https://github.com/apache/spark/commit/132890bbcd013dd284dfb9d9fd48e8440343ddd5).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13249][SQL] Add Filter checking nullabi...

2016-02-16 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11235#issuecomment-185053702
  
**[Test build #51414 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/51414/consoleFull)**
 for PR 11235 at commit 
[`132890b`](https://github.com/apache/spark/commit/132890bbcd013dd284dfb9d9fd48e8440343ddd5).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-13249][SQL] Add Filter checking nullabi...

2016-02-16 Thread viirya
GitHub user viirya opened a pull request:

https://github.com/apache/spark/pull/11235

[SPARK-13249][SQL] Add Filter checking nullability of keys for inner join

JIRA: https://issues.apache.org/jira/browse/SPARK-13249

For inner join, the join key with null in it will not match each other, so 
we could insert a Filter before inner join (could be pushed down), then we 
don't need to check nullability of keys while joining.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/viirya/spark-1 add-filter-for-innerjoin

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/11235.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #11235


commit 216305fef918f46d50af7107c7ea3182ad91afcf
Author: Liang-Chi Hsieh 
Date:   2016-02-17T06:31:03Z

Add Filter checking nullability of keys for inner join.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org