[GitHub] spark pull request: [SPARK-1495][SQL]add support for left semi joi...

2014-06-09 Thread marmbrus
Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/837#issuecomment-45526760
  
Thanks!  I've merged this into 1.0 and master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1495][SQL]add support for left semi joi...

2014-06-09 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/837


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1495][SQL]add support for left semi joi...

2014-06-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/837#issuecomment-45431143
  
All automated tests passed.
Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15540/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1495][SQL]add support for left semi joi...

2014-06-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/837#issuecomment-45431142
  
Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1495][SQL]add support for left semi joi...

2014-06-07 Thread adrian-wang
Github user adrian-wang commented on a diff in the pull request:

https://github.com/apache/spark/pull/837#discussion_r13522951
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala
 ---
@@ -119,6 +119,11 @@ object HashFilteredJoin extends Logging with 
PredicateHelper {
 case FilteredOperation(predicates, join @ Join(left, right, Inner, 
condition)) =>
   logger.debug(s"Considering hash inner join on: ${predicates ++ 
condition}")
   splitPredicates(predicates ++ condition, join)
+// All predicates can be evaluated for left semi join (those that are 
in the WHERE
+// clause can only from left table, so they can all be pushed down.)
--- End diff --

Yes, I think LEFT SEMI JOIN would not suffer by pushing down predicates.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1495][SQL]add support for left semi joi...

2014-06-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/837#issuecomment-45429982
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1495][SQL]add support for left semi joi...

2014-06-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/837#issuecomment-45429745
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1495][SQL]add support for left semi joi...

2014-06-07 Thread marmbrus
Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/837#issuecomment-45418856
  
(I deleted my earlier comment because I found a mistake)

I think this is looking pretty good, but we should at least add one test 
for the Broadcast Nested Loop version.  Here's a PR against your branch that 
does that: https://github.com/adrian-wang/spark/pull/1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1495][SQL]add support for left semi joi...

2014-06-07 Thread marmbrus
Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/837#issuecomment-45418467
  
I think this is looking pretty good.  One problem is that there are no 
tests for the nested loop version.  I tried adding this to SQLQuerySuite:

```scala
  test("left semi greater than predicate") {
checkAnswer(
  sql("SELECT * FROM testData2 x JOIN testData2 y WHERE x.a >= y.a + 
2"),
  Seq((3,1), (3,2))
)
  }
```

However this points out that we need to fix the other join strategies to 
avoid matching semi joins:
```scala
[info] - left semi greater than predicate *** FAILED *** (174 milliseconds)
[info]   Results do not match for query:
...
[info] == Physical Plan ==
[info] Project [a#18:0,b#19:1,a#20:2,b#21:3]
[info]  Filter (a#18:0 >= (a#20:2 + 2))
[info]   CartesianProduct 
[info]ExistingRdd [a#18,b#19], MapPartitionsRDD[4] at mapPartitions at 
basicOperators.scala:174
[info]ExistingRdd [a#20,b#21], MapPartitionsRDD[4] at mapPartitions at 
basicOperators.scala:174
[info] 
[info] == Results ==
[info] !== Correct Answer - 2 ==   == Spark Answer - 4 ==
[info] !Vector(3, 1)   [3,1,1,1]
[info] !Vector(3, 2)   [3,1,1,2]
[info] !   [3,2,1,1]
[info] !   [3,2,1,2] (QueryTest.scala:54)
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1495][SQL]add support for left semi joi...

2014-06-07 Thread marmbrus
Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/837#discussion_r13520365
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala
 ---
@@ -119,6 +119,11 @@ object HashFilteredJoin extends Logging with 
PredicateHelper {
 case FilteredOperation(predicates, join @ Join(left, right, Inner, 
condition)) =>
   logger.debug(s"Considering hash inner join on: ${predicates ++ 
condition}")
   splitPredicates(predicates ++ condition, join)
+// All predicates can be evaluated for left semi join (those that are 
in the WHERE
+// clause can only from left table, so they can all be pushed down.)
--- End diff --

I think in general we should avoid making too many assumptions in the 
planner about what optimizations have occurred.  For example, in the future we 
might avoid pushing down predicates that are very expensive to evaluate as it 
might be cheaper to run them on an already filtered set of data.  However, in 
the case of LEFT SEMI JOIN, I think it is actually okay to push all evaluation 
into the join condition, even if they only refer to the left table.  Is that 
true?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1495][SQL]add support for left semi joi...

2014-06-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/837#issuecomment-44927038
  
All automated tests passed.
Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15374/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1495][SQL]add support for left semi joi...

2014-06-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/837#issuecomment-44927036
  
Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1495][SQL]add support for left semi joi...

2014-06-02 Thread adrian-wang
Github user adrian-wang commented on the pull request:

https://github.com/apache/spark/pull/837#issuecomment-44923039
  
Hi Michael, when I was adding the Scala doc, I realized that if the join is 
not calculated in LeftSemiJoinHash, then it simply means there's no join keys 
for the left semi join. Then if I pushed down those predicates and 
conditions(all of them can be pushed down), I only need to verify the if right 
table size is null here, to decide whether to output the left table. So 
LeftSemiJoinBNL would be very much excessive. Am I right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1495][SQL]add support for left semi joi...

2014-06-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/837#issuecomment-44922845
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-1495][SQL]add support for left semi joi...

2014-06-02 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/837#issuecomment-44922835
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---