[GitHub] spark pull request: [SPARK-10539][SQL]Project should not be pushed...

2015-09-20 Thread yjshen
Github user yjshen closed the pull request at:

https://github.com/apache/spark/pull/8742


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10539][SQL]Project should not be pushed...

2015-09-20 Thread yjshen
Github user yjshen commented on the pull request:

https://github.com/apache/spark/pull/8742#issuecomment-141873047
  
Thanks @yhuai, I'll close this one.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10539][SQL]Project should not be pushed...

2015-09-18 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/8823


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10539][SQL]Project should not be pushed...

2015-09-18 Thread yhuai
Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/8823#issuecomment-141555821
  
Merging to master and branch 1.5. Thanks @yjshen 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10539][SQL]Project should not be pushed...

2015-09-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8823#issuecomment-14163
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10539][SQL]Project should not be pushed...

2015-09-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8823#issuecomment-14166
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42675/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10539][SQL]Project should not be pushed...

2015-09-18 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8823#issuecomment-141555402
  
  [Test build #42675 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42675/console)
 for   PR 8823 at commit 
[`1f56d2e`](https://github.com/apache/spark/commit/1f56d2e626ed75d4ce96ffd70929243eed4c8143).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10539][SQL]Project should not be pushed...

2015-09-18 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8823#issuecomment-141525663
  
  [Test build #42675 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42675/consoleFull)
 for   PR 8823 at commit 
[`1f56d2e`](https://github.com/apache/spark/commit/1f56d2e626ed75d4ce96ffd70929243eed4c8143).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10539][SQL]Project should not be pushed...

2015-09-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8823#issuecomment-141524234
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10539][SQL]Project should not be pushed...

2015-09-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8823#issuecomment-141524170
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10539][SQL]Project should not be pushed...

2015-09-18 Thread yhuai
Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/8823#issuecomment-141524014
  
The fix is good. I only added comments. Will merge it to both master and 
branch 1.5 once it passes jenkins.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10539][SQL]Project should not be pushed...

2015-09-18 Thread yhuai
Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/8742#issuecomment-141523674
  
@yjshen I added the comments and create a new PR 
(https://github.com/apache/spark/pull/8823). Can you close this one?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10539][SQL]Project should not be pushed...

2015-09-18 Thread yhuai
GitHub user yhuai opened a pull request:

https://github.com/apache/spark/pull/8823

[SPARK-10539][SQL]Project should not be pushed down through Intersect or 
Except #8742

Intersect and Except are both set operators and they use the all the 
columns to compare equality between rows. When pushing their Project parent 
down, the relations they based on would change, therefore not an equivalent 
transformation.

JIRA: https://issues.apache.org/jira/browse/SPARK-10539

I added some comments based on the fix of 
https://github.com/apache/spark/pull/8742.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/yhuai/spark fix_set_optimization

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/8823.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #8823


commit a9aa64388bae6afffecb27c432d9cf3de77aeba0
Author: Yijie Shen 
Date:   2015-09-14T09:05:38Z

fix set optimization by eliminate empty project push down

commit f9a3b70c7f4c73c7f0572c31a90bb7f7b4698ef7
Author: Yijie Shen 
Date:   2015-09-14T16:30:38Z

Project should not be pushed down through Intersect or Except

commit 1f56d2e626ed75d4ce96ffd70929243eed4c8143
Author: Yin Huai 
Date:   2015-09-18T18:13:05Z

Add comments.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10539][SQL]Project should not be pushed...

2015-09-17 Thread yhuai
Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/8742#issuecomment-141297061
  
@yjshen The fix is good. Can you address comments?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10539][SQL]Project should not be pushed...

2015-09-17 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/8742#discussion_r39815622
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -136,26 +136,12 @@ object SetOperationPushDown extends Rule[LogicalPlan] 
{
 Filter(condition, left),
 Filter(pushToRight(condition, rewrites), right))
 
-// Push down projection into intersect
-case Project(projectList, i @ Intersect(left, right)) =>
-  val rewrites = buildRewrites(i)
-  Intersect(
-Project(projectList, left),
-Project(projectList.map(pushToRight(_, rewrites)), right))
-
 // Push down filter into except
 case Filter(condition, e @ Except(left, right)) =>
   val rewrites = buildRewrites(e)
   Except(
 Filter(condition, left),
 Filter(pushToRight(condition, rewrites), right))
-
-// Push down projection into except
-case Project(projectList, e @ Except(left, right)) =>
-  val rewrites = buildRewrites(e)
-  Except(
-Project(projectList, left),
-Project(projectList.map(pushToRight(_, rewrites)), right))
--- End diff --

Can you add comments at here the reason that we cannot pushdown projections 
and why we can pushdown filters?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10539][SQL]Project should not be pushed...

2015-09-16 Thread marmbrus
Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/8742#issuecomment-140935988
  
ping :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10539][SQL]Project should not be pushed...

2015-09-16 Thread marmbrus
Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/8742#discussion_r39702841
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala 
---
@@ -907,4 +907,13 @@ class DataFrameSuite extends QueryTest with 
SharedSQLContext {
   assert(row.getDouble(1) - row.getDouble(3) === 0.0 +- 0.001)
 }
   }
+
+  test("SPARK-10539: Project should not be pushed down through Intersect 
or Except") {
+val df1 = (1 to 100).map(Tuple1.apply).toDF("i")
--- End diff --

nit: `sqlContext.range(100)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10539][SQL]Project should not be pushed...

2015-09-14 Thread yjshen
Github user yjshen commented on a diff in the pull request:

https://github.com/apache/spark/pull/8742#discussion_r39478700
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -136,26 +136,12 @@ object SetOperationPushDown extends Rule[LogicalPlan] 
{
 Filter(condition, left),
 Filter(pushToRight(condition, rewrites), right))
 
-// Push down projection into intersect
-case Project(projectList, i @ Intersect(left, right)) =>
-  val rewrites = buildRewrites(i)
-  Intersect(
-Project(projectList, left),
-Project(projectList.map(pushToRight(_, rewrites)), right))
-
 // Push down filter into except
 case Filter(condition, e @ Except(left, right)) =>
   val rewrites = buildRewrites(e)
   Except(
 Filter(condition, left),
 Filter(pushToRight(condition, rewrites), right))
-
-// Push down projection into except
-case Project(projectList, e @ Except(left, right)) =>
-  val rewrites = buildRewrites(e)
-  Except(
-Project(projectList, left),
-Project(projectList.map(pushToRight(_, rewrites)), right))
--- End diff --

@yhuai, thanks for your comment. I didn't consider non-deterministic 
filters' effect on push down when I was doing this, I will think about it and 
make comments soon.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10539][SQL]Project should not be pushed...

2015-09-14 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/8742#discussion_r39474132
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
 ---
@@ -136,26 +136,12 @@ object SetOperationPushDown extends Rule[LogicalPlan] 
{
 Filter(condition, left),
 Filter(pushToRight(condition, rewrites), right))
 
-// Push down projection into intersect
-case Project(projectList, i @ Intersect(left, right)) =>
-  val rewrites = buildRewrites(i)
-  Intersect(
-Project(projectList, left),
-Project(projectList.map(pushToRight(_, rewrites)), right))
-
 // Push down filter into except
 case Filter(condition, e @ Except(left, right)) =>
   val rewrites = buildRewrites(e)
   Except(
 Filter(condition, left),
 Filter(pushToRight(condition, rewrites), right))
-
-// Push down projection into except
-case Project(projectList, e @ Except(left, right)) =>
-  val rewrites = buildRewrites(e)
-  Except(
-Project(projectList, left),
-Project(projectList.map(pushToRight(_, rewrites)), right))
--- End diff --

Can we add comments in this class to explain why we cannot pushdown 
projections? For filter pushdown, if the condition has non-deterministic 
expressions, it is not safe to pushdown filters for some cases. But, it will 
not be the case because of https://github.com/apache/spark/pull/7446. But, it 
is still good to think about if there is any case that filter pushdown is not 
safe. If we determine it is safe to do filter pushdown, let's add comments to 
explain the reason.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10539][SQL]Project should not be pushed...

2015-09-14 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/8742#issuecomment-140184801
  
cc @yhuai for review.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10539][SQL]Project should not be pushed...

2015-09-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8742#issuecomment-140165971
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10539][SQL]Project should not be pushed...

2015-09-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8742#issuecomment-140165974
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42432/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10539][SQL]Project should not be pushed...

2015-09-14 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8742#issuecomment-140165825
  
  [Test build #42432 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42432/console)
 for   PR 8742 at commit 
[`ce6ed80`](https://github.com/apache/spark/commit/ce6ed80f1d3b7138664010a415a8501ea68dcd28).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-10539][SQL]Project should not be pushed...

2015-09-14 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8742#issuecomment-140136499
  
  [Test build #42432 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42432/consoleFull)
 for   PR 8742 at commit 
[`ce6ed80`](https://github.com/apache/spark/commit/ce6ed80f1d3b7138664010a415a8501ea68dcd28).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org