[GitHub] spark issue #22518: [SPARK-25482][SQL] Avoid pushdown of subqueries to data ...

2018-11-13 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/22518
  
thanks, merging to master!


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22518: [SPARK-25482][SQL] Avoid pushdown of subqueries to data ...

2018-11-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22518
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98779/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22518: [SPARK-25482][SQL] Avoid pushdown of subqueries to data ...

2018-11-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22518
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22518: [SPARK-25482][SQL] Avoid pushdown of subqueries to data ...

2018-11-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22518
  
**[Test build #98779 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98779/testReport)**
 for PR 22518 at commit 
[`52ae956`](https://github.com/apache/spark/commit/52ae9561e58d65f2c26a112ce78a78994e83f868).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22518: [SPARK-25482][SQL] Avoid pushdown of subqueries to data ...

2018-11-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22518
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22518: [SPARK-25482][SQL] Avoid pushdown of subqueries to data ...

2018-11-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22518
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98778/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22518: [SPARK-25482][SQL] Avoid pushdown of subqueries to data ...

2018-11-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22518
  
**[Test build #98778 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98778/testReport)**
 for PR 22518 at commit 
[`56ed812`](https://github.com/apache/spark/commit/56ed8129d0fa045c1a28914182d79cb9fa9d6103).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22518: [SPARK-25482][SQL] Avoid pushdown of subqueries to data ...

2018-11-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22518
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98777/
Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22518: [SPARK-25482][SQL] Avoid pushdown of subqueries to data ...

2018-11-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22518
  
Merged build finished. Test FAILed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22518: [SPARK-25482][SQL] Avoid pushdown of subqueries to data ...

2018-11-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22518
  
**[Test build #98777 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98777/testReport)**
 for PR 22518 at commit 
[`b414572`](https://github.com/apache/spark/commit/b4145721a30f83563dca264f838e042b2741d645).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22518: [SPARK-25482][SQL] Avoid pushdown of subqueries to data ...

2018-11-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22518
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98775/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22518: [SPARK-25482][SQL] Avoid pushdown of subqueries to data ...

2018-11-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22518
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22518: [SPARK-25482][SQL] Avoid pushdown of subqueries to data ...

2018-11-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22518
  
**[Test build #98775 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98775/testReport)**
 for PR 22518 at commit 
[`da3843e`](https://github.com/apache/spark/commit/da3843ed4711fb0ea6103e91d08285594dba5696).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22518: [SPARK-25482][SQL] Avoid pushdown of subqueries to data ...

2018-11-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22518
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4988/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22518: [SPARK-25482][SQL] Avoid pushdown of subqueries to data ...

2018-11-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22518
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22518: [SPARK-25482][SQL] Avoid pushdown of subqueries to data ...

2018-11-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22518
  
**[Test build #98779 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98779/testReport)**
 for PR 22518 at commit 
[`52ae956`](https://github.com/apache/spark/commit/52ae9561e58d65f2c26a112ce78a78994e83f868).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22518: [SPARK-25482][SQL] Avoid pushdown of subqueries to data ...

2018-11-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22518
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4987/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22518: [SPARK-25482][SQL] Avoid pushdown of subqueries to data ...

2018-11-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22518
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22518: [SPARK-25482][SQL] Avoid pushdown of subqueries to data ...

2018-11-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22518
  
**[Test build #98778 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98778/testReport)**
 for PR 22518 at commit 
[`56ed812`](https://github.com/apache/spark/commit/56ed8129d0fa045c1a28914182d79cb9fa9d6103).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22518: [SPARK-25482][SQL] Avoid pushdown of subqueries to data ...

2018-11-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22518
  
**[Test build #98777 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98777/testReport)**
 for PR 22518 at commit 
[`b414572`](https://github.com/apache/spark/commit/b4145721a30f83563dca264f838e042b2741d645).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22518: [SPARK-25482][SQL] Avoid pushdown of subqueries to data ...

2018-11-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22518
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22518: [SPARK-25482][SQL] Avoid pushdown of subqueries to data ...

2018-11-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22518
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4986/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22518: [SPARK-25482][SQL] Avoid pushdown of subqueries to data ...

2018-11-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22518
  
**[Test build #98775 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98775/testReport)**
 for PR 22518 at commit 
[`da3843e`](https://github.com/apache/spark/commit/da3843ed4711fb0ea6103e91d08285594dba5696).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22518: [SPARK-25482][SQL] Avoid pushdown of subqueries to data ...

2018-11-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22518
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22518: [SPARK-25482][SQL] Avoid pushdown of subqueries to data ...

2018-11-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22518
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4984/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22518: [SPARK-25482][SQL] Avoid pushdown of subqueries to data ...

2018-11-13 Thread mgaido91
Github user mgaido91 commented on the issue:

https://github.com/apache/spark/pull/22518
  
@cloud-fan this is the benchmark:
```
(1 to 100).toSeq.toDF("a").write.save("/tmp/t1")
spark.read.load("/tmp/t1").createTempView("t1")
(1 to 2000).toSeq.toDF("b").write.save("/tmp/t2")
spark.read.load("/tmp/t2").createTempView("t2")
val plan = sql("select * from t2 where b > (select avg(a + 1) from t1)")
val t0 = System.nanoTime()
plan.show
val t1 = System.nanoTime()
println("Elapsed time: " + (t1 - t0) + "ns")
```

the result is:

```
Before PR: Elapsed time: 862499689ns
After  PR: Elapsed time: 914728641ns
```
The difference is very small because all the subqueries run in parallel. 
The execution time would be much more affected if there are several subqueries 
(our thread pool has 16 threads so a query like that but with 9 filters with 
subqueries would see a big performance gain after this PR).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22518: [SPARK-25482][SQL] Avoid pushdown of subqueries to data ...

2018-11-12 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/22518
  
BTW can you include a simple benchmark to show this problem? e.g. just run 
a query in spark-shell, and post the result before and after this PR.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22518: [SPARK-25482][SQL] Avoid pushdown of subqueries to data ...

2018-11-12 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/22518
  
I'd like to merge this simple PR first, to address the performance problem 
(unnecessary subquery execution).

Let's create a new ticket for subquery filter pushing to data source, and 
have more people to attend the discussion.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22518: [SPARK-25482][SQL] Avoid pushdown of subqueries to data ...

2018-11-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22518
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98734/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22518: [SPARK-25482][SQL] Avoid pushdown of subqueries to data ...

2018-11-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22518
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22518: [SPARK-25482][SQL] Avoid pushdown of subqueries to data ...

2018-11-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22518
  
**[Test build #98734 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98734/testReport)**
 for PR 22518 at commit 
[`ef0a953`](https://github.com/apache/spark/commit/ef0a953f0c3fb6f5ba50e51668a3f0b6938b5416).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22518: [SPARK-25482][SQL] Avoid pushdown of subqueries to data ...

2018-11-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/22518
  
**[Test build #98734 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98734/testReport)**
 for PR 22518 at commit 
[`ef0a953`](https://github.com/apache/spark/commit/ef0a953f0c3fb6f5ba50e51668a3f0b6938b5416).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22518: [SPARK-25482][SQL] Avoid pushdown of subqueries to data ...

2018-11-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22518
  
Merged build finished. Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #22518: [SPARK-25482][SQL] Avoid pushdown of subqueries to data ...

2018-11-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/22518
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4955/
Test PASSed.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org