[GitHub] spark issue #22518: [SPARK-25482][SQL] Avoid pushdown of subqueries to data ...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22518 thanks, merging to master! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22518: [SPARK-25482][SQL] Avoid pushdown of subqueries to data ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22518 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98779/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22518: [SPARK-25482][SQL] Avoid pushdown of subqueries to data ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22518 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22518: [SPARK-25482][SQL] Avoid pushdown of subqueries to data ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22518 **[Test build #98779 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98779/testReport)** for PR 22518 at commit [`52ae956`](https://github.com/apache/spark/commit/52ae9561e58d65f2c26a112ce78a78994e83f868). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22518: [SPARK-25482][SQL] Avoid pushdown of subqueries to data ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22518 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22518: [SPARK-25482][SQL] Avoid pushdown of subqueries to data ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22518 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98778/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22518: [SPARK-25482][SQL] Avoid pushdown of subqueries to data ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22518 **[Test build #98778 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98778/testReport)** for PR 22518 at commit [`56ed812`](https://github.com/apache/spark/commit/56ed8129d0fa045c1a28914182d79cb9fa9d6103). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22518: [SPARK-25482][SQL] Avoid pushdown of subqueries to data ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22518 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98777/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22518: [SPARK-25482][SQL] Avoid pushdown of subqueries to data ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22518 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22518: [SPARK-25482][SQL] Avoid pushdown of subqueries to data ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22518 **[Test build #98777 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98777/testReport)** for PR 22518 at commit [`b414572`](https://github.com/apache/spark/commit/b4145721a30f83563dca264f838e042b2741d645). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22518: [SPARK-25482][SQL] Avoid pushdown of subqueries to data ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22518 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98775/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22518: [SPARK-25482][SQL] Avoid pushdown of subqueries to data ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22518 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22518: [SPARK-25482][SQL] Avoid pushdown of subqueries to data ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22518 **[Test build #98775 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98775/testReport)** for PR 22518 at commit [`da3843e`](https://github.com/apache/spark/commit/da3843ed4711fb0ea6103e91d08285594dba5696). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22518: [SPARK-25482][SQL] Avoid pushdown of subqueries to data ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22518 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4988/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22518: [SPARK-25482][SQL] Avoid pushdown of subqueries to data ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22518 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22518: [SPARK-25482][SQL] Avoid pushdown of subqueries to data ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22518 **[Test build #98779 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98779/testReport)** for PR 22518 at commit [`52ae956`](https://github.com/apache/spark/commit/52ae9561e58d65f2c26a112ce78a78994e83f868). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22518: [SPARK-25482][SQL] Avoid pushdown of subqueries to data ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22518 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4987/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22518: [SPARK-25482][SQL] Avoid pushdown of subqueries to data ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22518 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22518: [SPARK-25482][SQL] Avoid pushdown of subqueries to data ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22518 **[Test build #98778 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98778/testReport)** for PR 22518 at commit [`56ed812`](https://github.com/apache/spark/commit/56ed8129d0fa045c1a28914182d79cb9fa9d6103). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22518: [SPARK-25482][SQL] Avoid pushdown of subqueries to data ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22518 **[Test build #98777 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98777/testReport)** for PR 22518 at commit [`b414572`](https://github.com/apache/spark/commit/b4145721a30f83563dca264f838e042b2741d645). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22518: [SPARK-25482][SQL] Avoid pushdown of subqueries to data ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22518 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22518: [SPARK-25482][SQL] Avoid pushdown of subqueries to data ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22518 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4986/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22518: [SPARK-25482][SQL] Avoid pushdown of subqueries to data ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22518 **[Test build #98775 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98775/testReport)** for PR 22518 at commit [`da3843e`](https://github.com/apache/spark/commit/da3843ed4711fb0ea6103e91d08285594dba5696). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22518: [SPARK-25482][SQL] Avoid pushdown of subqueries to data ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22518 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22518: [SPARK-25482][SQL] Avoid pushdown of subqueries to data ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22518 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4984/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22518: [SPARK-25482][SQL] Avoid pushdown of subqueries to data ...
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/22518 @cloud-fan this is the benchmark: ``` (1 to 100).toSeq.toDF("a").write.save("/tmp/t1") spark.read.load("/tmp/t1").createTempView("t1") (1 to 2000).toSeq.toDF("b").write.save("/tmp/t2") spark.read.load("/tmp/t2").createTempView("t2") val plan = sql("select * from t2 where b > (select avg(a + 1) from t1)") val t0 = System.nanoTime() plan.show val t1 = System.nanoTime() println("Elapsed time: " + (t1 - t0) + "ns") ``` the result is: ``` Before PR: Elapsed time: 862499689ns After PR: Elapsed time: 914728641ns ``` The difference is very small because all the subqueries run in parallel. The execution time would be much more affected if there are several subqueries (our thread pool has 16 threads so a query like that but with 9 filters with subqueries would see a big performance gain after this PR). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22518: [SPARK-25482][SQL] Avoid pushdown of subqueries to data ...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22518 BTW can you include a simple benchmark to show this problem? e.g. just run a query in spark-shell, and post the result before and after this PR. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22518: [SPARK-25482][SQL] Avoid pushdown of subqueries to data ...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/22518 I'd like to merge this simple PR first, to address the performance problem (unnecessary subquery execution). Let's create a new ticket for subquery filter pushing to data source, and have more people to attend the discussion. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22518: [SPARK-25482][SQL] Avoid pushdown of subqueries to data ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22518 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98734/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22518: [SPARK-25482][SQL] Avoid pushdown of subqueries to data ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22518 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22518: [SPARK-25482][SQL] Avoid pushdown of subqueries to data ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22518 **[Test build #98734 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98734/testReport)** for PR 22518 at commit [`ef0a953`](https://github.com/apache/spark/commit/ef0a953f0c3fb6f5ba50e51668a3f0b6938b5416). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22518: [SPARK-25482][SQL] Avoid pushdown of subqueries to data ...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22518 **[Test build #98734 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98734/testReport)** for PR 22518 at commit [`ef0a953`](https://github.com/apache/spark/commit/ef0a953f0c3fb6f5ba50e51668a3f0b6938b5416). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22518: [SPARK-25482][SQL] Avoid pushdown of subqueries to data ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22518 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22518: [SPARK-25482][SQL] Avoid pushdown of subqueries to data ...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22518 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4955/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org