[GitHub] spark issue #20265: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/20265 Thank you so much, @cloud-fan and @gatorsmile ! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20265: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20265 thanks, merging to master/2.3! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20265: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20265 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86257/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20265: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20265 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20265: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20265 **[Test build #86257 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86257/testReport)** for PR 20265 at commit [`eb7035d`](https://github.com/apache/spark/commit/eb7035defe225c53ac8e43d63d6e3e6a974f4b1c). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20265: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20265 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86251/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20265: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20265 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20265: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20265 **[Test build #86251 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86251/testReport)** for PR 20265 at commit [`a556169`](https://github.com/apache/spark/commit/a5561697314527938a0cff085be33b215a746c4a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20265: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/20265 There might be many questions about ORC (or Parquet) performance benchmarks. We can do that later. We cannot enumerate all cases. Also, users can do that for their own workload. In fact, Apache Spark didn't show this kind of benchmark when it turns on PPD for Parquet. If there is a benchmark for Parquet, this PR will be a piece of cake. I think this PR is enough to show the benefit of ORC PPD for enabling the config true. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20265: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20265 **[Test build #86257 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86257/testReport)** for PR 20265 at commit [`eb7035d`](https://github.com/apache/spark/commit/eb7035defe225c53ac8e43d63d6e3e6a974f4b1c). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20265: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/20265 @gatorsmile . The number of rows are also changed. Why do you think so? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20265: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/20265 ORC performs further better when the number of columns is small. Maybe also add test cases back to show this observations? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20265: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20265 **[Test build #86251 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86251/testReport)** for PR 20265 at commit [`a556169`](https://github.com/apache/spark/commit/a5561697314527938a0cff085be33b215a746c4a). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20265: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20265 LGTM except one comment. Let's worry about row group/stripe size later, since both parquet and orc use default settings, I think it's still fair. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20265: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20265 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86200/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20265: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20265 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20265: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20265 **[Test build #86200 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86200/testReport)** for PR 20265 at commit [`87af693`](https://github.com/apache/spark/commit/87af693a82f9591a256c55a5eca65041f330a225). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20265: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20265 **[Test build #86200 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86200/testReport)** for PR 20265 at commit [`87af693`](https://github.com/apache/spark/commit/87af693a82f9591a256c55a5eca65041f330a225). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20265: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/20265 I updated the PR (except one RowGroupSize/OrcStripeSize part). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20265: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/20265 I'll update the PR tomorrow. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20265: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20265 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86143/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20265: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20265 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20265: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20265 **[Test build #86143 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86143/testReport)** for PR 20265 at commit [`440f76b`](https://github.com/apache/spark/commit/440f76bdbf4d720a361e0afde3599027ff6e7be2). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20265: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20265 I think we need to make sure parquet row group size and orc strip size is same, to make this benchmark fair. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20265: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20265 **[Test build #86143 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86143/testReport)** for PR 20265 at commit [`440f76b`](https://github.com/apache/spark/commit/440f76bdbf4d720a361e0afde3599027ff6e7be2). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20265: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/20265 Hi, @cloud-fan and @gatorsmile . Your questions are valid for all PPD cases. According to the comments, I added the following expressions (positive and negative) for both ORC/Parquet. ``` +// Positive cases: Select one or no rows +Seq("id = 0", "id == 0", "id <= 0", "id < 1", "id IS NULL").foreach { expr => + filterPushDownBenchmark(1024 * 1024 * 1, 20, expr) +} + +// Negative cases: Select all rows which means the predicate is always true. +Seq("id > -1", "id != -1", "id IS NOT NULL").foreach { expr => + filterPushDownBenchmark(1024 * 1024 * 1, 20, expr) +} ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20265: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20265 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20265: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20265 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86116/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20265: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20265 **[Test build #86116 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86116/testReport)** for PR 20265 at commit [`dda5bdf`](https://github.com/apache/spark/commit/dda5bdf6865018613eeb98c6acbdd39ab2459c87). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20265: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20265 **[Test build #86116 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86116/testReport)** for PR 20265 at commit [`dda5bdf`](https://github.com/apache/spark/commit/dda5bdf6865018613eeb98c6acbdd39ab2459c87). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20265: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/20265 cc @cloud-fan , @gatorsmile . --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org