[GitHub] spark issue #20265: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...

2018-01-17 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/20265 Thank you so much, @cloud-fan and @gatorsmile ! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For

[GitHub] spark issue #20265: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...

2018-01-17 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20265 thanks, merging to master/2.3! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands,

[GitHub] spark issue #20265: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...

2018-01-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20265 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86257/ Test PASSed. ---

[GitHub] spark issue #20265: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...

2018-01-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20265 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20265: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...

2018-01-17 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20265 **[Test build #86257 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86257/testReport)** for PR 20265 at commit

[GitHub] spark issue #20265: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...

2018-01-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20265 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86251/ Test PASSed. ---

[GitHub] spark issue #20265: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...

2018-01-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20265 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20265: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...

2018-01-17 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20265 **[Test build #86251 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86251/testReport)** for PR 20265 at commit

[GitHub] spark issue #20265: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...

2018-01-17 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/20265 There might be many questions about ORC (or Parquet) performance benchmarks. We can do that later. We cannot enumerate all cases. Also, users can do that for their own workload. In fact,

[GitHub] spark issue #20265: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...

2018-01-17 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20265 **[Test build #86257 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86257/testReport)** for PR 20265 at commit

[GitHub] spark issue #20265: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...

2018-01-17 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/20265 @gatorsmile . The number of rows are also changed. Why do you think so? --- - To unsubscribe, e-mail:

[GitHub] spark issue #20265: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...

2018-01-17 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/20265 ORC performs further better when the number of columns is small. Maybe also add test cases back to show this observations? ---

[GitHub] spark issue #20265: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...

2018-01-17 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20265 **[Test build #86251 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86251/testReport)** for PR 20265 at commit

[GitHub] spark issue #20265: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...

2018-01-16 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20265 LGTM except one comment. Let's worry about row group/stripe size later, since both parquet and orc use default settings, I think it's still fair. ---

[GitHub] spark issue #20265: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...

2018-01-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20265 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86200/ Test PASSed. ---

[GitHub] spark issue #20265: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...

2018-01-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20265 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20265: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...

2018-01-16 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20265 **[Test build #86200 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86200/testReport)** for PR 20265 at commit

[GitHub] spark issue #20265: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...

2018-01-16 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20265 **[Test build #86200 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86200/testReport)** for PR 20265 at commit

[GitHub] spark issue #20265: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...

2018-01-16 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/20265 I updated the PR (except one RowGroupSize/OrcStripeSize part). --- - To unsubscribe, e-mail:

[GitHub] spark issue #20265: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...

2018-01-15 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/20265 I'll update the PR tomorrow. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands,

[GitHub] spark issue #20265: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...

2018-01-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20265 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86143/ Test PASSed. ---

[GitHub] spark issue #20265: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...

2018-01-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20265 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20265: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...

2018-01-15 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20265 **[Test build #86143 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86143/testReport)** for PR 20265 at commit

[GitHub] spark issue #20265: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...

2018-01-15 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20265 I think we need to make sure parquet row group size and orc strip size is same, to make this benchmark fair. --- - To

[GitHub] spark issue #20265: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...

2018-01-15 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20265 **[Test build #86143 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86143/testReport)** for PR 20265 at commit

[GitHub] spark issue #20265: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...

2018-01-15 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/20265 Hi, @cloud-fan and @gatorsmile . Your questions are valid for all PPD cases. According to the comments, I added the following expressions (positive and negative) for both ORC/Parquet.

[GitHub] spark issue #20265: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...

2018-01-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20265 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20265: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...

2018-01-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20265 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86116/ Test PASSed. ---

[GitHub] spark issue #20265: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...

2018-01-14 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20265 **[Test build #86116 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86116/testReport)** for PR 20265 at commit

[GitHub] spark issue #20265: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...

2018-01-14 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20265 **[Test build #86116 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86116/testReport)** for PR 20265 at commit

[GitHub] spark issue #20265: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...

2018-01-14 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/20265 cc @cloud-fan , @gatorsmile . --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands,