[GitHub] spark issue #18991: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...

2018-01-14 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/18991 I see. I opened a new PR, #20265. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #18991: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...

2018-01-13 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18991 Yes. Please work on the perf tests and the benchmark test suite. I think the priority of this PR is much higher than the test-only PR you are working on ---

[GitHub] spark issue #18991: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...

2018-01-12 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/18991 conventionally rc1 would fail so we still have time :) --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For

[GitHub] spark issue #18991: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...

2018-01-12 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/18991 Oh, do we have time for 2.3? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands,

[GitHub] spark issue #18991: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...

2018-01-12 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/18991 I think we still have time for 2.3? I'm not worried about correctness, but we should show people how much it improves. --- -

[GitHub] spark issue #18991: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...

2018-01-11 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/18991 Ur, it's not record-level filtering. Maybe, it's because I explained it too abstractly [here](https://github.com/apache/spark/pull/19943#discussion_r160251456 ). It's stripe-level. So,

[GitHub] spark issue #18991: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...

2018-01-11 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/18991 Yea let's add some, I'm curious to see how well PPD works in ORC, since for parquet PPD doesn't work well and we disable record level filtering for parquet. ---

[GitHub] spark issue #18991: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...

2018-01-10 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/18991 @gatorsmile . I don't have any numbers for PPD=false. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark issue #18991: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...

2018-01-10 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18991 What is the performance number when turning it on, compared with the off mode? --- - To unsubscribe, e-mail:

[GitHub] spark issue #18991: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...

2018-01-10 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/18991 Yes, @cloud-fan . I added the same test coverage for ORC in Apache Spark. Sorry, @gatorsmile . I always turned on PPD, so there is no perf number for PPD=false. ---

[GitHub] spark issue #18991: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...

2018-01-09 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/18991 I'd expect orc has same test coverage as parquet, is it true? --- - To unsubscribe, e-mail:

[GitHub] spark issue #18991: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...

2018-01-09 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18991 Any perf number ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #18991: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...

2018-01-09 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18991 I expect we can port more test cases to Spark, instead of relying on the external data sources. --- - To unsubscribe,

[GitHub] spark issue #18991: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...

2017-09-06 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/18991 Maybe, this seems not a scope on Apache Spark 2.3.0 because it's a debut of Apache ORC 1.4.0. I'll close this PR. Thank you all for giving advice on this PR. ---

[GitHub] spark issue #18991: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...

2017-09-05 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/18991 Of course, I want to proceed in any part of ORC! As you know, I tried many trials to get a chance to be reviewed. Some PR gets it, but the other ORC PR like #18953 didn't get a feedback

[GitHub] spark issue #18991: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...

2017-09-05 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18991 If you want to hold, we can wait for the completion of data source API v2. Otherwise, we can start it now and change it if needed. Conceptually, the test coverage improvement should be

[GitHub] spark issue #18991: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...

2017-09-05 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/18991 Is the plan aligned with the ongoing [Data Source V2](#19136) ? --- - To unsubscribe, e-mail:

[GitHub] spark issue #18991: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...

2017-09-05 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18991 To avoid duplicating the efforts, we should have a unified testing framework for covering the PPD of all the sources. Parquet and ORC should be part of it. In the future, when we add the other

[GitHub] spark issue #18991: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...

2017-09-05 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/18991 To be more clear, I mean the existing Parquet coverage in Apache Spark code base. --- - To unsubscribe, e-mail:

[GitHub] spark issue #18991: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...

2017-09-05 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/18991 The test case coverage parity between Parquet and ORC should be the criteria for this, right? --- - To unsubscribe,

[GitHub] spark issue #18991: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...

2017-09-05 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18991 Left a few comments in the another PR: https://github.com/apache/spark/pull/19060#discussion_r137060959. I think it is a right time to improve the test case coverage before turning

[GitHub] spark issue #18991: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...

2017-09-05 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/18991 Hi, @liancheng , @gatorsmile , @cloud-fan , @rxin , and @omalley . I think you are the best people about ORC predicate pushdown issue. Could you review this PR to turn on ORC PPD by

[GitHub] spark issue #18991: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...

2017-09-01 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/18991 Hi, @gatorsmile , @cloud-fan, @rxin , and @omalley . #19060 shows that the behavior of Apache ORC 1.4.0 predicate push-down is correct. #19060 will add more test cases for data source

[GitHub] spark issue #18991: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...

2017-08-27 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18991 orc_create.q and orc_people_create.txt are from Hive. Writing test cases is pretty time consuming. I still hope we can get the test cases from the other open source projects. Instead,

[GitHub] spark issue #18991: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...

2017-08-26 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/18991 Hi, @gatorsmile . I made #19060 to tap on the direction. Could you review that? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as

[GitHub] spark issue #18991: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...

2017-08-26 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/18991 @gatorsmile . [orc_create.q](https://github.com/apache/spark/blob/master/sql/hive/src/test/resources/ql/src/test/queries/clientpositive/orc_create.q) and

[GitHub] spark issue #18991: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...

2017-08-25 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/18991 +1, I cannot agree anymore. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #18991: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...

2017-08-25 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18991 Yes. The commercial DBMS products have a very good/comprehensive test coverage. So far, it is missing in Apache Spark. Basically, we simply trust the underlying data sources, which are

[GitHub] spark issue #18991: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...

2017-08-25 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/18991 Wow. It's real commercial spec. Thank you! I understand. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark issue #18991: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...

2017-08-25 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18991 Since I saw you are also working on the enhancement of ORC reader/writer, we need to check all the limits (value ranges). I am not sure how good Apache ORC/Parquet did in their test case design.

[GitHub] spark issue #18991: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...

2017-08-25 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/18991 Thank you for the comments and directions. Definitely, I'll try! Since we depends on Apache Spark 1.4.0, I think I can add raw level test case somewhere for evaluation purpose only. ---

[GitHub] spark issue #18991: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...

2017-08-25 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18991 If ORC incorrectly filters out the extra rows, we might get incorrect results. In addition, we do not know whether the push down could get the performance gain. We saw the performance regression

[GitHub] spark issue #18991: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...

2017-08-25 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/18991 Hi, @gatorsmile . Could you review this ORC PPD default configuration? Our data source doesn't trust any data sources including Parquet/ORC. I think ORC PPD do no harm on Spark. --- If

[GitHub] spark issue #18991: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...

2017-08-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18991 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #18991: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...

2017-08-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18991 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81114/ Test PASSed. ---

[GitHub] spark issue #18991: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...

2017-08-24 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18991 **[Test build #81114 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81114/testReport)** for PR 18991 at commit

[GitHub] spark issue #18991: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...

2017-08-24 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/18991 Hi, @cloud-fan, @gatorsmile , @sameeragarwal , @rxin , @mridulm . Could you reivew this one liner PR about ORC PPD configuration? --- If your project is set up for it, you can reply to

[GitHub] spark issue #18991: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...

2017-08-24 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18991 **[Test build #81114 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81114/testReport)** for PR 18991 at commit

[GitHub] spark issue #18991: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...

2017-08-24 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/18991 Retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark issue #18991: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...

2017-08-23 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/18991 Hi, @cloud-fan, @gatorsmile , @sameeragarwal , @rxin , @mridulm . Could you reivew this ORC predicate pushdown configuration PR when you have sometime? Thank you in advance! --- If

[GitHub] spark issue #18991: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...

2017-08-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18991 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80934/ Test PASSed. ---

[GitHub] spark issue #18991: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...

2017-08-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18991 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #18991: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...

2017-08-21 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18991 **[Test build #80934 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80934/testReport)** for PR 18991 at commit

[GitHub] spark issue #18991: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...

2017-08-21 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18991 **[Test build #80934 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80934/testReport)** for PR 18991 at commit

[GitHub] spark issue #18991: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...

2017-08-21 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/18991 Retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark issue #18991: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...

2017-08-21 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/18991 Hi, @cloud-fan, @gatorsmile , @sameeragarwal , @rxin , @mridulm . Could you review this? --- If your project is set up for it, you can reply to this email and have your reply appear on

[GitHub] spark issue #18991: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...

2017-08-18 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/18991 Hi, @cloud-fan, @gatorsmile , @sameeragarwal , @rxin . Could you review this? This will help our ORC transition much more for next 3 months. If you don't want this, you can turn off

[GitHub] spark issue #18991: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...

2017-08-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18991 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #18991: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...

2017-08-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18991 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80834/ Test PASSed. ---

[GitHub] spark issue #18991: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...

2017-08-18 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18991 **[Test build #80834 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80834/testReport)** for PR 18991 at commit

[GitHub] spark issue #18991: [SPARK-21783][SQL] Turn on ORC filter push-down by defau...

2017-08-18 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18991 **[Test build #80834 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80834/testReport)** for PR 18991 at commit