[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0

2017-08-16 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/18640 Thank you, @gatorsmile !!! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0

2017-08-16 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18640 Thanks! Merging to master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0

2017-08-15 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/18640 Thank you so much, @rxin , @cloud-fan , @sameeragarwal , @mridulm , @viirya ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well.

[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0

2017-08-15 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18640 lgtm --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0

2017-08-15 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/18640 Hi, @cloud-fan , @rxin , @sameeragarwal and @mridulm . Could you merge this PR? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as

[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0

2017-08-15 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/18640 Thank you again, @viirya . --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0

2017-08-15 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18640 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the

[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0

2017-08-15 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/18640 LGTM besides some minor questions, @rxin any more comments on this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0

2017-08-14 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/18640 Hi, @sameeragarwal and @mridulm . I cannot see any clear reason for the objection here. Also, there is a positive feedback from @ash211 in the dev@spark, too. This PR will bring an

[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0

2017-08-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18640 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80576/ Test PASSed. ---

[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0

2017-08-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18640 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0

2017-08-12 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18640 **[Test build #80576 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80576/testReport)** for PR 18640 at commit

[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0

2017-08-12 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18640 **[Test build #80576 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80576/testReport)** for PR 18640 at commit

[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0

2017-08-12 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/18640 Retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0

2017-08-12 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/18640 Hi, @mridulm, @sameeragarwal , and @rxin . Please let me know if there is something for me to do here. --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0

2017-08-10 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/18640 Thank you so much, @sameeragarwal . --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0

2017-08-10 Thread sameeragarwal
Github user sameeragarwal commented on the issue: https://github.com/apache/spark/pull/18640 LGTM; unless @rxin still has some strong objections? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0

2017-08-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18640 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80466/ Test PASSed. ---

[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0

2017-08-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18640 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0

2017-08-09 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18640 **[Test build #80466 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80466/testReport)** for PR 18640 at commit

[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0

2017-08-09 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18640 **[Test build #80466 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80466/testReport)** for PR 18640 at commit

[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0

2017-08-09 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/18640 Retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0

2017-08-09 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/18640 @rxin . Could you make some decision for this PR? Do we need to put this into `sql/hive` still for some reasons? --- If your project is set up for it, you can reply to this email and have

[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0

2017-08-08 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/18640 Sure. Thank you so much, @omalley ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0

2017-08-08 Thread omalley
Github user omalley commented on the issue: https://github.com/apache/spark/pull/18640 I would also comment that in the long term, Spark should move to using the vectorized reader in ORC's core. That would remove the dependence on ORC's mapreduce module, which provides row by row

[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0

2017-08-08 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/18640 Thank you again for coming and reviewing this PR, @rxin , @kiszk , @mridulm , @omalley . So far, we discussed the followings. 1. `Why are we adding this to core? Why not just the

[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0

2017-08-07 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/18640 @rxin . How can I proceed this PR now? Could you give me some advice again? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If

[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0

2017-08-07 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/18640 Thank you, @omalley . @rxin . I think we had better depend on Apache ORC libraries as is in this PR. --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0

2017-08-07 Thread omalley
Github user omalley commented on the issue: https://github.com/apache/spark/pull/18640 @rxin The ORC core library's dependency tree is aggressively kept as small as possible. I've gone through and excluded unnecessary jars from our dependencies. I also kick back pull requests that

[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0

2017-08-05 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/18640 Hi, @rxin Since ORC 1.4.0, ORC community provides small shaded jar files to improve usability in general purposes. This PR uses the followings. - orc-core-1.4.0-nohive.jar

[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0

2017-08-04 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18640 I just checked the dependency size. They look pretty reasonable, roughly 2 MBs in total (although I do worry in the future whether ORC would bring in a lot more jars). cc @omalley any

[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0

2017-08-04 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/18640 Until now, I think ORC is the same with most of other data sources(CSV, JDBC, JSON, PARQUET, TEXT) which live inside `sql/core` now. If that is an architectural plan of Apache Spark 2.3, I

[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0

2017-08-04 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18640 Why don't we then create a separate orc module? Just copy a few of the files over? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If

[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0

2017-08-04 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/18640 I agree with the following, but this does not block those users. This is only better than putting the dependency on Hive because it also supports more the other users who are using ML and

[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0

2017-08-04 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18640 To the best of my knowledge almost everybody runs with Hive anyway and the vast majority of users that run ORC are Hive users. In hindsight we probably should have put most of the data source

[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0

2017-08-04 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/18640 LGTM, great to see progress on ORC support. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0

2017-08-04 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/18640 Thank you for review, @kiszk . The example may be #17980 , #17924, and #17943 . If possible, in this PR, I want to focus on only `Dependency on ORC` issue. --- If your project is set

[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0

2017-08-04 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/18640 Thank you for review, @rxin . We can use ORC like Parquet now. Parquet is inside `sql/core`, not `sql/hive`. --- If your project is set up for it, you can reply to this email and have

[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0

2017-08-04 Thread kiszk
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/18640 Can we add any smaller code to use this, too? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0

2017-08-04 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/18640 Why are we adding this to core? Why not just the hive module? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0

2017-08-04 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/18640 Hi, @liancheng , @zhzhan , @rxin , @marmbrus . I'm pining you since you worked on #6194 before. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0

2017-08-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18640 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80221/ Test PASSed. ---

[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0

2017-08-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18640 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0

2017-08-03 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18640 **[Test build #80221 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80221/testReport)** for PR 18640 at commit

[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0

2017-08-03 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18640 **[Test build #80221 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80221/testReport)** for PR 18640 at commit

[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0

2017-08-03 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/18640 Retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0

2017-07-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18640 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0

2017-07-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18640 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80055/ Test PASSed. ---

[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0

2017-07-30 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18640 **[Test build #80055 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80055/testReport)** for PR 18640 at commit

[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0

2017-07-30 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18640 **[Test build #80055 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80055/testReport)** for PR 18640 at commit

[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0

2017-07-30 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/18640 Retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0

2017-07-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18640 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0

2017-07-25 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18640 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79951/ Test PASSed. ---

[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0

2017-07-25 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18640 **[Test build #79951 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79951/testReport)** for PR 18640 at commit

[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0

2017-07-25 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18640 **[Test build #79951 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79951/testReport)** for PR 18640 at commit

[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0

2017-07-25 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/18640 Retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0

2017-07-15 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/18640 Hi, @rxin , @srowen , @sameeragarwal , @cloud-fan , @hvanhovell , @gatorsmile , @ueshin , @viirya , @kiszk . Could you review this small PR about depedency change? This is a

[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0

2017-07-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18640 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0

2017-07-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18640 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79627/ Test PASSed. ---

[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0

2017-07-14 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18640 **[Test build #79627 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79627/testReport)** for PR 18640 at commit

[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0

2017-07-14 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18640 **[Test build #79627 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79627/testReport)** for PR 18640 at commit

[GitHub] spark issue #18640: [SPARK-21422][BUILD] Depend on Apache ORC 1.4.0

2017-07-14 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/18640 This aims to reduce the review scope for #17980 . cc @kiszk . --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your