[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-10-31 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/16578 > @mallman I will try to go through this again. Do you think this can be generalized to data source v2 API? I'm not familiar with that API. I'm reluctant to generalize this

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-10-27 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/16578 @viirya I've rebased to resolve conflicts. All tests are passing. Can you take another look and sign off? Cheers

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-10-27 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/16578 @DaimonPl I'm going to resolve the merge conflicts shortly. Otherwise, I have no intention of making further modifications to this PR outside of further review

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-10-02 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/16578 @viirya I believe I have addressed all of your comments to date. Assuming the tests pass on the latest commit, the only item I'm still working on is getting to the root of the problem that requires

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-10-02 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/16578 > For the problem #16578 (comment) identified by @snir, I've submitted a fix at VideoAmp#8. I think it can solve this problem. Can you review it? Thanks. I've pushed your fix to this

[GitHub] spark pull request #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-10-02 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/16578#discussion_r142140412 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaPruning.scala --- @@ -0,0 +1,130

[GitHub] spark pull request #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-10-02 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/16578#discussion_r142119164 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/QueryPlanConstraints.scala --- @@ -77,20 +77,21 @@ trait

[GitHub] spark pull request #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-10-02 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/16578#discussion_r142117188 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetReadSupport.scala --- @@ -63,9 +74,22 @@ private[parquet

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-10-02 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/16578 I've been on vacation and traveling since the 26th, hence my sluggishness in responding. I will take some time to work on this tonight. I'll start by rebasing

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-09-22 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/16578 > @mallman how about adding comment explaining why such workaround was done + bug number in parquet-mr ? So in future once that bug is fixed, code can be cleaned. It will take me more t

[GitHub] spark pull request #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-09-22 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/16578#discussion_r140611282 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaPruning.scala --- @@ -0,0 +1,130

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-09-21 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/16578 > Oh crap. I know what happened here. I've been updating some of the unit tests locally, and I've been running changes against these modified tests. I'll fix this and push a com

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-09-21 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/16578 >Test build #82051 has finished for PR 16578 at commit 00ab80c. > > * This patch fails Spark unit tests. > * This patch merges cleanly. > * This patch adds no

[GitHub] spark pull request #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-09-21 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/16578#discussion_r140368054 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/QueryPlanConstraints.scala --- @@ -77,20 +77,21 @@ trait

[GitHub] spark pull request #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-09-21 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/16578#discussion_r140366890 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/AggregateFieldExtractionPushdown.scala --- @@ -0,0 +1,77

[GitHub] spark pull request #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-09-21 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/16578#discussion_r140358379 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetReadSupport.scala --- @@ -63,9 +74,22 @@ private[parquet

[GitHub] spark pull request #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-09-21 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/16578#discussion_r140357267 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/JoinFieldExtractionPushdown.scala --- @@ -0,0 +1,66

[GitHub] spark pull request #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-09-21 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/16578#discussion_r140351256 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetReadSupport.scala --- @@ -63,9 +74,22 @@ private[parquet

[GitHub] spark pull request #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-09-21 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/16578#discussion_r140338751 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/QueryPlanConstraints.scala --- @@ -77,20 +77,21 @@ trait

[GitHub] spark pull request #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-09-21 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/16578#discussion_r140338601 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/QueryPlanConstraints.scala --- @@ -77,20 +77,21 @@ trait

[GitHub] spark pull request #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-09-21 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/16578#discussion_r140226905 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaPruning.scala --- @@ -0,0 +1,130

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-09-20 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/16578 Dammit. I forgot to commit a couple of build fixes. Fixed commit coming momentarily... --- - To unsubscribe, e-mail: reviews

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-09-20 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/16578 I have some commits to address some of your comments, @viirya. However, I'm going to push a rebase first. I've validated that all catalyst, sql and hive project tests pass locally. Hopefully

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-09-15 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/16578 > * There is a minor change needed though, "parquetFormat: ParquetFileFormat" should be replaced by "fileFormat: FileFormat" as there is no dependency on the actual ParquetF

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-09-11 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/16578 > NPE is a problem though. any luck isolating that further? I can push a commit that prevents it, I just can't say for certain it's a proper fix versus a mere workaround. I'd l

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-09-08 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/16578 > @mallman (just pure curiosity :) ) how is it possible that this NPE was not found by automated tests? There's no single test case in entire spark suite which verifies scenarios l

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-09-07 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/16578 @snir It's hard to pin down exactly where the problem is and how to fix it. We removed the `NullIntolerant` trait from the implementations of the `ExtractValue` trait

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-09-07 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/16578 @snir I think I know the issue you're running into. Please retry your query with codegen disabled. That is, run ``` spark.sql("set spark.sql.codegen.wholeStage=

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-09-07 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/16578 @kamalptw Thanks for pinging Sean. I was going to do that myself soon. He's a helpful guy. --- - To unsubscribe, e-mail

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-07-28 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/16578 We'll need a review from a committer. @cloud-fan @ericl do you have time to review this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-07-21 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/16578 @rxin Any thoughts on a review of this PR? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-07-21 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/16578 @Gauravshah That's very encouraging to see. I suspect this patch will require some pretty heavy scrutiny. Hopefully it will make it into 2.3. If not, I hope that interested users

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-07-20 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/16578 I just pushed a revision of `SelectedField.scala`. Let's see what Jenkins says. I expect it to pass, and assuming it does I will return the ball to the reviewers' court. --- If your project

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-07-20 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/16578 Let me clean up `SelectedField.scala` before we proceed further. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark issue #17633: [SPARK-20331][SQL] Enhanced Hive partition pruning predi...

2017-07-11 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/17633 @cloud-fan Can you back port this PR to 2.1 and 2.2, please? I think the patch should apply cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request #17633: [SPARK-20331][SQL] Enhanced Hive partition prunin...

2017-07-10 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/17633#discussion_r126592666 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala --- @@ -589,18 +591,67 @@ private[client] class Shim_v0_13 extends

[GitHub] spark pull request #17633: [SPARK-20331][SQL] Enhanced Hive partition prunin...

2017-07-10 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/17633#discussion_r126592639 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala --- @@ -589,18 +591,67 @@ private[client] class Shim_v0_13 extends

[GitHub] spark pull request #17633: [SPARK-20331][SQL] Enhanced Hive partition prunin...

2017-07-10 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/17633#discussion_r126592491 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala --- @@ -589,18 +591,67 @@ private[client] class Shim_v0_13 extends

[GitHub] spark pull request #17633: [SPARK-20331][SQL] Enhanced Hive partition prunin...

2017-07-10 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/17633#discussion_r126580406 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala --- @@ -589,18 +590,43 @@ private[client] class Shim_v0_13 extends

[GitHub] spark pull request #17633: [SPARK-20331][SQL] Enhanced Hive partition prunin...

2017-07-10 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/17633#discussion_r126580371 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala --- @@ -589,18 +590,43 @@ private[client] class Shim_v0_13 extends

[GitHub] spark pull request #17633: [SPARK-20331][SQL] Enhanced Hive partition prunin...

2017-07-10 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/17633#discussion_r126561248 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala --- @@ -589,18 +590,43 @@ private[client] class Shim_v0_13 extends

[GitHub] spark pull request #17633: [SPARK-20331][SQL] Enhanced Hive partition prunin...

2017-07-10 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/17633#discussion_r126561277 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala --- @@ -589,18 +590,43 @@ private[client] class Shim_v0_13 extends

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-07-10 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/16578 @nfx I'll need help finishing this. Can you review the current patchset? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request #17633: [SPARK-20331][SQL] Enhanced Hive partition prunin...

2017-07-06 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/17633#discussion_r126023821 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala --- @@ -589,18 +590,40 @@ private[client] class Shim_v0_13 extends

[GitHub] spark pull request #17633: [SPARK-20331][SQL] Enhanced Hive partition prunin...

2017-07-06 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/17633#discussion_r126023747 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala --- @@ -589,18 +590,40 @@ private[client] class Shim_v0_13 extends

[GitHub] spark pull request #17633: [SPARK-20331][SQL] Enhanced Hive partition prunin...

2017-07-06 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/17633#discussion_r126023525 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala --- @@ -589,18 +590,40 @@ private[client] class Shim_v0_13 extends

[GitHub] spark pull request #17633: [SPARK-20331][SQL] Enhanced Hive partition prunin...

2017-07-06 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/17633#discussion_r126022861 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala --- @@ -589,18 +590,34 @@ private[client] class Shim_v0_13 extends

[GitHub] spark pull request #17633: [SPARK-20331][SQL] Enhanced Hive partition prunin...

2017-07-06 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/17633#discussion_r126018188 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala --- @@ -589,18 +590,40 @@ private[client] class Shim_v0_13 extends

[GitHub] spark pull request #17633: [SPARK-20331][SQL] Enhanced Hive partition prunin...

2017-07-06 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/17633#discussion_r125999437 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/client/HiveClientSuites.scala --- @@ -0,0 +1,29 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #17633: [SPARK-20331][SQL] Enhanced Hive partition prunin...

2017-07-06 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/17633#discussion_r125996128 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala --- @@ -589,18 +590,40 @@ private[client] class Shim_v0_13 extends

[GitHub] spark pull request #17633: [SPARK-20331][SQL] Enhanced Hive partition prunin...

2017-07-06 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/17633#discussion_r125995831 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala --- @@ -589,18 +590,40 @@ private[client] class Shim_v0_13 extends

[GitHub] spark issue #17633: [SPARK-20331][SQL] Enhanced Hive partition pruning predi...

2017-07-03 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/17633 @cloud-fan ping --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #18269: [SPARK-21056][SQL] Use at most one spark job to list fil...

2017-06-21 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/18269 Hi @bbossy > Does it match your scenario? It does not match my scenario. I'm reading files from HDFS. In your test, you're reading files from the local filesystem. Can you

[GitHub] spark issue #18269: [SPARK-21056][SQL] Use at most one spark job to list fil...

2017-06-16 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/18269 @bbossy I've built and deployed a branch of Spark 2.2 with your patch and compared its behavior to the same branch of Spark 2.2 without your patch. I'm seeing different behavior, but not what I

[GitHub] spark pull request #17633: [SPARK-20331][SQL] Enhanced Hive partition prunin...

2017-06-12 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/17633#discussion_r121533941 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala --- @@ -589,18 +590,34 @@ private[client] class Shim_v0_13 extends

[GitHub] spark issue #17633: [SPARK-20331][SQL] Enhanced Hive partition pruning predi...

2017-06-12 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/17633 @cloud-fan Somehow I didn't notice that my earlier commit was failing scalastyle. Anyway, I've pushed a new version that passes all tests. Can you continue your review, please? --- If your

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-06-08 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/16578 Haven't been able to reproduce. I'm rebasing to see if that "fixes" the problem. --- If your project is set up for it, you can reply to this email and have your reply appear on GitH

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-06-07 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/16578 Hmmm... this failed again for the same reason. I'll see if I can reproduce locally. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-06-06 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/16578 Hi Guys, Can someone with super-Jenkins-powers please kick off a retest? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark issue #18181: [SPARK-20958][SQL] Roll back parquet-mr 1.8.2 to 1.8.1

2017-06-02 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/18181 I can't speak to Parquet 1.8.x anymore. We use Parquet 1.9.0 plus a patch for https://issues.apache.org/jira/browse/PARQUET-783 and have had no problems. --- If your project is set up for it, you

[GitHub] spark issue #17633: [SPARK-20331][SQL] Enhanced Hive partition pruning predi...

2017-05-30 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/17633 > @mallman yea that sounds good to me Okay. The testing diff is significantly simpler now. --- If your project is set up for it, you can reply to this email and have your reply app

[GitHub] spark pull request #17633: [SPARK-20331][SQL] Enhanced Hive partition prunin...

2017-05-30 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/17633#discussion_r119174513 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/client/HiveClientSuite.scala --- @@ -43,19 +47,159 @@ class HiveClientSuite extends

[GitHub] spark issue #18112: [SPARK-20888][SQL][DOCS] Document change of default sett...

2017-05-25 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/18112 CC @cloud-fan @ericl --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-05-25 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/16578 Also, I'm confused about something—who has jenkins retest privileges? And can I get them? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark issue #18112: [SPARK-20888][SQL][DOCS] Document change of default sett...

2017-05-25 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/18112 @budde Can you please review (urgently) for inclusion as a migration note for 2.2? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well

[GitHub] spark pull request #18112: [SPARK-20888][SQL][DOCS] Document change of defau...

2017-05-25 Thread mallman
GitHub user mallman opened a pull request: https://github.com/apache/spark/pull/18112 [SPARK-20888][SQL][DOCS] Document change of default setting of spark.sql.hive.caseSensitiveInferenceMode (Link to Jira: https://issues.apache.org/jira/browse/SPARK-20888) ## What changes

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-05-25 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/16578 Looks like this build was `kill -9`ed by something: ``` [error] running /home/jenkins/workspace/SparkPullRequestBuilder@2/build/sbt -Phadoop-2.6 -Phive-thriftserver -Phive

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-05-24 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/16578 Sorry about test failures. Will fix tomorrow. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-05-24 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/16578 @saulshanabrook I've pushed my latest work (rebased off the latest master). I'm not sure it's quite complete, but feel free to review and comment. FYI, the one piece of code that's given me

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-05-24 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/16578 @saulshanabrook I'm sorry this effort has stalled. I actually have a lot of new code for this PR in my own clone, including *a lot* of new tests, but I haven't convinced myself that the code

[GitHub] spark issue #17633: [SPARK-20331][SQL] Enhanced Hive partition pruning predi...

2017-05-24 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/17633 > it's really hard to review the tests... Can we just add some simple tests and refactor the test suites in a follow-up PR? It's a big, complicated diff, and I sympathize. Basically

[GitHub] spark pull request #17633: [SPARK-20331][SQL] Enhanced Hive partition prunin...

2017-05-24 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/17633#discussion_r118318733 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala --- @@ -589,18 +590,34 @@ private[client] class Shim_v0_13 extends

[GitHub] spark pull request #17633: [SPARK-20331][SQL] Enhanced Hive partition prunin...

2017-05-24 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/17633#discussion_r118314820 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala --- @@ -589,18 +590,39 @@ private[client] class Shim_v0_13 extends

[GitHub] spark pull request #17633: [SPARK-20331][SQL] Enhanced Hive partition prunin...

2017-05-24 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/17633#discussion_r118314685 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala --- @@ -589,18 +590,39 @@ private[client] class Shim_v0_13 extends

[GitHub] spark issue #17633: [SPARK-20331][SQL] Enhanced Hive partition pruning predi...

2017-05-23 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/17633 Rebased to resolve merge conflicts. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #17633: [SPARK-20331][SQL] Enhanced Hive partition pruning predi...

2017-05-22 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/17633 I think this build was aborted because of the emergency jenkins restart, as reported on the spark dev mailing list. Retest, please? --- If your project is set up for it, you can reply

[GitHub] spark pull request #17633: [SPARK-20331][SQL] Enhanced Hive partition prunin...

2017-05-22 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/17633#discussion_r117834494 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/client/HiveClientSuite.scala --- @@ -17,45 +17,656 @@ package

[GitHub] spark issue #17633: [SPARK-20331][SQL] Enhanced Hive partition pruning predi...

2017-05-22 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/17633 > You can add the test cases to VersionsSuite for verifying the behaviors of each Hive meta-store client version. @gatorsmile, I've refactored Hive version-specific testing to m

[GitHub] spark issue #17633: [SPARK-20331][SQL] Enhanced Hive partition pruning predi...

2017-05-11 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/17633 Hey guys. Just a quick update. I made good progress on implementing multi-version testing today, however it's not quite ready. I'm going to be on leave from tomorrow through the rest of next week

[GitHub] spark pull request #17633: [SPARK-20331][SQL] Enhanced Hive partition prunin...

2017-05-10 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/17633#discussion_r115825881 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala --- @@ -589,18 +590,39 @@ private[client] class Shim_v0_13 extends

[GitHub] spark pull request #17633: [SPARK-20331][SQL] Enhanced Hive partition prunin...

2017-05-09 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/17633#discussion_r115614088 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/client/HiveClientSuite.scala --- @@ -43,19 +47,159 @@ class HiveClientSuite extends

[GitHub] spark issue #17633: [SPARK-20331][SQL] Enhanced Hive partition pruning predi...

2017-05-09 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/17633 > Could you check whether there exists any limit on predicate we can pass to Hive? There are, and I found something in the way of documentation or a grammar a while back that specif

[GitHub] spark issue #17633: [SPARK-20331][SQL] Enhanced Hive partition pruning predi...

2017-05-09 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/17633 I've pushed a new commit removing the logical for handling "foldables", since these are evaluated earlier in planning. I've also removed the modifications I made to `FiltersS

[GitHub] spark pull request #17633: [SPARK-20331][SQL] Enhanced Hive partition prunin...

2017-05-04 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/17633#discussion_r114878091 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala --- @@ -589,18 +590,34 @@ private[client] class Shim_v0_13 extends

[GitHub] spark issue #17633: [SPARK-20331][SQL] Enhanced Hive partition pruning predi...

2017-05-03 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/17633 Hi Guys. Sorry for the lack of updates on this. I've been held up with other responsibilities the past week. I'm planning to push a new commit today or tomorrow. --- If your project is set up

[GitHub] spark pull request #17633: [SPARK-20331][SQL] Enhanced Hive partition prunin...

2017-04-26 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/17633#discussion_r113591167 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala --- @@ -589,18 +590,34 @@ private[client] class Shim_v0_13 extends

[GitHub] spark pull request #17633: [SPARK-20331][SQL] Enhanced Hive partition prunin...

2017-04-26 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/17633#discussion_r113590135 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala --- @@ -589,18 +590,34 @@ private[client] class Shim_v0_13 extends

[GitHub] spark issue #17633: [SPARK-20331][SQL] Enhanced Hive partition pruning predi...

2017-04-26 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/17633 @cloud-fan @ericl Hi guys. Care to review? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #17749: [SPARK-20450] [SQL] Unexpected first-query schema infere...

2017-04-24 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/17749 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #15125: [SPARK-5484][GraphX] Periodically do checkpoint in Prege...

2017-04-19 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/15125 > I think we want to bring GraphFrames to feature/performance parity with GraphX - @mallman would love to understand the challenges you have run into. Better yet, would be great to get some iss

[GitHub] spark pull request #15125: [SPARK-5484][GraphX] Periodically do checkpoint i...

2017-04-19 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/15125#discussion_r112311178 --- Diff: core/src/main/scala/org/apache/spark/util/PeriodicCheckpointer.scala --- @@ -128,6 +128,16 @@ private[mllib] abstract class PeriodicCheckpointer

[GitHub] spark issue #15125: [SPARK-5484][GraphX] Periodically do checkpoint in Prege...

2017-04-18 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/15125 > Yeah, catch-22. I'd also like to split out graphx, but I sorta think that's what's already happened with GraphFrames. I don't feel strongly enough to campaign for it, but think graphx sho

[GitHub] spark issue #15125: [SPARK-5484][GraphX] Periodically do checkpoint in Prege...

2017-04-18 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/15125 > I feel like i really don't know anything about graphx and can't evaluate this. It seems reasonable. I don't know if graphx is really active at this stage? Understood. Let me resp

[GitHub] spark issue #15125: [SPARK-5484][GraphX] Periodically do checkpoint in Prege...

2017-04-18 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/15125 @srowen Can you merge this PR, please? It's been over a month since we've heard from any of the reviewers. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request #17633: [SPARK-20331][SQL] Enhanced Hive partition prunin...

2017-04-17 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/17633#discussion_r111773633 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala --- @@ -589,18 +590,34 @@ private[client] class Shim_v0_13 extends

[GitHub] spark issue #17633: [SPARK-20331][SQL] Enhanced Hive partition pruning predi...

2017-04-14 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/17633 > Does this work for non-Hive tables? This is geared towards Hive partitioned tables. If we have another system that prunes table partitions based on a string-ified pruning predicate

[GitHub] spark pull request #17633: [SPARK-20331][SQL] Enhanced Hive partition prunin...

2017-04-13 Thread mallman
GitHub user mallman opened a pull request: https://github.com/apache/spark/pull/17633 [SPARK-20331][SQL] Enhanced Hive partition pruning predicate pushdown (Link to Jira: https://issues.apache.org/jira/browse/SPARK-20331) ## What changes were proposed in this pull request

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-04-12 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/16578 > can I do something to help this pull request? Hi @Gauravshah. Thanks for asking. Right now I need to fix a broken piece of the code, or reimplement it. At the moment this is something

[GitHub] spark issue #15125: [SPARK-5484][GraphX] Periodically do checkpoint in Prege...

2017-04-11 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/15125 @felixcheung ping --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #17295: [SPARK-19556][core] Do not encrypt block manager data in...

2017-04-11 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/17295 > LGTM, cc @mallman to check the unmap part LGTM, too. Sorry for the late reply... I've been away the past two weeks. --- If your project is set up for it, you can reply to this em

<    1   2   3   4   5   6   7   >