[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-24 Thread ajacques
Github user ajacques commented on the issue: https://github.com/apache/spark/pull/21320 @mallman Glad to see this got merged in. Thanks for all of your work pushing through. I'm looking forward to the next phase. Please let me know if I can help again. I did notice that window

[GitHub] spark pull request #21889: [SPARK-4502][SQL] Parquet nested column pruning -...

2018-08-16 Thread ajacques
Github user ajacques closed the pull request at: https://github.com/apache/spark/pull/21889 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-16 Thread ajacques
Github user ajacques commented on the issue: https://github.com/apache/spark/pull/21889 Thanks for the response all. @mailman If it's really your preference, I will create a PR against that branch and close this one. My intention was never to take away from your efforts, and I still

[GitHub] spark pull request #21889: [SPARK-4502][SQL] Parquet nested column pruning -...

2018-08-14 Thread ajacques
Github user ajacques commented on a diff in the pull request: https://github.com/apache/spark/pull/21889#discussion_r210170646 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaPruningSuite.scala --- @@ -0,0 +1,245

[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-14 Thread ajacques
Github user ajacques commented on the issue: https://github.com/apache/spark/pull/21320 @mallman if you're planning on making more code changes, would you be willing to work on a shared branch or something? I've been working to incorporate the CR comments

[GitHub] spark pull request #21889: [SPARK-4502][SQL] Parquet nested column pruning -...

2018-08-13 Thread ajacques
Github user ajacques commented on a diff in the pull request: https://github.com/apache/spark/pull/21889#discussion_r209830673 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaPruning.scala --- @@ -0,0 +1,200

[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-09 Thread ajacques
Github user ajacques commented on the issue: https://github.com/apache/spark/pull/21889 @gatorsmile Do you think there is a on deterministic failure in this change that causes it to inconsistently fail

[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-08 Thread ajacques
Github user ajacques commented on the issue: https://github.com/apache/spark/pull/21889 @mallman, while we wait for the go-no-go, do you have the changes for the next PR ready? Is there anything you need help

[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-07 Thread ajacques
Github user ajacques commented on the issue: https://github.com/apache/spark/pull/21889 >> but @gatorsmile wants to review it in a follow-on PR. > Where did he say it after the comment above? It was my interpretation of this comment: https://github.com/apa

[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-07 Thread ajacques
Github user ajacques commented on the issue: https://github.com/apache/spark/pull/21889 @HyukjinKwon Looks like most of your comments have been already addressed, but I've gone ahead and made a few more tweaks to help this get merged. Please let me know if any blocking comments have

[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-07 Thread ajacques
Github user ajacques commented on the issue: https://github.com/apache/spark/pull/21889 Is there anything I can do to help with this PR? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-05 Thread ajacques
Github user ajacques commented on the issue: https://github.com/apache/spark/pull/21889 Jenkins build successful. Any PR comments/blockers to merge for phase 1? cc @HyukjinKwon, @gatorsmile, @cloud-fan

[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-05 Thread ajacques
Github user ajacques commented on the issue: https://github.com/apache/spark/pull/21889 Alright to make sure we're all on the same page, it sounds like we're ready to merge this PR pending: * Successful build by Jenkins * Any PR comments from a maintainer This feature

[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-05 Thread ajacques
Github user ajacques commented on the issue: https://github.com/apache/spark/pull/21889 @mallman Is it related to [this revert in ParquetReadSupport](https://github.com/apache/spark/pull/21889/commits/0312a5188f0d6c9fc5304195dbdc703bf0aa3fb7#diff-245e70c1f41e353e34cf29bd00fd9029L86

[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-04 Thread ajacques
Github user ajacques commented on the issue: https://github.com/apache/spark/pull/21889 @mallman `select id, name.middle, address from temp` - **Works** `select name.middle, address from temp` - **Fails

[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-04 Thread ajacques
Github user ajacques commented on the issue: https://github.com/apache/spark/pull/21889 The tests as committed pass for me, but I removed the `order by id` and I got that error. Are you saying it works with the specific query in my comment

[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-04 Thread ajacques
Github user ajacques commented on the issue: https://github.com/apache/spark/pull/21889 @mallman: I've rebased on top of your changes and pushed. I'm seeing the following: Given the following schema: ``` root |-- id: integer (nullable = true) |-- name

[GitHub] spark pull request #21889: [SPARK-4502][SQL] Parquet nested column pruning -...

2018-08-04 Thread ajacques
Github user ajacques commented on a diff in the pull request: https://github.com/apache/spark/pull/21889#discussion_r207718713 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaPruningSuite.scala --- @@ -0,0 +1,205

[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-03 Thread ajacques
Github user ajacques commented on the issue: https://github.com/apache/spark/pull/21889 @mallman: [This one](https://github.com/apache/spark/pull/21889/files#diff-0c6c7481232e9637b91c179f1005426aR120)? I just enabled it on my branch and the test passed. Was it fixed by your latest

[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-03 Thread ajacques
Github user ajacques commented on the issue: https://github.com/apache/spark/pull/21889 Are there any other blockers to enabling this by default now that @mallman fixed the currently known broken queries

[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-02 Thread ajacques
Github user ajacques commented on the issue: https://github.com/apache/spark/pull/21889 Anybody else able to reproduce this failure? It succeeded on my developer machine. --- - To unsubscribe, e-mail: reviews

[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-02 Thread ajacques
Github user ajacques commented on the issue: https://github.com/apache/spark/pull/21889 These test failures are in Spark streaming. Is this just an intermittent test failure or actually caused by this PR

[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-07-31 Thread ajacques
Github user ajacques commented on the issue: https://github.com/apache/spark/pull/21889 @mallman, sounds good I'll get this PR updated with your latest changes as soon as I can. --- - To unsubscribe, e-mail

[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-07-31 Thread ajacques
Github user ajacques commented on the issue: https://github.com/apache/spark/pull/21889 Where does that leave both of these PRs? Do we still want this one with the code refactoring or to go back to the original? Are there any comments for this PR that would block merging? I've set

[GitHub] spark pull request #21889: [SPARK-4502][SQL] Parquet nested column pruning -...

2018-07-26 Thread ajacques
GitHub user ajacques opened a pull request: https://github.com/apache/spark/pull/21889 [SPARK-4502][SQL] Parquet nested column pruning - foundation (2nd attempt) (Link to Jira: https://issues.apache.org/jira/browse/SPARK-4502) **This is a restart of apache/spark#21320. Most

[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-07-26 Thread ajacques
Github user ajacques commented on the issue: https://github.com/apache/spark/pull/21320 To confirm we want to start a secondary PR based on my stylistic/minor fixes? As I get up to speed on this code, I won't be able to make heavy changes. I'll have some time tomorrow to take a look

[GitHub] spark pull request #21320: [SPARK-4502][SQL] Parquet nested column pruning -...

2018-07-25 Thread ajacques
Github user ajacques commented on a diff in the pull request: https://github.com/apache/spark/pull/21320#discussion_r205329769 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/SelectedField.scala --- @@ -0,0 +1,134 @@ +/* + * Licensed

[GitHub] spark pull request #21320: [SPARK-4502][SQL] Parquet nested column pruning -...

2018-07-25 Thread ajacques
Github user ajacques commented on a diff in the pull request: https://github.com/apache/spark/pull/21320#discussion_r205329633 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/ProjectionOverSchema.scala --- @@ -0,0 +1,62 @@ +/* + * Licensed

[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-07-25 Thread ajacques
Github user ajacques commented on the issue: https://github.com/apache/spark/pull/21320 @HyukjinKwon, I'm not totally familiar with Spark internals yet, so to be honest I don't feel confident making big changes and hopefully can keep them simple at first. I've gone through

[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-07-25 Thread ajacques
Github user ajacques commented on the issue: https://github.com/apache/spark/pull/21320 Hey @mallman, I want to thank you for your work on this so far. I've been watching this pull request hoping this would get merged into 2.4 since it would be a benefit to me, but can see how