[GitHub] spark pull request #21889: [SPARK-4502][SQL] Parquet nested column pruning -...

2018-08-04 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/21889#discussion_r207719862 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaPruningSuite.scala --- @@ -0,0 +1,205

[GitHub] spark pull request #21889: [SPARK-4502][SQL] Parquet nested column pruning -...

2018-08-04 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/21889#discussion_r207718734 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaPruningSuite.scala --- @@ -0,0 +1,205

[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-03 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21889 > Are there any other blockers to enabling this by default now that @mallman fixed the currently known broken queries? The functionality exercised by the ignored t

[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-03 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21889 Success! Now where do we stand, @gatorsmile @HyukjinKwon ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-03 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21889 Oh dear. I don't know why we're getting all these sigkills. I think we're going to need another retest... --- - To unsubscribe

[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-02 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21889 > Anybody else able to reproduce this failure? It succeeded on my developer machine. It worked for me, too. Let's see what a retest d

[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-02 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21889 > I was able to run the first failing test successfully. Can we get a retest, please? @ajacques I just rebased and pushed my branch off of master. Perhaps the easiest thing to do wo

[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-02 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21889 > These test failures are in Spark streaming. Is this just an intermittent test failure or actually caused by this PR? I was able to run the first failing test successfully. Can we

[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-01 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21889 > This patch fails Scala style tests. Hi @ajacques. I'm not sure if you're aware of this, but you can run the scalastyle checks locally with ``` sbt scalast

[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-07-31 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21889 > Where does that leave both of these PRs? Do we still want this one with the code refactoring or to go back to the original? Are there any comments for this PR that would block merging? I've

[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-07-30 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21889 Hi @gatorsmile. Where do you see us at this point? Do you still want to get this into Spark 2.4? --- - To unsubscribe, e-mail

[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-07-28 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21320 > @gatorsmile @HyukjinKwon @ajacques I'm seeing incorrect results from the following simple projection query, given @jainaks test file: > >``` >select page.url, page fr

[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-07-27 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21889 > @mallman, if we are all happy here, mind taking a look https://github.com/apache/spark/pull/21320#issuecomment-408271470 and https://github.com/apache/spark/pull/21320#issuecomment-406765

[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-07-26 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21320 > Thanks @jainaks for the sample file and instructions to reproduce the problem. I will investigate and reply. @gatorsmile @HyukjinKwon @ajacques I'm seeing incorrect results f

[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-07-26 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21320 Thanks @jainaks for the sample file and instructions to reproduce the problem. I will investigate and reply

[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-07-26 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21320 Sorry it's taken me a couple of days to respond. I needed the time to ruminate (and not). I could write voluminously, but I just want to reply to a couple of points and move on. > I th

[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-07-25 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21320 > Few comments like #21320 (comment) or ^ are not minor or nits. I leave hard -1 if they are not addressed. I'm sorry to say I'm very close to hanging up this PR. I put a lot of care, t

[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-07-25 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21320 > gentle ping @mallman since the code freeze is close Outside of my primary occupation, my top priority on this PR right now is investigating https://github.com/apache/spark/pull/21

[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-07-25 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21320 > Regarding #21320 (comment), can you at least set this enable by default and see if some existing tests are broken or not? I have no intention to at this po

[GitHub] spark pull request #21320: [SPARK-4502][SQL] Parquet nested column pruning -...

2018-07-25 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/21320#discussion_r205022974 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/planning/SelectedFieldSuite.scala --- @@ -0,0 +1,388 @@ +/* + * Licensed

[GitHub] spark pull request #21320: [SPARK-4502][SQL] Parquet nested column pruning -...

2018-07-25 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/21320#discussion_r205022799 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaPruningSuite.scala --- @@ -0,0 +1,156

[GitHub] spark pull request #21320: [SPARK-4502][SQL] Parquet nested column pruning -...

2018-07-25 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/21320#discussion_r205022895 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaPruning.scala --- @@ -0,0 +1,153

[GitHub] spark pull request #21320: [SPARK-4502][SQL] Parquet nested column pruning -...

2018-07-25 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/21320#discussion_r205021712 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/planning/SelectedFieldSuite.scala --- @@ -0,0 +1,388 @@ +/* + * Licensed

[GitHub] spark pull request #21320: [SPARK-4502][SQL] Parquet nested column pruning -...

2018-07-25 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/21320#discussion_r205021469 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaPruning.scala --- @@ -0,0 +1,153

[GitHub] spark pull request #21320: [SPARK-4502][SQL] Parquet nested column pruning -...

2018-07-25 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/21320#discussion_r205021282 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/ProjectionOverSchema.scala --- @@ -0,0 +1,62 @@ +/* + * Licensed

[GitHub] spark pull request #21320: [SPARK-4502][SQL] Parquet nested column pruning -...

2018-07-25 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/21320#discussion_r205020970 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetReadSupport.scala --- @@ -71,9 +80,22 @@ private[parquet

[GitHub] spark pull request #21320: [SPARK-4502][SQL] Parquet nested column pruning -...

2018-07-25 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/21320#discussion_r205021140 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaPruningSuite.scala --- @@ -0,0 +1,156

[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-07-21 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21320 >> Hi @jainaks. Thanks for your report. Do you have the same problem running your test with this PR? > @mallman Yes, the issue with window functions is reproducible even with

[GitHub] spark pull request #21320: [SPARK-4502][SQL] Parquet nested column pruning -...

2018-07-21 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/21320#discussion_r204209033 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -1298,8 +1298,18 @@ object SQLConf { "issues.

[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-07-21 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21320 The test failure is unrelated to this patch. Shall we retest? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-07-21 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21320 > @mallman I still think we need to split it to two PRs. To resolve the issues you mentioned above, how about creating a separate PR? Only 10 days left before the code freeze of Spark 2.4. We p

[GitHub] spark pull request #21320: [SPARK-4502][SQL] Parquet nested column pruning -...

2018-07-21 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/21320#discussion_r204208518 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/planning/SelectedFieldSuite.scala --- @@ -0,0 +1,387 @@ +/* + * Licensed

[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-07-21 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21320 > BTW, I am trying to take a look closely. I would appreciate if there are some concrete examples so that I (or other reviewers) can double check along. Parquet is pretty core fix and let's be v

[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-07-21 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21320 > I think if the tests are few, you can make them ignored for now here, and make another PR enabling it back with the changes in ParquetReadSupport.scala. That's the approach I've ta

[GitHub] spark pull request #21320: [SPARK-4502][SQL] Parquet nested column pruning -...

2018-07-21 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/21320#discussion_r204206072 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaPruningSuite.scala --- @@ -0,0 +1,156

[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-07-21 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21320 > @mallman I still think we need to split it to two PRs. To resolve the issues you mentioned above, how about creating a separate PR? Only 10 days left before the code freeze of Spark 2.4. We p

[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-07-20 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21320 > Could we move the changes made in ParquetReadSupport.scala to a separate PR? Then, we can merge this PR very quickly. If I remove the changes to `ParquetReadSupport.scala`, then f

[GitHub] spark pull request #21320: [SPARK-4502][SQL] Parquet nested column pruning -...

2018-07-11 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/21320#discussion_r201863353 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaPruning.scala --- @@ -0,0 +1,153

[GitHub] spark pull request #21320: [SPARK-4502][SQL] Parquet nested column pruning -...

2018-07-11 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/21320#discussion_r201863463 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/SelectedField.scala --- @@ -0,0 +1,134 @@ +/* + * Licensed

[GitHub] spark pull request #21320: [SPARK-4502][SQL] Parquet nested column pruning -...

2018-07-11 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/21320#discussion_r201863251 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetReadSupport.scala --- @@ -71,9 +80,22 @@ private[parquet

[GitHub] spark issue #19410: [SPARK-22184][CORE][GRAPHX] GraphX fails in case of insu...

2018-07-09 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/19410 Hi @szhem. I understand you've put a lot of work into this implementation, however I think you should try a simpler approach before we consider something more complicated. I believe

[GitHub] spark issue #19410: [SPARK-22184][CORE][GRAPHX] GraphX fails in case of insu...

2018-07-09 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/19410 Hi @szhem. Thanks for the information regarding disk use for your scenario. What do you think about my second point, using the `ContextCleaner

[GitHub] spark pull request #21320: [SPARK-4502][SQL] Parquet nested column pruning -...

2018-07-02 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/21320#discussion_r199648692 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRowConverter.scala --- @@ -182,18 +182,20 @@ private[parquet

[GitHub] spark pull request #21320: [SPARK-4502][SQL] Parquet nested column pruning -...

2018-07-02 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/21320#discussion_r199643803 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetReadSupport.scala --- @@ -71,9 +80,22 @@ private[parquet

[GitHub] spark pull request #21320: [SPARK-4502][SQL] Parquet nested column pruning -...

2018-07-02 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/21320#discussion_r199631341 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetReadSupport.scala --- @@ -47,16 +47,25 @@ import

[GitHub] spark issue #19410: [SPARK-22184][CORE][GRAPHX] GraphX fails in case of insu...

2018-07-02 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/19410 Hi @szhem. I dug deeper and think I understand the problem better. To state the obvious, the periodic checkpointer deletes checkpoint files of RDDs that are potentially still accessible

[GitHub] spark issue #19410: [SPARK-22184][CORE][GRAPHX] GraphX fails in case of insu...

2018-06-26 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/19410 Hi @szhem. Thanks for the kind reminder and thanks for your contribution. I'm sorry I did not respond sooner. I no longer work where I regularly used the checkpointing code with large

[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-06-24 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21320 @gatorsmile I've removed the changes to the files as you requested. This removes support for schema pruning on filters of queries. I've pushed the previous revision to a new branch in our `spark

[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-06-19 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21320 @gatorsmile The last build was killed by SIGKILL. Can you start a new build, please? --- - To unsubscribe, e-mail: reviews

[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-06-18 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21320 @mallman Yes, the issue with window functions is reproducible even with this PR. Can you attach a (small) parquet file I can use to test this scenario

[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-06-12 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21320 > @mallman It does work fine with "name.First". @jainaks What is the value of the Spark SQL configuration setting `spark.sql.caseSensitive` when you run this query? Also, are

[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-06-12 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21320 @gatorsmile The last couple of build test failures appear to be entirely unrelated to this PR. The error message in the one failed test reads `org.scalatest.exceptions.TestFailedException: Unable

[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-06-12 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21320 Hi @jainaks. Thanks for your report. Do you have the same problem running your test with this PR? --- - To unsubscribe, e-mail

[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-06-11 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21320 Hi @jainaks. I can see why your query would not work. In the example you provide, if you refer to the column as `name.First`, does your query succeed

[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-06-04 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21320 @gatorsmile I've addressed many of your points in today's commits. Can you please take a look at what I've done so far? I'm still working on the PRs you requested

[GitHub] spark pull request #21320: [SPARK-4502][SQL] Parquet nested column pruning -...

2018-05-24 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/21320#discussion_r190494243 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/planning/SelectedFieldSuite.scala --- @@ -0,0 +1,432 @@ +/* + * Licensed

[GitHub] spark pull request #21320: [SPARK-4502][SQL] Parquet nested column pruning -...

2018-05-24 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/21320#discussion_r190493689 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaPruning.scala --- @@ -0,0 +1,154

[GitHub] spark pull request #21320: [SPARK-4502][SQL] Parquet nested column pruning -...

2018-05-24 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/21320#discussion_r190492424 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaPruning.scala --- @@ -0,0 +1,154

[GitHub] spark pull request #21320: [SPARK-4502][SQL] Parquet nested column pruning -...

2018-05-24 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/21320#discussion_r190491050 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetQuerySuite.scala --- @@ -879,6 +879,15 @@ class

[GitHub] spark pull request #21320: [SPARK-4502][SQL] Parquet nested column pruning -...

2018-05-24 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/21320#discussion_r190486220 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -1256,8 +1256,18 @@ object SQLConf { "issues.

[GitHub] spark pull request #21320: [SPARK-4502][SQL] Parquet nested column pruning -...

2018-05-24 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/21320#discussion_r190485768 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala --- @@ -286,7 +286,19 @@ case class FileSourceScanExec

[GitHub] spark pull request #21320: [SPARK-4502][SQL] Parquet nested column pruning -...

2018-05-24 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/21320#discussion_r190485713 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/ColumnarFileFormat.scala --- @@ -0,0 +1,32 @@ +/* + * Licensed

[GitHub] spark pull request #21320: [SPARK-4502][SQL] Parquet nested column pruning -...

2018-05-24 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/21320#discussion_r190484386 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/QueryPlanConstraints.scala --- @@ -99,27 +100,28 @@ trait

[GitHub] spark pull request #21320: [SPARK-4502][SQL] Parquet nested column pruning -...

2018-05-24 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/21320#discussion_r190479026 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala --- @@ -162,7 +162,9 @@ case class FilterExec(condition

[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-05-14 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21320 @gatorsmile I believe this is the PR you requested for review. --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2018-05-14 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/16578 I'm closing this PR in favor of #21320. That PR deals with simple projection and filter queries only. I will submit subsequent PRs for aggregation and join queries following the acceptance

[GitHub] spark pull request #16578: [SPARK-4502][SQL] Parquet nested column pruning

2018-05-14 Thread mallman
Github user mallman closed the pull request at: https://github.com/apache/spark/pull/16578 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #21320: [SPARK-4502][SQL] Parquet nested column pruning -...

2018-05-14 Thread mallman
GitHub user mallman opened a pull request: https://github.com/apache/spark/pull/21320 [SPARK-4502][SQL] Parquet nested column pruning - foundation (Link to Jira: https://issues.apache.org/jira/browse/SPARK-4502) _N.B. This is a restart of PR #16578 which includes everything

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2018-05-10 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/16578 BTW I’ve been and am currently traveling with a busy itinerary. I haven’t started work on this and probably won’t get to work on it until Monday at the very earliest. > On Ma

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2018-05-04 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/16578 > To ensure the PR and review quality, we normally avoid doing everything in a single huge PR. It would be much better if you can cut it to a few smaller PRs. I'll have a

[GitHub] spark pull request #16578: [SPARK-4502][SQL] Parquet nested column pruning

2018-04-15 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/16578#discussion_r181575614 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala --- @@ -151,6 +151,9 @@ abstract class Optimizer

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2018-01-09 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/16578 I'd just suggest trying it. Since this PR is a patch for master, please message me personally at m...@allman.ms to discuss progress and questions on a backport to 2.2. If we get it working

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2018-01-03 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/16578 > However, I am -1 on merging a change this large after branch cut. It's disappointing, but I agree we can't merge a change this large into a branch cut. It will have to wait for 2.

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-11-21 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/16578 > But I think we still need other eyes on this too. Agreed. @rxin can you help rope anyone else in on this? It's a big PR with a bigger history, but absent some savaging by anot

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-11-21 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/16578 > Can you give an example it would fail? We didn't change clipParquetSchema, so I think even when pruning happens, why we read a super set of the file's schema and cause the exception, accord

[GitHub] spark pull request #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-11-16 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/16578#discussion_r151477529 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -961,6 +961,15 @@ object SQLConf { .booleanConf

[GitHub] spark pull request #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-11-16 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/16578#discussion_r151474342 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -961,6 +961,15 @@ object SQLConf { .booleanConf

[GitHub] spark pull request #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-11-14 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/16578#discussion_r151026919 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -961,6 +961,15 @@ object SQLConf { .booleanConf

[GitHub] spark issue #19682: [SPARK-22464] [SQL] No pushdown for Hive metastore parti...

2017-11-07 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/19682 Thanks for the fix! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-11-03 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/16578 @viirya Can you please take a look at my latest revisions and replies to your comments? Cheers. --- - To unsubscribe, e-mail

[GitHub] spark pull request #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-11-03 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/16578#discussion_r148866084 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetReadSupport.scala --- @@ -63,9 +74,22 @@ private[parquet

[GitHub] spark pull request #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-11-03 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/16578#discussion_r148863673 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetReadSupport.scala --- @@ -63,9 +74,22 @@ private[parquet

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-11-03 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/16578 I can't tell what's causing the build to fail: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83390/console Any ideas

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-11-03 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/16578 > Yeah, I think with a config for this optimization is good. I added a config switch, `spark.sql.nestedSchemaPruning.enabled`, which disables the optimizations if set to `false`. By defa

[GitHub] spark pull request #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-11-03 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/16578#discussion_r148731634 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/planning/SelectedFieldSuite.scala --- @@ -0,0 +1,440 @@ +/* + * Licensed

[GitHub] spark pull request #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-11-03 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/16578#discussion_r148731266 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/planning/SelectedFieldSuite.scala --- @@ -0,0 +1,440 @@ +/* + * Licensed

[GitHub] spark pull request #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-11-03 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/16578#discussion_r148725914 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetReadSupport.scala --- @@ -63,9 +74,22 @@ private[parquet

[GitHub] spark pull request #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-11-03 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/16578#discussion_r148725482 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/FileSchemaPruningTest.scala --- @@ -0,0 +1,53 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-11-03 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/16578#discussion_r148725325 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/FileSchemaPruningTest.scala --- @@ -0,0 +1,53 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-11-03 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/16578#discussion_r148725152 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaPruning.scala --- @@ -0,0 +1,147

[GitHub] spark pull request #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-11-03 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/16578#discussion_r148723190 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaPruning.scala --- @@ -0,0 +1,147

[GitHub] spark pull request #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-11-03 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/16578#discussion_r148722822 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRowConverter.scala --- @@ -127,8 +127,8 @@ private[parquet

[GitHub] spark pull request #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-11-03 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/16578#discussion_r148719962 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/ProjectionOverSchema.scala --- @@ -0,0 +1,61 @@ +/* + * Licensed

[GitHub] spark pull request #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-11-03 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/16578#discussion_r148718059 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/ProjectionOverSchema.scala --- @@ -0,0 +1,61 @@ +/* + * Licensed

[GitHub] spark pull request #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-11-03 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/16578#discussion_r148717122 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/JoinFieldExtractionPushdown.scala --- @@ -0,0 +1,66

[GitHub] spark pull request #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-11-03 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/16578#discussion_r148717085 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/FieldExtractionPushdown.scala --- @@ -0,0 +1,53 @@ +/* + * Licensed

[GitHub] spark pull request #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-11-03 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/16578#discussion_r148716702 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/AggregateFieldExtractionPushdown.scala --- @@ -0,0 +1,77

[GitHub] spark pull request #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-11-03 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/16578#discussion_r148716194 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/AggregateFieldExtractionPushdown.scala --- @@ -0,0 +1,77

[GitHub] spark pull request #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-11-02 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/16578#discussion_r148687395 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/FieldExtractionPushdown.scala --- @@ -0,0 +1,53 @@ +/* + * Licensed

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-10-31 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/16578 > I'm reluctant to generalize this PR without practical experience applying it to other column-oriented file formats. The only format I'm familiar with and have production experie

<    1   2   3   4   5   6   7   >