[GitHub] spark issue #22905: [SPARK-25894][SQL] Add a ColumnarFileFormat type which r...

2018-12-05 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/22905 I think I've made my case for this patch as best I can. It does not appear this PR has unanimous support, but I continue to believe we should merge it to master. So where do we take it from here

[GitHub] spark issue #22880: [SPARK-25407][SQL] Ensure we pass a compatible pruned sc...

2018-12-05 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/22880 Hi @dbtsai @HyukjinKwon @gatorsmile @viirya. Can we merge this to master? --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark pull request #22880: [SPARK-25407][SQL] Ensure we pass a compatible pr...

2018-11-13 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/22880#discussion_r233180347 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRowConverter.scala --- @@ -130,8 +130,8 @@ private[parquet

[GitHub] spark pull request #22880: [SPARK-25407][SQL] Ensure we pass a compatible pr...

2018-11-13 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/22880#discussion_r233179076 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRowConverter.scala --- @@ -202,8 +204,12 @@ private[parquet

[GitHub] spark pull request #22905: [SPARK-25894][SQL] Add a ColumnarFileFormat type ...

2018-11-13 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/22905#discussion_r233175025 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala --- @@ -306,7 +306,15 @@ case class FileSourceScanExec

[GitHub] spark issue #22880: [SPARK-25407][SQL] Ensure we pass a compatible pruned sc...

2018-11-08 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/22880 @gatorsmile How do you feel about merging this in? Anyone else I should ping for review? --- - To unsubscribe, e-mail: reviews

[GitHub] spark issue #22880: [SPARK-25407][SQL] Ensure we pass a compatible pruned sc...

2018-11-07 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/22880 Can someone with Jenkins retest privileges please kick off a retest? --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark issue #22880: [SPARK-25407][SQL] Ensure we pass a compatible pruned sc...

2018-11-07 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/22880 Jenkins retest please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #22880: [SPARK-25407][SQL] Ensure we pass a compatible pr...

2018-11-06 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/22880#discussion_r231249401 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRowConverter.scala --- @@ -202,11 +204,15 @@ private[parquet

[GitHub] spark pull request #22880: [SPARK-25407][SQL] Ensure we pass a compatible pr...

2018-11-06 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/22880#discussion_r231243760 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRowConverter.scala --- @@ -202,11 +204,15 @@ private[parquet

[GitHub] spark issue #22880: [SPARK-25407][SQL] Ensure we pass a compatible pruned sc...

2018-11-05 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/22880 cc @HyukjinKwon Would you like to review this PR? It's a bug fix. --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark pull request #22880: [SPARK-25407][SQL] Ensure we pass a compatible pr...

2018-11-05 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/22880#discussion_r230916138 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRowConverter.scala --- @@ -202,11 +204,15 @@ private[parquet

[GitHub] spark pull request #22905: [SPARK-25894][SQL] Add a ColumnarFileFormat type ...

2018-11-05 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/22905#discussion_r230914377 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala --- @@ -306,7 +306,15 @@ case class FileSourceScanExec

[GitHub] spark issue #22905: [SPARK-25894][SQL] Add a ColumnarFileFormat type which r...

2018-11-02 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/22905 > @mallman Could you run the EXPLAIN with this new changes and post it in the PR description? D

[GitHub] spark pull request #22905: [SPARK-25894][SQL] Add a ColumnarFileFormat type ...

2018-11-02 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/22905#discussion_r230442852 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/ColumnarFileFormat.scala --- @@ -0,0 +1,32 @@ +/* + * Licensed

[GitHub] spark pull request #22905: [SPARK-25894][SQL] Add a ColumnarFileFormat type ...

2018-11-02 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/22905#discussion_r230433746 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/ColumnarFileFormat.scala --- @@ -0,0 +1,32 @@ +/* + * Licensed

[GitHub] spark pull request #22905: [SPARK-25894][SQL] Add a ColumnarFileFormat type ...

2018-11-02 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/22905#discussion_r230432336 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala --- @@ -306,7 +306,15 @@ case class FileSourceScanExec

[GitHub] spark pull request #22905: [SPARK-25894][SQL] Add a ColumnarFileFormat type ...

2018-11-01 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/22905#discussion_r230128199 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/ColumnarFileFormat.scala --- @@ -0,0 +1,32 @@ +/* + * Licensed

[GitHub] spark pull request #22905: [SPARK-25894][SQL] Add a ColumnarFileFormat type ...

2018-11-01 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/22905#discussion_r230125133 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala --- @@ -306,7 +306,15 @@ case class FileSourceScanExec

[GitHub] spark pull request #22880: [SPARK-25407][SQL] Ensure we pass a compatible pr...

2018-10-31 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/22880#discussion_r229743035 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRowConverter.scala --- @@ -202,11 +204,15 @@ private[parquet

[GitHub] spark pull request #22880: [SPARK-25407][SQL] Ensure we pass a compatible pr...

2018-10-31 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/22880#discussion_r229739407 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRowConverter.scala --- @@ -202,11 +204,15 @@ private[parquet

[GitHub] spark pull request #22880: [SPARK-25407][SQL] Ensure we pass a compatible pr...

2018-10-31 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/22880#discussion_r229738879 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRowConverter.scala --- @@ -202,11 +204,15 @@ private[parquet

[GitHub] spark issue #22905: [SPARK-25894][SQL] Add a ColumnarFileFormat type which r...

2018-10-31 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/22905 > is there anything blocked by this? I agree this is a good feature, but it asks the data source to provide a new ability, which may become a problem when migrating file sources to data source

[GitHub] spark pull request #22905: [SPARK-25894][SQL] Add a ColumnarFileFormat type ...

2018-10-31 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/22905#discussion_r229729687 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala --- @@ -306,7 +306,15 @@ case class FileSourceScanExec

[GitHub] spark issue #22905: [SPARK-25894][SQL] Add a ColumnarFileFormat type which r...

2018-10-31 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/22905 @gatorsmile @viirya @cloud-fan @dbtsai your thoughts? cc @dongjoon-hyun for ORC file format perspective

[GitHub] spark pull request #22905: [SPARK-25894][SQL] Add a ColumnarFileFormat type ...

2018-10-31 Thread mallman
GitHub user mallman opened a pull request: https://github.com/apache/spark/pull/22905 [SPARK-25894][SQL] Add a ColumnarFileFormat type which returns the column count for a given schema (link to Jira: https://issues.apache.org/jira/browse/SPARK-25894) ## What changes were

[GitHub] spark pull request #22880: [SPARK-25407][SQL] Ensure we pass a compatible pr...

2018-10-31 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/22880#discussion_r229654276 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRowConverter.scala --- @@ -202,11 +204,15 @@ private[parquet

[GitHub] spark pull request #22880: [SPARK-25407][SQL] Ensure we pass a compatible pr...

2018-10-30 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/22880#discussion_r229451788 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRowConverter.scala --- @@ -202,11 +204,15 @@ private[parquet

[GitHub] spark pull request #22880: [SPARK-25407][SQL] Ensure we pass a compatible pr...

2018-10-30 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/22880#discussion_r229451108 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRowConverter.scala --- @@ -182,18 +182,20 @@ private[parquet

[GitHub] spark pull request #22880: [SPARK-25407][SQL] Ensure we pass a compatible pr...

2018-10-30 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/22880#discussion_r229450720 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetReadSupport.scala --- @@ -93,13 +141,14 @@ private[parquet

[GitHub] spark pull request #22880: [SPARK-25407][SQL] Ensure we pass a compatible pr...

2018-10-30 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/22880#discussion_r229449812 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetReadSupport.scala --- @@ -49,34 +49,82 @@ import

[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-10-30 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21320 > https://issues.apache.org/jira/browse/SPARK-25879 > > If we select a nested field and a top level field, the schema pruning will fail. Here is the reproducible test, > ...

[GitHub] spark pull request #22880: [SPARK-25407][SQL] Ensure we pass a compatible pr...

2018-10-29 Thread mallman
GitHub user mallman opened a pull request: https://github.com/apache/spark/pull/22880 [SPARK-25407][SQL] Ensure we pass a compatible pruned schema to ParquetRowConverter ## What changes were proposed in this pull request? (Link to Jira issue: https://issues.apache.org/jira

[GitHub] spark issue #19410: [SPARK-22184][CORE][GRAPHX] GraphX fails in case of insu...

2018-10-29 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/19410 Hi @szhem. I'm sorry I haven't been more responsive here. I can relate to your frustration, and I do want to help you make progress on this PR and merge it in. I have indeed been busy

[GitHub] spark pull request #22614: [SPARK-25561][SQL] Implement a new config to cont...

2018-10-08 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/22614#discussion_r223441744 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/client/HiveClientSuite.scala --- @@ -79,12 +82,30 @@ class HiveClientSuite(version: String

[GitHub] spark pull request #22614: [SPARK-25561][SQL] Implement a new config to cont...

2018-10-08 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/22614#discussion_r223425011 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/client/HiveClientSuite.scala --- @@ -79,12 +82,30 @@ class HiveClientSuite(version: String

[GitHub] spark pull request #22614: [SPARK-25561][SQL] Implement a new config to cont...

2018-10-08 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/22614#discussion_r223424625 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/client/HiveClientSuite.scala --- @@ -79,12 +82,30 @@ class HiveClientSuite(version: String

[GitHub] spark pull request #22614: [SPARK-25561][SQL] Implement a new config to cont...

2018-10-08 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/22614#discussion_r223422030 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala --- @@ -746,34 +746,45 @@ private[client] class Shim_v0_13 extends

[GitHub] spark pull request #22614: [SPARK-25561][SQL] Implement a new config to cont...

2018-10-08 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/22614#discussion_r223415835 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala --- @@ -754,26 +755,38 @@ private[client] class Shim_v0_13 extends

[GitHub] spark pull request #22614: [SPARK-25561][SQL] HiveClient.getPartitionsByFilt...

2018-10-03 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/22614#discussion_r222348674 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala --- @@ -746,34 +746,20 @@ private[client] class Shim_v0_13 extends

[GitHub] spark pull request #22614: [SPARK-25561][SQL] HiveClient.getPartitionsByFilt...

2018-10-03 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/22614#discussion_r222345462 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala --- @@ -746,34 +746,20 @@ private[client] class Shim_v0_13 extends

[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-09-26 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21320 Hi @Gauravshah. That branch has diverged substantially from what’s in master. Right now I’m preparing a PR to address a problem with the current implementation in master, but I’m on holiday

[GitHub] spark issue #22394: [SPARK-25406][SQL] For ParquetSchemaPruningSuite.scala, ...

2018-09-12 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/22394 Retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews

[GitHub] spark issue #22394: [SPARK-25406][SQL] For ParquetSchemaPruningSuite.scala, ...

2018-09-12 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/22394 > Hey @mallman, let's just target to fix the problem in the JIRA without other refactorings. The refactorings I've made address the problem directly. Hopefully that will be clearer with

[GitHub] spark pull request #22394: [SPARK-25406][SQL] For ParquetSchemaPruningSuite....

2018-09-12 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/22394#discussion_r217055207 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaPruningSuite.scala --- @@ -245,28 +249,32 @@ class

[GitHub] spark pull request #22394: [SPARK-25406][SQL] For ParquetSchemaPruningSuite....

2018-09-12 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/22394#discussion_r217052036 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaPruningSuite.scala --- @@ -245,28 +249,32 @@ class

[GitHub] spark issue #22357: [SPARK-25363][SQL] Fix schema pruning in where clause by...

2018-09-11 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/22357 > FYI, @mallman I'm working on having ParquetFilter to support IsNotNull(employer.id) to be pushed into parquet reader. That would be pretty c

[GitHub] spark issue #22357: [SPARK-25363][SQL] Fix schema pruning in where clause by...

2018-09-11 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/22357 And FYI this is the Jira issue I promised in https://github.com/apache/spark/pull/22357#issuecomment-419940228 yesterday: https://issues.apache.org/jira/browse/SPARK-25407

[GitHub] spark issue #22394: [SPARK-25406][SQL] For ParquetSchemaPruningSuite.scala, ...

2018-09-11 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/22394 FYI @viirya @dbtsai @gatorsmile @HyukjinKwon Can I get someone's review of this PR please? The unmasked failures appear to be false positives, so no changes to the tested code

[GitHub] spark issue #22357: [SPARK-25363][SQL] Fix schema pruning in where clause by...

2018-09-11 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/22357 This LGTM. I'm not going to submit a PR for my approach to this problem. Thanks @viirya! --- - To unsubscribe, e-mail

[GitHub] spark pull request #22357: [SPARK-25363][SQL] Fix schema pruning in where cl...

2018-09-11 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/22357#discussion_r216714387 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaPruning.scala --- @@ -110,7 +110,17 @@ private[sql

[GitHub] spark issue #22394: [SPARK-25406][SQL] For ParquetSchemaPruningSuite.scala, ...

2018-09-11 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/22394 I'm working on fixing these test failures now. Hopefully I'll have something pushed soon. --- - To unsubscribe, e-mail: reviews

[GitHub] spark issue #22357: [SPARK-25363][SQL] Fix schema pruning in where clause by...

2018-09-11 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/22357 FYI, the PR I previously mentioned about fixing the use of `withSQLConf` is #22394. --- - To unsubscribe, e-mail: reviews

[GitHub] spark pull request #22394: [SPARK-25406][SQL] For ParquetSchemaPruningSuite....

2018-09-11 Thread mallman
GitHub user mallman opened a pull request: https://github.com/apache/spark/pull/22394 [SPARK-25406][SQL] For ParquetSchemaPruningSuite.scala, move calls to `withSQLConf` inside calls to `test` (Link to Jira: https://issues.apache.org/jira/browse/SPARK-25406) ## What

[GitHub] spark issue #22357: [SPARK-25363][SQL] Fix schema pruning in where clause by...

2018-09-11 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/22357 I have some bad news. The methods `testSchemaPruning` and `testMixedCasePruning` do not set the configuration settings as expected. Fixing that reveals 6 failing tests for the mixed case tests. One

[GitHub] spark issue #22357: [SPARK-25363][SQL] Fix schema pruning in where clause by...

2018-09-11 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/22357 @viirya Please amend https://github.com/apache/spark/blob/d684a0f30599d50061ef78ec62edcdd3b726e2d9/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet

[GitHub] spark pull request #22357: [SPARK-25363][SQL] Fix schema pruning in where cl...

2018-09-11 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/22357#discussion_r216686762 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaPruningSuite.scala --- @@ -155,6 +163,60 @@ class

[GitHub] spark pull request #22357: [SPARK-25363][SQL] Fix schema pruning in where cl...

2018-09-11 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/22357#discussion_r216683076 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaPruning.scala --- @@ -110,7 +110,17 @@ private[sql

[GitHub] spark pull request #22357: [SPARK-25363][SQL] Fix schema pruning in where cl...

2018-09-10 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/22357#discussion_r216545091 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaPruning.scala --- @@ -110,7 +110,17 @@ private[sql

[GitHub] spark issue #22357: [SPARK-25363][SQL] Fix schema pruning in where clause by...

2018-09-10 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/22357 > FYI, per further checking code and discussion with @dbtsai regarding with predicate pushdown, we know that predicate push down only works for primitive types on Parquet datasource. So b

[GitHub] spark issue #22357: [SPARK-25363][SQL] Fix schema pruning in where clause by...

2018-09-10 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/22357 > @mallman It will be great that we can have this fix in 2.4 release as this can dramatically reduce the data being read in many applications which is the purpose of the original work.

[GitHub] spark issue #22357: [SPARK-25363][SQL] Fix schema pruning in where clause by...

2018-09-07 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/22357 I have reconstructed my original patch for this issue, but I've discovered it will require more work to complete. However, as part of that reconstruction I've discovered a couple of cases where our

[GitHub] spark issue #22357: [SPARK-25363][SQL] Fix schema pruning in where clause by...

2018-09-07 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/22357 Hi @viirya, Thanks for this PR! I have an alternative implementation which I'd like to submit for comparison. My implementation was something I removed from my original patch. I

[GitHub] spark pull request #15673: [SPARK-17992][SQL] Return all partitions from Hiv...

2018-09-07 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/15673#discussion_r216037341 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/HiveShim.scala --- @@ -586,17 +587,31 @@ private[client] class Shim_v0_13 extends

[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-24 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21320 Thanks everyone for your contributions, support and patience. It's been a journey and a half, and I'm excited for the future. I will open a follow-on PR to address the current known failure

[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-23 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21320 > @mallman Could you remove the changes made in ParquetRowConverter.scala and also turn off spark.sql.nestedSchemaPruning.enabled by default in this PR? D

[GitHub] spark pull request #21320: [SPARK-4502][SQL] Parquet nested column pruning -...

2018-08-23 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/21320#discussion_r212396370 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRowConverter.scala --- @@ -202,11 +204,15 @@ private[parquet

[GitHub] spark pull request #21320: [SPARK-4502][SQL] Parquet nested column pruning -...

2018-08-23 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/21320#discussion_r212388958 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRowConverter.scala --- @@ -202,11 +204,15 @@ private[parquet

[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-22 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21320 @gatorsmile Any concerns about merging this PR at this point? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-21 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21320 @gatorsmile How does this look? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-21 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21320 > Add some test cases when turning on spark.sql.caseSensitive? Will do. --- - To unsubscribe, e-mail: reviews-unsub

[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-20 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21320 > Try this when spark.sql.nestedSchemaPruning.enabled is on? I don't think this will be difficult to fix. I'm working on it now and will add relevant test cover

[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-20 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21320 > Try this when spark.sql.nestedSchemaPruning.enabled is on? This is a case-sensitivity issue (obviously). I'll get to the root of it. Tha

[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-16 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21320 > I see no point of leaving this PR open. I don't agree with you on that point, and I've expressed my view in https://github.com/apache/spark/pull/21889#issuecomment-413655

[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-16 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21889 Essentially, this PR was created to take the management of #21320 out of my hands, with a view towards facilitating its incorporation into Spark 2.4. It was my suggestion, one based in frustration

[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-15 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21320 >> Hello, we've been using your patch at Stripe and we've found something that looks like a new bug: > > Thank you for sharing this, @xinxin-stripe. This is very helpful. I will

[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-15 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21320 > Hello, we've been using your patch at Stripe and we've found something that looks like a new bug: Thank you for sharing this, @xinxin-stripe. This is very helpful. I will investig

[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-15 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21889 > Due to the urgency of the upcoming 2.4 code freeze, I'm going to open this PR to collect any feedback. This can be closed if you prefer to continue to the work in the original

[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-15 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21320 > @mallman if you're planning on making more code changes, would you be willing to work on a shared branch or something? I've been working to incorporate the CR comments. No, howe

[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-14 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21320 >> the window of opportunity to review syntax and style in this PR closed long ago. > Why/when is this window closed? Who closed that? What I wrote above is a coarse approximat

[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-13 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21889 @ajacques I added a commit to enable schema pruning by default. It's a little more complete than your commit to do the same. Please rebase off my branch and remove your commit

[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-13 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21320 > Then should we keep this one or #21889? shall we deduplicate the efforts? I requested to open that because this looks going to be inactive per your comments. As I stated before, I

[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-10 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21889 >> @mallman, while we wait for the go-no-go, do you have the changes for the next PR ready? Is there anything you need help with? > I have the hack I used originally, but I have

[GitHub] spark issue #21320: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-09 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21320 > @mallman, can we close this PR? Are you willing to update here or not? I pushed an update less than a day ago, and I intend to continue pushing updates as nee

[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-09 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21889 > @mallman, while we wait for the go-no-go, do you have the changes for the next PR ready? Is there anything you need help with? I have the hack I used originally, but I haven't tr

[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-09 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21889 @ajacques Please rebase off my branch. @gatorsmile I don't recall seeing that error before. Any idea for how I can reproduce and debug

[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-08 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21889 Are we waiting for @gatorsmile's go-ahead and merge? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-07 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21889 > just for clarification, so now .. there no outstanding bugs, some tests are ignored per #21320 (comment) and left comments were mostly addressed. Did i understand correctly? The igno

[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-07 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21889 See https://github.com/apache/spark/pull/21320#issuecomment-406353694 for @gatorsmile's request to move the changes to `ParquetReadSupport.scala` to another PR. There was another

[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-07 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21889 > Assuming from #21889 (comment), we shouldn't have any identified bug here. What kind of bugs left to be fixed? That bug was address by b50ddb4. We still need to fix the bug underly

[GitHub] spark pull request #21889: [SPARK-4502][SQL] Parquet nested column pruning -...

2018-08-07 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/21889#discussion_r208446828 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaPruningSuite.scala --- @@ -0,0 +1,205

[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-05 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21889 > Alright to make sure we're all on the same page, it sounds like we're ready to merge this PR pending: > > * Successful build by Jenkins > * Any PR comments from

[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-05 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21889 > @mallman Is it related to this revert in ParquetReadSupport.scala? I re-added this logic and all 32 tests in ParquetSchemaPruningSuite passed. Yes. That's what we need to w

[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-05 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21889 I've pushed a commit to restore the original test coverage while also ensuring determinism of the output. Don't ask me how I did it. It's a secret! The test that was failing before

[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-04 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21889 > select id, name.middle, address from temp - Works > select name.middle, address from temp - Fails > select name.middle from temp - Works > select name.middle, id, addre

[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-04 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21889 > Test build #94228 has finished for PR 21889 at commit 92901da. The test failure appears to be unrelated to this PR. Is it just me or has the test suite become flakier in the p

[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-04 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21889 > The tests as committed pass for me, but I removed the order by id and I got that error. Are you saying it works with the specific query in my comment? @ajacques Please try this qu

[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-04 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21889 > The tests as committed pass for me, but I removed the order by id and I got that error. Are you saying it works with the specific query in my comment? Oh! I didn't notice you chan

[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-04 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21889 > @mallman: I've rebased on top of your changes and pushed. I'm seeing the following: That test passes for me locally. Also, I inspected your branch and could not find any err

[GitHub] spark issue #21889: [SPARK-4502][SQL] Parquet nested column pruning - founda...

2018-08-04 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/21889 > @mallman: I've rebased on top of your changes and pushed. I'm seeing the following That's the test case that I "unignored". It was passing. There must be some simple e

  1   2   3   4   5   6   7   >