Github user mallman commented on the issue: https://github.com/apache/spark/pull/21320 > Could we move the changes made in ParquetReadSupport.scala to a separate PR? Then, we can merge this PR very quickly. If I remove the changes to `ParquetReadSupport.scala`, then four tests fail in `ParquetSchemaPruningSuite.scala`. I don't think we should/can proceed without addressing the issue of reading from two parquet files with identical column names and types but different ordering of those columns in their respective file schema. Personally, I think the fact that the Spark parquet reader appears to assume the same column order in otherwise compatible schema across files is a bug. I think column selection should be by name, not index. The parquet-mr reader behaves that way. As a stop-gap alternative, I suppose we could disable the built-in reader if parquet schema pruning is turned on. But I think that would be a rather ugly, invasive and confusing hack. Of course I'm open to other ideas as well.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org