Github user mallman commented on the issue: https://github.com/apache/spark/pull/16578 > Can you give an example it would fail? We didn't change clipParquetSchema, so I think even when pruning happens, why we read a super set of the file's schema and cause the exception, according to the comment? We won't add new fields but remove existing from the file's schema, right? (Oddly, Github won't let me reply to this comment in line.) The situation we've run into is pruning a schema for a query over a partitioned Hive table backed by parquet files where some files are missing fields specified by the table schema. This can happen, e.g., in schema evolution where fields are added to the table over time without rewriting existing partitions. In those cases, we've found parquet-mr throws an exception if we try to read from that file with table-pruned schema (a superset of that file's schema). Therefore, we further clip the pruned schema against each file's schema before attempting to read.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org