Sorry, but my original solution is incorrect
1. Glue Crawlers are not supposed to set the spark.sql.sources.schema.*
properties, but Spark SQL should. The default in Spark 2.4 for
spark.sql.hive.caseSensitiveInferenceMode is INFER_AND_SAVE which means that
Spark infers the schema from the
This bug happens because the Glue table's SERDEPROPERTIES is missing two
important properties:
spark.sql.sources.schema.numParts
spark.sql.sources.schema.part.0
To solve the problem, I had to add those two properties via the Glue console
(couldn't do it with ALTER TABLE …)
I guess
Spark version: 2.4.2 on Amazon EMR 5.24.0
I have a Glue Catalog table backed by S3 Parquet directory. The Parquet
files have case-sensitive column names (like /lastModified/). It doesn't
matter what I do, I get lowercase column names (/lastmodified/) when reading
the Glue Catalog table with