Re: Cannot read case-sensitive Glue table backed by Parquet

2020-01-17 Thread oripwk
Sorry, but my original solution is incorrect 1. Glue Crawlers are not supposed to set the spark.sql.sources.schema.* properties, but Spark SQL should. The default in Spark 2.4 for spark.sql.hive.caseSensitiveInferenceMode is INFER_AND_SAVE which means that Spark infers the schema from the

Re: Cannot read case-sensitive Glue table backed by Parquet

2020-01-17 Thread oripwk
This bug happens because the Glue table's SERDEPROPERTIES is missing two important properties: spark.sql.sources.schema.numParts spark.sql.sources.schema.part.0 To solve the problem, I had to add those two properties via the Glue console (couldn't do it with ALTER TABLE …) I guess

Cannot read case-sensitive Glue table backed by Parquet

2020-01-16 Thread oripwk
Spark version: 2.4.2 on Amazon EMR 5.24.0 I have a Glue Catalog table backed by S3 Parquet directory. The Parquet files have case-sensitive column names (like /lastModified/). It doesn't matter what I do, I get lowercase column names (/lastmodified/) when reading the Glue Catalog table with