Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/16797 @budde Spark does support mixed-case-schema tables, and it has always been. It's because we write table schema to metastore case-preserving, via table properties. When we read a table, we get schema from metastore and assume it's the schema of the table data files. So the data file schema must match the table schema, or Spark will fail, it has always been. However, there is one exception. There are 2 kinds of tables in Spark: data source tables and hive serde tables(we have different SQL syntax to create them). Data source tables are totally managed by Spark, we read/write data files directly and only use hive metastore as a persistent layer, which means data source tables are not compatible with hive, hive can't read/write it. For hive serde tables, it should be compatible with hive and we use hive api to read/write it. For any table, as long as hive can read it, Spark can read it. However, the exception is, for parquet and orc formats, we will read data files directly, as an optimization(reading using hive api is slow). Before Spark 2.1, we save schema to hive metastore directly, which means schema will be lowercased. w.r.t. this, ideally we should not support mixed-case-schema parquet/orc data files for this kind of table, or the data schema will mismatch the table schema. But we supported it, with the cost of runtime schema inference. This problem was solved in Spark 2.1, by writing table schema to metastore case-preserving for hive serde tables. Now we can say that, the data schema must match the table schema, or Spark should fail. Then comes to this problem: for parquet/orc format hive serde tables created by Spark prior to 2.1, the data file schema may not match the table schema, but we need to still support it for compatibility. That's why I prefer the migration command approach, it keeps the concept clean: data schema must match table schema. Like you said, users can still create a hive table with mixed-case-schema parquet/orc files, by hive or other systems like presto. This table is readable for hive, and for Spark prior to 2.1, because of the runtime schema inference. But this is not intentional, and Spark should not support it as the data file schema and table schema mismatch. We can make the migration command cover this case too.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org