Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/22343#discussion_r216204114 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetOptions.scala --- @@ -69,12 +69,25 @@ class ParquetOptions( .get(MERGE_SCHEMA) .map(_.toBoolean) .getOrElse(sqlConf.isParquetSchemaMergingEnabled) + + /** + * How to resolve duplicated field names. By default, parquet data source fails when hitting + * duplicated field names in case-insensitive mode. When converting hive parquet table to parquet + * data source, we need to ask parquet data source to pick the first matched field - the same + * behavior as hive parquet table - to keep behaviors consistent. + */ + val duplicatedFieldsResolutionMode: String = { + parameters.getOrElse(DUPLICATED_FIELDS_RESOLUTION_MODE, --- End diff -- whether we have a SQL config for it or not, we must define an option here. The conversion happens per-query, so we must have a per-query option to switch the behavior, instead of a per-session SQL config.
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org