andygrove opened a new pull request, #3687: URL: https://github.com/apache/datafusion-comet/pull/3687
## Which issue does this PR close? Related to #3682. ## Rationale for this change When `spark.sql.caseSensitive=false` (the default) and a Parquet schema contains field names that collide after lowercasing (e.g., `Name` and `name`), DataFusion produces different error messages than Spark. This causes the `SPARK-25207: exception when duplicate fields in case-insensitive mode` spark-sql test to fail when using `native_datafusion`. ## What changes are included in this PR? Adds a guard in `nativeDataFusionScan()` that detects duplicate field names (after lowercasing) in the required schema when case-insensitive analysis is enabled. When duplicates are found, the scan falls back to avoid incompatible error behavior. ## How are these changes tested? Covered by the existing `SPARK-25207` test in the spark-sql test suite, which verifies the correct error behavior for duplicate fields in case-insensitive mode. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
