andygrove opened a new issue, #4316: URL: https://github.com/apache/datafusion-comet/issues/4316
## Describe the bug When the `native_datafusion` scan adapter rejects an incompatible Parquet column read, the resulting `SparkError::ParquetSchemaConvert` carries an empty `file_path`. The JVM shim translates this to a `SparkException` whose message reads: ``` Parquet column cannot be converted in file . Column: [a], Expected: int, Found: BINARY. ``` (Note the empty path between `in file` and `.`.) Spark's vectorized reader populates this path via `FileScanRDD`'s catch block (`currentFile.urlEncodedPath`), so its message reads e.g. `... in file file:/tmp/.../part-00000.parquet. Column: ...`. This blocks several Spark SQL tests that extract the path from the message and re-open the file (e.g. `ParquetSchemaSuite > schema mismatch failure error message for parquet vectorized reader`). ## Where the gap is `SparkPhysicalExprAdapter::replace_with_spark_cast` and the deferred `RejectOnNonEmpty` expression build the error with `file_path: String::new()` because `PhysicalExprAdapterFactory::create` does not receive the file path. Fixing this likely requires either: - Capturing the file path when the per-file adapter is created (would need a DataFusion API extension), or - Catching `ParquetSchemaConvert` at a higher layer with file context (e.g. the parquet `ScanExec`/`FileOpener` wrapper) and re-raising with the path filled in. ## Repro `./dev/diffs/3.4.3.diff` has the test currently tagged with `IgnoreCometNativeDataFusion` pointing at this issue. Drop the tag and run: ``` ENABLE_COMET=true ENABLE_COMET_ONHEAP=true build/sbt "sql/testOnly *ParquetSchemaSuite -- -z 'schema mismatch failure error message for parquet vectorized reader'" ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
