andygrove opened a new issue, #3318: URL: https://github.com/apache/datafusion-comet/issues/3318
## Summary ~130 tests in `SchemaPruningSuite` (both "Spark vectorized reader" and "Non-vectorized reader" variants, with and without partition data columns) fail because `CometNativeScan` is not recognized as a file source scan node. ## Error Pattern All failures have the same error: ``` 0 did not equal 1 Found 0 file sources in dataframe, but expected ArraySeq(struct<...>) ``` The test infrastructure looks for `FileSourceScanExec` or `BatchScanExec` nodes in the query plan to verify schema pruning. `CometNativeScan` is neither of these, so the tests find 0 file sources and fail. ## Failing Tests - All `SchemaPruningSuite` tests including: select complex fields, nested field pruning, correlated subqueries, case-insensitive schema, generator output, Expand/Sort/Window, etc. - `Case-insensitive parser` variants from the same suite - `SPARK-37450: Prunes unnecessary fields from Explode for count aggregation` ## Root Cause `CometNativeScan` doesn't extend or isn't matched by the plan inspection utilities that look for file source scan nodes. The tests verify that schema pruning pushes the correct pruned schema down to the scan, but can't find the scan node to inspect. This is both a test infrastructure issue (tests don't know about `CometNativeScan`) and potentially a functional concern (schema pruning may not be happening the same way in native_datafusion). ## Related Discovered in CI for #3307 (enable native_datafusion in auto scan mode). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
