[PR] fix: [df52] schema pruning crash on complex nested types [datafusion-comet]

via GitHub Thu, 12 Feb 2026 08:23:22 -0800


andygrove opened a new pull request, #3500:
URL: https://github.com/apache/datafusion-comet/pull/3500


   ## Summary
   
   - Fix column index mismatch in `init_datasource_exec` when `data_schema` is 
provided but `projection_vector` is `None` (the NativeBatchReader / 
`native_iceberg_compat` path)
   - When the pruned `required_schema` was used as the base schema, DataFusion 
thought the table had only the pruned columns, causing `PhysicalExprAdapter` to 
misalign physical and logical column indices (e.g., mapping logical "friends" 
at index 0 to physical "id" at index 0)
   - Now computes a projection vector by mapping required field names to their 
indices in the full `data_schema`, so DataFusion correctly knows the full file 
schema and selects only the needed columns
   - Hardens `wrap_all_type_mismatches` to use name-based lookup for physical 
fields instead of fragile positional index
   
   ## Test plan
   
   - [ ] Verify ~44 schema pruning tests pass in `spark-sql-auto-sql_core-2` CI 
jobs
   - [ ] Key tests: `select a single complex field array and in clause`, 
`select nested field from a complex map key using map_keys`, `SPARK-34638: 
nested column prune on generator output`
   - [ ] Verify no regressions in other CI jobs
   
   🤖 Generated with [Claude Code](https://claude.com/claude-code)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] fix: [df52] schema pruning crash on complex nested types [datafusion-comet]

Reply via email to