adriangb opened a new pull request, #19389:
URL: https://github.com/apache/datafusion/pull/19389
## Summary
This PR extends `get_field` to accept multiple field name arguments for
nested struct/map access, enabling `get_field(col, 'a', 'b', 'c')` as
equivalent to `col['a']['b']['c']`.
**The primary motivation is to make it easier for downstream optimizations
to match on and optimize struct/map field access patterns.** By representing
`col['a']['b']['c']` as a single `get_field(col, 'a', 'b', 'c')` call rather
than nested `get_field(get_field(get_field(col, 'a'), 'b'), 'c')` calls,
optimization rules can more easily identify and transform field access patterns.
## Changes
- **Variadic signature**: `get_field` now accepts 2+ arguments (base + one
or more field names)
- **Type validation at planning time**: Accessing a field on a
non-struct/map type (e.g., `get_field({a: 1}, 'a', 'b')`) fails during planning
with a clear error message indicating which argument position caused the failure
- **Bracket syntax optimization**: The `FieldAccessPlanner` now merges
consecutive bracket accesses into a single `get_field` call (e.g.,
`s['a']['b']` → `get_field(s, 'a', 'b')`)
- **Mixed access handling**: Array index access correctly breaks the
batching (e.g., `s['a'][0]['b']` → `get_field(array_element(get_field(s, 'a'),
0), 'b')`)
## Example
```sql
-- Direct function call with nested access
SELECT get_field(my_struct, 'outer', 'inner', 'value');
-- Equivalent bracket syntax (now optimized to single get_field)
SELECT my_struct['outer']['inner']['value'];
-- EXPLAIN shows single get_field call
EXPLAIN SELECT s['a']['b'] FROM t;
-- Projection: get_field(t.s, Utf8("a"), Utf8("b"))
```
## Backwards Compatibility
- The original 2-argument form `get_field(struct, 'field')` continues to
work unchanged
- Existing queries using bracket syntax will automatically benefit from the
optimization
## Test plan
- [x] Backwards compatibility test for 2-argument form
- [x] Multi-level get_field with 2, 3, and 5 levels of nesting
- [x] Type validation error tests at argument positions 2, 3, 4
- [x] Non-existent field error tests
- [x] Null handling (null at base, null in middle of chain)
- [x] Mixed array/struct access (verifies array index breaks batching)
- [x] Nullable parent propagation
- [x] EXPLAIN test verifying single get_field call for bracket syntax
- [x] Minimum argument validation (0 and 1 argument cases)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]