adriangb opened a new pull request, #19389:
URL: https://github.com/apache/datafusion/pull/19389

   ## Summary
   
   This PR extends `get_field` to accept multiple field name arguments for 
nested struct/map access, enabling `get_field(col, 'a', 'b', 'c')` as 
equivalent to `col['a']['b']['c']`.
   
   **The primary motivation is to make it easier for downstream optimizations 
to match on and optimize struct/map field access patterns.** By representing 
`col['a']['b']['c']` as a single `get_field(col, 'a', 'b', 'c')` call rather 
than nested `get_field(get_field(get_field(col, 'a'), 'b'), 'c')` calls, 
optimization rules can more easily identify and transform field access patterns.
   
   ## Changes
   
   - **Variadic signature**: `get_field` now accepts 2+ arguments (base + one 
or more field names)
   - **Type validation at planning time**: Accessing a field on a 
non-struct/map type (e.g., `get_field({a: 1}, 'a', 'b')`) fails during planning 
with a clear error message indicating which argument position caused the failure
   - **Bracket syntax optimization**: The `FieldAccessPlanner` now merges 
consecutive bracket accesses into a single `get_field` call (e.g., 
`s['a']['b']` → `get_field(s, 'a', 'b')`)
   - **Mixed access handling**: Array index access correctly breaks the 
batching (e.g., `s['a'][0]['b']` → `get_field(array_element(get_field(s, 'a'), 
0), 'b')`)
   
   ## Example
   
   ```sql
   -- Direct function call with nested access
   SELECT get_field(my_struct, 'outer', 'inner', 'value');
   
   -- Equivalent bracket syntax (now optimized to single get_field)
   SELECT my_struct['outer']['inner']['value'];
   
   -- EXPLAIN shows single get_field call
   EXPLAIN SELECT s['a']['b'] FROM t;
   -- Projection: get_field(t.s, Utf8("a"), Utf8("b"))
   ```
   
   ## Backwards Compatibility
   
   - The original 2-argument form `get_field(struct, 'field')` continues to 
work unchanged
   - Existing queries using bracket syntax will automatically benefit from the 
optimization
   
   ## Test plan
   
   - [x] Backwards compatibility test for 2-argument form
   - [x] Multi-level get_field with 2, 3, and 5 levels of nesting
   - [x] Type validation error tests at argument positions 2, 3, 4
   - [x] Non-existent field error tests
   - [x] Null handling (null at base, null in middle of chain)
   - [x] Mixed array/struct access (verifies array index breaks batching)
   - [x] Nullable parent propagation
   - [x] EXPLAIN test verifying single get_field call for bracket syntax
   - [x] Minimum argument validation (0 and 1 argument cases)
   
   🤖 Generated with [Claude Code](https://claude.com/claude-code)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to