evangelisilva opened a new pull request, #20376:
URL: https://github.com/apache/datafusion/pull/20376

   # UDTF Argument Coercion Suppression
   
   ## Which issue does this PR close?
   
   Closes #20293.
   
   ## Rationale for this change
   
   Currently, User-Defined Table Functions (UDTFs) in DataFusion automatically 
undergo argument coercion and simplification before being passed to the 
function creator. This process happens against an empty schema 
(`DFSchema::empty()`). 
   
   If a UDTF uses arguments that contain identifiers (e.g., 
[scan_with(index=['a', 
'b'])](cci:1://file:///Users/evangelisilva/Documents/datafusion/datafusion/catalog/src/table.rs:173:4-202:5)),
 the simplifier fails with a `Schema error: No field named index` because it 
attempts to resolve 
[index](cci:1://file:///Users/evangelisilva/Documents/datafusion/datafusion/datasource-parquet/src/opener.rs:971:0-1000:1)
 as a column reference. This prevents UDTFs from implementing custom argument 
parsing logic that relies on identifiers or complex expressions.
   
   ## What changes are included in this PR?
   
   1.  **Modified 
[TableFunctionImpl](cci:2://file:///Users/evangelisilva/Documents/datafusion/datafusion/catalog/src/table.rs:488:0-497:1)
 trait**: Added a new method [coerce_arguments(&self) -> 
bool](cci:1://file:///Users/evangelisilva/Documents/datafusion/datafusion/catalog/src/table.rs:529:4-532:5)
 that defaults to 
[true](cci:1://file:///Users/evangelisilva/Documents/datafusion/datafusion/physical-expr/src/expressions/binary.rs:5218:4-5240:5).
 This allows UDTF authors to opt-out of automatic coercion.
   2.  **Updated 
[TableFunction](cci:2://file:///Users/evangelisilva/Documents/datafusion/datafusion/catalog/src/table.rs:501:0-506:1)
 struct**: Exposed 
[coerce_arguments](cci:1://file:///Users/evangelisilva/Documents/datafusion/datafusion/catalog/src/table.rs:529:4-532:5)
 on the wrapper struct.
   3.  **Updated 
[SessionContextProvider](cci:2://file:///Users/evangelisilva/Documents/datafusion/datafusion/core/src/execution/session_state.rs:1792:0-1795:1)**:
 Modified the SQL planner integration in 
[session_state.rs](cci:7://file:///Users/evangelisilva/Documents/datafusion/datafusion/core/src/execution/session_state.rs:0:0-0:0)
 to check the 
[coerce_arguments](cci:1://file:///Users/evangelisilva/Documents/datafusion/datafusion/catalog/src/table.rs:529:4-532:5)
 flag. If `false`, the raw 
[Expr](cci:2://file:///Users/evangelisilva/Documents/datafusion/datafusion/physical-expr/src/expressions/cast.rs:62:0-69:1)
 arguments are passed directly to the UDTF creator without modification.
   4.  **Unit Tests**: Added comprehensive tests in 
[session_state.rs](cci:7://file:///Users/evangelisilva/Documents/datafusion/datafusion/core/src/execution/session_state.rs:0:0-0:0)
 to verify both the default behavior (automatic coercion/failure on 
identifiers) and the new suppressed behavior (allowing identifiers).
   
   ## Are these changes tested?
   
   Yes. I've added a new test module `udtf_tests` in 
[datafusion/core/src/execution/session_state.rs](cci:7://file:///Users/evangelisilva/Documents/datafusion/datafusion/core/src/execution/session_state.rs:0:0-0:0)
 containing:
   - 
[test_udtf_no_coercion](cci:1://file:///Users/evangelisilva/Documents/datafusion/datafusion/core/src/execution/session_state.rs:2578:4-2609:5):
 Verifies that identifiers survive when coercion is disabled.
   - 
[test_udtf_default_coercion](cci:1://file:///Users/evangelisilva/Documents/datafusion/datafusion/core/src/execution/session_state.rs:2611:4-2654:5):
 Verifies that the existing behavior (failing on identifiers) is preserved by 
default to ensure no regressions.
   
   ## Are there any user-facing changes?
   
   Yes. There is a new method on the 
[TableFunctionImpl](cci:2://file:///Users/evangelisilva/Documents/datafusion/datafusion/catalog/src/table.rs:488:0-497:1)
 trait. However, because it has a default implementation that returns 
[true](cci:1://file:///Users/evangelisilva/Documents/datafusion/datafusion/physical-expr/src/expressions/binary.rs:5218:4-5240:5),
 it is **backward compatible** and will not break existing UDTF 
implementations. UDTF authors who need the new behavior simply need to override 
this method.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to