linhr commented on PR #18921: URL: https://github.com/apache/datafusion/pull/18921#issuecomment-3668603319
This is an exciting initiative! I skimmed through the PR, and I have some thoughts regarding how this can fit into DataFusion's existing setup. I wonder if we can have `Expr::LambdaFunction(LambdaFunction)`, similar to `Expr::ScalarFunction(ScalarFunction)`? Here are my reasoning: 1. `LambdaFunction` can represent a "resolved" lambda function call in the logical plan. In contrast, `Expr::Lambda` and `Expr::LambdaColumn` are only fragments whose data types etc. are not well-defined which may be challenging to work with in other parts of the library (e.g. `ExprSchemable`). 2. I feel `ScalarUDF` may not fit cleanly for lambda functions, so we can then have a separate abstraction (e.g. `LambdaUDF`) inside `Expr::LambdaFunction`. The `ArrayTransform` example shows that we would have to use `ScalarFunctionArgs` which was not originally designed for the lambda use case. (IMO when the function is actually "invoked" with Arrow arrays, the lambda parameter should have been resolved and removed from the argument list.) 3. Once we have a self-contained logical expression, the entire query analysis flow might become easier to reason about (from `SqlToRel` to `Expr::LambdaFunction` to a dedicated `PhysicalExpr` and the corresponding physical planner). This is just my rough thoughts. Happy to discuss! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
