jorgecarleitao commented on pull request #7967:
URL: https://github.com/apache/arrow/pull/7967#issuecomment-678781518


   The code you pointed to reads `return_type: DataType`. I will assume you 
mean the return type declared in `Expr::ScalarFunctions`.
   
   Two minds thinking alike: I was just trying to do that in the codebase.
   
   Unfortunately, I do not think that that is sufficient 😞 : when a projection 
is declared, we need to resolve its schema's type, which we do via 
`Expr::get_type`. If we do not have the UDF's `return_type` on 
`Expr::ScalarFunction`, we can't know its return type, which means we can't 
even project (even before optimizations).
   
   But to get the UDF's `DataType`, we need to access the UDF's registry. What 
we currently do is let the user decide the `DataType` for us in the logical 
plane via the call `scalar_function("name", vec![args..], DATATYPE)`. 
Unfortunately, this means that the user needs to know the return type of the 
UDF, or it will all break during planning, when the physical plan has nothing 
to do with the logical one. I would prefer that the user does not have to have 
this burden: it registers a UDF with the type, and then just plans a call 
without its return type, during planning.
   
   I am formalizing a proposal to address this. The gist is that we can't have 
"meta" of UDFs in the logical plan: they need to know their return type, which 
means that we need to access the registry during planning.
   
   I am developing some ideas for this 
[here](https://docs.google.com/document/d/1Kzz642ScizeKXmVE1bBlbLvR663BKQaGqVIyy9cAscY/edit?usp=sharing).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to