alamb commented on pull request #7967:
URL: https://github.com/apache/arrow/pull/7967#issuecomment-683411727
> With this said, as an exercise, let me try to write how I imagine an
interface could look like for option 3, just to check if I have the same
understanding as you do.
I think I had a slightly different idea. Here is one idea for an interface
for defining UDFs that I think covers all the cases you have in mind (though it
doesn't talk about implementation at all):
## UDF Registration:
```
trait UDF {
// return the name that the user refers to invoke this function
fn name(&self) -> &str;
// Return desired argument types.
// If desired type is "None" then no type coercion is done and any number
of arguments
// are accepted during logical planning.
// if desired type is a slice, the logical planner will require the
function is called with exactly that number
// of arguments and will attempt to coerce arguments into these types. If
any type is `None` then no coercion
// will be done on that argument
fn desired_argument_types(&self) -> Option<&[<Option<DataType>>]>
// given the specified argument types, returns true if this function can
fn valid_argument_types(arg_types: &[DataType]) -> bool
// create the appropriate PhysicalExpression
fn make_physical_expr(&self, arg_types: &[DataType]) -> Box<dyn
PhysicalExpr>;
}
```
Here is an sketch of registering sqrt with both 32 and 64 variants:
```
struct sqrt_32 {}
impl UDF for sqrt_32 {
fn name(&self) { "sqrt"}
fn desired_argument_types(&self) { [Float32] }
fn valid_argument_types(arg_types: &[DataType]) { arg_types == [Float32] }
fn make_physical_expr(&self, arg_types: &[DataType]) {...}
}
struct sqrt_64 {}
impl UDF for sqrt_64 {
fn name(&self) { "sqrt"}
fn desired_argument_types(&self) { [Float64] }
fn valid_argument_types(arg_types: &[DataType]) { arg_types == [Float64] }
fn make_physical_expr(&self, arg_types: &[DataType]) {...}
}
```
The user would write `"sqrt(c)" `and then the type coercion logic would
change that to `sqrt_64(cast c as Float64)` or perhaps `sqrt_32(c)` (if c was
float 32).
And you can imagine the type coercion logic hitting a `sqrt` function, and
then trying to coerce arguments to Float32 first to match the first, and if
that wasn't possible, try to coerce to Float64
Here is an example of "concat" that can take two exactly two arguments of
the same type
```
struct concat {}
impl UDF for concat {
fn name(&self) { "concat"}
fn desired_argument_types(&self) { [None, None] }
fn valid_argument_types(arg_types: &[DataType]) { arg_types.len() == 2 &&
arg_types[0] == arg_types[1] }
fn make_physical_expr(&self, arg_types: &[DataType]) {...}
}
```
Here is an example of a `array`
```
struct array {}
impl UDF for array {
fn name(&self) { "array"}
fn desired_argument_types(&self) { None }
fn valid_argument_types(arg_types: &[DataType]) { ... custom logic to make
sure all types are the same here ... }
fn make_physical_expr(&self, arg_types: &[DataType]) {...}
}
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]