[GitHub] [arrow] alamb commented on pull request #7967: ARROW-9751: [Rust] [DataFusion] Allow UDFs to accept multiple data types per argument

GitBox Sun, 30 Aug 2020 04:59:37 -0700


alamb commented on pull request #7967:
URL: https://github.com/apache/arrow/pull/7967#issuecomment-683411727



   > With this said, as an exercise, let me try to write how I imagine an 
interface could look like for option 3, just to check if I have the same 
understanding as you do.
   
   I think I had a slightly different idea. Here is one idea for an interface 
for defining UDFs that I think covers all the cases you have in mind (though it 
doesn't talk about implementation at all):
   
   ## UDF Registration:
   
   ```
   trait UDF {
     // return the name that the user refers to invoke this function
     fn name(&self) -> &str;
   
     // Return desired argument types. 
     // If desired type is "None" then no type coercion is done and any number 
of arguments
     // are accepted during logical planning. 
     // if desired type is a slice, the logical planner will require the 
function is called with exactly that number
     // of arguments and  will attempt to coerce arguments into these types. If 
any type is `None` then no coercion 
     // will be done on that argument
     fn desired_argument_types(&self) -> Option<&[<Option<DataType>>]>
   
     // given the specified argument types, returns true if this function can 
     fn valid_argument_types(arg_types: &[DataType]) -> bool
   
     //  create the appropriate PhysicalExpression
     fn make_physical_expr(&self, arg_types: &[DataType]) -> Box<dyn 
PhysicalExpr>;
   }
   ```
   
   Here is an sketch of registering sqrt with both 32 and 64 variants:
   ```
   struct sqrt_32 {}
   impl UDF for sqrt_32 {
     fn name(&self) { "sqrt"}
     fn desired_argument_types(&self) { [Float32] }
     fn valid_argument_types(arg_types: &[DataType]) { arg_types == [Float32] }
     fn make_physical_expr(&self, arg_types: &[DataType]) {...}
   }
   
   struct sqrt_64 {}
   impl UDF for sqrt_64 {
     fn name(&self) { "sqrt"}
     fn desired_argument_types(&self) { [Float64] }
     fn valid_argument_types(arg_types: &[DataType]) { arg_types == [Float64] }
     fn make_physical_expr(&self, arg_types: &[DataType]) {...}
   }
   
   ```
   
   The user would write `"sqrt(c)" `and then the type coercion logic would 
change that to `sqrt_64(cast c as Float64)` or perhaps `sqrt_32(c)` (if c was 
float 32). 
   
   And you can imagine the type coercion logic hitting a `sqrt` function, and 
then trying to coerce arguments to Float32 first to match the first, and if 
that wasn't possible, try to coerce to Float64
   
   Here is an example of "concat" that can take two exactly two arguments of 
the same type
   ```
   struct concat {}
   impl UDF for concat {
     fn name(&self) { "concat"}
     fn desired_argument_types(&self) { [None, None] }
     fn valid_argument_types(arg_types: &[DataType]) { arg_types.len() == 2 && 
arg_types[0] == arg_types[1] }
     fn make_physical_expr(&self, arg_types: &[DataType]) {...}
   }
   ```
   
   
   Here is an example of a `array` 
   ```
   struct array {}
   impl UDF for array {
     fn name(&self) { "array"}
     fn desired_argument_types(&self) { None }
     fn valid_argument_types(arg_types: &[DataType]) { ... custom logic to make 
sure all types are the same here ... }
     fn make_physical_expr(&self, arg_types: &[DataType]) {...}
   }
   ```
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] alamb commented on pull request #7967: ARROW-9751: [Rust] [DataFusion] Allow UDFs to accept multiple data types per argument

Reply via email to