milenkovicm commented on code in PR #9304:
URL: https://github.com/apache/arrow-datafusion/pull/9304#discussion_r1511347050


##########
datafusion/core/tests/user_defined/user_defined_scalar_functions.rs:
##########
@@ -514,6 +519,97 @@ async fn deregister_udf() -> Result<()> {
     Ok(())
 }
 
+#[derive(Debug)]
+struct CastToI64UDF {
+    signature: Signature,
+}
+
+impl CastToI64UDF {
+    fn new() -> Self {
+        Self {
+            signature: Signature::any(1, Volatility::Immutable),
+        }
+    }
+}
+
+impl ScalarUDFImpl for CastToI64UDF {
+    fn as_any(&self) -> &dyn Any {
+        self
+    }
+    fn name(&self) -> &str {
+        "cast_to_i64"
+    }
+    fn signature(&self) -> &Signature {
+        &self.signature
+    }
+    fn return_type(&self, _args: &[DataType]) -> Result<DataType> {
+        Ok(DataType::Int64)
+    }
+    // Wrap with Expr::Cast() to Int64
+    fn simplify(
+        &self,
+        mut args: Vec<Expr>,
+        info: &dyn SimplifyInfo,
+    ) -> Result<ExprSimplifyResult> {
+        // Note that Expr::cast_to requires an ExprSchema but simplify gets a
+        // SimplifyInfo so we have to replicate some of the casting logic here.
+        let source_type = info.get_data_type(&args[0])?;
+        if source_type == DataType::Int64 {
+            Ok(ExprSimplifyResult::Original(args))
+        } else {
+            // DataFusion should have ensured the function is called with just 
a
+            // single argument
+            assert_eq!(args.len(), 1);
+            let e = args.pop().unwrap();
+            Ok(ExprSimplifyResult::Simplified(Expr::Cast(
+                datafusion_expr::Cast {
+                    expr: Box::new(e),
+                    data_type: DataType::Int64,
+                },
+            )))
+        }
+    }
+    // Casting should be done in `simplify`, so we just return the first 
argument
+    fn invoke(&self, args: &[ColumnarValue]) -> Result<ColumnarValue> {
+        assert_eq!(args.len(), 1);
+        Ok(args.first().unwrap().clone())

Review Comment:
   To put some context to my comment, let's say if we define function `f(INT, 
INT) = $1 + $2` we can eliminate UDF call with `Alias($1 + $2, "f(a,b)" )` and 
get UDF free plan, which would be easier to distribute across ballista cluster



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to