paleolimbot commented on code in PR #4459:
URL: https://github.com/apache/datafusion-comet/pull/4459#discussion_r3314058328


##########
native/proto/src/proto/expr.proto:
##########
@@ -530,3 +531,22 @@ message JvmScalarUdf {
   // Whether the result column may contain nulls.
   bool return_nullable = 4;
 }
+
+// Call to a user-supplied Rust UDF loaded from a cdylib.
+//
+// The native side resolves (library_path, name) against its loaded-library
+// cache, looks up the kernel by name, and invokes it through whichever ABI
+// flavor (C ABI / datafusion-ffi) the cdylib registered the kernel under.
+message RustUdfCall {
+  // Function name as registered through CometRustUDF.register on the JVM
+  // side; matched against names exposed by the cdylib.
+  string name = 1;
+  // Filesystem path of the cdylib.
+  string library_path = 2;

Review Comment:
   I see the motivation for having this primarily live as shared objects! I am 
not sure if there's an opportunity / whether it's any easier to bundle these 
UDFs as .jars or Python packages but that could be implemented as a future 
field of this struct.



##########
native/proto/src/proto/expr.proto:
##########
@@ -530,3 +531,22 @@ message JvmScalarUdf {
   // Whether the result column may contain nulls.
   bool return_nullable = 4;
 }
+
+// Call to a user-supplied Rust UDF loaded from a cdylib.
+//
+// The native side resolves (library_path, name) against its loaded-library
+// cache, looks up the kernel by name, and invokes it through whichever ABI
+// flavor (C ABI / datafusion-ffi) the cdylib registered the kernel under.
+message RustUdfCall {
+  // Function name as registered through CometRustUDF.register on the JVM
+  // side; matched against names exposed by the cdylib.
+  string name = 1;
+  // Filesystem path of the cdylib.
+  string library_path = 2;
+  // Argument expressions, evaluated before invocation.
+  repeated Expr args = 3;
+  // Expected return type, declared at register time on the JVM side.
+  DataType return_type = 4;

Review Comment:
   Is a fixed return type a requirement? (DataFusion lets you compute this on 
demand but perhaps there's a limiation here)



##########
native/comet-udf-sdk/src/df_abi.rs:
##########
@@ -0,0 +1,239 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! datafusion-ffi flavor.
+//!
+//! Discovery returns a list of `FFI_ScalarUDF` values produced by
+//! `datafusion_ffi`. The host imports each via
+//! `ForeignScalarUDF::try_from`, yielding a `ScalarUDFImpl` it can plug
+//! straight into its existing planner — no further adaptation needed.

Review Comment:
   Also cool! (And hard to argue with the compact implementation!)



##########
native/comet-udf-sdk/src/c_abi.rs:
##########
@@ -0,0 +1,707 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+//! Pure C ABI flavor — sedona-style.
+//!
+//! The wire format is two `#[repr(C)]` structs of function pointers,
+//! parameterized only by Arrow's C Data Interface
+//! (`FFI_ArrowSchema` / `FFI_ArrowArray`). No DataFusion types appear in
+//! the FFI surface, so the user's cdylib only needs a matching `arrow`
+//! crate, not a matching `datafusion` version.
+//!
+//! # Authoring a UDF
+//!
+//! Implement [`CometCScalarUdf`] for a unit struct and use the
+//! [`comet_c_udf_export!`] macro to emit the discovery entry point:
+//!
+//! ```ignore
+//! use comet_udf_sdk::c_abi::*;
+//! use arrow::array::{ArrayRef, Int64Array};
+//! use arrow::datatypes::{DataType, Field};
+//! use std::sync::Arc;
+//!
+//! pub struct AddOne;
+//! impl CometCScalarUdf for AddOne {
+//!     fn name(&self) -> &str { "add_one_c" }
+//!     fn return_field(&self, args: &[Field]) -> Result<Field, String> {
+//!         if args.len() != 1 || args[0].data_type() != &DataType::Int64 {
+//!             return Err("expected (Int64) -> Int64".into());
+//!         }
+//!         Ok(Field::new("add_one_c", DataType::Int64, true))
+//!     }
+//!     fn invoke(&self, args: &[ArrayRef], _n: usize) -> Result<ArrayRef, 
String> {
+//!         let a = args[0].as_any().downcast_ref::<Int64Array>().unwrap();
+//!         Ok(Arc::new(a.iter().map(|v| v.map(|x| x + 
1)).collect::<Int64Array>()))
+//!     }
+//! }
+//!
+//! comet_udf_sdk::comet_c_udf_export!(AddOne);
+//! ```
+
+use std::ffi::{c_char, c_int, c_void};
+
+use arrow::ffi::{FFI_ArrowArray, FFI_ArrowSchema};
+
+/// Generic non-zero error code returned by `init` / `execute` to signal
+/// failure. The host treats any non-zero return as an error and calls
+/// `get_last_error` for the message; the specific code is informational.
+const C_ABI_ERR: c_int = 1;
+
+// -- factory struct --------------------------------------------------------
+
+/// Factory for [`CometCScalarKernelImpl`] instances.
+///
+/// Lives in a registry, may be cloned across an FFI boundary. Calls to
+/// `function_name` and `new_impl` must be thread-safe (the implementation
+/// is responsible for any internal synchronization).
+///
+/// `#[repr(C)]` layout, matched by the host loader. Adding new fields
+/// requires bumping `COMET_UDF_ABI_VERSION`.
+#[repr(C)]
+pub struct CometCScalarKernel {
+    /// Return the function name this kernel implements as a NUL-terminated
+    /// UTF-8 C string. The pointer must remain valid for the lifetime of
+    /// the [`CometCScalarKernel`].
+    ///
+    /// May be `None`, in which case the kernel is treated as anonymous and
+    /// won't be discoverable by name. (Comet always sets this; field is
+    /// optional for parity with sedona's design.)
+    pub function_name: Option<unsafe extern "C" fn(*const CometCScalarKernel) 
-> *const c_char>,
+
+    /// Initialize a new [`CometCScalarKernelImpl`] into `out`. Called once
+    /// per execution, on the thread that will then drive `init`/`execute`.
+    pub new_impl: Option<
+        unsafe extern "C" fn(*const CometCScalarKernel, out: *mut 
CometCScalarKernelImpl),
+    >,
+
+    /// Release this kernel. After release, all callbacks must be set to
+    /// `None`. Called when the host's `LoadedLibrary` is dropped.
+    pub release: Option<unsafe extern "C" fn(*mut CometCScalarKernel)>,
+
+    /// Implementation-private data, opaque to the host.
+    pub private_data: *mut c_void,
+}
+
+// SAFETY: `CometCScalarKernel` is a thin wrapper around C function
+// pointers with caller-defined synchronization semantics; the trait impls
+// are required so loaded kernels can be referenced from multi-threaded
+// host code. Implementations of the FFI must respect thread safety as
+// described in the doc comments.
+unsafe impl Send for CometCScalarKernel {}
+unsafe impl Sync for CometCScalarKernel {}
+
+impl Default for CometCScalarKernel {
+    fn default() -> Self {
+        Self {
+            function_name: None,
+            new_impl: None,
+            release: None,
+            private_data: std::ptr::null_mut(),
+        }
+    }
+}
+
+impl Drop for CometCScalarKernel {
+    fn drop(&mut self) {
+        if let Some(release) = self.release.take() {
+            // SAFETY: release is the FFI-defined cleanup callback;
+            // implementations must reset `release` to None per the contract.
+            unsafe { release(self) };
+        }
+    }
+}
+
+// -- per-execution instance struct ----------------------------------------
+
+/// Per-execution instance produced by [`CometCScalarKernel::new_impl`].
+///
+/// Not thread-safe; the caller must serialize access. Typically used on
+/// one thread for one batch then dropped.
+#[repr(C)]
+pub struct CometCScalarKernelImpl {

Review Comment:
   Cool!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to