Copilot commented on code in PR #19469: URL: https://github.com/apache/datafusion/pull/19469#discussion_r2822188524
########## datafusion/ffi/src/config/mod.rs: ########## @@ -0,0 +1,165 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +pub mod extension_options; + +use abi_stable::StableAbi; +use abi_stable::std_types::{RHashMap, RString}; +use datafusion_common::config::{ + ConfigExtension, ConfigOptions, ExtensionOptions, TableOptions, +}; +use datafusion_common::{DataFusionError, Result}; + +use crate::config::extension_options::FFI_ExtensionOptions; + +/// A stable struct for sharing [`ConfigOptions`] across FFI boundaries. +/// +/// Accessing FFI extension options require a slightly different pattern +/// than local extensions. The trait [`ExtensionOptionsFFIProvider`] can +/// be used to simplify accessing FFI extensions. +#[repr(C)] +#[derive(Debug, Clone, StableAbi)] +pub struct FFI_ConfigOptions { + base_options: RHashMap<RString, RString>, + + extensions: FFI_ExtensionOptions, +} + +impl From<&ConfigOptions> for FFI_ConfigOptions { + fn from(options: &ConfigOptions) -> Self { + let base_options: RHashMap<RString, RString> = options + .entries() + .into_iter() + .filter_map(|entry| entry.value.map(|value| (entry.key, value))) Review Comment: `FFI_ConfigOptions::from` builds `base_options` from `ConfigOptions::entries()`, but `entries()` includes extension entries whose keys are *not* namespaced (the `extensions_options!` macro emits keys like `baz_count`). Those keys will later be passed to `ConfigOptions::set` in `TryFrom<FFI_ConfigOptions>`, which requires a namespace and will error. Also, you separately serialize extensions into `extensions`, so extension values end up duplicated. Consider building `base_options` from only built-in `datafusion.*` settings (e.g. via `visit`), and serialize extensions exclusively into `extensions` with fully-qualified keys. ```suggestion .filter_map(|entry| { if entry.key.starts_with("datafusion.") { entry.value.map(|value| (entry.key, value)) } else { None } }) ``` ########## datafusion/ffi/src/config/mod.rs: ########## @@ -0,0 +1,165 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +pub mod extension_options; + +use abi_stable::StableAbi; +use abi_stable::std_types::{RHashMap, RString}; +use datafusion_common::config::{ + ConfigExtension, ConfigOptions, ExtensionOptions, TableOptions, +}; +use datafusion_common::{DataFusionError, Result}; + +use crate::config::extension_options::FFI_ExtensionOptions; + +/// A stable struct for sharing [`ConfigOptions`] across FFI boundaries. +/// +/// Accessing FFI extension options require a slightly different pattern +/// than local extensions. The trait [`ExtensionOptionsFFIProvider`] can +/// be used to simplify accessing FFI extensions. +#[repr(C)] +#[derive(Debug, Clone, StableAbi)] +pub struct FFI_ConfigOptions { + base_options: RHashMap<RString, RString>, + + extensions: FFI_ExtensionOptions, +} + +impl From<&ConfigOptions> for FFI_ConfigOptions { + fn from(options: &ConfigOptions) -> Self { + let base_options: RHashMap<RString, RString> = options + .entries() + .into_iter() + .filter_map(|entry| entry.value.map(|value| (entry.key, value))) + .map(|(key, value)| (key.into(), value.into())) + .collect(); + + let mut extensions = FFI_ExtensionOptions::default(); + for (extension_name, extension) in options.extensions.iter() { + for entry in extension.entries().iter() { + if let Some(value) = entry.value.as_ref() { + extensions + .set(format!("{extension_name}.{}", entry.key).as_str(), value) + .expect("FFI_ExtensionOptions set should always return Ok"); + } Review Comment: When serializing extensions, `format!("{extension_name}.{}", entry.key)` will double-prefix keys for an existing `FFI_ExtensionOptions` extension (prefix `datafusion_ffi`), producing keys like `datafusion_ffi.my_config.baz_count`. That prevents `to_extension::<MyConfig>()` from matching `MyConfig::PREFIX` and will effectively drop FFI extension values on round-trip. Handle the `datafusion_ffi` extension specially (e.g. downcast to `FFI_ExtensionOptions` and `merge`/copy keys without adding another prefix). ```suggestion // Avoid double-prefixing keys for the FFI extension itself. if extension_name == FFI_ExtensionOptions::PREFIX { for entry in extension.entries().iter() { if let Some(value) = entry.value.as_ref() { extensions .set(entry.key.as_str(), value) .expect("FFI_ExtensionOptions set should always return Ok"); } } } else { for entry in extension.entries().iter() { if let Some(value) = entry.value.as_ref() { extensions .set( format!("{extension_name}.{}", entry.key).as_str(), value, ) .expect("FFI_ExtensionOptions set should always return Ok"); } } ``` ########## datafusion/ffi/src/config/mod.rs: ########## @@ -0,0 +1,165 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +pub mod extension_options; + +use abi_stable::StableAbi; +use abi_stable::std_types::{RHashMap, RString}; +use datafusion_common::config::{ + ConfigExtension, ConfigOptions, ExtensionOptions, TableOptions, +}; +use datafusion_common::{DataFusionError, Result}; + +use crate::config::extension_options::FFI_ExtensionOptions; + +/// A stable struct for sharing [`ConfigOptions`] across FFI boundaries. +/// +/// Accessing FFI extension options require a slightly different pattern +/// than local extensions. The trait [`ExtensionOptionsFFIProvider`] can +/// be used to simplify accessing FFI extensions. +#[repr(C)] +#[derive(Debug, Clone, StableAbi)] +pub struct FFI_ConfigOptions { + base_options: RHashMap<RString, RString>, + + extensions: FFI_ExtensionOptions, +} + +impl From<&ConfigOptions> for FFI_ConfigOptions { + fn from(options: &ConfigOptions) -> Self { + let base_options: RHashMap<RString, RString> = options + .entries() + .into_iter() + .filter_map(|entry| entry.value.map(|value| (entry.key, value))) + .map(|(key, value)| (key.into(), value.into())) + .collect(); + + let mut extensions = FFI_ExtensionOptions::default(); + for (extension_name, extension) in options.extensions.iter() { + for entry in extension.entries().iter() { + if let Some(value) = entry.value.as_ref() { + extensions + .set(format!("{extension_name}.{}", entry.key).as_str(), value) + .expect("FFI_ExtensionOptions set should always return Ok"); + } + } + } + + Self { + base_options, + extensions, + } + } +} + +impl TryFrom<FFI_ConfigOptions> for ConfigOptions { + type Error = DataFusionError; + fn try_from(ffi_options: FFI_ConfigOptions) -> Result<Self, Self::Error> { + let mut options = ConfigOptions::default(); + options.extensions.insert(ffi_options.extensions); + + for kv_tuple in ffi_options.base_options.iter() { + options.set(kv_tuple.0.as_str(), kv_tuple.1.as_str())?; + } + + Ok(options) + } +} + +pub trait ExtensionOptionsFFIProvider { + fn ffi_extension<C: ConfigExtension + Clone + Default>(&self) -> Option<C>; +} + +impl ExtensionOptionsFFIProvider for ConfigOptions { + fn ffi_extension<C: ConfigExtension + Clone + Default>(&self) -> Option<C> { + self.extensions + .get::<C>() + .map(|v| v.to_owned()) + .or_else(|| { + self.extensions + .get::<FFI_ExtensionOptions>() + .and_then(|ffi_ext| ffi_ext.to_extension().ok()) + }) Review Comment: `ffi_extension` returns `Some(C::default())` whenever a `FFI_ExtensionOptions` exists, even if there are no entries for `C::PREFIX` (because `to_extension` always returns `Ok(Default)`). This makes it impossible to distinguish “extension not present” from “present with defaults”, and can hide missing registration/config. Consider returning `None` when no matching keys were found (e.g. track whether any key matched in `to_extension`, or pre-check for any key starting with `"{C::PREFIX}."`). ########## datafusion/ffi/src/config/extension_options.rs: ########## @@ -0,0 +1,285 @@ +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. + +use std::any::Any; +use std::collections::HashMap; +use std::ffi::c_void; + +use abi_stable::StableAbi; +use abi_stable::std_types::{RResult, RStr, RString, RVec, Tuple2}; +use datafusion_common::config::{ConfigEntry, ConfigExtension, ExtensionOptions}; +use datafusion_common::{Result, exec_err}; + +use crate::df_result; + +/// A stable struct for sharing [`ExtensionOptions`] across FFI boundaries. +/// +/// Unlike other FFI structs in this crate, we do not construct a foreign +/// variant of this object. This is due to the typical method for interacting +/// with extension options is by creating a local struct of your concrete type. +/// To support this methodology use the `to_extension` method instead. +/// +/// When using [`FFI_ExtensionOptions`] with multiple extensions, all extension +/// values are stored on a single [`FFI_ExtensionOptions`] object. The keys +/// are stored with the full path prefix to avoid overwriting values when using +/// multiple extensions. +#[repr(C)] +#[derive(Debug, StableAbi)] +pub struct FFI_ExtensionOptions { + /// Return a deep clone of this [`ExtensionOptions`] + pub cloned: unsafe extern "C" fn(&Self) -> FFI_ExtensionOptions, + + /// Set the given `key`, `value` pair + pub set: + unsafe extern "C" fn(&mut Self, key: RStr, value: RStr) -> RResult<(), RString>, + + /// Returns the [`ConfigEntry`] stored in this [`ExtensionOptions`] + pub entries: unsafe extern "C" fn(&Self) -> RVec<Tuple2<RString, RString>>, + + /// Release the memory of the private data when it is no longer being used. + pub release: unsafe extern "C" fn(&mut Self), + + /// Internal data. This is only to be accessed by the provider of the options. + pub private_data: *mut c_void, +} + +unsafe impl Send for FFI_ExtensionOptions {} +unsafe impl Sync for FFI_ExtensionOptions {} + +pub struct ExtensionOptionsPrivateData { + pub options: HashMap<String, String>, +} + +impl FFI_ExtensionOptions { + #[inline] + fn inner_mut(&mut self) -> &mut HashMap<String, String> { + let private_data = self.private_data as *mut ExtensionOptionsPrivateData; + unsafe { &mut (*private_data).options } + } + + #[inline] + fn inner(&self) -> &HashMap<String, String> { + let private_data = self.private_data as *const ExtensionOptionsPrivateData; + unsafe { &(*private_data).options } + } +} + +unsafe extern "C" fn cloned_fn_wrapper( + options: &FFI_ExtensionOptions, +) -> FFI_ExtensionOptions { + options + .inner() + .iter() + .map(|(k, v)| (k.to_owned(), v.to_owned())) + .collect::<HashMap<String, String>>() + .into() +} + +unsafe extern "C" fn set_fn_wrapper( + options: &mut FFI_ExtensionOptions, + key: RStr, + value: RStr, +) -> RResult<(), RString> { + let _ = options.inner_mut().insert(key.into(), value.into()); + RResult::ROk(()) +} + +unsafe extern "C" fn entries_fn_wrapper( + options: &FFI_ExtensionOptions, +) -> RVec<Tuple2<RString, RString>> { + options + .inner() + .iter() + .map(|(key, value)| (key.to_owned().into(), value.to_owned().into()).into()) + .collect() +} + +unsafe extern "C" fn release_fn_wrapper(options: &mut FFI_ExtensionOptions) { + let private_data = unsafe { + Box::from_raw(options.private_data as *mut ExtensionOptionsPrivateData) + }; + drop(private_data); Review Comment: `release_fn_wrapper` frees `private_data` but does not null it out or guard against being called more than once. Most other FFI types in this crate `debug_assert!(!private_data.is_null())` and set the pointer to null after freeing to avoid double-free if `release` is invoked multiple times (directly or via `Drop`). Align this implementation with the existing pattern. ```suggestion debug_assert!(!options.private_data.is_null()); let private_data = Box::from_raw(options.private_data as *mut ExtensionOptionsPrivateData); drop(private_data); options.private_data = std::ptr::null_mut(); ``` ########## datafusion/ffi/src/session/config.rs: ########## @@ -39,7 +39,7 @@ use datafusion_execution::config::SessionConfig; pub struct FFI_SessionConfig { /// Return a hash map from key to value of the config options represented /// by string values. - pub config_options: unsafe extern "C" fn(config: &Self) -> RHashMap<RString, RString>, + pub config_options: FFI_ConfigOptions, Review Comment: The doc comment and field comment still describe `config_options` as a string->string hashmap, but it is now an `FFI_ConfigOptions`. This is also an ABI-breaking change to the `FFI_SessionConfig` layout for any foreign library built against an older `datafusion-ffi` minor/patch version. Consider keeping the existing map-based accessor for compatibility and adding a new optional accessor/field (e.g. via an abi_stable prefix type or root-module function) for the richer `FFI_ConfigOptions` representation. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
