timsaucer opened a new issue, #18671:
URL: https://github.com/apache/datafusion/issues/18671

   ### Is your feature request related to a problem or challenge?
   
   We use protobuf to serialize and deserialize frequently in the FFI work. 
This has been a great advantage in exposing these functions and reduces the 
amount of code duplication we need to perform. We currently have a problem in 
that to call the de/serialize functions we need to pass either a 
`FunctionRegistry` or a `TaskContext` depending on whether you are working with 
the logical or physical expressions. Right now the implementation creates a 
default `SessionContext` before making the de/serialize calls.
   
   The problem with this is that if a user has registered a custom function and 
used that function as an input to any of the FFI calls that take expressions, 
it will fail in the de/serialize calls.
   
   ### Describe the solution you'd like
   
   There are a few things I think we should do to improve this work and I have 
a functioning branch tested against `datafusion-python` that performs most of 
them. I will be putting up a series of PRs to address.
   
   - Add a `TaskContextProvider` trait that we can hold a weak reference to. 
This is used so that at a point *after* registration we can get the current 
`TaskContext` during de/serialization.
   - Add a FFI version of Logical and Physical Extension codec. This one I 
haven't done yet, but will address soon.
   - Implement `FFI_Session`
   - Remove `datafusion` core crate from `datafusion-ffi` dependencies. This 
has a nice side benefit of reducing library size of some of the providers by 
half or more.
   - Add a method to identify when a Foreign FFI struct is actually in the 
local library. When this is true, convert to the underlying data structure 
instead of keeping the FFI wrapper.
   
   ### Describe alternatives you've considered
   
   We could pass in a task context directly and pass that around the FFI 
structs. This has a major problem in that it would only be based on what was 
registered at the time of creation of that task context. I haven't been able to 
come up with a better alternative.
   
   ### Additional context
   
   This draft PR shows all of these features implemented. I want to do some 
renaming and I am going to break it into smaller pieces.
   
   https://github.com/apache/datafusion/pull/18568


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to