alamb opened a new issue, #18223:
URL: https://github.com/apache/datafusion/issues/18223

   ### Is your feature request related to a problem or challenge?
   
   This is is part of implementing LogicalTypes / Extension Types in 
DataFusion, as described by @findepi 
   - https://github.com/apache/datafusion/issues/12644
   
   
[ExtensionType](https://docs.rs/arrow-schema/56.2.0/arrow_schema/extension/trait.ExtensionType.html)s
 are defined using the metadata on an arrow `Field` (not the DataType) and 
stored physically as one of the existing arrow types. This system has the nice 
benefit that extension types can be processed (passed through) by systems that 
don't support them as their underlying arrow type, and then additional 
semantics added by systems that do. 
   
   As people continue using DataFusion to implement more sophisticated 
extension types such as geometry and geography (@paleolimbot) and Variant 
@friendlymatthew ), they are finding is important to customize certain 
operations that are currently hardcoded in DataFusion based on physical type. 
   
   Some example of operations where special semantics are sometimes needed for 
extension types
   1. printing / displaying values (e.g. printing Variant values in a JSON like 
manner rather than their raw bytes)
   2. casting values to/from other types
   3. Comparing values (e.g. it is not correct to compare two variant values 
byte by byte)
   
   There are a few challenges challenges now:
   1. Extension type information is carried on 
[`Field`](https://docs.rs/arrow-schema/56.2.0/arrow_schema/struct.Field.html) 
(rather than DataType), and the 
[Field](https://docs.rs/arrow-schema/56.2.0/arrow_schema/struct.Field.html) is 
not yet available everywhere (though @paleolimbot and others are working on 
this)
   2. Even once we have `Field` available everywhere, the callsites for many 
cast/print and binary operations call directly into the arrow kernels which 
have no way to customize behavior for extension types.
   
   
   ### Describe the solution you'd like
   
   I think we need some sort of DataFusion API for users of extension types to 
specify and customize their behavior.
   
   
   
   ### Describe alternatives you've considered
   
   
   One possibility is to add a `DFExtensionType` trait, that extends the 
exiting 
[`ExtensionType`](https://docs.rs/arrow-schema/56.2.0/arrow_schema/struct.Field.html#method.data_type)
 trait, similar to  
[`DFSchema`](https://docs.rs/datafusion/latest/datafusion/common/struct.DFSchema.html)
   
   
   Maybe something like
   ```rust
   /// DataFusion Extension Type support
   pub trait DFExtensionType: ExtensionTrait {
     /// Cast a column of this extension type to the target
     fn cast(&self, input: ColumarValue, output_type: &Field) -> 
Result<ColumnarValue>;
     // .. other functions ...
   }
   ```
   
   We would also need some way to register these types dynamically with the 
SessionContext as well as pass along the registry into the places they are 
needed.
   
   ```rust
   let ctx = SessionContext::new();
   ctx.register_extension(Arc::new(DFVariantExtension));
   ...
   ```
   
   I am not quite sure if this is the right API, we would need to try it out 
probably
   
   
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to