tobixdev commented on issue #18223:
URL: https://github.com/apache/datafusion/issues/18223#issuecomment-3452344643

   > One possibility is to add a DFExtensionType trait, that extends the 
exiting 
[ExtensionType](https://docs.rs/arrow-schema/56.2.0/arrow_schema/struct.Field.html#method.data_type)
 trait, similar to 
[DFSchema](https://docs.rs/datafusion/latest/datafusion/common/struct.DFSchema.html)
   
   We have one problem here that `ExtensionType` is not dyn-compatible due to 
the use of an associated constant and an associated type. I believe we need 
dyn-compatibility as our registry will look something like `Arc<dyn 
ExtensionTypeThingey>`.
   
   I've experimented a bit more with this and ran into the problem that we have 
been discussing earlier with custom printing. If I want to define a custom 
string representation for a type, I need to have access to that in the printing 
logic. Currently, this is happening in the respective Debug/Display 
implementations and they do not have access to a registry. Therefore, from my 
perspective, there are only two sane options:
   1. Use some kind of pretty-printing visitor that has access to the registry
   2. Directly store the extension in the `Field`  or a possible `DFField`.
   
   I think 2. would be the better approach but I may be mistaken. Here, a 
problem is that the "use `Field` instead of `DataType`" approach would likely 
not be enough if `Field` cannot provide access to a `dyn ExtensionTypeThingey`. 
This would require us to make another round of "replace `Field` with 
`PowerfulField`" which I think we want to avoid if possible.
   
   So, can we use `Field` as a carrier for our DataFusion extension type trait? 
Not in its current form but I've created an example of how it might could look: 
   
   
https://github.com/apache/arrow-rs/compare/main...tobixdev:arrow-rs:crazy-field-experiment?expand=1
   
   Code that uses `arrow-rs` can then provide their own enriched extension type 
traits without managing this themselves. It has of course multiple drawbacks: 
   - Downcasting can create problems that a `DFDataType` enum would prevent.
   - I got an error with unwind safety in the tests due to the dynamic dispatch 
(see diff).
   - Less ergonomic and complex as another trait exists
   
   Any thoughts on that @alamb @paleolimbot ?
   
   I think for Option 2. the other way would be to have a `DFField` and a 
`DFDataType`. I think this could also be fine if we use that in `DFSchema`.
   
   @paleolimbot Thanks for your input 👍 ! I am currently exploring Option 2 but 
if we choose to go with Option 1 (use registry /  `TypeExtensions`) this could 
be an interessting approach!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to