alamb commented on issue #7923: URL: https://github.com/apache/arrow-datafusion/issues/7923#issuecomment-1779752813
Thank you for this @yukkit -- I think the high level idea would work really nicely into the DataFusion story of extensibility. I think the core challenge of implementing this feature is how to work it into the existing code. DataFusion uses `DataType` directly all over its code base, and I think it is close to infeasible now to try and change that. One way to model user defined types in DataFusion would be as an arrow extension type (which would need upstream support as described in https://github.com/apache/arrow-rs/issues/4472). Then the DataFusion codebase could treat all user defined types as arrow extension types, using the `UserDefinedType` metadata to look up the various information needed for planning and execution. We would have to extend the various codepaths to know about and handle extension types. There is also a somewhat related discussion on https://github.com/apache/arrow-datafusion/discussions/7421 about how DataType encodes both encoding and logical type -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
