paleolimbot opened a new issue, #20755: URL: https://github.com/apache/datafusion/issues/20755
### Is your feature request related to a problem or challenge? After https://github.com/apache/datafusion/pull/18136 we can represent casts to an extension type in the logical plan and Substrait, and after https://github.com/apache/datafusion/pull/20676 we will be able to represent casts to an extension type in SQL. While these can now be intercepted by an optimizer rule or logical plan modification, they will currently error if passed to the default planner. https://github.com/apache/datafusion/blob/678d1ad7f4590e74e7bae0326292949617da0f57/datafusion/physical-expr/src/planner.rs#L291-L309 At this point we have access to the `ExecutionProps`, which has a reference to the `ConfigOptions` and given some ability to plug in casting behaviour we could resolve this expression to provide built-in support for things like casting UUID strings to into a UUID value and vice versa. Related to this is the desire to customize how a cast between two non-extension types happens (e.g., for a Spark compatible cast https://github.com/apache/datafusion/issues/11201). ### Describe the solution you'd like I would personally like to resolve this using an extension type registry of some kind, where we add members to the trait proposed in https://github.com/apache/datafusion/pull/20312 for things like `can_cast_explicit()` and `cast_to_explicit()`. We could so something simpler (e.g., resolve a ScalarUDF) but I don't think this would scale to some of the other types of casts that happen internally (e.g., implicit casts as part of function argument inputs, https://github.com/apache/datafusion/issues/20748). It would also be nice to keep this as a "cast" to keep physical optimizations that special case the cast. The extension registry approach won't handle anything about non-extension types. We could separate the concept of resolving a cast from the extension name (e.g., `CastResolver::resolve_cast(from: &Field, to: &Field, options: _)`, where the CastResolver implementation is responsible for peeking at the extension names or not). I also may be missing some internal infrastructure for this that is already in place! ### Describe alternatives you've considered A workaround is to just use a scalar function to do casting and avoid internal casts and cast expressions. This is roughly what we do in SedonaDB (e.g., we don't use a Signature and do our own type matching/internal casting). ### Additional context For reference, DuckDB's implementation of a cast registry is here: https://github.com/duckdb/duckdb/blob/dc11eadd8f0a7c600f0034810706605ebe10d5b9/src/include/duckdb/function/cast/default_casts.hpp https://github.com/duckdb/duckdb/blob/dc11eadd8f0a7c600f0034810706605ebe10d5b9/src/function/cast/cast_function_set.cpp My reading of this is that it's a series of default casts and a flat list of other cast overloads that are tried in reverse until one of them matches. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
