paleolimbot opened a new issue, #20755:
URL: https://github.com/apache/datafusion/issues/20755

   ### Is your feature request related to a problem or challenge?
   
   After https://github.com/apache/datafusion/pull/18136 we can represent casts 
to an extension type in the logical plan and Substrait, and after 
https://github.com/apache/datafusion/pull/20676 we will be able to represent 
casts to an extension type in SQL. While these can now be intercepted by an 
optimizer rule or logical plan modification, they will currently error if 
passed to the default planner.
   
   
https://github.com/apache/datafusion/blob/678d1ad7f4590e74e7bae0326292949617da0f57/datafusion/physical-expr/src/planner.rs#L291-L309
   
   At this point we have access to the `ExecutionProps`, which has a reference 
to the `ConfigOptions` and given some ability to plug in casting behaviour we 
could resolve this expression to provide built-in support for things like 
casting UUID strings to into a UUID value and vice versa.
   
   Related to this is the desire to customize how a cast between two 
non-extension types happens (e.g., for a Spark compatible cast 
https://github.com/apache/datafusion/issues/11201).
   
   ### Describe the solution you'd like
   
   I would personally like to resolve this using an extension type registry of 
some kind, where we add members to the trait proposed in 
https://github.com/apache/datafusion/pull/20312 for things like 
`can_cast_explicit()` and `cast_to_explicit()`. We could so something simpler 
(e.g., resolve a ScalarUDF) but I don't think this would scale to some of the 
other types of casts that happen internally (e.g., implicit casts as part of 
function argument inputs, https://github.com/apache/datafusion/issues/20748). 
It would also be nice to keep this as a "cast" to keep physical optimizations 
that special case the cast.
   
   The extension registry approach won't handle anything about non-extension 
types. We could separate the concept of resolving a cast from the extension 
name (e.g., `CastResolver::resolve_cast(from: &Field, to: &Field, options: _)`, 
where the CastResolver implementation is responsible for peeking at the 
extension names or not).
   
   I also may be missing some internal infrastructure for this that is already 
in place!
   
   ### Describe alternatives you've considered
   
   A workaround is to just use a scalar function to do casting and avoid 
internal casts and cast expressions. This is roughly what we do in SedonaDB 
(e.g., we don't use a Signature and do our own type matching/internal casting).
   
   ### Additional context
   
   For reference, DuckDB's implementation of a cast registry is here:
   
   
https://github.com/duckdb/duckdb/blob/dc11eadd8f0a7c600f0034810706605ebe10d5b9/src/include/duckdb/function/cast/default_casts.hpp
   
   
https://github.com/duckdb/duckdb/blob/dc11eadd8f0a7c600f0034810706605ebe10d5b9/src/function/cast/cast_function_set.cpp
   
   My reading of this is that it's a series of default casts and a flat list of 
other cast overloads that are tried in reverse until one of them matches.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to