Joris Van den Bossche created ARROW-17890:
---------------------------------------------

             Summary: [C++][Python] Allow an ExtensionType to register or 
implement custom casts
                 Key: ARROW-17890
                 URL: https://issues.apache.org/jira/browse/ARROW-17890
             Project: Apache Arrow
          Issue Type: Improvement
          Components: C++, Python
            Reporter: Joris Van den Bossche


With ARROW-14500 and ARROW-15545 (https://github.com/apache/arrow/pull/14106), 
we are allowing to cast "storage_type" -> "extension<storage_type>" (and the 
cast the other way around already worked as well). 

Initially, that PR allowed any cast from "any" -> "extension<storage_type>", as 
long as the input type could be cast to the storage type (so deferring to the 
"any" -> "storage_type" cast). However, because whether a certain cast makes 
sense or not depends on the semantics of the extension type, it was restricted 
to exactly matching storage_type. 

One idea could be to still allow the other casts behind a cast option flag, 
like {{allow_non_storage_extension_casts}} (or a better name), so the user can 
explicitly allow to cast to/from any type (as long as the cast from/to the 
storage type works).

That could help for the user, but for certain casts, the ExtensionType might 
also want to control _how_ such a cast is done. For example, for casting 
to/from string type (which would be useful for reading/writing CSV files, or 
for repr), you typically will want to do something different than casting your 
storage array to string. 

A more general solution could thus be to have a mechanism for the ExtensionType 
to implement a certain cast kernel itself, and register this to the C++ cast 
dispatching.








--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to