Hello all, Recently, a PR to arrow c++ [1] was opened to allow implicit casting from any extension type to its storage type in acero. This raises questions about the validity of applying operations to an extension array's storage. For example, some extension type authors may intend different ordering for arrays of their new type than would be applied to the array's storage or may not intend for the type to participate in arithmetic even though its storage could.
Suggestions/observations from discussion on that PR included: - Extension types could provide general semantic description of storage type equivalence [2], so that a flag on the extension type enables ordering by storage but disables arithmetic on it - Compute functions or kernels could be augmented with a filter declaring which extension types are supported [3]. - Currently arrow-rs considers extension types metadata only [4], so all kernels treat extension arrays equivalently to their storage. - Currently arrow c++ only supports explicitly casting from an extension type to its storage (and from storage to ext), so any operation can be performed on an extension array's storage but it requires opting in. Sincerely, Ben Kietzman [1] https://github.com/apache/arrow/pull/39200 [2] https://github.com/apache/arrow/pull/39200#issuecomment-1852307954 [3] https://github.com/apache/arrow/pull/39200#issuecomment-1852676161 [4] https://github.com/apache/arrow/pull/39200#issuecomment-1852881651