jorisvandenbossche opened a new issue, #33997: URL: https://github.com/apache/arrow/issues/33997
When wrapping a type (or array) in a pyarrow object, we need to define which Python class to use. Currently, for extension types, this logic lives here in `pyarrow_wrap_data_type`: https://github.com/apache/arrow/blob/b413ac4f2b6911af5e8241803277caccc43aa3c4/python/pyarrow/public-api.pxi#L114-L120 So there are currently two options: - The ExtensionType is implemented in Python, by subclassing `pyarrow.(Py)ExtensionType`, and which links to the C++ `arrow::py::PyExtensionType` (a subclass of `arrow::ExtensionType`). In this case, we store the python type instance on the C++ instance, and return this as python object in `pyarrow_wrap_data_type`. - The ExtensionType is implemented in C++, and then we currently always fall back to wrap this in the `pyarrow.BaseExtenstionType` base class (there is currently a bug in this, but that is getting fixed in [GH-33802](https://github.com/apache/arrow/pull/33802)). However, that means that for such extension types implemented in C++, there is currently no way to have a "richer" python Type object (or Array object, since that is determined by the Type, and for a BaseExtensionType, that will always use the base ExtensionArray). While for an extension type, you might want to add type-specific attributes or methods. For canonical extension types that are implemented in Arrow C++ itself (for example, the currently discussed Tensor extension type in https://github.com/apache/arrow/pull/8510, or a previous effort to add complex type as extension type in https://github.com/apache/arrow/pull/10565), I think it will work today to create a custom subclass of `pyarrow.BaseExtensionType` for the specific canonical type, and then we could add a special case to `pyarrow_wrap_data_type` checking the name of the extension type, and if it is a canonical one we implement ourselves, use the python subclass we implemented ourselves. But for extension types that are implemented in C++ externally (or for extension types that are implemented in Arrow C++, but for which we don't provide a custom python subclass), that doesn't work. I am wondering to what extent we want to allow "registering" a python class that should be used when wrapping a specific C++ extension type (and to what extent this would be useful for -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
