jorisvandenbossche opened a new issue, #33997:
URL: https://github.com/apache/arrow/issues/33997

   When wrapping a type (or array) in a pyarrow object, we need to define which 
Python class to use. Currently, for extension types, this logic lives here in 
`pyarrow_wrap_data_type`:
   
   
https://github.com/apache/arrow/blob/b413ac4f2b6911af5e8241803277caccc43aa3c4/python/pyarrow/public-api.pxi#L114-L120
   
   So there are currently two options:
   
   - The ExtensionType is implemented in Python, by subclassing 
`pyarrow.(Py)ExtensionType`, and which links to the C++ 
`arrow::py::PyExtensionType` (a subclass of `arrow::ExtensionType`). In this 
case, we store the python type instance on the C++ instance, and return this as 
python object in `pyarrow_wrap_data_type`.
   - The ExtensionType is implemented in C++, and then we currently always fall 
back to wrap this in the `pyarrow.BaseExtenstionType` base class (there is 
currently a bug in this, but that is getting fixed in 
[GH-33802](https://github.com/apache/arrow/pull/33802)).
   
   However, that means that for such extension types implemented in C++, there 
is currently no way to have a "richer" python Type object (or Array object, 
since that is determined by the Type, and for a BaseExtensionType, that will 
always use the base ExtensionArray). While for an extension type, you might 
want to add type-specific attributes or methods. 
   
   For canonical extension types that are implemented in Arrow C++ itself (for 
example, the currently discussed Tensor extension type in 
https://github.com/apache/arrow/pull/8510, or a previous effort to add complex 
type as extension type in https://github.com/apache/arrow/pull/10565), I think 
it will work today to create a custom subclass of `pyarrow.BaseExtensionType` 
for the specific canonical type, and then we could add a special case to 
`pyarrow_wrap_data_type` checking the name of the extension type, and if it is 
a canonical one we implement ourselves, use the python subclass we implemented 
ourselves. 
   
   But for extension types that are implemented in C++ externally (or for 
extension types that are implemented in Arrow C++, but for which we don't 
provide a custom python subclass), that doesn't work. 
   I am wondering to what extent we want to allow "registering" a python class 
that should be used when wrapping a specific C++ extension type (and to what 
extent this would be useful for 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to