jmdeschenes commented on pull request #10565: URL: https://github.com/apache/arrow/pull/10565#issuecomment-999017847
Hello, There is an issue with the approach: on array.pxi ```cython cdef class ExtensionArray(Array): """ Concrete class for Arrow extension arrays. """ @property def storage(self): cdef: CExtensionArray* ext_array = <CExtensionArray*>(self.ap) return pyarrow_wrap_array(ext_array.storage()) # ## LINES SKIPPED # def to_numpy(self, **kwargs): """ Convert extension array to a numpy ndarray. See Also -------- Array.to_numpy """ return self.storage.to_numpy(**kwargs) ``` on table.pxi ```cython def to_numpy(self): """ Return a NumPy copy of this array (experimental). Returns ------- array : numpy.ndarray """ cdef: PyObject* out PandasOptions c_options object values if self.type.id == _Type_EXTENSION: storage_array = chunked_array( [chunk.storage for chunk in self.iterchunks()], type=self.type.storage_type ``` Both of these "strip" the Extension type sent to the CPP code. As such, the CPP code never knows that it is dealing with an extension. If this is to be kept, fixed_size_list would need to convert into a proper 2D numpy array(That could have several benefits, it could be done only for primitive types at the start) @jorisvandenbossche Do you think that is something that could be acceptable? Otherwise, letting the CPP code handle the extension type could be another option. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org