Chang She created ARROW-18013: --------------------------------- Summary: Cannot concatenate extension arrays Key: ARROW-18013 URL: https://issues.apache.org/jira/browse/ARROW-18013 Project: Apache Arrow Issue Type: Improvement Components: C++, Python Affects Versions: 9.0.0 Reporter: Chang She
`pa.Table.take` and `pa.ChunkedArray.combine_chunks` raises exception for extension arrays. https://github.com/apache/arrow/blob/apache-arrow-9.0.0/cpp/src/arrow/array/concatenate.cc#L440 Quick example: ``` In [1]: import pyarrow as pa In [2]: class LabelType(pa.ExtensionType): ...: ...: def __init__(self): ...: super(LabelType, self).__init__(pa.string(), "label") ...: ...: def __arrow_ext_serialize__(self): ...: return b"" ...: ...: @classmethod ...: def __arrow_ext_deserialize__(cls, storage_type, serialized): ...: return LabelType() ...: In [3]: import numpy as np In [4]: chunk1 = pa.ExtensionArray.from_storage(LabelType(), pa.array(np.repeat('a', 1000))) In [5]: chunk2 = pa.ExtensionArray.from_storage(LabelType(), pa.array(np.repeat('b', 1000))) In [6]: pa.chunked_array([chunk1, chunk2]).combine_chunks() --------------------------------------------------------------------------- ArrowNotImplementedError Traceback (most recent call last) Cell In [6], line 1 ----> 1 pa.chunked_array([chunk1, chunk2]).combine_chunks() File ~/.venv/lance/lib/python3.10/site-packages/pyarrow/table.pxi:700, in pyarrow.lib.ChunkedArray.combine_chunks() File ~/.venv/lance/lib/python3.10/site-packages/pyarrow/array.pxi:2889, in pyarrow.lib.concat_arrays() File ~/.venv/lance/lib/python3.10/site-packages/pyarrow/error.pxi:144, in pyarrow.lib.pyarrow_internal_check_status() File ~/.venv/lance/lib/python3.10/site-packages/pyarrow/error.pxi:121, in pyarrow.lib.check_status() ArrowNotImplementedError: concatenation of extension<label<LabelType>> ``` Would it be possible to concatenate the storage and the "re-box" to the ExtensionType? -- This message was sent by Atlassian Jira (v8.20.10#820010)