Joris Van den Bossche created ARROW-15643: ---------------------------------------------
Summary: [C++] Kernel to select subset of fields of a StructArray Key: ARROW-15643 URL: https://issues.apache.org/jira/browse/ARROW-15643 Project: Apache Arrow Issue Type: Improvement Components: C++ Reporter: Joris Van den Bossche Triggered by https://stackoverflow.com/questions/71035754/pyarrow-drop-a-column-in-a-nested-structure. I thought there was already an issue about this, but don't directly find one. Assume you have a struct array with some fields: {code} >>> arr = pa.StructArray.from_arrays([[1, 2, 3]]*3, names=['a', 'b', 'c']) >>> arr.type StructType(struct<a: int64, b: int64, c: int64>) {code} We have a kernel to select a single child field: {code} >>> pc.struct_field(arr, [0]) <pyarrow.lib.Int64Array object at 0x7ffa9e229940> [ 1, 2, 3 ] {code} But if you want to subset the StructArray to some of its fields, resulting in a new StructArray, that's not possible with {{struct_fields}}, and doing this manually is a bit cumbersome: {code} >>> fields = ['a', 'c'] >>> arrays = [arr.field(n) for n in fields] >>> arr_subset = pa.StructArray.from_arrays(arrays, names=fields) >>> arr_subset.type StructType(struct<a: int64, c: int64>) {code} (this is still OK, but if you had a ChunkedArray, it certainly gets annoying) -- This message was sent by Atlassian Jira (v8.20.1#820001)