Joris Van den Bossche created ARROW-15643:
---------------------------------------------

             Summary: [C++] Kernel to select subset of fields of a StructArray
                 Key: ARROW-15643
                 URL: https://issues.apache.org/jira/browse/ARROW-15643
             Project: Apache Arrow
          Issue Type: Improvement
          Components: C++
            Reporter: Joris Van den Bossche


Triggered by 
https://stackoverflow.com/questions/71035754/pyarrow-drop-a-column-in-a-nested-structure.
 I thought there was already an issue about this, but don't directly find one.

Assume you have a struct array with some fields:

{code}
>>> arr = pa.StructArray.from_arrays([[1, 2, 3]]*3, names=['a', 'b', 'c'])
>>> arr.type
StructType(struct<a: int64, b: int64, c: int64>)
{code}

We have a kernel to select a single child field:

{code}
>>> pc.struct_field(arr, [0])
<pyarrow.lib.Int64Array object at 0x7ffa9e229940>
[
  1,
  2,
  3
]
{code}

But if you want to subset the StructArray to some of its fields, resulting in a 
new StructArray, that's not possible with {{struct_fields}}, and doing this 
manually is a bit cumbersome:

{code}
>>> fields = ['a', 'c']
>>> arrays = [arr.field(n) for n in fields]
>>> arr_subset = pa.StructArray.from_arrays(arrays, names=fields)
>>> arr_subset.type
StructType(struct<a: int64, c: int64>)
{code}

(this is still OK, but if you had a ChunkedArray, it certainly gets annoying)





--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to