[ https://issues.apache.org/jira/browse/ARROW-6775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17147019#comment-17147019 ]
Wes McKinney commented on ARROW-6775: ------------------------------------- #3 can be computed using a combination of the {{is_null}} function and Cast, so I think this is done {code} In [1]: arr = pa.array([1, 2, 3, None, None, 6, 7, None]) In [2]: arr.is_null() Out[2]: <pyarrow.lib.BooleanArray object at 0x7f783a397e88> [ false, false, false, true, true, false, false, true ] In [3]: arr.is_null().cast('uint8') Out[3]: <pyarrow.lib.UInt8Array object at 0x7f7839cb6f48> [ 0, 0, 0, 1, 1, 0, 0, 1 ] {code} #1 and #4 will need patches. #1 is pretty easy so I'll write a patch for that soon hopefully > [C++] [Python] Proposal for several Array utility functions > ----------------------------------------------------------- > > Key: ARROW-6775 > URL: https://issues.apache.org/jira/browse/ARROW-6775 > Project: Apache Arrow > Issue Type: Wish > Components: C++, Python > Reporter: Zhuo Peng > Assignee: Wes McKinney > Priority: Minor > > Hi, > We developed several utilities that computes / accesses certain properties of > Arrays and wonder if they make sense to get them into the upstream (into both > the C++ API and pyarrow) and assuming yes, where is the best place to put > them? > Maybe I have overlooked existing APIs that already do the same.. in that case > please point out. > > 1/ ListLengthFromListArray(ListArray&) > Returns lengths of lists in a ListArray, as a Int32Array (or Int64Array for > large lists). For example: > [[1, 2, 3], [], None] => [3, 0, 0] (or [3, 0, None], but we hope the returned > array can be converted to numpy) > > 2/ GetBinaryArrayTotalByteSize(BinaryArray&) > Returns the total byte size of a BinaryArray (basically offset[len - 1] - > offset[0]). > Alternatively, a BinaryArray::Flatten() -> Uint8Array would work. > > 3/ GetArrayNullBitmapAsByteArray(Array&) > Returns the array's null bitmap as a UInt8Array (which can be efficiently > converted to a bool numpy array) > > 4/ GetFlattenedArrayParentIndices(ListArray&) > Makes a int32 array of the same length as the flattened ListArray. > returned_array[i] == j means i-th element in the flattened ListArray came > from j-th list in the ListArray. > For example [[1,2,3], [], None, [4,5]] => [0, 0, 0, 3, 3] > -- This message was sent by Atlassian Jira (v8.3.4#803005)