[ 
https://issues.apache.org/jira/browse/ARROW-6775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17147019#comment-17147019
 ] 

Wes McKinney commented on ARROW-6775:
-------------------------------------

#3 can be computed using a combination of the {{is_null}} function and Cast, so 
I think this is done

{code}
In [1]: arr = pa.array([1, 2, 3, None, None, 6, 7, None])                       
                                  

In [2]: arr.is_null()                                                           
                                  
Out[2]: 
<pyarrow.lib.BooleanArray object at 0x7f783a397e88>
[
  false,
  false,
  false,
  true,
  true,
  false,
  false,
  true
]

In [3]: arr.is_null().cast('uint8')                                             
                                  
Out[3]: 
<pyarrow.lib.UInt8Array object at 0x7f7839cb6f48>
[
  0,
  0,
  0,
  1,
  1,
  0,
  0,
  1
]
{code}

#1 and #4 will need patches. #1 is pretty easy so I'll write a patch for that 
soon hopefully

> [C++] [Python] Proposal for several Array utility functions
> -----------------------------------------------------------
>
>                 Key: ARROW-6775
>                 URL: https://issues.apache.org/jira/browse/ARROW-6775
>             Project: Apache Arrow
>          Issue Type: Wish
>          Components: C++, Python
>            Reporter: Zhuo Peng
>            Assignee: Wes McKinney
>            Priority: Minor
>
> Hi,
> We developed several utilities that computes / accesses certain properties of 
> Arrays and wonder if they make sense to get them into the upstream (into both 
> the C++ API and pyarrow) and assuming yes, where is the best place to put 
> them?
> Maybe I have overlooked existing APIs that already do the same.. in that case 
> please point out.
>  
> 1/ ListLengthFromListArray(ListArray&)
> Returns lengths of lists in a ListArray, as a Int32Array (or Int64Array for 
> large lists). For example:
> [[1, 2, 3], [], None] => [3, 0, 0] (or [3, 0, None], but we hope the returned 
> array can be converted to numpy)
>  
> 2/ GetBinaryArrayTotalByteSize(BinaryArray&)
> Returns the total byte size of a BinaryArray (basically offset[len - 1] - 
> offset[0]).
> Alternatively, a BinaryArray::Flatten() -> Uint8Array would work.
>  
> 3/ GetArrayNullBitmapAsByteArray(Array&)
> Returns the array's null bitmap as a UInt8Array (which can be efficiently 
> converted to a bool numpy array)
>  
> 4/ GetFlattenedArrayParentIndices(ListArray&)
> Makes a int32 array of the same length as the flattened ListArray. 
> returned_array[i] == j means i-th element in the flattened ListArray came 
> from j-th list in the ListArray.
> For example [[1,2,3], [], None, [4,5]] => [0, 0, 0, 3, 3]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to