[GitHub] [arrow] amol- commented on a change in pull request #10101: ARROW-9594: [Python] Preserve null indexes in DictionaryArray.to_numpy as it's done in DictionaryArray.to_pandas

GitBox Tue, 20 Apr 2021 06:37:31 -0700


amol- commented on a change in pull request #10101:
URL: https://github.com/apache/arrow/pull/10101#discussion_r616690521




##########
File path: python/pyarrow/array.pxi
##########
@@ -1170,7 +1170,9 @@ cdef class Array(_PandasConvertible):
         array = PyObject_to_object(out)
 
         if isinstance(array, dict):
+            missings = array["indices"] < 0
             array = np.take(array['dictionary'], array['indices'])
+            array[missings] = np.NaN

Review comment:
       Added an optimization based on `zero_copy_only` option (as it doesn't 
allow nulls) and on `self.null_count` as the null_count is cached ( 
https://github.com/apache/arrow/blob/8e43f23dcc6a9e630516228f110c48b64d13cec6/cpp/src/arrow/array/data.cc#L120-L121
 ) thus should frequently not add to the overhead.
   
   Also confirmed that `-1` is used to signal NULL values when converting to 
`numpy` arrays ( 
https://github.com/apache/arrow/blob/8e43f23dcc6a9e630516228f110c48b64d13cec6/cpp/src/arrow/python/arrow_to_pandas.cc#L1637
 )
   
   A further optimization might have been to have `ConvertArrayToPandas`  
return the count of null values (as it is invoking `IsValid` on them anyway) 
but that requires a more widespread change and thus I think should be deferred 
until proved necessary.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] amol- commented on a change in pull request #10101: ARROW-9594: [Python] Preserve null indexes in DictionaryArray.to_numpy as it's done in DictionaryArray.to_pandas

Reply via email to