[GitHub] [arrow] amol- commented on a change in pull request #10199: ARROW-12431: [C++] Mask is inverted when creating FixedSizeBinaryArray

GitBox Wed, 05 May 2021 01:21:28 -0700


amol- commented on a change in pull request #10199:
URL: https://github.com/apache/arrow/pull/10199#discussion_r626365942




##########
File path: cpp/src/arrow/python/numpy_to_arrow.cc
##########
@@ -594,7 +594,13 @@ Status NumPyConverter::Visit(const FixedSizeBinaryType& 
type) {
 
   if (mask_ != nullptr) {
     Ndarray1DIndexer<uint8_t> mask_values(mask_);
-    RETURN_NOT_OK(builder.AppendValues(data, length_, mask_values.data()));
+    std::unique_ptr<uint8_t[]> inverted_mask(new uint8_t[length_]);
+    for (int64_t i = 0; i < length_; ++i) {
+      inverted_mask[i] = !mask_values[i];
+    }

Review comment:
       I didn't benchmark the code, but notice that we were iterating over a 
mask of the same length and this is the same thing we are also doing for 
strings and varlen binary
   
   
https://github.com/apache/arrow/blob/master/cpp/src/arrow/python/numpy_to_arrow.cc#L561-L567
   
   I do see btw that ``ArrayData::Make`` is able to do a zero copy from numpy 
data under certain conditions, so it will surely be a lot faster if we end into 
that condition.  In the other case, it seems that we end into ``CopyStrided*`` 
which does iterate over the data like now and thus the performance benefit 
would probably be negligible if any.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow] amol- commented on a change in pull request #10199: ARROW-12431: [C++] Mask is inverted when creating FixedSizeBinaryArray

Reply via email to