amol- commented on a change in pull request #10199:
URL: https://github.com/apache/arrow/pull/10199#discussion_r626365942
##########
File path: cpp/src/arrow/python/numpy_to_arrow.cc
##########
@@ -594,7 +594,13 @@ Status NumPyConverter::Visit(const FixedSizeBinaryType&
type) {
if (mask_ != nullptr) {
Ndarray1DIndexer<uint8_t> mask_values(mask_);
- RETURN_NOT_OK(builder.AppendValues(data, length_, mask_values.data()));
+ std::unique_ptr<uint8_t[]> inverted_mask(new uint8_t[length_]);
+ for (int64_t i = 0; i < length_; ++i) {
+ inverted_mask[i] = !mask_values[i];
+ }
Review comment:
I didn't benchmark the code, but notice that we were iterating over a
mask of the same length and this is the same thing we are also doing for
strings and varlen binary
https://github.com/apache/arrow/blob/master/cpp/src/arrow/python/numpy_to_arrow.cc#L561-L567
I do see btw that ``ArrayData::Make`` is able to do a zero copy from numpy
data under certain conditions, so it will surely be a lot faster if we end into
that condition. In the other case, it seems that we end into ``CopyStrided*``
which does iterate over the data like now and thus the performance benefit
would probably be negligible if any.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]