[ https://issues.apache.org/jira/browse/ARROW-8553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17108159#comment-17108159 ]
Yibo Cai commented on ARROW-8553: --------------------------------- [~wesm], the aligned case is simple enough for compiler to auto vectorize the code. Did a quick test with below patch, no obvious performance diff found. {code:c} diff --git a/cpp/src/arrow/util/bit_util.cc b/cpp/src/arrow/util/bit_util.cc index 395801f5e..8beaf6cb8 100644 --- a/cpp/src/arrow/util/bit_util.cc +++ b/cpp/src/arrow/util/bit_util.cc @@ -261,7 +261,7 @@ template <template <typename> class BitOp> void AlignedBitmapOp(const uint8_t* left, int64_t left_offset, const uint8_t* right, int64_t right_offset, uint8_t* out, int64_t out_offset, int64_t length) { - BitOp<uint8_t> op; + BitOp<uint64_t> op; DCHECK_EQ(left_offset % 8, right_offset % 8); DCHECK_EQ(left_offset % 8, out_offset % 8); @@ -269,8 +269,11 @@ void AlignedBitmapOp(const uint8_t* left, int64_t left_offset, const uint8_t* ri left += left_offset / 8; right += right_offset / 8; out += out_offset / 8; - for (int64_t i = 0; i < nbytes; ++i) { - out[i] = op(left[i], right[i]); + uint64_t *out64 = (uint64_t*)out; + uint64_t *left64 = (uint64_t*)left; + uint64_t *right64 = (uint64_t*)right; + for (int64_t i = 0; i < nbytes/8; ++i) { + out64[i] = op(left64[i], right64[i]); } } {code} Benchmark before this patch (in uint8) {code:c} BenchmarkBitmapAnd/32768/0 4253 ns 4251 ns 164715 bytes_per_second=7.17813G/s BenchmarkBitmapAnd/131072/0 16767 ns 16760 ns 41875 bytes_per_second=7.28348G/s BenchmarkBitmapAnd/32768/0 4264 ns 4262 ns 165145 bytes_per_second=7.15959G/s BenchmarkBitmapAnd/131072/0 16702 ns 16695 ns 41952 bytes_per_second=7.31158G/s {code} Benchmark after this patch (in uint64) {code:c} BenchmarkBitmapAnd/32768/0 4133 ns 4131 ns 171808 bytes_per_second=7.38787G/s BenchmarkBitmapAnd/131072/0 17167 ns 17157 ns 40529 bytes_per_second=7.11491G/s BenchmarkBitmapAnd/32768/0 4103 ns 4101 ns 171883 bytes_per_second=7.44151G/s BenchmarkBitmapAnd/131072/0 17351 ns 17343 ns 43299 bytes_per_second=7.0385G/s {code} > [C++] Optimize unaligned bitmap operations > ------------------------------------------ > > Key: ARROW-8553 > URL: https://issues.apache.org/jira/browse/ARROW-8553 > Project: Apache Arrow > Issue Type: Improvement > Components: C++ > Affects Versions: 0.17.0 > Reporter: Antoine Pitrou > Assignee: Yibo Cai > Priority: Major > Labels: pull-request-available > Fix For: 1.0.0 > > Time Spent: 4h 40m > Remaining Estimate: 0h > > Currently, {{BitmapAnd}} uses a bit-by-bit loop for unaligned inputs. Using > {{Bitmap::VisitWords}} instead would probably yield a manyfold performance > increase. -- This message was sent by Atlassian Jira (v8.3.4#803005)