Jim Apple has posted comments on this change. Change subject: IMPALA-4300: Speed up BloomFilter::Or with SIMD ......................................................................
Patch Set 1: (13 comments) http://gerrit.cloudera.org:8080/#/c/4813/1/be/src/util/bloom-filter.cc File be/src/util/bloom-filter.cc: Line 167 > One reason auto-vectorisation must fail here is that the two directories ma What's stringe is that without the pragma and without __restrict__, Matt Godbolt's compiler explorer still shows the loop being autovectorized. I tried loop unrolling, but it still didn't vectorize. PS1, Line 162: DCHECK > DCHECK_EQ Done Line 163: auto simd_in = reinterpret_cast<const double*>(in); > I think double*/const double* is more readable than auto here - I'm used to Done PS1, Line 166: int > Use size_t for consistency? I had to think about whether the implicit casts Done PS1, Line 168: _mm256_loadu_pd > We should be able to use the aligned versions here, right, since we allocat BloomFilters do, but TBloomFilters do not, and from using gdb I see that they are not 32-byte aligned. PS1, Line 181: should > It shouldn't actually with the code in it's current form since it has to as Let's continue conversation on overlapping at the other comment. Line 184: // TODO: Tune gcc flags to auto-vectorize. This might not be possible. > See my other comment - should be possible. Let's continue conversation on that comment. Line 186: // TODO: IMPALA-4312: Evaluate clang for builds other than ASAN. > Just remove this TODO? I don't think it's informative for people trying to Done Line 190: auto simd_in = reinterpret_cast<const __m128i*>(&in.directory[0]); > This code is simple enough that I'm fine with moving forward with it even i I will leave as-is. PS1, Line 191: auto > I think just repeating the types const__m128i* or __m128i* is more readable Done PS1, Line 192: auto > I'd prefer the int type was explicit here. Done PS1, Line 196: int > size_t for consistency? Done PS1, Line 199: _mm_storeu_si128 > We should be able to use the aligned versions here too. Let's continue this conversation above. FWIW, i also tried some very complex methods of using non-simd ops to move both pointers forward until at least one was aligned. The performance was sometimes worse, so I put that aside for another day. -- To view, visit http://gerrit.cloudera.org:8080/4813 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I840799d9cfb81285c796e2abfe2029bb869b0f67 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Jim Apple <jbap...@cloudera.com> Gerrit-Reviewer: Jim Apple <jbap...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-HasComments: Yes