Tim Armstrong has posted comments on this change. Change subject: IMPALA-4300: Speed up BloomFilter::Or with SIMD ......................................................................
Patch Set 1: (4 comments) LGTM, just want to make sure the comment is a little clearer. http://gerrit.cloudera.org:8080/#/c/4813/1/be/src/util/bloom-filter.cc File be/src/util/bloom-filter.cc: Line 167 > What's stringe is that without the pragma and without __restrict__, Matt Go I experimented with some variants: https://godbolt.org/g/rFww4g .BloomFilterOrPointersUnrolledIvDep() and BloomFilterOrIvDepInt() emit exactly the code you want. There are a few things that make a difference: * Avoiding the indirection via the in/out pointers helps a lot. If we extract vector.data()/vector.size() from the loop body we can get the same effect. * Rather subtly, using a signed/unsigned loop counter makes a difference (I think with the signed type the compiler is allow to assume no overflow and this somehow helps). * The unrolling and #pragma ivdep aren't always necessary to emit the code, but it's a lot cleaner with them. I think gcc's vectoriser is smart enough to insert a check for whether the arrays overlap and two versions of the code for the overlapping/non-overlapping cases. The explicit SIMD version has the virtue of not being sensitive to all these things. PS1, Line 168: _mm256_loadu_pd > BloomFilters do, but TBloomFilters do not, and from using gdb I see that th Oh, duh. PS1, Line 199: _mm_storeu_si128 > Let's continue this conversation above. Yeah, I doubt there's much difference between the aligned/unaligned versions, just didn't want to use the unaligned ones if there was already an invariant that the memory was aligned. http://gerrit.cloudera.org:8080/#/c/4813/2/be/src/util/bloom-filter.cc File be/src/util/bloom-filter.cc: Line 181: // The trivial loop out[i] |= in[i] should auto-vectorize with gcc at -O3, but it is not Let's fix this comment so it's clear that the issue is that the code is not written in a way that is friendly to auto-vectorization. I've seen other cryptic comments about these kind of things ("compiler is confused by x") and they're hard to know what to do with (if not outright misleading). -- To view, visit http://gerrit.cloudera.org:8080/4813 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I840799d9cfb81285c796e2abfe2029bb869b0f67 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Jim Apple <jbap...@cloudera.com> Gerrit-Reviewer: Jim Apple <jbap...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-HasComments: Yes