Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4300: Speed up BloomFilter::Or with SIMD
......................................................................


Patch Set 1:

(4 comments)

LGTM, just want to make sure the comment is a little clearer.

http://gerrit.cloudera.org:8080/#/c/4813/1/be/src/util/bloom-filter.cc
File be/src/util/bloom-filter.cc:

Line 167
> What's stringe is that without the pragma and without __restrict__, Matt Go
I experimented with some variants: https://godbolt.org/g/rFww4g 
.BloomFilterOrPointersUnrolledIvDep() and BloomFilterOrIvDepInt() emit exactly 
the code you want.
There are a few things that make a difference:

* Avoiding the indirection via the in/out pointers helps a lot. If we extract 
vector.data()/vector.size() from the loop body we can get the same effect.
* Rather subtly, using a signed/unsigned loop counter makes a difference (I 
think with the signed type the compiler is allow to assume no overflow and this 
somehow helps).
* The unrolling and #pragma ivdep aren't always necessary to emit the code, but 
it's a lot cleaner with them.  I think gcc's vectoriser is smart enough to 
insert a check for whether the arrays overlap and two versions of the code for 
the overlapping/non-overlapping cases.

The explicit SIMD version has the virtue of not being sensitive to all these 
things.


PS1, Line 168: _mm256_loadu_pd
> BloomFilters do, but TBloomFilters do not, and from using gdb I see that th
Oh, duh.


PS1, Line 199: _mm_storeu_si128
> Let's continue this conversation above.
Yeah, I doubt there's much difference between the aligned/unaligned versions, 
just didn't want to use the unaligned ones if there was already an invariant 
that the memory was aligned.


http://gerrit.cloudera.org:8080/#/c/4813/2/be/src/util/bloom-filter.cc
File be/src/util/bloom-filter.cc:

Line 181:   // The trivial loop out[i] |= in[i] should auto-vectorize with gcc 
at -O3, but it is not
Let's fix this comment so it's clear that the issue is that the code is not 
written in a way that is friendly to auto-vectorization.

I've seen other cryptic comments about these kind of things ("compiler is 
confused by x")  and they're hard to know what to do with (if not outright 
misleading).


-- 
To view, visit http://gerrit.cloudera.org:8080/4813
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I840799d9cfb81285c796e2abfe2029bb869b0f67
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Jim Apple <jbap...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jbap...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-HasComments: Yes

Reply via email to