Jim Apple has posted comments on this change.

Change subject: IMPALA-4300: Speed up BloomFilter::Or with SIMD
......................................................................


Patch Set 1:

(13 comments)

http://gerrit.cloudera.org:8080/#/c/4813/1/be/src/util/bloom-filter.cc
File be/src/util/bloom-filter.cc:

Line 167
> One reason auto-vectorisation must fail here is that the two directories ma
What's stringe is that without the pragma and without __restrict__, Matt 
Godbolt's compiler explorer still shows the loop being autovectorized.

I tried loop unrolling, but it still didn't vectorize.


PS1, Line 162: DCHECK
> DCHECK_EQ
Done


Line 163:   auto simd_in = reinterpret_cast<const double*>(in);
> I think double*/const double* is more readable than auto here - I'm used to
Done


PS1, Line 166: int
> Use size_t for consistency? I had to think about whether the implicit casts
Done


PS1, Line 168: _mm256_loadu_pd
> We should be able to use the aligned versions here, right, since we allocat
BloomFilters do, but TBloomFilters do not, and from using gdb I see that they 
are not 32-byte aligned.


PS1, Line 181: should
> It shouldn't actually with the code in it's current form since it has to as
Let's continue conversation on overlapping at the other comment.


Line 184:   // TODO: Tune gcc flags to auto-vectorize. This might not be 
possible.
> See my other comment - should be possible.
Let's continue conversation on that comment.


Line 186:   // TODO: IMPALA-4312: Evaluate clang for builds other than ASAN.
> Just remove this TODO? I don't think it's informative for people trying to 
Done


Line 190:     auto simd_in = reinterpret_cast<const __m128i*>(&in.directory[0]);
> This code is simple enough that I'm fine with moving forward with it even i
I will leave as-is.


PS1, Line 191: auto
> I think just repeating the types const__m128i* or __m128i* is more readable
Done


PS1, Line 192: auto
> I'd prefer the int type was explicit here.
Done


PS1, Line 196: int
> size_t for consistency?
Done


PS1, Line 199: _mm_storeu_si128
> We should be able to use the aligned versions here too.
Let's continue this conversation above.

FWIW, i also tried some very complex methods of using non-simd ops to move both 
pointers forward until at least one was aligned. The performance was sometimes 
worse, so I put that aside for another day.


-- 
To view, visit http://gerrit.cloudera.org:8080/4813
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I840799d9cfb81285c796e2abfe2029bb869b0f67
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Jim Apple <jbap...@cloudera.com>
Gerrit-Reviewer: Jim Apple <jbap...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-HasComments: Yes

Reply via email to