On Sat, 6 Apr 2024 at 14:17, Nathan Bossart <nathandboss...@gmail.com> wrote: > > On Sat, Apr 06, 2024 at 12:08:14PM +1300, David Rowley wrote: > > Won't Valgrind complain about this? > > > > +pg_popcount_avx512(const char *buf, int bytes) > > > > + buf = (const char *) TYPEALIGN_DOWN(sizeof(__m512i), buf); > > > > + val = _mm512_maskz_loadu_epi8(mask, (const __m512i *) buf); > > I haven't been able to generate any complaints, at least with some simple > tests. But I see your point. If this did cause such complaints, ISTM we'd > just want to add it to the suppression file. Otherwise, I think we'd have > to go back to the non-maskz approach (which I really wanted to avoid > because of the weird function overhead juggling) or find another way to do > a partial load into an __m512i.
[1] seems to think it's ok. If this is true then the following shouldn't segfault: The following seems to run without any issue and if I change the mask to 1 it crashes, as you'd expect. #include <immintrin.h> #include <stdio.h> int main(void) { __m512i val; val = _mm512_maskz_loadu_epi8((__mmask64) 0, NULL); printf("%llu\n", _mm512_reduce_add_epi64(val)); return 0; } gcc avx512.c -o avx512 -O0 -mavx512f -march=native David [1] https://stackoverflow.com/questions/54497141/when-using-a-mask-register-with-avx-512-load-and-stores-is-a-fault-raised-for-i