https://gcc.gnu.org/bugzilla/show_bug.cgi?id=73350
Bug ID: 73350 Summary: AVX512: GCC optimizes away rounding flags Product: gcc Version: 7.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: wen...@mitsuba-renderer.org Target Milestone: --- The AVX512 instruction set introduced the ability to specify a rounding flag for almost every arithmetic operation that is subject to rounding. This is extremely useful because it eliminates the need to mess around with the MXCSR control register when using tools like interval arithmetic that need control of rounding. Unfortunately, support for this is currently broken in GCC. Specifically, the GCC optimizer does not seem to distinguish between function variants with different rounding modes and ends up merging them during common subexpression elimination. Consider the simple program attached below, which computes "1 + pi" with +inf and -inf rounding modes and then prints the difference of these values. The expected output is: $ g++ test.c -o test -mavx512f -O0 -fomit-frame-pointer -fomit-frame-pointer && ./test -4.76837e-07 At optimization level, -O1, this currently stops working (tested with GCC trunk): $ g++ test.c -o test -mavx512f -O0 -fomit-frame-pointer -fomit-frame-pointer && ./test -4.76837e-07 Looking at the assembly, there are two surprising things: first, common subexpression elimination seems to have (partially) merged the two additions. The second add is still generated but its result is never used. The other weird thing is that GCC decides to fill a mask register with '-1' and then use the masked versions of these operations instead of using the unmasked versions, which use a "-1" mask by default. _main: leaq 8(%rsp), %r10 andq $-64, %rsp pushq -8(%r10) pushq %rbp movq %rsp, %rbp pushq %r10 subq $40, %rsp movl $-1, %eax kmovw %eax, %k1 vbroadcastss LC0(%rip), %zmm1 vbroadcastss LC1(%rip), %zmm2 vaddps {rd-sae}, %zmm2, %zmm1, %zmm0{%k1}{z} <------ Why use mask? vaddps {ru-sae}, %zmm2, %zmm1, %zmm1{%k1}{z} vsubss %xmm0, %xmm0, %xmm0 <------ xmm0 ?????? vcvtss2sd %xmm0, %xmm0, %xmm0 leaq LC2(%rip), %rdi movl $1, %eax call _printf movl $0, %eax addq $40, %rsp popq %r10 popq %rbp leaq -8(%r10), %rsp ret // ============== Program to reproduce ============ #include <stdio.h> #include <math.h> #include <immintrin.h> int main(int argc, char *argv[]) { __m512 a = _mm512_set1_ps((float) M_PI); __m512 b = _mm512_set1_ps((float) 1.f); __m512 result1 = _mm512_add_round_ps(a, b, (_MM_FROUND_TO_NEG_INF | _MM_FROUND_NO_EXC)); __m512 result2 = _mm512_add_round_ps(a, b, (_MM_FROUND_TO_POS_INF | _MM_FROUND_NO_EXC)); printf("%g\n", result1[0] - result2[0]); return 0; }