https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118141
Bug ID: 118141
Summary: GCC miscompiles __builtin_convertvector() narrowing
operation on amd64 above -O1
Product: gcc
Version: unknown
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: middle-end
Assignee: unassigned at gcc dot gnu.org
Reporter: richard.yao at alumni dot stonybrook.edu
Target Milestone: ---
Here is a minimal program that was written to see what code the compiler would
generate to convert an AVX2 ymm register containing single precision floating
point numbers into a xmm register containing bfloat16 floating point numbers,
under the assumption that no subnormal numbers were passed:
https://godbolt.org/z/xhvc557xv
GCC trunk gives the following output:
bfloat16 value 0: 0x0000
bfloat16 value 1: 0x0000
bfloat16 value 2: 0x0000
bfloat16 value 3: 0x0000
bfloat16 value 4: 0x0000
bfloat16 value 5: 0x0000
bfloat16 value 6: 0x0000
bfloat16 value 7: 0x0000
GCC 14.2 gives the following output:
bfloat16 value 0: 0x0000
bfloat16 value 1: 0x0000
bfloat16 value 2: 0x178b
bfloat16 value 3: 0x0000
bfloat16 value 4: 0xc02f
bfloat16 value 5: 0x0000
bfloat16 value 6: 0x0000
bfloat16 value 7: 0x0000
Both are wrong. Clang gives the following output, which is correct:
bfloat16 value 0: 0x3f80
bfloat16 value 1: 0x4000
bfloat16 value 2: 0x4040
bfloat16 value 3: 0x4080
bfloat16 value 4: 0x40a0
bfloat16 value 5: 0x40c0
bfloat16 value 6: 0x40f9
bfloat16 value 7: 0x4100
https://godbolt.org/z/769W8Pzxx
Interestingly, if -O1 is used, GCC does not miscompile it. I assume this is a
middle end optimization issue since miscompilation appears to also occur on
arm64.