https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121688
Bug ID: 121688
Summary: F16C/AVX512F cvtph2ps and cvtps2ph not used on
__builtin_convertvector
Product: gcc
Version: 16.0
Status: UNCONFIRMED
Keywords: missed-optimization
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: mkretz at gcc dot gnu.org
Target Milestone: ---
Target: x86-64-*-*, i686-*-*
Test case (https://compiler-explorer.com/z/Yz7hvxGd1):
using v4hf [[gnu::vector_size(8)]] = _Float16;
using v8hf [[gnu::vector_size(16)]] = _Float16;
using v16hf [[gnu::vector_size(32)]] = _Float16;
using v4sf [[gnu::vector_size(16)]] = float;
using v8sf [[gnu::vector_size(32)]] = float;
using v16sf [[gnu::vector_size(64)]] = float;
v4sf cvtph2ps(v4hf x)
{ return __builtin_convertvector(x, v4sf); }
v4hf cvtps2ph(v4sf x)
{ return __builtin_convertvector(x, v4hf); }
v8sf cvtph2ps(v8hf x)
{ return __builtin_convertvector(x, v8sf); }
v8hf cvtps2ph(v8sf x)
{ return __builtin_convertvector(x, v8hf); }
v16sf cvtph2ps(v16hf x)
{ return __builtin_convertvector(x, v16sf); }
v16hf cvtps2ph(v16sf x)
{ return __builtin_convertvector(x, v16hf); }
Compile with -O2 -march=x86-64-v4 (or -v3).
All of these functions should get translated to a single cvtph2ps/cvtps2ph
instruction + ret. Similar to when compiling with '-mavx512fp16', except that
the 'x' from the instruction needs to be removed 😉.
(This seems to be a prerequisite for PR121587.)