https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102812
Bug ID: 102812 Summary: Unoptimal (and wrong) code for _Float16 insert Product: gcc Version: 12.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: ubizjak at gmail dot com Target Milestone: --- Following code: --cut here-- typedef _Float16 v8hf __attribute__((__vector_size__ (16))); v8hf t (_Float16 a) { return (v8hf){a, 0, 0, 0, 0, 0, 0, 0}; } --cut here-- compiles with -msse4 to: pxor %xmm15, %xmm15 movaps %xmm15, -56(%rsp) pextrw $0, %xmm0, -56(%rsp) vmovdqa64 -56(%rsp), %xmm0 PBLWNDW with cleared %xmm15 would be much more optimal, and wouldn't use memory. Also, VMOVDQA64 is an AVX512F/AVX512VL, not a SSE4 (not even AVX) instruction.