https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110105

            Bug ID: 110105
           Summary: ARM GCC: underoptimization: expected vfma.f16, actual
                    vcvtb-vfma.f32-vcvtb
           Product: gcc
           Version: 13.1.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: pavel.morozkin at gmail dot com
  Target Milestone: ---

This code:
__fp16 mul(__fp16 x, __fp16 y, __fp16 z)
{
    return x * y + z;
}

compiled as:
gcc -O3 -mfpu=fp-armv8 -march=armv8.2-a+fp16

produces the following assembler code:
mul:
        vcvtb.f32.f16   s0, s0
        vcvtb.f32.f16   s1, s1
        vcvtb.f32.f16   s2, s2
        vfma.f32        s2, s0, s1
        vcvtb.f16.f32   s0, s2
        bx      lr

Here we see vcvtb-vfma.f32-vcvtb while a single vfma.f16 is expected.

Reply via email to