[Bug target/114350] New: missing support for SVE widening floating point conversion

tnfchris at gcc dot gnu.org via Gcc-bugs Fri, 15 Mar 2024 00:57:03 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114350


            Bug ID: 114350
           Summary: missing support for SVE widening floating point
                    conversion
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: tnfchris at gcc dot gnu.org
  Target Milestone: ---
            Target: aarch64*

The following example:

#include <arm_sve.h>
svfloat64_t widening (svint32_t a)
{
    svbool_t pred = svptrue_b32 ();
    svint64_t cvt = svreinterpret_s64_s32 (a);
    svint64_t ext = svextw_s64_x (pred, cvt);
    svfloat64_t res = svcvt_f64_s64_x (pred, ext);
    return res;
}

compiled with -Ofast -march=armv9-a generates:

widening(__SVInt32_t):
        ptrue   p3.b, all
        sxtw    z0.d, p3/m, z0.d
        scvtf   z0.d, p3/m, z0.d
        ret

but SVE has widening and narrowing floating point conversions, as such this
should generate:

widening(__SVInt32_t):
        ptrue   p3.b, all
        scvtf   z0.d, p3/m, z0.s
        ret

The autovec equivalent is:


void f(int n, double *data) {
    for (int i=0;i<n;i++) {
        data[i] = i;
    }
}

which generates:

.L5:
        mov     z29.d, z31.d
        sxtw    z29.d, p7/m, z29.d
        scvtf   z29.d, p7/m, z29.d
        st1d    z29.d, p6, [x1, x2, lsl 3]
        add     z31.s, z31.s, z30.s
        incd    x2
        whilelo p6.d, w2, w0
        b.any   .L5

note that scalar has the widening variant as well (which we do use) and we
account for in vectorizer costing:

(double) i_14 1 times scalar_stmt costs 1 in epilogue
_4 1 times scalar_store costs 1 in epilogue

Adv. SIMD costing is right:

(double) i_14 2 times vec_promote_demote costs 4 in body

but SVE costing is wrong:

(double) i_14 2 times vector_stmt costs 2 in body

which makes SVE seem as expensive than Adv. SIMD.
Note that we can also use the widening instruction for Adv. SIMD on a SVE
capable system.

[Bug target/114350] New: missing support for SVE widening floating point conversion

Reply via email to