https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114350
Bug ID: 114350 Summary: missing support for SVE widening floating point conversion Product: gcc Version: 14.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: tnfchris at gcc dot gnu.org Target Milestone: --- Target: aarch64* The following example: #include <arm_sve.h> svfloat64_t widening (svint32_t a) { svbool_t pred = svptrue_b32 (); svint64_t cvt = svreinterpret_s64_s32 (a); svint64_t ext = svextw_s64_x (pred, cvt); svfloat64_t res = svcvt_f64_s64_x (pred, ext); return res; } compiled with -Ofast -march=armv9-a generates: widening(__SVInt32_t): ptrue p3.b, all sxtw z0.d, p3/m, z0.d scvtf z0.d, p3/m, z0.d ret but SVE has widening and narrowing floating point conversions, as such this should generate: widening(__SVInt32_t): ptrue p3.b, all scvtf z0.d, p3/m, z0.s ret The autovec equivalent is: void f(int n, double *data) { for (int i=0;i<n;i++) { data[i] = i; } } which generates: .L5: mov z29.d, z31.d sxtw z29.d, p7/m, z29.d scvtf z29.d, p7/m, z29.d st1d z29.d, p6, [x1, x2, lsl 3] add z31.s, z31.s, z30.s incd x2 whilelo p6.d, w2, w0 b.any .L5 note that scalar has the widening variant as well (which we do use) and we account for in vectorizer costing: (double) i_14 1 times scalar_stmt costs 1 in epilogue _4 1 times scalar_store costs 1 in epilogue Adv. SIMD costing is right: (double) i_14 2 times vec_promote_demote costs 4 in body but SVE costing is wrong: (double) i_14 2 times vector_stmt costs 2 in body which makes SVE seem as expensive than Adv. SIMD. Note that we can also use the widening instruction for Adv. SIMD on a SVE capable system.