https://gcc.gnu.org/bugzilla/show_bug.cgi?id=39821
Andrew Pinski <pinskia at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Component|tree-optimization |target --- Comment #5 from Andrew Pinski <pinskia at gcc dot gnu.org> --- The code generation for aarch64 looks fine: dotproduct_order4: .LFB1: .cfi_startproc ldr q1, [x0] ldr q2, [x1] smull v0.2d, v2.2s, v1.2s smlal2 v0.2d, v2.4s, v1.4s addp d0, v0.2d fmov x0, d0 ret vect__6.41_18 = MEM <vector(4) int> [(int32_t *)v1_2(D)]; vect__10.44_13 = MEM <vector(4) int> [(int32_t *)v2_3(D)]; vect_patt_25.45_8 = WIDEN_MULT_LO_EXPR <vect__10.44_13, vect__6.41_18>; vect_patt_25.45_4 = WIDEN_MULT_HI_EXPR <vect__10.44_13, vect__6.41_18>; vect_accum_14.46_31 = vect_patt_25.45_4 + vect_patt_25.45_8; _33 = .REDUC_PLUS (vect_accum_14.46_31); [tail call] ---- CUT ---- Even the gimple level for x86_64 looks ok: vect__6.41_18 = MEM <vector(4) int> [(int32_t *)v1_2(D)]; vect__10.44_13 = MEM <vector(4) int> [(int32_t *)v2_3(D)]; vect_patt_25.45_8 = WIDEN_MULT_LO_EXPR <vect__10.44_13, vect__6.41_18>; vect_patt_25.45_4 = WIDEN_MULT_HI_EXPR <vect__10.44_13, vect__6.41_18>; vect_accum_14.46_31 = vect_patt_25.45_4 + vect_patt_25.45_8; _33 = VEC_PERM_EXPR <vect_accum_14.46_31, { 0, 0 }, { 1, 2 }>; _34 = vect_accum_14.46_31 + _33; stmp_accum_14.47_35 = BIT_FIELD_REF <_34, 64, 0>; But the expansion looks bad.