https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82189

--- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
We do better now:

        ldp     s1, s3, [x1]
        dup     v0.4s, v0.s[0]
        ldr     s2, [x2, 4]
        ins     v1.s[1], v3.s[0]
        ld1     {v1.s}[2], [x2]
        ins     v1.s[3], v2.s[0]
        fdiv    v1.4s, v1.4s, v0.4s
        str     q1, [x0]


  _19 = {t_12(D), t_12(D), t_12(D), t_12(D)};
  _1 = *b_9(D);
  _3 = MEM[(float *)b_9(D) + 4B];
  _5 = *c_15(D);
  _7 = MEM[(float *)c_15(D) + 4B];
  _20 = {_1, _3, _5, _7};
  vect__2.3_18 = _20 / _19;
  MEM <vector(4) float> [(float *)a_11(D)] = vect__2.3_18;

But we still don't Do the merging of the loads.

Reply via email to