https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80846

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |vmakarov at gcc dot gnu.org

--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
Hmm, for some reason we end up spilling:

.L2:
        vmovdqa -48(%rbp), %ymm3
        addq    $32, %rdi
        vpaddd  -32(%rdi), %ymm3, %ymm2
        cmpq    %rdi, %rax
        vmovdqa %ymm2, -48(%rbp)
        jne     .L2
        vmovdqa -32(%rbp), %xmm5
        vpaddd  -48(%rbp), %xmm5, %xmm0
        vpsrldq $8, %xmm0, %xmm1
        vpaddd  %xmm1, %xmm0, %xmm0
        vpsrldq $4, %xmm0, %xmm1
        vpaddd  %xmm1, %xmm0, %xmm0
        vmovd   %xmm0, %eax

when expanding the epilogue

  _6 = BIT_FIELD_REF <vect_sum_11.6_14, 128, 0>;
  _5 = BIT_FIELD_REF <vect_sum_11.6_14, 128, 128>;
  _29 = _5 + _6;
  vect_sum_11.8_22 = VEC_PERM_EXPR <_29, { 0, 0, 0, 0 }, { 2, 3, 4, 5 }>;
  vect_sum_11.8_21 = vect_sum_11.8_22 + _29;
  vect_sum_11.8_20 = VEC_PERM_EXPR <vect_sum_11.8_21, { 0, 0, 0, 0 }, { 1, 2,
3, 4 }>;
  vect_sum_11.8_19 = vect_sum_11.8_20 + vect_sum_11.8_21;
  stmp_sum_11.7_18 = BIT_FIELD_REF <vect_sum_11.8_19, 32, 0>;
  return stmp_sum_11.7_18;

as

;; _29 = _5 + _6;

(insn 17 16 18 (set (reg:OI 101)
        (subreg:OI (reg:V8SI 90 [ vect_sum_11.6 ]) 0)) -1
     (nil))

(insn 18 17 19 (set (reg:OI 102)
        (subreg:OI (reg:V8SI 90 [ vect_sum_11.6 ]) 0)) -1
     (nil))

(insn 19 18 0 (set (reg:V4SI 98 [ _29 ])
        (plus:V4SI (subreg:V4SI (reg:OI 101) 16)
            (subreg:V4SI (reg:OI 102) 0))) -1
     (nil))

before RA:

(insn 19 16 20 4 (set (reg:V4SI 98 [ _29 ])
        (plus:V4SI (subreg:V4SI (reg:V8SI 90 [ vect_sum_11.6 ]) 16)
            (subreg:V4SI (reg:V8SI 90 [ vect_sum_11.6 ]) 0))) 2990 {*addv4si3}
     (expr_list:REG_DEAD (reg:V8SI 90 [ vect_sum_11.6 ])
        (nil)))

so the issue is likely that we do not expose the splitting in separate
instructions but we assume LRA can deal with reloading the above w/o stack?

         Choosing alt 1 in insn 19:  (0) v  (1) v  (2) vm {*addv4si3}
            alt=0: Bad operand -- refuse
            alt=1: Bad operand -- refuse
          alt=2,overall=0,losers=0,rld_nregs=0

not sure if it would help not to combine the subregs into the plus in the
first place or some reload hook could help here?

With the above issue the patch I have is likely not going to help ;)

Reply via email to