https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97194

--- Comment #12 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Alexander Monakov from comment #11)
> Yeah, for inserts such tactic would be inappropriate due to bad store
> forwarding stalls anyway. As you've shown in earlier comments, inserts have
> a very nice generic way to expand them (that does not touch stack).

Unfortunately it doesn't work (the CSE).  Patch:

diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
index 1eaa1da11b9..f7b1a92dd95 100644
--- a/gcc/cfgexpand.c
+++ b/gcc/cfgexpand.c
@@ -6102,7 +6102,11 @@ discover_nonconstant_array_refs_r (tree * tp, int
*walk_subtrees,
             || CONVERT_EXPR_P (t))
        t = TREE_OPERAND (t, 0);

-      if (TREE_CODE (t) == ARRAY_REF || TREE_CODE (t) == ARRAY_RANGE_REF)
+      if ((TREE_CODE (t) == ARRAY_REF
+          && !(TREE_CODE (TREE_OPERAND (t, 0)) == VIEW_CONVERT_EXPR
+               && DECL_P (TREE_OPERAND (TREE_OPERAND (t, 0), 0)))
+               && VECTOR_TYPE_P (TREE_TYPE (TREE_OPERAND (TREE_OPERAND (t, 0),
0))))
+          || TREE_CODE (t) == ARRAY_RANGE_REF)
        {
          t = get_base_address (t);
          if (t && DECL_P (t)


and for

typedef int v4si __attribute__((vector_size(16)));

int foo (v4si v, int i)
{
  v = v + v;
  return v[i] + v[2*i];
}

at -O2 we get

foo:
.LFB0:
        .cfi_startproc
        leal    (%rdi,%rdi), %edx
        paddd   %xmm0, %xmm0
        movslq  %edi, %rdi
        movslq  %edx, %rdx
        movaps  %xmm0, -24(%rsp)
        movaps  %xmm0, -40(%rsp)
        movl    -40(%rsp,%rdi,4), %eax
        addl    -24(%rsp,%rdx,4), %eax
        ret

we likely also not get rid of the stack allocation.  Maybe it's due to the
way expand does the temporary spill, not ending its lifetime, not sure.
We're definitely not "remembering" the spill slot used for 'v' and do
not re-use it, there's no mechanism for that IIRC.

At least we don't ICE for the specific case of vectors.  We're running into

        /* If we have either an offset, a BLKmode result, or a reference
           outside the underlying object, we must force it to memory.
           Such a case can occur in Ada if we have unchecked conversion
           of an expression from a scalar type to an aggregate type or
           for an ARRAY_RANGE_REF whose type is BLKmode, or if we were
           passed a partially uninitialized object or a view-conversion
           to a larger size.  */
        must_force_mem = (offset
                          || mode1 == BLKmode
                          || (mode == BLKmode
                              && !int_mode_for_size (bitsize, 1).exists ())
                          || maybe_gt (bitpos + bitsize,
                                       GET_MODE_BITSIZE (mode2)));

where 'offset' is MULT_EXPR and we've sofar expanded 'v' to op0 = (reg/v:V4SI
88 [ v ])
and then

        /* Otherwise, if this is a constant or the object is not in memory
           and need be, put it there.  */
        else if (CONSTANT_P (op0) || (!MEM_P (op0) && must_force_mem))
          {
            memloc = assign_temp (TREE_TYPE (tem), 1, 1);
            emit_move_insn (memloc, op0);
            op0 = memloc;
            clear_mem_expr = true;
          }

Reply via email to