https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97194
--- Comment #12 from Richard Biener <rguenth at gcc dot gnu.org> --- (In reply to Alexander Monakov from comment #11) > Yeah, for inserts such tactic would be inappropriate due to bad store > forwarding stalls anyway. As you've shown in earlier comments, inserts have > a very nice generic way to expand them (that does not touch stack). Unfortunately it doesn't work (the CSE). Patch: diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c index 1eaa1da11b9..f7b1a92dd95 100644 --- a/gcc/cfgexpand.c +++ b/gcc/cfgexpand.c @@ -6102,7 +6102,11 @@ discover_nonconstant_array_refs_r (tree * tp, int *walk_subtrees, || CONVERT_EXPR_P (t)) t = TREE_OPERAND (t, 0); - if (TREE_CODE (t) == ARRAY_REF || TREE_CODE (t) == ARRAY_RANGE_REF) + if ((TREE_CODE (t) == ARRAY_REF + && !(TREE_CODE (TREE_OPERAND (t, 0)) == VIEW_CONVERT_EXPR + && DECL_P (TREE_OPERAND (TREE_OPERAND (t, 0), 0))) + && VECTOR_TYPE_P (TREE_TYPE (TREE_OPERAND (TREE_OPERAND (t, 0), 0)))) + || TREE_CODE (t) == ARRAY_RANGE_REF) { t = get_base_address (t); if (t && DECL_P (t) and for typedef int v4si __attribute__((vector_size(16))); int foo (v4si v, int i) { v = v + v; return v[i] + v[2*i]; } at -O2 we get foo: .LFB0: .cfi_startproc leal (%rdi,%rdi), %edx paddd %xmm0, %xmm0 movslq %edi, %rdi movslq %edx, %rdx movaps %xmm0, -24(%rsp) movaps %xmm0, -40(%rsp) movl -40(%rsp,%rdi,4), %eax addl -24(%rsp,%rdx,4), %eax ret we likely also not get rid of the stack allocation. Maybe it's due to the way expand does the temporary spill, not ending its lifetime, not sure. We're definitely not "remembering" the spill slot used for 'v' and do not re-use it, there's no mechanism for that IIRC. At least we don't ICE for the specific case of vectors. We're running into /* If we have either an offset, a BLKmode result, or a reference outside the underlying object, we must force it to memory. Such a case can occur in Ada if we have unchecked conversion of an expression from a scalar type to an aggregate type or for an ARRAY_RANGE_REF whose type is BLKmode, or if we were passed a partially uninitialized object or a view-conversion to a larger size. */ must_force_mem = (offset || mode1 == BLKmode || (mode == BLKmode && !int_mode_for_size (bitsize, 1).exists ()) || maybe_gt (bitpos + bitsize, GET_MODE_BITSIZE (mode2))); where 'offset' is MULT_EXPR and we've sofar expanded 'v' to op0 = (reg/v:V4SI 88 [ v ]) and then /* Otherwise, if this is a constant or the object is not in memory and need be, put it there. */ else if (CONSTANT_P (op0) || (!MEM_P (op0) && must_force_mem)) { memloc = assign_temp (TREE_TYPE (tem), 1, 1); emit_move_insn (memloc, op0); op0 = memloc; clear_mem_expr = true; }