On Thu, 13 Jan 2022, Jakub Jelinek wrote:
> On Thu, Jan 13, 2022 at 02:49:47PM +0100, Richard Biener wrote:
> > > + tree d = build_debug_expr_decl (type);
> > > + gdebug *g
> > > + = gimple_build_debug_bind (d, build2 (rcode, type,
> > > + new_lhs, arg),
> > > + stmt2);
> > > + gsi_insert_after (&gsi, g, GSI_NEW_STMT);
> > > + replace_uses_by (lhs2, d);
> >
> > I wonder if you can leave a lhs2 = d; in the IL instead of using
> > replace_uses_by which will process imm uses and fold stmts while
> > we're going to do that anyway in the caller? That would IMHO
> > be better here.
>
> I'd need to emit them always for reversible ops and when the
> atomic call can't be last, regardless of whether it is needed or not,
> just so that next DCE would remove those up and emit those debug stmts,
> because otherwise that could result in -fcompare-debug failures
> (at least with -fno-tree-dce -fno-tree-whatever ...).
> And
> + tree narg = build_debug_expr_decl (type);
> + gdebug *g
> + = gimple_build_debug_bind (narg,
> + fold_convert (type, arg),
> + stmt2);
> isn't that much more code compared to
> gimple *g = gimple_build_assign (lhs2, NOP_EXPR, arg);
> Or would you like it to be emitted always, i.e.
> if (atomic_op != BIT_AND_EXPR
> && atomic_op != BIT_IOR_EXPR
> /* With -fnon-call-exceptions if we can't
> add stmts after the call easily. */
> && !stmt_ends_bb_p (stmt2))
> {
> tree type = TREE_TYPE (lhs2);
> if (TREE_CODE (arg) == INTEGER_CST)
> arg = fold_convert (type, arg);
> else if (!useless_type_conversion_p (type, TREE_TYPE (arg)))
> {
> tree narg = make_ssa_name (type);
> gimple *g = gimple_build_assign (narg, NOP_EXPR, arg);
> gsi_insert_after (&gsi, g, GSI_NEW_STMT);
> arg = narg;
> }
> enum tree_code rcode;
> switch (atomic_op)
> {
> case PLUS_EXPR: rcode = MINUS_EXPR; break;
> case MINUS_EXPR: rcode = PLUS_EXPR; break;
> case BIT_XOR_EXPR: rcode = atomic_op; break;
> default: gcc_unreachable ();
> }
> tree d = build_debug_expr_decl (type);
> gimple *g = gimple_build_assign (lhs2, rcode, new_lhs, arg);
> gsi_insert_after (&gsi, g, GSI_NEW_STMT);
> lhs2 = NULL_TREE;
> }
> in between
> update_stmt (use_stmt);
> and
> imm_use_iterator iter;
> and then do the
> FOR_EACH_IMM_USE_STMT (use_stmt, iter, lhs2)
> if (use_stmt != cast_stmt)
> with resetting only if (lhs2)
> and similarly release_ssa_name (lhs2) only if (lhs2)?
> I think the usual case is that we emit debug exprs right away,
> not emit something that we want to DCE.
>
> + if (atomic_op == BIT_AND_EXPR
> + || atomic_op == BIT_IOR_EXPR
> + /* Or with -fnon-call-exceptions if we can't
> + add debug stmts after the call. */
> + || stmt_ends_bb_p (stmt2))
>
>
> But now that you mention it, I think I don't handle right the
> case where lhs2 has no debug uses but there is a cast_stmt that has debug
> uses for its lhs. We'd need to add_debug_temp in that case too and
> add a debug temp.
I'm mostly concerned about the replace_uses_by use. forwprop
will go over newly emitted stmts and thus the hypothetical added
lhs2 = d;
record the copy and schedule the stmt for removal, substituting 'd'
in each use as it goes along the function and folding them. It's
a bit iffy (and maybe has unintended side-effects in odd cases)
to trample around and fold stuff behind that flows back.
I'd always vote to simplify the folding code so it's easier to
maintain and not micro-optimize there since it's not going to be
a hot part of the compiler.
Richard.