On Thu, 13 Jan 2022, Jakub Jelinek wrote: > On Thu, Jan 13, 2022 at 02:49:47PM +0100, Richard Biener wrote: > > > + tree d = build_debug_expr_decl (type); > > > + gdebug *g > > > + = gimple_build_debug_bind (d, build2 (rcode, type, > > > + new_lhs, arg), > > > + stmt2); > > > + gsi_insert_after (&gsi, g, GSI_NEW_STMT); > > > + replace_uses_by (lhs2, d); > > > > I wonder if you can leave a lhs2 = d; in the IL instead of using > > replace_uses_by which will process imm uses and fold stmts while > > we're going to do that anyway in the caller? That would IMHO > > be better here. > > I'd need to emit them always for reversible ops and when the > atomic call can't be last, regardless of whether it is needed or not, > just so that next DCE would remove those up and emit those debug stmts, > because otherwise that could result in -fcompare-debug failures > (at least with -fno-tree-dce -fno-tree-whatever ...). > And > + tree narg = build_debug_expr_decl (type); > + gdebug *g > + = gimple_build_debug_bind (narg, > + fold_convert (type, arg), > + stmt2); > isn't that much more code compared to > gimple *g = gimple_build_assign (lhs2, NOP_EXPR, arg); > Or would you like it to be emitted always, i.e. > if (atomic_op != BIT_AND_EXPR > && atomic_op != BIT_IOR_EXPR > /* With -fnon-call-exceptions if we can't > add stmts after the call easily. */ > && !stmt_ends_bb_p (stmt2)) > { > tree type = TREE_TYPE (lhs2); > if (TREE_CODE (arg) == INTEGER_CST) > arg = fold_convert (type, arg); > else if (!useless_type_conversion_p (type, TREE_TYPE (arg))) > { > tree narg = make_ssa_name (type); > gimple *g = gimple_build_assign (narg, NOP_EXPR, arg); > gsi_insert_after (&gsi, g, GSI_NEW_STMT); > arg = narg; > } > enum tree_code rcode; > switch (atomic_op) > { > case PLUS_EXPR: rcode = MINUS_EXPR; break; > case MINUS_EXPR: rcode = PLUS_EXPR; break; > case BIT_XOR_EXPR: rcode = atomic_op; break; > default: gcc_unreachable (); > } > tree d = build_debug_expr_decl (type); > gimple *g = gimple_build_assign (lhs2, rcode, new_lhs, arg); > gsi_insert_after (&gsi, g, GSI_NEW_STMT); > lhs2 = NULL_TREE; > } > in between > update_stmt (use_stmt); > and > imm_use_iterator iter; > and then do the > FOR_EACH_IMM_USE_STMT (use_stmt, iter, lhs2) > if (use_stmt != cast_stmt) > with resetting only if (lhs2) > and similarly release_ssa_name (lhs2) only if (lhs2)? > I think the usual case is that we emit debug exprs right away, > not emit something that we want to DCE. > > + if (atomic_op == BIT_AND_EXPR > + || atomic_op == BIT_IOR_EXPR > + /* Or with -fnon-call-exceptions if we can't > + add debug stmts after the call. */ > + || stmt_ends_bb_p (stmt2)) > > > But now that you mention it, I think I don't handle right the > case where lhs2 has no debug uses but there is a cast_stmt that has debug > uses for its lhs. We'd need to add_debug_temp in that case too and > add a debug temp.
I'm mostly concerned about the replace_uses_by use. forwprop will go over newly emitted stmts and thus the hypothetical added lhs2 = d; record the copy and schedule the stmt for removal, substituting 'd' in each use as it goes along the function and folding them. It's a bit iffy (and maybe has unintended side-effects in odd cases) to trample around and fold stuff behind that flows back. I'd always vote to simplify the folding code so it's easier to maintain and not micro-optimize there since it's not going to be a hot part of the compiler. Richard.