On Thu, 13 Jan 2022, Jakub Jelinek wrote:

> On Thu, Jan 13, 2022 at 02:49:47PM +0100, Richard Biener wrote:
> > > +           tree d = build_debug_expr_decl (type);
> > > +           gdebug *g
> > > +             = gimple_build_debug_bind (d, build2 (rcode, type,
> > > +                                                   new_lhs, arg),
> > > +                                        stmt2);
> > > +           gsi_insert_after (&gsi, g, GSI_NEW_STMT);
> > > +           replace_uses_by (lhs2, d);
> > 
> > I wonder if you can leave a lhs2 = d; in the IL instead of using
> > replace_uses_by which will process imm uses and fold stmts while
> > we're going to do that anyway in the caller?  That would IMHO
> > be better here.
> 
> I'd need to emit them always for reversible ops and when the
> atomic call can't be last, regardless of whether it is needed or not,
> just so that next DCE would remove those up and emit those debug stmts,
> because otherwise that could result in -fcompare-debug failures
> (at least with -fno-tree-dce -fno-tree-whatever ...).
> And
> +                     tree narg = build_debug_expr_decl (type);
> +                     gdebug *g
> +                       = gimple_build_debug_bind (narg,
> +                                                  fold_convert (type, arg),
> +                                                  stmt2);
> isn't that much more code compared to
>                     gimple *g = gimple_build_assign (lhs2, NOP_EXPR, arg);
> Or would you like it to be emitted always, i.e.
>             if (atomic_op != BIT_AND_EXPR
>                && atomic_op != BIT_IOR_EXPR
>                /* With -fnon-call-exceptions if we can't
>                   add stmts after the call easily.  */
>                && !stmt_ends_bb_p (stmt2))
>               {
>                 tree type = TREE_TYPE (lhs2);
>                 if (TREE_CODE (arg) == INTEGER_CST)
>                   arg = fold_convert (type, arg);
>                 else if (!useless_type_conversion_p (type, TREE_TYPE (arg)))
>                   {
>                     tree narg = make_ssa_name (type);
>                     gimple *g = gimple_build_assign (narg, NOP_EXPR, arg);
>                     gsi_insert_after (&gsi, g, GSI_NEW_STMT);
>                     arg = narg;
>                   }
>                 enum tree_code rcode;
>                 switch (atomic_op)
>                   {
>                   case PLUS_EXPR: rcode = MINUS_EXPR; break;
>                   case MINUS_EXPR: rcode = PLUS_EXPR; break;
>                   case BIT_XOR_EXPR: rcode = atomic_op; break;
>                   default: gcc_unreachable ();
>                   }
>                 tree d = build_debug_expr_decl (type);
>                 gimple *g = gimple_build_assign (lhs2, rcode, new_lhs, arg);
>                 gsi_insert_after (&gsi, g, GSI_NEW_STMT);
>                 lhs2 = NULL_TREE;
>               }
> in between
>             update_stmt (use_stmt);
> and
>             imm_use_iterator iter;
> and then do the
>              FOR_EACH_IMM_USE_STMT (use_stmt, iter, lhs2)
>                if (use_stmt != cast_stmt)
> with resetting only if (lhs2)
> and similarly release_ssa_name (lhs2) only if (lhs2)?
> I think the usual case is that we emit debug exprs right away,
> not emit something that we want to DCE.
> 
> +                   if (atomic_op == BIT_AND_EXPR
> +                       || atomic_op == BIT_IOR_EXPR
> +                       /* Or with -fnon-call-exceptions if we can't
> +                          add debug stmts after the call.  */
> +                       || stmt_ends_bb_p (stmt2))
> 
> 
> But now that you mention it, I think I don't handle right the
> case where lhs2 has no debug uses but there is a cast_stmt that has debug
> uses for its lhs.  We'd need to add_debug_temp in that case too and
> add a debug temp.

I'm mostly concerned about the replace_uses_by use.  forwprop
will go over newly emitted stmts and thus the hypothetical added

    lhs2 = d;

record the copy and schedule the stmt for removal, substituting 'd'
in each use as it goes along the function and folding them.  It's
a bit iffy (and maybe has unintended side-effects in odd cases)
to trample around and fold stuff behind that flows back.

I'd always vote to simplify the folding code so it's easier to
maintain and not micro-optimize there since it's not going to be
a hot part of the compiler.

Richard.

Reply via email to