Re: extend fwprop optimization

Steven Bosscher Mon, 11 Mar 2013 12:52:56 -0700

On Mon, Mar 11, 2013 at 6:52 AM, Wei Mi wrote:
> This is the fwprop extension patch which is put in order. Regression
> test and bootstrap pass. Please help to review its rationality. The
> following is a brief description what I have done in the patch.
>
> In order to make fwprop more effective in rtl optimization, we extend
> it to handle general expressions instead of the three cases listed in
> the head comment in fwprop.c. The major changes include a) We need to
> check propagation correctness for src exprs of def which contain mem
> references. Previous fwprop for the three cases above doesn't have the
> problem. b) We need a better cost model because the benefit is usually
> not so apparent as the three cases above.
>
> For a general fwprop problem, there are two possible sources where
> benefit comes from. The frist is the new use insn after propagation
> and simplification may have lower cost than itself before propagation,
> or propagation may create a new insn, that could be splitted or
> peephole optimized later and get a lower cost. The second is that if
> all the uses are replaced with the src of the def insn, the def insn
> could be deleted.
>
> So instead of check each def-use pair independently, we use DU chain
> to track all the uses for a def. For each def-use pair, we attempt the
> propagation, record the change candidate in changes[] array, but we
> wait to confirm the changes until all the pairs with the same def are
> iterated. The changes confirmation is done in the func
> confirm_change_group_by_cost. We only do this for fwprop. For
> fwprop_addr, the benefit of each change is ensured by
> propagation_rtx_1 using should_replace_address, so we just confirm all
> the changes without checking benefit again.


Hello Wei Mi,

So IIUC, in essence you are doing:

main:
  FOR_EACH_BB:
    FOR_BB_INSNS, non-debug insns only:
      for each df_ref DEF operand on insn:
        iterate_def_uses

iterate_def_uses:
  for each UD chain from DEF to USE(i):
    forward_propagate_into
  confirm changes by total benefit

I still like the idea, but there are also still a few "design issues"
to resolve.

Some of the same comments as before apply: Do you really, really,
reallyreally have to go so low-level as to insn splitting, peephole
optimizations, and even register allocation, to get the cost right?
That will almost certainly not be acceptable, and I for one would
oppose such a change. It's IMHO a violation of proper engineering when
your medium-to-high level code transformations have to do that. If you
have strong reasons for your approach, it'd be helpful if you can
explain them so that we can together look for a less intrusive
solution (e.g. splitting earlier, adjusting the cost model, etc.).

So things like:
> +  /* we may call peephole2_insns in fwprop phase to estimate how
> +     peephole will affect the cost of the insn transformed by fwprop.
> +     fwprop is done before ira phase. In that case, we simply return
> +     a new pseudo register.  */
> +  if (!strncmp (current_pass->name, "fwprop", 6))
> +    return gen_reg_rtx (mode);

and

> Index: config/i386/i386.c
> ===================================================================
> --- config/i386/i386.c        (revision 196270)
> +++ config/i386/i386.c        (working copy)
> @@ -15901,8 +15901,14 @@ ix86_expand_clear (rtx dest)
>  {
>    rtx tmp;
>
> -  /* We play register width games, which are only valid after reload.  */
> -  gcc_assert (reload_completed);
> +  /* We play register width games, which are only valid after reload.
> +     An exception: fwprop call peephole to estimate the change benefit,
> +     and peephole will call this func. That is before reload complete.
> +     It will not bring any problem because the peephole2_insns call is
> +     only used for cost estimation in fwprop, and its change will be
> +     abandoned immediately after the cost estimation.  */
> +  if (strncmp (current_pass->name, "fwprop", 6))
> +    gcc_assert (reload_completed);

are IMHO not OK.

Note that your patch is a bit difficult to read at some points because
you have included a bunch of non-changes (whitespaces fixes --
necessary cleanups but not relevant for your patch), see e.g. the
changed lines that contain "lra_in_progress". Also the changes like:
>  static bool
> -propagate_rtx_1 (rtx *px, rtx old_rtx, rtx new_rtx, int flags)
> +propagate_rtx_1 (rtx *px, rtx old_rtx, rtx new_rtx, bool speed)

which are quite distracting, making it harder to see what has *really* changed.

You should probably just a helper function apply_change_group_num()
and avoid all the apply_change_group use fixups.


In fwprop.c:
> +  /* DF_LR_RUN_DCE is used in peephole2_insns, which is called for cost
> +     estimation in estimate_split_and_peephole.  */
> +  df_set_flags (DF_LR_RUN_DCE);
>    df_md_add_problem ();
>    df_note_add_problem ();
> -  df_analyze ();
> +  df_chain_add_problem (DF_UD_CHAIN | DF_DU_CHAIN);
>    df_maybe_reorganize_use_refs (DF_REF_ORDER_BY_INSN_WITH_NOTES);
> +  df_analyze ();

you add DU and UD chains, and implicitly the RD problem, but you also
already have the MD problem. I think my reaching-defs patches for GCC
4.8 make the MD problem less necessary, but you certainly don't need
MD + RD + UD + DU.

You've noticed so yourself:
> +   We think the maintainance for use_def_ref vector is not necessary, so
> +   we remove update_df/update_uses/update_df_init/register_active_defs.  */

and it looks like you're simply avoiding the problem by queuing up
changes and commit them all at the end. I don't believe that will
work, you'll break the UD and DU chains and may end up with dangling
pointers to free'd or removed df_refs.


> +  /* see when the insn is not a set  */
> +  if (!set)
> +    return false;

fwprop.c was speciflcally developed to also handle multiple-set
instructions, like the bits of cse.c that it tried to replace. Your
patch should not change this.


> +static bool
> +mem_may_be_modified (rtx from, rtx to)

This has "potentially slow" written all over it :-) (You're punting on
any MEM for now, but someone at some point will find a reason to use
alias analysis, blowing up really bad test cases like PR39326...)


> +int
> +reg_mentioned_num (const_rtx reg, const_rtx in

Should use DF caches instead of deep-diving the pattern. Or if DF
cache updates are deferred, use for_each_rtx on the pattern.


> +/* Find whether the set define a return reg.  */
> +
> +static bool
> +def_return_reg (rtx set)
> +{
> +  edge eg;
> +  edge_iterator ei;
> +  rtx dest = SET_DEST (set);
> +
> +  if (!REG_P (dest))
> +    return false;

The return USE will also be a hard reg:
 +  if (!REG_P (dest) || ! HARD_REGISTER_P (dest))
 +    return false;


> +  FOR_EACH_EDGE (eg, ei, EXIT_BLOCK_PTR->preds)
> +    if (eg->flags & EDGE_FALLTHRU)
> +      {
> +     basic_block src_bb = eg->src;
> +     rtx last_insn, ret_reg;
> +     if (EDGE_COUNT (EXIT_BLOCK_PTR->preds) == 1

single_pred_p(), but why do FOR_EACH_EDGE and then chec that there is
only one pred to begin with?

> +         && NONJUMP_INSN_P ((last_insn = BB_END (src_bb)))
> +         && GET_CODE (PATTERN (last_insn)) == USE
> +         && GET_CODE ((ret_reg = XEXP (PATTERN (last_insn), 0))) == REG
> +         && REGNO (ret_reg) == REGNO (dest))
> +       return true;
> +      }
> +  return false;
> +}

Actually this whole change makes me nervous. I don't think you should
propagate into any USE at all, for return value or otherwise.


> +  if (def_insn)
> +  {
> +    rtx set = single_set (def_insn);
> +    if (set)
> +      def_insn_cost = set_src_cost (SET_SRC (set), speed)
> +                   + set_src_cost (SET_DEST (set), speed) + 1;
> +    else
> +      return false;
> +  }

As before: You'll have to deal with non-single_set insns also.


> +void
> +dump_cfg (FILE *file)
> +{

You'll find -fdump-rtl-fwprop-graph useful, as well as brief_dump_cfg.


> +  if (fwprop_addr)
> +     return confirm_change_group_by_cost (false,
> +                                       0,
> +                                       false);
> +  else
> +    {
> +      all_uses_replaced = (use_num == reg_replaced_num);
> +      return confirm_change_group_by_cost (all_uses_replaced,
> +                                        def_insn_cost,
> +                                        true);
> +    }


What happens if you propagate into an insn that uses the same register
twice? Will the DU chains still be valid (I don't think that's
guaranteed)?

Is the extra_benefit flag always applicable if all USEs of a DEF have
been propagated out? What if the DEF is in an insn that is inherently
necessary?

Have you measured what effect this pass has on combine?

Ciao!
Steven

Re: extend fwprop optimization

Reply via email to