Unfortunately, this was before my time, so I don't know.

That being said, thanks for tackling these issues that my work
triggered last release.  Much appreciated.

Aldy

On Tue, Aug 2, 2022 at 10:41 AM Richard Biener <rguent...@suse.de> wrote:
>
> I am trying to make sense of back_threader_profitability::profitable_path_p
> and the first thing I notice is that we do
>
>   /* Threading is profitable if the path duplicated is hot but also
>      in a case we separate cold path from hot path and permit optimization
>      of the hot path later.  Be on the agressive side here. In some testcases,
>      as in PR 78407 this leads to noticeable improvements.  */
>   if (m_speed_p
>       && ((taken_edge && optimize_edge_for_speed_p (taken_edge))
>           || contains_hot_bb))
>     {
>       if (n_insns >= param_max_fsm_thread_path_insns)
>         {
>           if (dump_file && (dump_flags & TDF_DETAILS))
>             fprintf (dump_file, "  FAIL: Jump-thread path not considered: "
>                      "the number of instructions on the path "
>                      "exceeds PARAM_MAX_FSM_THREAD_PATH_INSNS.\n");
>           return false;
>         }
> ...
>     }
>   else if (!m_speed_p && n_insns > 1)
>     {
>       if (dump_file && (dump_flags & TDF_DETAILS))
>         fprintf (dump_file, "  FAIL: Jump-thread path not considered: "
>                  "duplication of %i insns is needed and optimizing for 
> size.\n",
>                  n_insns);
>       return false;
>     }
> ...
>   return true;
>
> thus we apply the n_insns >= param_max_fsm_thread_path_insns only
> to "hot paths".  The comment above this isn't entirely clear whether
> this is by design ("Be on the aggressive side here ...") but I think
> this is a mistake.  In fact the "hot path" check seems entirely
> useless since if the path is not hot we simply continue threading it.
>
> I have my reservations about how we compute hot (contains_hot_bb
> in particular), but the following first refactors the above to apply
> the size constraints always and then _not_ threading if the path
> is not considered hot (but allow threading if n_insns <= 1 as with
> the !m_speed_p case).
>
> As for contains_hot_bb - it might be that this consciously captures
> the case where we separate a cold from a hot path even though the
> threaded path itself is cold.  Consider
>
>    A
>   / \ (unlikely)
>  B   C
>   \ /
>    D
>   / \
>  ..  abort()
>
> when we want to thread A -> B -> D -> abort () and A (or D)
> has a hot BB count then we have contains_hot_bb even though
> the counts on the path itself are small.  In fact when we
> thread the only relevant count for the resulting threaded
> path is the count of A with the A->C probability applied
> (that should also the count to subtract from the blocks
> we copied - sth missing for the backwards threader as well).
>
> So I'm wondering how the logic computing contains_hot_bb
> relates to the above comment before the costing block.
> Anyone remembers?
>
> Bootstrap & regtest running on x86_64-unknown-linux-gnu.
>
>         * tree-ssa-threadbackwards.cc
>         (back_threader_profitability::profitable_path_p): Apply
>         size constraints to all paths.  Do not thread cold paths.
> ---
>  gcc/tree-ssa-threadbackward.cc | 53 +++++++++++++++++++++-------------
>  1 file changed, 33 insertions(+), 20 deletions(-)
>
> diff --git a/gcc/tree-ssa-threadbackward.cc b/gcc/tree-ssa-threadbackward.cc
> index 0519f2a8c4b..a887568032b 100644
> --- a/gcc/tree-ssa-threadbackward.cc
> +++ b/gcc/tree-ssa-threadbackward.cc
> @@ -761,22 +761,43 @@ back_threader_profitability::profitable_path_p (const 
> vec<basic_block> &m_path,
>         *creates_irreducible_loop = true;
>      }
>
> -  /* Threading is profitable if the path duplicated is hot but also
> +  const int max_cold_insns = 1;
> +  if (!m_speed_p && n_insns > max_cold_insns)
> +    {
> +      if (dump_file && (dump_flags & TDF_DETAILS))
> +       fprintf (dump_file, "  FAIL: Jump-thread path not considered: "
> +                "duplication of %i insns is needed and optimizing for 
> size.\n",
> +                n_insns);
> +      return false;
> +    }
> +  else if (n_insns >= param_max_fsm_thread_path_insns)
> +    {
> +      if (dump_file && (dump_flags & TDF_DETAILS))
> +       fprintf (dump_file, "  FAIL: Jump-thread path not considered: "
> +                "the number of instructions on the path "
> +                "exceeds PARAM_MAX_FSM_THREAD_PATH_INSNS.\n");
> +      return false;
> +    }
> +
> +  /* Threading is profitable if the path duplicated is small or hot but also
>       in a case we separate cold path from hot path and permit optimization
>       of the hot path later.  Be on the agressive side here. In some 
> testcases,
>       as in PR 78407 this leads to noticeable improvements.  */
> -  if (m_speed_p
> -      && ((taken_edge && optimize_edge_for_speed_p (taken_edge))
> -         || contains_hot_bb))
> +  if (!(n_insns <= max_cold_insns
> +       || contains_hot_bb
> +       || (taken_edge && optimize_edge_for_speed_p (taken_edge))))
> +    {
> +      if (dump_file && (dump_flags & TDF_DETAILS))
> +       fprintf (dump_file, "  FAIL: Jump-thread path not considered: "
> +                "path is not profitable to thread.\n");
> +      return false;
> +    }
> +
> +  /* If the path is not small to duplicate and either the entry or
> +     the final destination is probably never executed avoid separating
> +     the cold path since that can lead to spurious diagnostics.  */
> +  if (n_insns > max_cold_insns)
>      {
> -      if (n_insns >= param_max_fsm_thread_path_insns)
> -       {
> -         if (dump_file && (dump_flags & TDF_DETAILS))
> -           fprintf (dump_file, "  FAIL: Jump-thread path not considered: "
> -                    "the number of instructions on the path "
> -                    "exceeds PARAM_MAX_FSM_THREAD_PATH_INSNS.\n");
> -         return false;
> -       }
>        if (taken_edge && probably_never_executed_edge_p (cfun, taken_edge))
>         {
>           if (dump_file && (dump_flags & TDF_DETAILS))
> @@ -794,14 +815,6 @@ back_threader_profitability::profitable_path_p (const 
> vec<basic_block> &m_path,
>           return false;
>         }
>      }
> -  else if (!m_speed_p && n_insns > 1)
> -    {
> -      if (dump_file && (dump_flags & TDF_DETAILS))
> -       fprintf (dump_file, "  FAIL: Jump-thread path not considered: "
> -                "duplication of %i insns is needed and optimizing for 
> size.\n",
> -                n_insns);
> -      return false;
> -    }
>
>    /* We avoid creating irreducible inner loops unless we thread through
>       a multiway branch, in which case we have deemed it worth losing
> --
> 2.35.3
>

Reply via email to