On Tue, Apr 24, 2012 at 11:26 PM, Teresa Johnson <tejohn...@google.com> wrote:

>        * params.def (PARAM_MIN_ITER_UNROLL_WITH_BRANCHES): New param.
>        (PARAM_UNROLL_OUTER_LOOP_BRANCH_BUDGET): Ditto.

You should add documentation for these new PARAMs to doc/invoke.texi.

I don't really like these new PARAMs: All other loop PARAMs are based
on the number of insns in a loop, or the maximum number of times a
transformation is applied. Your new
PARAM_MIN_ITER_UNROLL_WITH_BRANCHES is completely different, because
it is a number of iterations. This makes the PARAM value feel even
more arbitrary than all the other PARAMs to some extend already do...

(The only other PARAM like that is PARAM_ALIGN_LOOP_ITERATIONS, and
its default value also looks quite arbitrary...)


> Index: loop-unroll.c
> ===================================================================
> --- loop-unroll.c       (revision 186783)
> +++ loop-unroll.c       (working copy)
> @@ -152,6 +152,180 @@ static void combine_var_copies_in_loop_exit (struc
>                                             basic_block);
>  static rtx get_expansion (struct var_to_expand *);
>
> +/* Determine whether LOOP contains call.  */
> +static bool
> +loop_has_call(struct loop *loop)
> +{
> +  basic_block *body, bb;
> +  unsigned i;
> +  rtx insn;
> +
> +  body = get_loop_body (loop);
> +  for (i = 0; i < loop->num_nodes; i++)
> +    {
> +      bb = body[i];
> +
> +      FOR_BB_INSNS (bb, insn)
> +        {
> +          if (CALL_P (insn))
> +            {
> +              free (body);
> +              return true;
> +            }
> +        }
> +    }
> +  free (body);
> +  return false;
> +}
> +
> +/* Determine whether LOOP contains floating-point computation.  */
> +static bool
> +loop_has_FP_comp(struct loop *loop)
> +{
> +  rtx set, dest;
> +  basic_block *body, bb;
> +  unsigned i;
> +  rtx insn;
> +
> +  body = get_loop_body (loop);
> +  for (i = 0; i < loop->num_nodes; i++)
> +    {
> +      bb = body[i];
> +
> +      FOR_BB_INSNS (bb, insn)
> +        {
> +          set = single_set (insn);
> +          if (!set)
> +            continue;
> +
> +          dest = SET_DEST (set);
> +          if (FLOAT_MODE_P (GET_MODE (dest)))
> +            {
> +              free (body);
> +              return true;

So you only detect single-set FP operations where some insns stores in
a float mode. It wouldn't be very difficult to just walk over all sets
and look for float modes. This is also necessary e.g. for x87 sincos,
as well as various insns on other machines. Your comments say you
don't want to apply the new heuristic to loops containing FP
operations because these loops usually benefit more from unrolling.
Therefore, you should IMHO look at non-single_set() insns also here,
to avoid applying the heuristics to loops containing non-single_set()
FP insns.


> +            }
> +        }
> +    }
> +  free (body);
> +  return false;
> +}

Nit: You are calling loop_has_call and loop_has_FP_comp() twice on
each loop (first for constant iterations and next for runtime
iterations), maybe you can fuse the functions and cache the results
(e.g. with two bitmaps, or put it in the loop description and retrieve
it with get_simple_loop_desc). Actually num_loop_branches()
could/should also be cached. I realize that the loop body walks are
probably not very expensive (and compile time probably isn't a concern
if you're using profile driven optimizations) but they do all add
up...


> +/* Compute the number of branches in LOOP, weighted by execution counts.  */
> +static float
> +compute_weighted_branches(struct loop *loop)

The floating point thing was already mentioned by Andrew. You can use
integer math instead (for examples, look for BB_FREQ_MAX e.g. in
average_num_loop_insns()).


> +  while (loop_outer(outer))
> +    {
> +      outer_desc = get_simple_loop_desc (outer);
> +
> +      if (outer_desc->const_iter)
> +        outer_niters *= outer_desc->niter;
> +      else if (outer->header->count)
> +        outer_niters *= expected_loop_iterations (outer);
> +
> +      weighted_outer_branches = compute_weighted_branches (outer);

Can you delay this computation of "weighted_outer_branches" call to ...

> +      /* Should have been checked by caller.  */
> +      gcc_assert(PARAM_VALUE (PARAM_MIN_ITER_UNROLL_WITH_BRANCHES) != -1);

Should never even happen. You have set the minimum acceptable value to
0. If you managed to test this code with
PARAM_MIN_ITER_UNROLL_WITH_BRANCHES==-1, I'd like to know how (if you
can do it from the command line, there is a bug in the handling of
acceptable PARAM values :-)


> +      /* If the outer loop has enough iterations to be considered hot, then
> +         we can stop our upwards loop tree traversal and examine the current
> +         outer loop.  */
> +      if (outer_niters >= PARAM_VALUE (PARAM_MIN_ITER_UNROLL_WITH_BRANCHES))
> +        {
> +          /* Assume that any call will cause the branch budget to be 
> exceeded,
> +             and that we can't unroll the current loop without increasing
> +             mispredicts.  */
> +          if (loop_has_call(outer))
> +            return 0;
> +
> +          /* Otherwise, compute the maximum number of times current loop can 
> be
> +             unrolled without exceeding our branch budget. First we subtract
> +             off the outer loop's weighted branch count from the budget. Note
> +             that this includes the branches in the current loop. This yields
> +             the number of branches left in the budget for the unrolled 
> copies.
> +             We divide this by the number of branches in the current loop 
> that
> +             must be duplicated when we unroll, which is the total weighted
> +             number of branches minus the back-edge branch. This yields the
> +             number of new loop body copies that can be created by unrolling
> +             without exceeding the budget, to which we add 1 to get the 
> unroll
> +             factor.  */

... somewhere here, where weighted_outer_branches is used?

> +          return (PARAM_VALUE (PARAM_UNROLL_OUTER_LOOP_BRANCH_BUDGET) -
> +                  weighted_outer_branches)/(weighted_num_branches - 1) + 1;

You should guard against weighted branches==1.0.

Ciao!
Steven

Reply via email to