Hi!

On Mon, Jul 06, 2020 at 03:13:13PM +0800, guojiufu wrote:
> For very small loops (< 6 insns), it would be fine to unroll 4
> times to use cache line better.  Like below loops:
>  `while (i) a[--i] = NULL;   while (p < e)  *d++ = *p++;`

Yes, definitely.

> And for very complex loops which may cause negative impacts:
> branch-miss or cache-miss. Like below loop: there are calls,
> early exits and branches in loop.
> ```
>   for (int i = 0; i < n; i++) {
>               int e = a[I];
>              ....
>               if (function_call(e))  break;
>              ....
>   }
> ```
> 
> This patch enhances RTL unroll for small loops and prevent to
> unroll complex loops.

I am not happy about what is considered "a complex loop" here.

> +/* Count the number of call insns in LOOP.  */
> +static unsigned int
> +num_loop_calls (struct loop *loop)
> +{
> +  basic_block *bbs;
> +  rtx_insn *insn;
> +  unsigned int i;
> +  unsigned int call_ins_num = 0;
> +
> +  bbs = get_loop_body (loop);
> +  for (i = 0; i < loop->num_nodes; i++)
> +    FOR_BB_INSNS (bbs[i], insn)
> +      if (CALL_P (insn))
> +     call_ins_num++;
> +
> +  free (bbs);
> +
> +  return call_ins_num;
> +}

This function belongs in cfgloop.c really?  (Or cfgloopanal.c).  Next to
num_loop_branches ;-)

> +/* Return true if LOOP is too complex to be unrolled.  */
> +static bool
> +rs6000_complex_loop_p (struct loop *loop)
> +{
> +  unsigned call_num;
> +
> +  return loop->ninsns > 10
> +    && (call_num = num_loop_calls (loop)) > 0
> +    && (call_num + num_loop_branches (loop)) * 5 > loop->ninsns
> +    && !single_exit (loop);
> +}

Don't do initialisation in conditionals please (or in loop conditions),
like Will said already.

This calls only very specific loops "complex".  We need a better way
to decide this :-(

>  static unsigned
>  rs6000_loop_unroll_adjust (unsigned nunroll, struct loop *loop)
>  {
> -   if (unroll_only_small_loops)
> +  if (unroll_only_small_loops)
>      {
> -      /* TODO: This is hardcoded to 10 right now.  It can be refined, for
> -      example we may want to unroll very small loops more times (4 perhaps).
> -      We also should use a PARAM for this.  */
> +      if (loop->ninsns <= 6)
> +     return MIN (4, nunroll);
>        if (loop->ninsns <= 10)
>       return MIN (2, nunroll);
> -      else
> -     return 0;
> +
> +      return 0;
>      }

This part is fine.  It is independent of the rest AFAICS, so if you
agree, this part is preapproved for trunk.  Thanks!

(A PARAM would be nice, but too many of those isn't actually useful
either...  Next time, add one as soon as writing the code, at least it
is useful at that point in time, when you still need to experiment with
it :-) )

> +  if (rs6000_complex_loop_p (loop))
> +    return 0;
> +
>    return nunroll;
>  }

So, we use rs6000_complex_loop_p only to prevent all unrolling, never to
reduce the unrolling, and only in very specific cases.

Is there no middle road possible?  Say, don't unroll to more than 25
insns total (which is what the "only small loops" does, sort of -- it
also avoids unrolling 3x a bit, yes), and don't unroll to more than 2
calls, and not to more than 4 branches (I'm making up those numbers, of
course, and PARAMS would be helpful).  Some of this already does exist,
and might need retuning for us?


Segher

Reply via email to